Are you over 18 and want to see adult content?
More Annotations
Загрузить картинку, фото: png, jpg, gif. Бесплатный обменник картинок. | savepice.ru
Are you over 18 and want to see adult content?
Create an Ecommerce Website and Sell Online! Ecommerce Software by Shopify
Are you over 18 and want to see adult content?
Online prodaja i isporuka - shop u Vašem mestu - WinWin
Are you over 18 and want to see adult content?
Computerwissen | PC-Tipps & Computer-Hilfe kostenlos
Are you over 18 and want to see adult content?
Favourite Annotations
A complete backup of https://mudmasters.nl
Are you over 18 and want to see adult content?
A complete backup of https://smiletwice.com
Are you over 18 and want to see adult content?
A complete backup of https://guardian-angel-reading.com
Are you over 18 and want to see adult content?
A complete backup of https://scuolaforum.org
Are you over 18 and want to see adult content?
A complete backup of https://healourcommunities.org
Are you over 18 and want to see adult content?
A complete backup of https://g-form.com
Are you over 18 and want to see adult content?
A complete backup of https://abiaonline.com.ng
Are you over 18 and want to see adult content?
A complete backup of https://cfpj.com
Are you over 18 and want to see adult content?
A complete backup of https://ditchnet.org
Are you over 18 and want to see adult content?
A complete backup of https://viajaeplaneja.com.br
Are you over 18 and want to see adult content?
A complete backup of https://cathe.com
Are you over 18 and want to see adult content?
A complete backup of https://condoleance.nl
Are you over 18 and want to see adult content?
Text
pages of
APPLE’S M1 PROCESSOR AND THE FULL 128-BIT INTEGER PRODUCT If I multiply two 64-bit integers (having values in [0, 264)), the product requires 128 bits. Intel and AMD processors (x64) can compute the full (128-bit) product of two 64-bit integers using a single instruction (mul). ARM processors, such as those found in your mobile phone, require two instructions to achieve the same result: mul computes Continue reading Apple’s M1 processor and the REUSING A THREAD IN C++ FOR BETTER In a previous post, I measured the time necessary to start a thread, execute a small job and return. auto mythread = std::thread( { counter++; }); mythread.join(); The answer is thousands of nanoseconds. Importantly, that is the time as measured by the main thread. That is, sending the query, and getting back the result, takes Continue reading Reusing a thread in C++ for better performance ADDING A (PREDICTABLE) BRANCH TO EXISTING CODE CAN Software is full of “branches”. They often take the form of if-then clauses in code. Modern processors try to predict the result of branches often long before evaluating them. Hard-to-predict branches are a challenge performance-wise because when a processor fails to predict correctly a branch, it does useless work that must be thrown away. A Continue reading Adding a (predictable GET INTO PAY SITES FOR FREE AS A GOOGLEBOT This trick is very clever: many sites limiting access to documents, let Googlebot (Google’s spidering agent) through. I think this is the case with some IEEE archives. So, you can simply tell your browser to identify yourself as the user agent “Googlebot/2.1”. Voilà ! You can go where Google can go. What a beautiful hack! With Continue reading Get into pay sites for free as a Googlebot EVEN FASTER BITMAP DECODING Bitmaps are a simple data structure used to represent sets of integers. For example, you can represent all sets of integers in [0,64) using a single 64-bit integer. When they are applicable, bitmaps are very efficient compared to the alternatives (e.g., a hash set). Unfortunately, extracting the bit sets in a bitmap can be expensive. Continue reading Even faster bitmap decoding QUICKLY PRUNING ELEMENTS IN SIMD VECTORS USING THE Modern processors have powerful vector instructions. However, some algorithms are tricky to implement using vector instructions. I often need to prune selected values from a vector. On x64 processors, we can achieve this result using table lookups and an efficient shuffle instruction. Building up the table each time gets tiring, however. Let us consider two Continue reading Quickly pruning ON HUMAN INTELLIGENCE… A PERSPECTIVE FROM COMPUTER SCIENCE On human intelligence a perspective from computer science. Whenever I read social scientists, there is often, implicit in the background, the concept of “intelligence” as a well defined quantity. I have some amount of intelligence. Maybe you have a bit more. But what does computer science have to say about any of this? PER-CORE FREQUENCY SCALING AND AVX-512: AN EXPERIMENT Intel has fancy new instructions (AVX-512) that are powerful, in part for heavy numerical work. When a core uses these heaviest of these new instructions, the core’s frequency comes down to maintain the power usage within bounds. I wanted to test it out so I wrote a little threaded program. It runs on four threads Continue reading Per-core frequency scaling and AVX-512: an experiment SOME USEFUL REGULAR EXPRESSIONS FOR In my blog post, My programming setup, I stressed how important regular expressions are to my programming activities. Regular expressions can look intimidating and outright ugly. However, they should not be underestimated. Someone asked for examples of regular expressions that I rely upon. Here a few. It is commonly considered a faux pas to include Continue reading Some useful regular COUNTING CYCLES AND INSTRUCTIONS ON THE APPLE M1 PROCESSOR Counting cycles and instructions on the Apple M1 processor. When benchmarking software, we often start by measuring the time elapsed. If you are benchmarking data bandwidth or latency, it is the right measure. However, if you are benchmarking computational tasks where you avoid disk and network accesses and where you only access a fewpages of
APPLE’S M1 PROCESSOR AND THE FULL 128-BIT INTEGER PRODUCT If I multiply two 64-bit integers (having values in [0, 264)), the product requires 128 bits. Intel and AMD processors (x64) can compute the full (128-bit) product of two 64-bit integers using a single instruction (mul). ARM processors, such as those found in your mobile phone, require two instructions to achieve the same result: mul computes Continue reading Apple’s M1 processor and the REUSING A THREAD IN C++ FOR BETTER In a previous post, I measured the time necessary to start a thread, execute a small job and return. auto mythread = std::thread( { counter++; }); mythread.join(); The answer is thousands of nanoseconds. Importantly, that is the time as measured by the main thread. That is, sending the query, and getting back the result, takes Continue reading Reusing a thread in C++ for better performance ADDING A (PREDICTABLE) BRANCH TO EXISTING CODE CAN Software is full of “branches”. They often take the form of if-then clauses in code. Modern processors try to predict the result of branches often long before evaluating them. Hard-to-predict branches are a challenge performance-wise because when a processor fails to predict correctly a branch, it does useless work that must be thrown away. A Continue reading Adding a (predictable GET INTO PAY SITES FOR FREE AS A GOOGLEBOT This trick is very clever: many sites limiting access to documents, let Googlebot (Google’s spidering agent) through. I think this is the case with some IEEE archives. So, you can simply tell your browser to identify yourself as the user agent “Googlebot/2.1”. Voilà ! You can go where Google can go. What a beautiful hack! With Continue reading Get into pay sites for free as a Googlebot EVEN FASTER BITMAP DECODING Bitmaps are a simple data structure used to represent sets of integers. For example, you can represent all sets of integers in [0,64) using a single 64-bit integer. When they are applicable, bitmaps are very efficient compared to the alternatives (e.g., a hash set). Unfortunately, extracting the bit sets in a bitmap can be expensive. Continue reading Even faster bitmap decoding QUICKLY PRUNING ELEMENTS IN SIMD VECTORS USING THE Modern processors have powerful vector instructions. However, some algorithms are tricky to implement using vector instructions. I often need to prune selected values from a vector. On x64 processors, we can achieve this result using table lookups and an efficient shuffle instruction. Building up the table each time gets tiring, however. Let us consider two Continue reading Quickly pruning ON HUMAN INTELLIGENCE… A PERSPECTIVE FROM COMPUTER SCIENCE On human intelligence a perspective from computer science. Whenever I read social scientists, there is often, implicit in the background, the concept of “intelligence” as a well defined quantity. I have some amount of intelligence. Maybe you have a bit more. But what does computer science have to say about any of this? PER-CORE FREQUENCY SCALING AND AVX-512: AN EXPERIMENT Intel has fancy new instructions (AVX-512) that are powerful, in part for heavy numerical work. When a core uses these heaviest of these new instructions, the core’s frequency comes down to maintain the power usage within bounds. I wanted to test it out so I wrote a little threaded program. It runs on four threads Continue reading Per-core frequency scaling and AVX-512: an experiment APPLE’S M1 PROCESSOR AND THE FULL 128-BIT INTEGER PRODUCT If I multiply two 64-bit integers (having values in [0, 264)), the product requires 128 bits. Intel and AMD processors (x64) can compute the full (128-bit) product of two 64-bit integers using a single instruction (mul). ARM processors, such as those found in your mobile phone, require two instructions to achieve the same result: mul computes Continue reading Apple’s M1 processor and the SCIENCE IS THE BELIEF IN THE IGNORANCE OF EXPERTS Science is the belief in the ignorance of experts said Richard Feynman. Feynman had a Nobel prize in physics. He was a remarquable educator: his lecture notes are still popular. He foresaw nanotechnology and quantum computing. He is credited with identifying the cause of the Space Shuttle Challenger disaster. There is abeautiful talk by
COMPUTATIONAL OVERHEAD DUE TO DOCKER UNDER MACOS It may come to dominate the running time. If you want to squeeze every ounce of computational performance out your machine, it is likely that you should avoid the docker overhead under macOS. A 3% overhead may prove to be unacceptable. However, for developing and benchmarking your code, it may well be an acceptable trade-off. FAST FLOAT PARSING IN PRACTICE In our work parsing JSON documents as quickly as possible, we found that one of the most challenging problem is to parse numbers. That is, you want to take the string “1.3553e142” and convert it quickly to a double-precision floating-point number. You can use the strtod function from the standard C/C++ library, but it is Continue reading Fast float parsing in practice SETTING UP A ROCKPRO64 (POWERFUL SINGLE-CARD COMPUTER A few months ago, I ordered ROCKPro64. If you are familiar with the Raspberry Pi, then it is a bit of the same an inexpensive computer that comes in the form of a single card. The ROCKPro64 differs from the Raspberry Pi in that it is much closer in power to a normal PC. You Continue reading Setting up a ROCKPro64 (powerful single-cardcomputer)
EVEN FASTER BITMAP DECODING Bitmaps are a simple data structure used to represent sets of integers. For example, you can represent all sets of integers in [0,64) using a single 64-bit integer. When they are applicable, bitmaps are very efficient compared to the alternatives (e.g., a hash set). Unfortunately, extracting the bit sets in a bitmap can be expensive. Continue reading Even faster bitmap decoding REVISITING VERNOR VINGE’S “PREDICTIONS” FOR 2025 Vernor Vinge is a retired mathematics professor who became famous through his science-fiction novels. He is also famous as being one of the first to contemplate the idea of a “technological singularity“. There is debate as to what the technological singularity, but the general idea goes as follows. At some point in the near future Continue reading Revisiting Vernor Vinge’s THE AVERAGE OF AVERAGES IS NOT THE AVERAGE A fact that we teach in our OLAP class is that you can’t take the average of averages and hope it will match the average. This is a common enough mistake for people working with databases and doing number crunching. It is only true if all of the averages are computed over sets having the Continue reading The average of averages is notthe average
THE DAY I SUBSCRIBED TO A DOZEN PORN SITES… This morning, I noticed some odd charges on my VISA card. They were attributed to sites such as videosupport1.com, bngvsupport.com, paysupport1.com, bdpayhelp.com. I called up my bank. They gave me the phone number of the company behind these pay sites and told me to ask what the charges were for. I called the company behind Continue reading The day I subscribed to a dozen porn ON THE MEMORY USAGE OF MAPS IN JAVA Though we have plenty of memory in our computers, there are still cases where you want to minimize memory usage if only to avoid expensive cache faults. To compare the memory usage of various standard map data structures, I wrote a small program where I create a map from the value k to the value Continue reading On the memory usageof maps in Java
SOME USEFUL REGULAR EXPRESSIONS FOR In my blog post, My programming setup, I stressed how important regular expressions are to my programming activities. Regular expressions can look intimidating and outright ugly. However, they should not be underestimated. Someone asked for examples of regular expressions that I rely upon. Here a few. It is commonly considered a faux pas to include Continue reading Some useful regular COUNTING CYCLES AND INSTRUCTIONS ON THE APPLE M1 PROCESSOR Counting cycles and instructions on the Apple M1 processor. When benchmarking software, we often start by measuring the time elapsed. If you are benchmarking data bandwidth or latency, it is the right measure. However, if you are benchmarking computational tasks where you avoid disk and network accesses and where you only access a fewpages of
APPLE’S M1 PROCESSOR AND THE FULL 128-BIT INTEGER PRODUCT If I multiply two 64-bit integers (having values in [0, 264)), the product requires 128 bits. Intel and AMD processors (x64) can compute the full (128-bit) product of two 64-bit integers using a single instruction (mul). ARM processors, such as those found in your mobile phone, require two instructions to achieve the same result: mul computes Continue reading Apple’s M1 processor and the REUSING A THREAD IN C++ FOR BETTER In a previous post, I measured the time necessary to start a thread, execute a small job and return. auto mythread = std::thread( { counter++; }); mythread.join(); The answer is thousands of nanoseconds. Importantly, that is the time as measured by the main thread. That is, sending the query, and getting back the result, takes Continue reading Reusing a thread in C++ for better performance ADDING A (PREDICTABLE) BRANCH TO EXISTING CODE CAN Software is full of “branches”. They often take the form of if-then clauses in code. Modern processors try to predict the result of branches often long before evaluating them. Hard-to-predict branches are a challenge performance-wise because when a processor fails to predict correctly a branch, it does useless work that must be thrown away. A Continue reading Adding a (predictable GET INTO PAY SITES FOR FREE AS A GOOGLEBOT This trick is very clever: many sites limiting access to documents, let Googlebot (Google’s spidering agent) through. I think this is the case with some IEEE archives. So, you can simply tell your browser to identify yourself as the user agent “Googlebot/2.1”. Voilà ! You can go where Google can go. What a beautiful hack! With Continue reading Get into pay sites for free as a Googlebot EVEN FASTER BITMAP DECODING Bitmaps are a simple data structure used to represent sets of integers. For example, you can represent all sets of integers in [0,64) using a single 64-bit integer. When they are applicable, bitmaps are very efficient compared to the alternatives (e.g., a hash set). Unfortunately, extracting the bit sets in a bitmap can be expensive. Continue reading Even faster bitmap decoding QUICKLY PRUNING ELEMENTS IN SIMD VECTORS USING THE Modern processors have powerful vector instructions. However, some algorithms are tricky to implement using vector instructions. I often need to prune selected values from a vector. On x64 processors, we can achieve this result using table lookups and an efficient shuffle instruction. Building up the table each time gets tiring, however. Let us consider two Continue reading Quickly pruning ON HUMAN INTELLIGENCE… A PERSPECTIVE FROM COMPUTER SCIENCE On human intelligence a perspective from computer science. Whenever I read social scientists, there is often, implicit in the background, the concept of “intelligence” as a well defined quantity. I have some amount of intelligence. Maybe you have a bit more. But what does computer science have to say about any of this? PER-CORE FREQUENCY SCALING AND AVX-512: AN EXPERIMENT Intel has fancy new instructions (AVX-512) that are powerful, in part for heavy numerical work. When a core uses these heaviest of these new instructions, the core’s frequency comes down to maintain the power usage within bounds. I wanted to test it out so I wrote a little threaded program. It runs on four threads Continue reading Per-core frequency scaling and AVX-512: an experiment SOME USEFUL REGULAR EXPRESSIONS FOR In my blog post, My programming setup, I stressed how important regular expressions are to my programming activities. Regular expressions can look intimidating and outright ugly. However, they should not be underestimated. Someone asked for examples of regular expressions that I rely upon. Here a few. It is commonly considered a faux pas to include Continue reading Some useful regular COUNTING CYCLES AND INSTRUCTIONS ON THE APPLE M1 PROCESSOR Counting cycles and instructions on the Apple M1 processor. When benchmarking software, we often start by measuring the time elapsed. If you are benchmarking data bandwidth or latency, it is the right measure. However, if you are benchmarking computational tasks where you avoid disk and network accesses and where you only access a fewpages of
APPLE’S M1 PROCESSOR AND THE FULL 128-BIT INTEGER PRODUCT If I multiply two 64-bit integers (having values in [0, 264)), the product requires 128 bits. Intel and AMD processors (x64) can compute the full (128-bit) product of two 64-bit integers using a single instruction (mul). ARM processors, such as those found in your mobile phone, require two instructions to achieve the same result: mul computes Continue reading Apple’s M1 processor and the REUSING A THREAD IN C++ FOR BETTER In a previous post, I measured the time necessary to start a thread, execute a small job and return. auto mythread = std::thread( { counter++; }); mythread.join(); The answer is thousands of nanoseconds. Importantly, that is the time as measured by the main thread. That is, sending the query, and getting back the result, takes Continue reading Reusing a thread in C++ for better performance ADDING A (PREDICTABLE) BRANCH TO EXISTING CODE CAN Software is full of “branches”. They often take the form of if-then clauses in code. Modern processors try to predict the result of branches often long before evaluating them. Hard-to-predict branches are a challenge performance-wise because when a processor fails to predict correctly a branch, it does useless work that must be thrown away. A Continue reading Adding a (predictable GET INTO PAY SITES FOR FREE AS A GOOGLEBOT This trick is very clever: many sites limiting access to documents, let Googlebot (Google’s spidering agent) through. I think this is the case with some IEEE archives. So, you can simply tell your browser to identify yourself as the user agent “Googlebot/2.1”. Voilà ! You can go where Google can go. What a beautiful hack! With Continue reading Get into pay sites for free as a Googlebot EVEN FASTER BITMAP DECODING Bitmaps are a simple data structure used to represent sets of integers. For example, you can represent all sets of integers in [0,64) using a single 64-bit integer. When they are applicable, bitmaps are very efficient compared to the alternatives (e.g., a hash set). Unfortunately, extracting the bit sets in a bitmap can be expensive. Continue reading Even faster bitmap decoding QUICKLY PRUNING ELEMENTS IN SIMD VECTORS USING THE Modern processors have powerful vector instructions. However, some algorithms are tricky to implement using vector instructions. I often need to prune selected values from a vector. On x64 processors, we can achieve this result using table lookups and an efficient shuffle instruction. Building up the table each time gets tiring, however. Let us consider two Continue reading Quickly pruning ON HUMAN INTELLIGENCE… A PERSPECTIVE FROM COMPUTER SCIENCE On human intelligence a perspective from computer science. Whenever I read social scientists, there is often, implicit in the background, the concept of “intelligence” as a well defined quantity. I have some amount of intelligence. Maybe you have a bit more. But what does computer science have to say about any of this? PER-CORE FREQUENCY SCALING AND AVX-512: AN EXPERIMENT Intel has fancy new instructions (AVX-512) that are powerful, in part for heavy numerical work. When a core uses these heaviest of these new instructions, the core’s frequency comes down to maintain the power usage within bounds. I wanted to test it out so I wrote a little threaded program. It runs on four threads Continue reading Per-core frequency scaling and AVX-512: an experiment APPLE’S M1 PROCESSOR AND THE FULL 128-BIT INTEGER PRODUCT If I multiply two 64-bit integers (having values in [0, 264)), the product requires 128 bits. Intel and AMD processors (x64) can compute the full (128-bit) product of two 64-bit integers using a single instruction (mul). ARM processors, such as those found in your mobile phone, require two instructions to achieve the same result: mul computes Continue reading Apple’s M1 processor and the SCIENCE IS THE BELIEF IN THE IGNORANCE OF EXPERTS Science is the belief in the ignorance of experts said Richard Feynman. Feynman had a Nobel prize in physics. He was a remarquable educator: his lecture notes are still popular. He foresaw nanotechnology and quantum computing. He is credited with identifying the cause of the Space Shuttle Challenger disaster. There is abeautiful talk by
COMPUTATIONAL OVERHEAD DUE TO DOCKER UNDER MACOS It may come to dominate the running time. If you want to squeeze every ounce of computational performance out your machine, it is likely that you should avoid the docker overhead under macOS. A 3% overhead may prove to be unacceptable. However, for developing and benchmarking your code, it may well be an acceptable trade-off. FAST FLOAT PARSING IN PRACTICE In our work parsing JSON documents as quickly as possible, we found that one of the most challenging problem is to parse numbers. That is, you want to take the string “1.3553e142” and convert it quickly to a double-precision floating-point number. You can use the strtod function from the standard C/C++ library, but it is Continue reading Fast float parsing in practice SETTING UP A ROCKPRO64 (POWERFUL SINGLE-CARD COMPUTER A few months ago, I ordered ROCKPro64. If you are familiar with the Raspberry Pi, then it is a bit of the same an inexpensive computer that comes in the form of a single card. The ROCKPro64 differs from the Raspberry Pi in that it is much closer in power to a normal PC. You Continue reading Setting up a ROCKPro64 (powerful single-cardcomputer)
EVEN FASTER BITMAP DECODING Bitmaps are a simple data structure used to represent sets of integers. For example, you can represent all sets of integers in [0,64) using a single 64-bit integer. When they are applicable, bitmaps are very efficient compared to the alternatives (e.g., a hash set). Unfortunately, extracting the bit sets in a bitmap can be expensive. Continue reading Even faster bitmap decoding REVISITING VERNOR VINGE’S “PREDICTIONS” FOR 2025 Vernor Vinge is a retired mathematics professor who became famous through his science-fiction novels. He is also famous as being one of the first to contemplate the idea of a “technological singularity“. There is debate as to what the technological singularity, but the general idea goes as follows. At some point in the near future Continue reading Revisiting Vernor Vinge’s THE AVERAGE OF AVERAGES IS NOT THE AVERAGE A fact that we teach in our OLAP class is that you can’t take the average of averages and hope it will match the average. This is a common enough mistake for people working with databases and doing number crunching. It is only true if all of the averages are computed over sets having the Continue reading The average of averages is notthe average
THE DAY I SUBSCRIBED TO A DOZEN PORN SITES… This morning, I noticed some odd charges on my VISA card. They were attributed to sites such as videosupport1.com, bngvsupport.com, paysupport1.com, bdpayhelp.com. I called up my bank. They gave me the phone number of the company behind these pay sites and told me to ask what the charges were for. I called the company behind Continue reading The day I subscribed to a dozen porn ON THE MEMORY USAGE OF MAPS IN JAVA Though we have plenty of memory in our computers, there are still cases where you want to minimize memory usage if only to avoid expensive cache faults. To compare the memory usage of various standard map data structures, I wrote a small program where I create a map from the value k to the value Continue reading On the memory usageof maps in Java
SOME USEFUL REGULAR EXPRESSIONS FOR In my blog post, My programming setup, I stressed how important regular expressions are to my programming activities. Regular expressions can look intimidating and outright ugly. However, they should not be underestimated. Someone asked for examples of regular expressions that I rely upon. Here a few. It is commonly considered a faux pas to include Continue reading Some useful regular REUSING A THREAD IN C++ FOR BETTER In a previous post, I measured the time necessary to start a thread, execute a small job and return. auto mythread = std::thread( { counter++; }); mythread.join(); The answer is thousands of nanoseconds. Importantly, that is the time as measured by the main thread. That is, sending the query, and getting back the result, takes Continue reading Reusing a thread in C++ for better performance MEMORY ACCESS ON THE APPLE M1 PROCESSOR When a program is mostly just accessing memory randomly, a standard cost model is to count the number of distinct random accesses. The general idea is that memory access is much slower than most other computational tasks. Furthermore, the cost model can be extended to count “nearby” memory accesses as free. That is, if I Continue reading Memory access on the Apple M1 processor COUNTING CYCLES AND INSTRUCTIONS ON THE APPLE M1 PROCESSOR When benchmarking software, we often start by measuring the time elapsed. If you are benchmarking data bandwidth or latency, it is the right measure. However, if you are benchmarking computational tasks where you avoid disk and network accesses and where you only access a few pages of memory, then the time elapsed is often not ideal Continue reading Counting cycles and ADDING A (PREDICTABLE) BRANCH TO EXISTING CODE CAN Software is full of “branches”. They often take the form of if-then clauses in code. Modern processors try to predict the result of branches often long before evaluating them. Hard-to-predict branches are a challenge performance-wise because when a processor fails to predict correctly a branch, it does useless work that must be thrown away. A Continue reading Adding a (predictable GET INTO PAY SITES FOR FREE AS A GOOGLEBOT This trick is very clever: many sites limiting access to documents, let Googlebot (Google’s spidering agent) through. I think this is the case with some IEEE archives. So, you can simply tell your browser to identify yourself as the user agent “Googlebot/2.1”. Voilà ! You can go where Google can go. What a beautiful hack! With Continue reading Get into pay sites for free as a Googlebot EVEN FASTER BITMAP DECODING Bitmaps are a simple data structure used to represent sets of integers. For example, you can represent all sets of integers in [0,64) using a single 64-bit integer. When they are applicable, bitmaps are very efficient compared to the alternatives (e.g., a hash set). Unfortunately, extracting the bit sets in a bitmap can be expensive. Continue reading Even faster bitmap decoding QUICKLY PRUNING ELEMENTS IN SIMD VECTORS USING THE Modern processors have powerful vector instructions. However, some algorithms are tricky to implement using vector instructions. I often need to prune selected values from a vector. On x64 processors, we can achieve this result using table lookups and an efficient shuffle instruction. Building up the table each time gets tiring, however. Let us consider two Continue reading Quickly pruning PER-CORE FREQUENCY SCALING AND AVX-512: AN EXPERIMENT Intel has fancy new instructions (AVX-512) that are powerful, in part for heavy numerical work. When a core uses these heaviest of these new instructions, the core’s frequency comes down to maintain the power usage within bounds. I wanted to test it out so I wrote a little threaded program. It runs on four threads Continue reading Per-core frequency scaling and AVX-512: an experiment ON HUMAN INTELLIGENCE… A PERSPECTIVE FROM COMPUTER SCIENCE Whenever I read social scientists, there is often, implicit in the background, the concept of “intelligence” as a well defined quantity. I have some amount of intelligence. Maybe you have a bit more. But what does computer science have to say about any of this? That is, what is this “intelligence” we are talking about? Continue reading On human intelligence a perspective from SOME USEFUL REGULAR EXPRESSIONS FOR In my blog post, My programming setup, I stressed how important regular expressions are to my programming activities. Regular expressions can look intimidating and outright ugly. However, they should not be underestimated. Someone asked for examples of regular expressions that I rely upon. Here a few. It is commonly considered a faux pas to include Continue reading Some useful regular REUSING A THREAD IN C++ FOR BETTER In a previous post, I measured the time necessary to start a thread, execute a small job and return. auto mythread = std::thread( { counter++; }); mythread.join(); The answer is thousands of nanoseconds. Importantly, that is the time as measured by the main thread. That is, sending the query, and getting back the result, takes Continue reading Reusing a thread in C++ for better performance MEMORY ACCESS ON THE APPLE M1 PROCESSOR When a program is mostly just accessing memory randomly, a standard cost model is to count the number of distinct random accesses. The general idea is that memory access is much slower than most other computational tasks. Furthermore, the cost model can be extended to count “nearby” memory accesses as free. That is, if I Continue reading Memory access on the Apple M1 processor COUNTING CYCLES AND INSTRUCTIONS ON THE APPLE M1 PROCESSOR When benchmarking software, we often start by measuring the time elapsed. If you are benchmarking data bandwidth or latency, it is the right measure. However, if you are benchmarking computational tasks where you avoid disk and network accesses and where you only access a few pages of memory, then the time elapsed is often not ideal Continue reading Counting cycles and ADDING A (PREDICTABLE) BRANCH TO EXISTING CODE CAN Software is full of “branches”. They often take the form of if-then clauses in code. Modern processors try to predict the result of branches often long before evaluating them. Hard-to-predict branches are a challenge performance-wise because when a processor fails to predict correctly a branch, it does useless work that must be thrown away. A Continue reading Adding a (predictable GET INTO PAY SITES FOR FREE AS A GOOGLEBOT This trick is very clever: many sites limiting access to documents, let Googlebot (Google’s spidering agent) through. I think this is the case with some IEEE archives. So, you can simply tell your browser to identify yourself as the user agent “Googlebot/2.1”. Voilà ! You can go where Google can go. What a beautiful hack! With Continue reading Get into pay sites for free as a Googlebot EVEN FASTER BITMAP DECODING Bitmaps are a simple data structure used to represent sets of integers. For example, you can represent all sets of integers in [0,64) using a single 64-bit integer. When they are applicable, bitmaps are very efficient compared to the alternatives (e.g., a hash set). Unfortunately, extracting the bit sets in a bitmap can be expensive. Continue reading Even faster bitmap decoding QUICKLY PRUNING ELEMENTS IN SIMD VECTORS USING THE Modern processors have powerful vector instructions. However, some algorithms are tricky to implement using vector instructions. I often need to prune selected values from a vector. On x64 processors, we can achieve this result using table lookups and an efficient shuffle instruction. Building up the table each time gets tiring, however. Let us consider two Continue reading Quickly pruning PER-CORE FREQUENCY SCALING AND AVX-512: AN EXPERIMENT Intel has fancy new instructions (AVX-512) that are powerful, in part for heavy numerical work. When a core uses these heaviest of these new instructions, the core’s frequency comes down to maintain the power usage within bounds. I wanted to test it out so I wrote a little threaded program. It runs on four threads Continue reading Per-core frequency scaling and AVX-512: an experiment ON HUMAN INTELLIGENCE… A PERSPECTIVE FROM COMPUTER SCIENCE Whenever I read social scientists, there is often, implicit in the background, the concept of “intelligence” as a well defined quantity. I have some amount of intelligence. Maybe you have a bit more. But what does computer science have to say about any of this? That is, what is this “intelligence” we are talking about? Continue reading On human intelligence a perspective from JUNE 2021 – DANIEL LEMIRE'S BLOG I my previous blog post, I documented how one might proceed to compute the number of digits of an integer quickly. E.g., given the integer 999, you want 3 but given the integer 1000, you want 4. It is effectively the integer logarithm in base 10. On computers, you can quickly compute the integer logarithm Continue reading Computing the number of digits of an integer even faster MEMORY ACCESS ON THE APPLE M1 PROCESSOR When a program is mostly just accessing memory randomly, a standard cost model is to count the number of distinct random accesses. The general idea is that memory access is much slower than most other computational tasks. Furthermore, the cost model can be extended to count “nearby” memory accesses as free. That is, if I Continue reading Memory access on the Apple M1 processor APPLE’S M1 PROCESSOR AND THE FULL 128-BIT INTEGER PRODUCT If I multiply two 64-bit integers (having values in [0, 264)), the product requires 128 bits. Intel and AMD processors (x64) can compute the full (128-bit) product of two 64-bit integers using a single instruction (mul). ARM processors, such as those found in your mobile phone, require two instructions to achieve the same result: mul computes Continue reading Apple’s M1 processor and the FAST FLOAT PARSING IN PRACTICE In our work parsing JSON documents as quickly as possible, we found that one of the most challenging problem is to parse numbers. That is, you want to take the string “1.3553e142” and convert it quickly to a double-precision floating-point number. You can use the strtod function from the standard C/C++ library, but it is Continue reading Fast float parsing in practice EVEN FASTER BITMAP DECODING Bitmaps are a simple data structure used to represent sets of integers. For example, you can represent all sets of integers in [0,64) using a single 64-bit integer. When they are applicable, bitmaps are very efficient compared to the alternatives (e.g., a hash set). Unfortunately, extracting the bit sets in a bitmap can be expensive. Continue reading Even faster bitmap decoding SETTING UP A ROCKPRO64 (POWERFUL SINGLE-CARD COMPUTER A few months ago, I ordered ROCKPro64. If you are familiar with the Raspberry Pi, then it is a bit of the same an inexpensive computer that comes in the form of a single card. The ROCKPro64 differs from the Raspberry Pi in that it is much closer in power to a normal PC. You Continue reading Setting up a ROCKPro64 (powerful single-cardcomputer)
FASTEST WAY TO COMPUTE THE GREATEST COMMON DIVISOR Given two positive integers x and y, the greatest common divisor (GCD) z is the largest number that divides both x and y. For example, given 64 and 32, the greatest common divisor is 32. There is a fast technique to compute the GCD called the binary GCD algorithm or Stein’s algorithm. According to Wikipedia, Continue reading Fastest way to compute the greatest common divisor REVISITING VERNOR VINGE’S “PREDICTIONS” FOR 2025 Vernor Vinge is a retired mathematics professor who became famous through his science-fiction novels. He is also famous as being one of the first to contemplate the idea of a “technological singularity“. There is debate as to what the technological singularity, but the general idea goes as follows. At some point in the near future Continue reading Revisiting Vernor Vinge’s MEMORY-LEVEL PARALLELISM: INTEL SKYLAKE VERSUS INTEL All programmers know about multicore parallelism: your CPU is made of several nearly independent processors (called cores) that can run instructions in parallel. However, our processors are parallel in many different ways. I am interested in a particular form of parallelism called “memory-level parallelism” where the same processor can issue several memory requests. THE AVERAGE OF AVERAGES IS NOT THE AVERAGE A fact that we teach in our OLAP class is that you can’t take the average of averages and hope it will match the average. This is a common enough mistake for people working with databases and doing number crunching. It is only true if all of the averages are computed over sets having the Continue reading The average of averages is notthe average
SOME USEFUL REGULAR EXPRESSIONS FOR In my blog post, My programming setup, I stressed how important regular expressions are to my programming activities. Regular expressions can look intimidating and outright ugly. However, they should not be underestimated. Someone asked for examples of regular expressions that I rely upon. Here a few. It is commonly considered a faux pas to include Continue reading Some useful regular DANIEL LEMIRE'S BLOG Daniel Lemire is a computer science professor at the University of Quebec (TELUQ) in Montreal. His research is focused on software performance and data engineering. He is a techno-optimist. MEMORY ACCESS ON THE APPLE M1 PROCESSOR When a program is mostly just accessing memory randomly, a standard cost model is to count the number of distinct random accesses. The general idea is that memory access is much slower than most other computational tasks. Furthermore, the cost model can be extended to count “nearby” memory accesses as free. That is, if I Continue reading Memory access on the Apple M1 processor FAST FLOAT PARSING IN PRACTICE In our work parsing JSON documents as quickly as possible, we found that one of the most challenging problem is to parse numbers. That is, you want to take the string “1.3553e142” and convert it quickly to a double-precision floating-point number. You can use the strtod function from the standard C/C++ library, but it is Continue reading Fast float parsing in practice INSTRUCTIONS PER CYCLE: AMD ZEN 2 VERSUS INTEL The performance of a processor is determined by several factors. For example, processors with a higher frequency tend to do more work per unit of time. Physics makes it difficult to produce processors that have higher frequency. Modern processors can execute many instructions per cycle. Thus a 3.4GHz processor has 3.4 billion cycles per second, Continue reading Instructions per cycle: COUNTING CYCLES AND INSTRUCTIONS ON THE APPLE M1 PROCESSOR Counting cycles and instructions on the Apple M1 processor. When benchmarking software, we often start by measuring the time elapsed. If you are benchmarking data bandwidth or latency, it is the right measure. However, if you are benchmarking computational tasks where you avoid disk and network accesses and where you only access a fewpages of
APPLE’S M1 PROCESSOR AND THE FULL 128-BIT INTEGER PRODUCT If I multiply two 64-bit integers (having values in [0, 264)), the product requires 128 bits. Intel and AMD processors (x64) can compute the full (128-bit) product of two 64-bit integers using a single instruction (mul). ARM processors, such as those found in your mobile phone, require two instructions to achieve the same result: mul computes Continue reading Apple’s M1 processor and the HOW FAST CAN YOU ALLOCATE A LARGE BLOCK OF MEMORY IN C++ In C++, the most basic memory allocation code is just a call to the new operator: char *buf = new char; According to a textbook interpretation, we just allocated s bytes1. If you benchmark this line of code, you might find that it almost entirely free on a per-byte basis for large values of s. But Continue reading How fast can you allocate a large block of memory in C++? GET INTO PAY SITES FOR FREE AS A GOOGLEBOT This trick is very clever: many sites limiting access to documents, let Googlebot (Google’s spidering agent) through. I think this is the case with some IEEE archives. So, you can simply tell your browser to identify yourself as the user agent “Googlebot/2.1”. Voilà ! You can go where Google can go. What a beautiful hack! With Continue reading Get into pay sites for free as a Googlebot THE AVERAGE OF AVERAGES IS NOT THE AVERAGE A fact that we teach in our OLAP class is that you can’t take the average of averages and hope it will match the average. This is a common enough mistake for people working with databases and doing number crunching. It is only true if all of the averages are computed over sets having the Continue reading The average of averages is notthe average
SOME USEFUL REGULAR EXPRESSIONS FOR In my blog post, My programming setup, I stressed how important regular expressions are to my programming activities. Regular expressions can look intimidating and outright ugly. However, they should not be underestimated. Someone asked for examples of regular expressions that I rely upon. Here a few. It is commonly considered a faux pas to include Continue reading Some useful regular DANIEL LEMIRE'S BLOG Daniel Lemire is a computer science professor at the University of Quebec (TELUQ) in Montreal. His research is focused on software performance and data engineering. He is a techno-optimist. MEMORY ACCESS ON THE APPLE M1 PROCESSOR When a program is mostly just accessing memory randomly, a standard cost model is to count the number of distinct random accesses. The general idea is that memory access is much slower than most other computational tasks. Furthermore, the cost model can be extended to count “nearby” memory accesses as free. That is, if I Continue reading Memory access on the Apple M1 processor FAST FLOAT PARSING IN PRACTICE In our work parsing JSON documents as quickly as possible, we found that one of the most challenging problem is to parse numbers. That is, you want to take the string “1.3553e142” and convert it quickly to a double-precision floating-point number. You can use the strtod function from the standard C/C++ library, but it is Continue reading Fast float parsing in practice COUNTING CYCLES AND INSTRUCTIONS ON THE APPLE M1 PROCESSOR Counting cycles and instructions on the Apple M1 processor. When benchmarking software, we often start by measuring the time elapsed. If you are benchmarking data bandwidth or latency, it is the right measure. However, if you are benchmarking computational tasks where you avoid disk and network accesses and where you only access a fewpages of
INSTRUCTIONS PER CYCLE: AMD ZEN 2 VERSUS INTEL The performance of a processor is determined by several factors. For example, processors with a higher frequency tend to do more work per unit of time. Physics makes it difficult to produce processors that have higher frequency. Modern processors can execute many instructions per cycle. Thus a 3.4GHz processor has 3.4 billion cycles per second, Continue reading Instructions per cycle: APPLE’S M1 PROCESSOR AND THE FULL 128-BIT INTEGER PRODUCT If I multiply two 64-bit integers (having values in [0, 264)), the product requires 128 bits. Intel and AMD processors (x64) can compute the full (128-bit) product of two 64-bit integers using a single instruction (mul). ARM processors, such as those found in your mobile phone, require two instructions to achieve the same result: mul computes Continue reading Apple’s M1 processor and the HOW FAST CAN YOU ALLOCATE A LARGE BLOCK OF MEMORY IN C++ In C++, the most basic memory allocation code is just a call to the new operator: char *buf = new char; According to a textbook interpretation, we just allocated s bytes1. If you benchmark this line of code, you might find that it almost entirely free on a per-byte basis for large values of s. But Continue reading How fast can you allocate a large block of memory in C++? GET INTO PAY SITES FOR FREE AS A GOOGLEBOT This trick is very clever: many sites limiting access to documents, let Googlebot (Google’s spidering agent) through. I think this is the case with some IEEE archives. So, you can simply tell your browser to identify yourself as the user agent “Googlebot/2.1”. Voilà ! You can go where Google can go. What a beautiful hack! With Continue reading Get into pay sites for free as a Googlebot THE AVERAGE OF AVERAGES IS NOT THE AVERAGE A fact that we teach in our OLAP class is that you can’t take the average of averages and hope it will match the average. This is a common enough mistake for people working with databases and doing number crunching. It is only true if all of the averages are computed over sets having the Continue reading The average of averages is notthe average
DANIEL LEMIRE'S BLOG Daniel Lemire is a computer science professor at the University of Quebec (TELUQ) in Montreal. His research is focused on software performance and data engineering. He is a techno-optimist. SCIENCE IS THE BELIEF IN THE IGNORANCE OF EXPERTS Science is the belief in the ignorance of experts said Richard Feynman. Feynman had a Nobel prize in physics. He was a remarquable educator: his lecture notes are still popular. He foresaw nanotechnology and quantum computing. He is credited with identifying the cause of the Space Shuttle Challenger disaster. There is abeautiful talk by
APPLE’S M1 PROCESSOR AND THE FULL 128-BIT INTEGER PRODUCT If I multiply two 64-bit integers (having values in [0, 264)), the product requires 128 bits. Intel and AMD processors (x64) can compute the full (128-bit) product of two 64-bit integers using a single instruction (mul). ARM processors, such as those found in your mobile phone, require two instructions to achieve the same result: mul computes Continue reading Apple’s M1 processor and the COUNTING THE NUMBER OF MATCHING CHARACTERS IN TWO ASCII Suppose that you give me two ASCII strings having the same number of characters. I wish to compute efficiently the number of matching characters (same position, same character). E.g., the strings ‘012c’ and ‘021c’ have two matching characters (‘0’ and ‘c’). The conventional approach in C would look as follow: uint64_t standard_matching_bytes(char * c1, char THE AVERAGE OF AVERAGES IS NOT THE AVERAGE A fact that we teach in our OLAP class is that you can’t take the average of averages and hope it will match the average. This is a common enough mistake for people working with databases and doing number crunching. It is only true if all of the averages are computed over sets having the Continue reading The average of averages is notthe average
AVX-512: WHEN AND HOW TO USE THESE NEW INSTRUCTIONS Our processors typically do computations using small data stores called registers. On 64-bit processors, 64-bit registers are frequently used. Most modern processors also have vector instructions and these instructions operate on larger registers (128-bit, 256-bit, or even 512-bit). Intel’s new processors have AVX-512 instructions. These instructions are capable of operating on large 512-bit BIG-O NOTATION AND REAL-WORLD PERFORMANCE Classical Newtonian mechanics is always mathematically consistent. However, Newtonian mechanics assumes that bodies move without friction and that we stay far from the speed of light. When your car is stuck in the mud or you are running an intergalactic spaceship, frictionless Newtonian mechanics is the wrong model even though it remains mathematically consistent. “CRACKING” RANDOM NUMBER GENERATORS (XOROSHIRO128 In software, we generate random numbers by calling a function called a “random number generator”. Such functions have hidden states, so that repeated calls to the function generate new numbers that appear random. If you know this state, you can predict all future outcomes of the random number generators. O’Neill, a professor at Harvey Mudd Continue reading “Cracking” random DO NOT WASTE TIME WITH STL VECTORS I spend a lot of time with the C++ Standard Template Library. It is available on diverse platforms, it is fast and it is (relatively) easy to learn. It has been perhaps too conservative at times: we only recently got a standard hash table data structure (with C++11). However, using STL is orders of magnitude Continue reading Do not waste time with STL vectors WAS LIFE BETTER IN THE 1970S? People from my generation often complain that their parents were better off. They are often quick to dismiss the Internet and smart phones as irrelevant to their well-being. Were they better off? Though it has recently peaked, the number of cars per person is higher than it was in the seventies. Current cars are much Continue reading Was life better in the 1970s? SOME USEFUL REGULAR EXPRESSIONS FOR In my blog post, My programming setup, I stressed how important regular expressions are to my programming activities. Regular expressions can look intimidating and outright ugly. However, they should not be underestimated. Someone asked for examples of regular expressions that I rely upon. Here a few. It is commonly considered a faux pas to include Continue reading Some useful regular DANIEL LEMIRE'S BLOG Daniel Lemire is a computer science professor at the University of Quebec (TELUQ) in Montreal. His research is focused on software performance and data engineering. He is a techno-optimist. MEMORY ACCESS ON THE APPLE M1 PROCESSOR When a program is mostly just accessing memory randomly, a standard cost model is to count the number of distinct random accesses. The general idea is that memory access is much slower than most other computational tasks. Furthermore, the cost model can be extended to count “nearby” memory accesses as free. That is, if I Continue reading Memory access on the Apple M1 processor FAST FLOAT PARSING IN PRACTICE In our work parsing JSON documents as quickly as possible, we found that one of the most challenging problem is to parse numbers. That is, you want to take the string “1.3553e142” and convert it quickly to a double-precision floating-point number. You can use the strtod function from the standard C/C++ library, but it is Continue reading Fast float parsing in practice COUNTING CYCLES AND INSTRUCTIONS ON THE APPLE M1 PROCESSOR Counting cycles and instructions on the Apple M1 processor. When benchmarking software, we often start by measuring the time elapsed. If you are benchmarking data bandwidth or latency, it is the right measure. However, if you are benchmarking computational tasks where you avoid disk and network accesses and where you only access a fewpages of
INSTRUCTIONS PER CYCLE: AMD ZEN 2 VERSUS INTEL The performance of a processor is determined by several factors. For example, processors with a higher frequency tend to do more work per unit of time. Physics makes it difficult to produce processors that have higher frequency. Modern processors can execute many instructions per cycle. Thus a 3.4GHz processor has 3.4 billion cycles per second, Continue reading Instructions per cycle: APPLE’S M1 PROCESSOR AND THE FULL 128-BIT INTEGER PRODUCT If I multiply two 64-bit integers (having values in [0, 264)), the product requires 128 bits. Intel and AMD processors (x64) can compute the full (128-bit) product of two 64-bit integers using a single instruction (mul). ARM processors, such as those found in your mobile phone, require two instructions to achieve the same result: mul computes Continue reading Apple’s M1 processor and the HOW FAST CAN YOU ALLOCATE A LARGE BLOCK OF MEMORY IN C++ In C++, the most basic memory allocation code is just a call to the new operator: char *buf = new char; According to a textbook interpretation, we just allocated s bytes1. If you benchmark this line of code, you might find that it almost entirely free on a per-byte basis for large values of s. But Continue reading How fast can you allocate a large block of memory in C++? GET INTO PAY SITES FOR FREE AS A GOOGLEBOT This trick is very clever: many sites limiting access to documents, let Googlebot (Google’s spidering agent) through. I think this is the case with some IEEE archives. So, you can simply tell your browser to identify yourself as the user agent “Googlebot/2.1”. Voilà ! You can go where Google can go. What a beautiful hack! With Continue reading Get into pay sites for free as a Googlebot THE AVERAGE OF AVERAGES IS NOT THE AVERAGE A fact that we teach in our OLAP class is that you can’t take the average of averages and hope it will match the average. This is a common enough mistake for people working with databases and doing number crunching. It is only true if all of the averages are computed over sets having the Continue reading The average of averages is notthe average
SOME USEFUL REGULAR EXPRESSIONS FOR In my blog post, My programming setup, I stressed how important regular expressions are to my programming activities. Regular expressions can look intimidating and outright ugly. However, they should not be underestimated. Someone asked for examples of regular expressions that I rely upon. Here a few. It is commonly considered a faux pas to include Continue reading Some useful regular DANIEL LEMIRE'S BLOG Daniel Lemire is a computer science professor at the University of Quebec (TELUQ) in Montreal. His research is focused on software performance and data engineering. He is a techno-optimist. MEMORY ACCESS ON THE APPLE M1 PROCESSOR When a program is mostly just accessing memory randomly, a standard cost model is to count the number of distinct random accesses. The general idea is that memory access is much slower than most other computational tasks. Furthermore, the cost model can be extended to count “nearby” memory accesses as free. That is, if I Continue reading Memory access on the Apple M1 processor FAST FLOAT PARSING IN PRACTICE In our work parsing JSON documents as quickly as possible, we found that one of the most challenging problem is to parse numbers. That is, you want to take the string “1.3553e142” and convert it quickly to a double-precision floating-point number. You can use the strtod function from the standard C/C++ library, but it is Continue reading Fast float parsing in practice INSTRUCTIONS PER CYCLE: AMD ZEN 2 VERSUS INTEL The performance of a processor is determined by several factors. For example, processors with a higher frequency tend to do more work per unit of time. Physics makes it difficult to produce processors that have higher frequency. Modern processors can execute many instructions per cycle. Thus a 3.4GHz processor has 3.4 billion cycles per second, Continue reading Instructions per cycle: COUNTING CYCLES AND INSTRUCTIONS ON THE APPLE M1 PROCESSOR Counting cycles and instructions on the Apple M1 processor. When benchmarking software, we often start by measuring the time elapsed. If you are benchmarking data bandwidth or latency, it is the right measure. However, if you are benchmarking computational tasks where you avoid disk and network accesses and where you only access a fewpages of
APPLE’S M1 PROCESSOR AND THE FULL 128-BIT INTEGER PRODUCT If I multiply two 64-bit integers (having values in [0, 264)), the product requires 128 bits. Intel and AMD processors (x64) can compute the full (128-bit) product of two 64-bit integers using a single instruction (mul). ARM processors, such as those found in your mobile phone, require two instructions to achieve the same result: mul computes Continue reading Apple’s M1 processor and the HOW FAST CAN YOU ALLOCATE A LARGE BLOCK OF MEMORY IN C++ In C++, the most basic memory allocation code is just a call to the new operator: char *buf = new char; According to a textbook interpretation, we just allocated s bytes1. If you benchmark this line of code, you might find that it almost entirely free on a per-byte basis for large values of s. But Continue reading How fast can you allocate a large block of memory in C++? GET INTO PAY SITES FOR FREE AS A GOOGLEBOT This trick is very clever: many sites limiting access to documents, let Googlebot (Google’s spidering agent) through. I think this is the case with some IEEE archives. So, you can simply tell your browser to identify yourself as the user agent “Googlebot/2.1”. Voilà ! You can go where Google can go. What a beautiful hack! With Continue reading Get into pay sites for free as a Googlebot THE AVERAGE OF AVERAGES IS NOT THE AVERAGE A fact that we teach in our OLAP class is that you can’t take the average of averages and hope it will match the average. This is a common enough mistake for people working with databases and doing number crunching. It is only true if all of the averages are computed over sets having the Continue reading The average of averages is notthe average
DANIEL LEMIRE'S BLOG Daniel Lemire is a computer science professor at the University of Quebec (TELUQ) in Montreal. His research is focused on software performance and data engineering. He is a techno-optimist. SCIENCE IS THE BELIEF IN THE IGNORANCE OF EXPERTS Science is the belief in the ignorance of experts said Richard Feynman. Feynman had a Nobel prize in physics. He was a remarquable educator: his lecture notes are still popular. He foresaw nanotechnology and quantum computing. He is credited with identifying the cause of the Space Shuttle Challenger disaster. There is abeautiful talk by
APPLE’S M1 PROCESSOR AND THE FULL 128-BIT INTEGER PRODUCT If I multiply two 64-bit integers (having values in [0, 264)), the product requires 128 bits. Intel and AMD processors (x64) can compute the full (128-bit) product of two 64-bit integers using a single instruction (mul). ARM processors, such as those found in your mobile phone, require two instructions to achieve the same result: mul computes Continue reading Apple’s M1 processor and the COUNTING THE NUMBER OF MATCHING CHARACTERS IN TWO ASCII Suppose that you give me two ASCII strings having the same number of characters. I wish to compute efficiently the number of matching characters (same position, same character). E.g., the strings ‘012c’ and ‘021c’ have two matching characters (‘0’ and ‘c’). The conventional approach in C would look as follow: uint64_t standard_matching_bytes(char * c1, char AVX-512: WHEN AND HOW TO USE THESE NEW INSTRUCTIONS Our processors typically do computations using small data stores called registers. On 64-bit processors, 64-bit registers are frequently used. Most modern processors also have vector instructions and these instructions operate on larger registers (128-bit, 256-bit, or even 512-bit). Intel’s new processors have AVX-512 instructions. These instructions are capable of operating on large 512-bit THE AVERAGE OF AVERAGES IS NOT THE AVERAGE A fact that we teach in our OLAP class is that you can’t take the average of averages and hope it will match the average. This is a common enough mistake for people working with databases and doing number crunching. It is only true if all of the averages are computed over sets having the Continue reading The average of averages is notthe average
BIG-O NOTATION AND REAL-WORLD PERFORMANCE Classical Newtonian mechanics is always mathematically consistent. However, Newtonian mechanics assumes that bodies move without friction and that we stay far from the speed of light. When your car is stuck in the mud or you are running an intergalactic spaceship, frictionless Newtonian mechanics is the wrong model even though it remains mathematically consistent. “CRACKING” RANDOM NUMBER GENERATORS (XOROSHIRO128 In software, we generate random numbers by calling a function called a “random number generator”. Such functions have hidden states, so that repeated calls to the function generate new numbers that appear random. If you know this state, you can predict all future outcomes of the random number generators. O’Neill, a professor at Harvey Mudd Continue reading “Cracking” random DO NOT WASTE TIME WITH STL VECTORS I spend a lot of time with the C++ Standard Template Library. It is available on diverse platforms, it is fast and it is (relatively) easy to learn. It has been perhaps too conservative at times: we only recently got a standard hash table data structure (with C++11). However, using STL is orders of magnitude Continue reading Do not waste time with STL vectors WAS LIFE BETTER IN THE 1970S? People from my generation often complain that their parents were better off. They are often quick to dismiss the Internet and smart phones as irrelevant to their well-being. Were they better off? Though it has recently peaked, the number of cars per person is higher than it was in the seventies. Current cars are much Continue reading Was life better in the 1970s? SOME USEFUL REGULAR EXPRESSIONS FOR In my blog post, My programming setup, I stressed how important regular expressions are to my programming activities. Regular expressions can look intimidating and outright ugly. However, they should not be underestimated. Someone asked for examples of regular expressions that I rely upon. Here a few. It is commonly considered a faux pas to include Continue reading Some useful regular DANIEL LEMIRE'S BLOG Daniel Lemire is a computer science professor at the University of Quebec (TELUQ) in Montreal. His research is focused on software performance and data engineering. He is a techno-optimist. MEMORY ACCESS ON THE APPLE M1 PROCESSOR When a program is mostly just accessing memory randomly, a standard cost model is to count the number of distinct random accesses. The general idea is that memory access is much slower than most other computational tasks. Furthermore, the cost model can be extended to count “nearby” memory accesses as free. That is, if I Continue reading Memory access on the Apple M1 processor FAST FLOAT PARSING IN PRACTICE In our work parsing JSON documents as quickly as possible, we found that one of the most challenging problem is to parse numbers. That is, you want to take the string “1.3553e142” and convert it quickly to a double-precision floating-point number. You can use the strtod function from the standard C/C++ library, but it is Continue reading Fast float parsing in practice INSTRUCTIONS PER CYCLE: AMD ZEN 2 VERSUS INTEL The performance of a processor is determined by several factors. For example, processors with a higher frequency tend to do more work per unit of time. Physics makes it difficult to produce processors that have higher frequency. Modern processors can execute many instructions per cycle. Thus a 3.4GHz processor has 3.4 billion cycles per second, Continue reading Instructions per cycle: COUNTING CYCLES AND INSTRUCTIONS ON THE APPLE M1 PROCESSOR Counting cycles and instructions on the Apple M1 processor. When benchmarking software, we often start by measuring the time elapsed. If you are benchmarking data bandwidth or latency, it is the right measure. However, if you are benchmarking computational tasks where you avoid disk and network accesses and where you only access a fewpages of
APPLE’S M1 PROCESSOR AND THE FULL 128-BIT INTEGER PRODUCT If I multiply two 64-bit integers (having values in [0, 264)), the product requires 128 bits. Intel and AMD processors (x64) can compute the full (128-bit) product of two 64-bit integers using a single instruction (mul). ARM processors, such as those found in your mobile phone, require two instructions to achieve the same result: mul computes Continue reading Apple’s M1 processor and the HOW FAST CAN YOU ALLOCATE A LARGE BLOCK OF MEMORY IN C++ In C++, the most basic memory allocation code is just a call to the new operator: char *buf = new char; According to a textbook interpretation, we just allocated s bytes1. If you benchmark this line of code, you might find that it almost entirely free on a per-byte basis for large values of s. But Continue reading How fast can you allocate a large block of memory in C++? GET INTO PAY SITES FOR FREE AS A GOOGLEBOT This trick is very clever: many sites limiting access to documents, let Googlebot (Google’s spidering agent) through. I think this is the case with some IEEE archives. So, you can simply tell your browser to identify yourself as the user agent “Googlebot/2.1”. Voilà ! You can go where Google can go. What a beautiful hack! With Continue reading Get into pay sites for free as a Googlebot THE AVERAGE OF AVERAGES IS NOT THE AVERAGE A fact that we teach in our OLAP class is that you can’t take the average of averages and hope it will match the average. This is a common enough mistake for people working with databases and doing number crunching. It is only true if all of the averages are computed over sets having the Continue reading The average of averages is notthe average
SOME USEFUL REGULAR EXPRESSIONS FOR In my blog post, My programming setup, I stressed how important regular expressions are to my programming activities. Regular expressions can look intimidating and outright ugly. However, they should not be underestimated. Someone asked for examples of regular expressions that I rely upon. Here a few. It is commonly considered a faux pas to include Continue reading Some useful regular DANIEL LEMIRE'S BLOG Daniel Lemire is a computer science professor at the University of Quebec (TELUQ) in Montreal. His research is focused on software performance and data engineering. He is a techno-optimist. MEMORY ACCESS ON THE APPLE M1 PROCESSOR When a program is mostly just accessing memory randomly, a standard cost model is to count the number of distinct random accesses. The general idea is that memory access is much slower than most other computational tasks. Furthermore, the cost model can be extended to count “nearby” memory accesses as free. That is, if I Continue reading Memory access on the Apple M1 processor FAST FLOAT PARSING IN PRACTICE In our work parsing JSON documents as quickly as possible, we found that one of the most challenging problem is to parse numbers. That is, you want to take the string “1.3553e142” and convert it quickly to a double-precision floating-point number. You can use the strtod function from the standard C/C++ library, but it is Continue reading Fast float parsing in practice INSTRUCTIONS PER CYCLE: AMD ZEN 2 VERSUS INTEL The performance of a processor is determined by several factors. For example, processors with a higher frequency tend to do more work per unit of time. Physics makes it difficult to produce processors that have higher frequency. Modern processors can execute many instructions per cycle. Thus a 3.4GHz processor has 3.4 billion cycles per second, Continue reading Instructions per cycle: COUNTING CYCLES AND INSTRUCTIONS ON THE APPLE M1 PROCESSOR Counting cycles and instructions on the Apple M1 processor. When benchmarking software, we often start by measuring the time elapsed. If you are benchmarking data bandwidth or latency, it is the right measure. However, if you are benchmarking computational tasks where you avoid disk and network accesses and where you only access a fewpages of
APPLE’S M1 PROCESSOR AND THE FULL 128-BIT INTEGER PRODUCT If I multiply two 64-bit integers (having values in [0, 264)), the product requires 128 bits. Intel and AMD processors (x64) can compute the full (128-bit) product of two 64-bit integers using a single instruction (mul). ARM processors, such as those found in your mobile phone, require two instructions to achieve the same result: mul computes Continue reading Apple’s M1 processor and the HOW FAST CAN YOU ALLOCATE A LARGE BLOCK OF MEMORY IN C++ In C++, the most basic memory allocation code is just a call to the new operator: char *buf = new char; According to a textbook interpretation, we just allocated s bytes1. If you benchmark this line of code, you might find that it almost entirely free on a per-byte basis for large values of s. But Continue reading How fast can you allocate a large block of memory in C++? GET INTO PAY SITES FOR FREE AS A GOOGLEBOT This trick is very clever: many sites limiting access to documents, let Googlebot (Google’s spidering agent) through. I think this is the case with some IEEE archives. So, you can simply tell your browser to identify yourself as the user agent “Googlebot/2.1”. Voilà ! You can go where Google can go. What a beautiful hack! With Continue reading Get into pay sites for free as a Googlebot THE AVERAGE OF AVERAGES IS NOT THE AVERAGE A fact that we teach in our OLAP class is that you can’t take the average of averages and hope it will match the average. This is a common enough mistake for people working with databases and doing number crunching. It is only true if all of the averages are computed over sets having the Continue reading The average of averages is notthe average
DANIEL LEMIRE'S BLOG Daniel Lemire is a computer science professor at the University of Quebec (TELUQ) in Montreal. His research is focused on software performance and data engineering. He is a techno-optimist. SCIENCE IS THE BELIEF IN THE IGNORANCE OF EXPERTS Science is the belief in the ignorance of experts said Richard Feynman. Feynman had a Nobel prize in physics. He was a remarquable educator: his lecture notes are still popular. He foresaw nanotechnology and quantum computing. He is credited with identifying the cause of the Space Shuttle Challenger disaster. There is abeautiful talk by
APPLE’S M1 PROCESSOR AND THE FULL 128-BIT INTEGER PRODUCT If I multiply two 64-bit integers (having values in [0, 264)), the product requires 128 bits. Intel and AMD processors (x64) can compute the full (128-bit) product of two 64-bit integers using a single instruction (mul). ARM processors, such as those found in your mobile phone, require two instructions to achieve the same result: mul computes Continue reading Apple’s M1 processor and the COUNTING THE NUMBER OF MATCHING CHARACTERS IN TWO ASCII Suppose that you give me two ASCII strings having the same number of characters. I wish to compute efficiently the number of matching characters (same position, same character). E.g., the strings ‘012c’ and ‘021c’ have two matching characters (‘0’ and ‘c’). The conventional approach in C would look as follow: uint64_t standard_matching_bytes(char * c1, char THE AVERAGE OF AVERAGES IS NOT THE AVERAGE A fact that we teach in our OLAP class is that you can’t take the average of averages and hope it will match the average. This is a common enough mistake for people working with databases and doing number crunching. It is only true if all of the averages are computed over sets having the Continue reading The average of averages is notthe average
AVX-512: WHEN AND HOW TO USE THESE NEW INSTRUCTIONS Our processors typically do computations using small data stores called registers. On 64-bit processors, 64-bit registers are frequently used. Most modern processors also have vector instructions and these instructions operate on larger registers (128-bit, 256-bit, or even 512-bit). Intel’s new processors have AVX-512 instructions. These instructions are capable of operating on large 512-bit BIG-O NOTATION AND REAL-WORLD PERFORMANCE Classical Newtonian mechanics is always mathematically consistent. However, Newtonian mechanics assumes that bodies move without friction and that we stay far from the speed of light. When your car is stuck in the mud or you are running an intergalactic spaceship, frictionless Newtonian mechanics is the wrong model even though it remains mathematically consistent. “CRACKING” RANDOM NUMBER GENERATORS (XOROSHIRO128 In software, we generate random numbers by calling a function called a “random number generator”. Such functions have hidden states, so that repeated calls to the function generate new numbers that appear random. If you know this state, you can predict all future outcomes of the random number generators. O’Neill, a professor at Harvey Mudd Continue reading “Cracking” random DO NOT WASTE TIME WITH STL VECTORS I spend a lot of time with the C++ Standard Template Library. It is available on diverse platforms, it is fast and it is (relatively) easy to learn. It has been perhaps too conservative at times: we only recently got a standard hash table data structure (with C++11). However, using STL is orders of magnitude Continue reading Do not waste time with STL vectors WAS LIFE BETTER IN THE 1970S? People from my generation often complain that their parents were better off. They are often quick to dismiss the Internet and smart phones as irrelevant to their well-being. Were they better off? Though it has recently peaked, the number of cars per person is higher than it was in the seventies. Current cars are much Continue reading Was life better in the 1970s? SOME USEFUL REGULAR EXPRESSIONS FOR In my blog post, My programming setup, I stressed how important regular expressions are to my programming activities. Regular expressions can look intimidating and outright ugly. However, they should not be underestimated. Someone asked for examples of regular expressions that I rely upon. Here a few. It is commonly considered a faux pas to include Continue reading Some useful regular DANIEL LEMIRE'S BLOG Daniel Lemire is a computer science professor at the University of Quebec (TELUQ) in Montreal. His research is focused on software performance and data engineering. He is a techno-optimist. MEMORY ACCESS ON THE APPLE M1 PROCESSOR When a program is mostly just accessing memory randomly, a standard cost model is to count the number of distinct random accesses. The general idea is that memory access is much slower than most other computational tasks. Furthermore, the cost model can be extended to count “nearby” memory accesses as free. That is, if I Continue reading Memory access on the Apple M1 processor FAST FLOAT PARSING IN PRACTICE In our work parsing JSON documents as quickly as possible, we found that one of the most challenging problem is to parse numbers. That is, you want to take the string “1.3553e142” and convert it quickly to a double-precision floating-point number. You can use the strtod function from the standard C/C++ library, but it is Continue reading Fast float parsing in practice INSTRUCTIONS PER CYCLE: AMD ZEN 2 VERSUS INTEL The performance of a processor is determined by several factors. For example, processors with a higher frequency tend to do more work per unit of time. Physics makes it difficult to produce processors that have higher frequency. Modern processors can execute many instructions per cycle. Thus a 3.4GHz processor has 3.4 billion cycles per second, Continue reading Instructions per cycle: COUNTING CYCLES AND INSTRUCTIONS ON THE APPLE M1 PROCESSOR Counting cycles and instructions on the Apple M1 processor. When benchmarking software, we often start by measuring the time elapsed. If you are benchmarking data bandwidth or latency, it is the right measure. However, if you are benchmarking computational tasks where you avoid disk and network accesses and where you only access a fewpages of
APPLE’S M1 PROCESSOR AND THE FULL 128-BIT INTEGER PRODUCT If I multiply two 64-bit integers (having values in [0, 264)), the product requires 128 bits. Intel and AMD processors (x64) can compute the full (128-bit) product of two 64-bit integers using a single instruction (mul). ARM processors, such as those found in your mobile phone, require two instructions to achieve the same result: mul computes Continue reading Apple’s M1 processor and the HOW FAST CAN YOU ALLOCATE A LARGE BLOCK OF MEMORY IN C++ In C++, the most basic memory allocation code is just a call to the new operator: char *buf = new char; According to a textbook interpretation, we just allocated s bytes1. If you benchmark this line of code, you might find that it almost entirely free on a per-byte basis for large values of s. But Continue reading How fast can you allocate a large block of memory in C++? GET INTO PAY SITES FOR FREE AS A GOOGLEBOT This trick is very clever: many sites limiting access to documents, let Googlebot (Google’s spidering agent) through. I think this is the case with some IEEE archives. So, you can simply tell your browser to identify yourself as the user agent “Googlebot/2.1”. Voilà ! You can go where Google can go. What a beautiful hack! With Continue reading Get into pay sites for free as a Googlebot THE AVERAGE OF AVERAGES IS NOT THE AVERAGE A fact that we teach in our OLAP class is that you can’t take the average of averages and hope it will match the average. This is a common enough mistake for people working with databases and doing number crunching. It is only true if all of the averages are computed over sets having the Continue reading The average of averages is notthe average
SOME USEFUL REGULAR EXPRESSIONS FOR In my blog post, My programming setup, I stressed how important regular expressions are to my programming activities. Regular expressions can look intimidating and outright ugly. However, they should not be underestimated. Someone asked for examples of regular expressions that I rely upon. Here a few. It is commonly considered a faux pas to include Continue reading Some useful regular DANIEL LEMIRE'S BLOG Daniel Lemire is a computer science professor at the University of Quebec (TELUQ) in Montreal. His research is focused on software performance and data engineering. He is a techno-optimist. MEMORY ACCESS ON THE APPLE M1 PROCESSOR When a program is mostly just accessing memory randomly, a standard cost model is to count the number of distinct random accesses. The general idea is that memory access is much slower than most other computational tasks. Furthermore, the cost model can be extended to count “nearby” memory accesses as free. That is, if I Continue reading Memory access on the Apple M1 processor FAST FLOAT PARSING IN PRACTICE In our work parsing JSON documents as quickly as possible, we found that one of the most challenging problem is to parse numbers. That is, you want to take the string “1.3553e142” and convert it quickly to a double-precision floating-point number. You can use the strtod function from the standard C/C++ library, but it is Continue reading Fast float parsing in practice COUNTING CYCLES AND INSTRUCTIONS ON THE APPLE M1 PROCESSOR Counting cycles and instructions on the Apple M1 processor. When benchmarking software, we often start by measuring the time elapsed. If you are benchmarking data bandwidth or latency, it is the right measure. However, if you are benchmarking computational tasks where you avoid disk and network accesses and where you only access a fewpages of
INSTRUCTIONS PER CYCLE: AMD ZEN 2 VERSUS INTEL The performance of a processor is determined by several factors. For example, processors with a higher frequency tend to do more work per unit of time. Physics makes it difficult to produce processors that have higher frequency. Modern processors can execute many instructions per cycle. Thus a 3.4GHz processor has 3.4 billion cycles per second, Continue reading Instructions per cycle: APPLE’S M1 PROCESSOR AND THE FULL 128-BIT INTEGER PRODUCT If I multiply two 64-bit integers (having values in [0, 264)), the product requires 128 bits. Intel and AMD processors (x64) can compute the full (128-bit) product of two 64-bit integers using a single instruction (mul). ARM processors, such as those found in your mobile phone, require two instructions to achieve the same result: mul computes Continue reading Apple’s M1 processor and the HOW FAST CAN YOU ALLOCATE A LARGE BLOCK OF MEMORY IN C++ In C++, the most basic memory allocation code is just a call to the new operator: char *buf = new char; According to a textbook interpretation, we just allocated s bytes1. If you benchmark this line of code, you might find that it almost entirely free on a per-byte basis for large values of s. But Continue reading How fast can you allocate a large block of memory in C++? GET INTO PAY SITES FOR FREE AS A GOOGLEBOT This trick is very clever: many sites limiting access to documents, let Googlebot (Google’s spidering agent) through. I think this is the case with some IEEE archives. So, you can simply tell your browser to identify yourself as the user agent “Googlebot/2.1”. Voilà ! You can go where Google can go. What a beautiful hack! With Continue reading Get into pay sites for free as a Googlebot THE AVERAGE OF AVERAGES IS NOT THE AVERAGE A fact that we teach in our OLAP class is that you can’t take the average of averages and hope it will match the average. This is a common enough mistake for people working with databases and doing number crunching. It is only true if all of the averages are computed over sets having the Continue reading The average of averages is notthe average
DANIEL LEMIRE'S BLOG Daniel Lemire is a computer science professor at the University of Quebec (TELUQ) in Montreal. His research is focused on software performance and data engineering. He is a techno-optimist. SCIENCE IS THE BELIEF IN THE IGNORANCE OF EXPERTS Science is the belief in the ignorance of experts said Richard Feynman. Feynman had a Nobel prize in physics. He was a remarquable educator: his lecture notes are still popular. He foresaw nanotechnology and quantum computing. He is credited with identifying the cause of the Space Shuttle Challenger disaster. There is abeautiful talk by
APPLE’S M1 PROCESSOR AND THE FULL 128-BIT INTEGER PRODUCT If I multiply two 64-bit integers (having values in [0, 264)), the product requires 128 bits. Intel and AMD processors (x64) can compute the full (128-bit) product of two 64-bit integers using a single instruction (mul). ARM processors, such as those found in your mobile phone, require two instructions to achieve the same result: mul computes Continue reading Apple’s M1 processor and the COUNTING THE NUMBER OF MATCHING CHARACTERS IN TWO ASCII Suppose that you give me two ASCII strings having the same number of characters. I wish to compute efficiently the number of matching characters (same position, same character). E.g., the strings ‘012c’ and ‘021c’ have two matching characters (‘0’ and ‘c’). The conventional approach in C would look as follow: uint64_t standard_matching_bytes(char * c1, char THE AVERAGE OF AVERAGES IS NOT THE AVERAGE A fact that we teach in our OLAP class is that you can’t take the average of averages and hope it will match the average. This is a common enough mistake for people working with databases and doing number crunching. It is only true if all of the averages are computed over sets having the Continue reading The average of averages is notthe average
AVX-512: WHEN AND HOW TO USE THESE NEW INSTRUCTIONS Our processors typically do computations using small data stores called registers. On 64-bit processors, 64-bit registers are frequently used. Most modern processors also have vector instructions and these instructions operate on larger registers (128-bit, 256-bit, or even 512-bit). Intel’s new processors have AVX-512 instructions. These instructions are capable of operating on large 512-bit BIG-O NOTATION AND REAL-WORLD PERFORMANCE Classical Newtonian mechanics is always mathematically consistent. However, Newtonian mechanics assumes that bodies move without friction and that we stay far from the speed of light. When your car is stuck in the mud or you are running an intergalactic spaceship, frictionless Newtonian mechanics is the wrong model even though it remains mathematically consistent. “CRACKING” RANDOM NUMBER GENERATORS (XOROSHIRO128 In software, we generate random numbers by calling a function called a “random number generator”. Such functions have hidden states, so that repeated calls to the function generate new numbers that appear random. If you know this state, you can predict all future outcomes of the random number generators. O’Neill, a professor at Harvey Mudd Continue reading “Cracking” random DO NOT WASTE TIME WITH STL VECTORS I spend a lot of time with the C++ Standard Template Library. It is available on diverse platforms, it is fast and it is (relatively) easy to learn. It has been perhaps too conservative at times: we only recently got a standard hash table data structure (with C++11). However, using STL is orders of magnitude Continue reading Do not waste time with STL vectors WAS LIFE BETTER IN THE 1970S? People from my generation often complain that their parents were better off. They are often quick to dismiss the Internet and smart phones as irrelevant to their well-being. Were they better off? Though it has recently peaked, the number of cars per person is higher than it was in the seventies. Current cars are much Continue reading Was life better in the 1970s? SOME USEFUL REGULAR EXPRESSIONS FOR In my blog post, My programming setup, I stressed how important regular expressions are to my programming activities. Regular expressions can look intimidating and outright ugly. However, they should not be underestimated. Someone asked for examples of regular expressions that I rely upon. Here a few. It is commonly considered a faux pas to include Continue reading Some useful regular REUSING A THREAD IN C++ FOR BETTER In a previous post, I measured the time necessary to start a thread, execute a small job and return. auto mythread = std::thread( { counter++; }); mythread.join(); The answer is thousands of nanoseconds. Importantly, that is the time as measured by the main thread. That is, sending the query, and getting back the result, takes Continue reading Reusing a thread in C++ for better performance MEMORY ACCESS ON THE APPLE M1 PROCESSOR When a program is mostly just accessing memory randomly, a standard cost model is to count the number of distinct random accesses. The general idea is that memory access is much slower than most other computational tasks. Furthermore, the cost model can be extended to count “nearby” memory accesses as free. That is, if I Continue reading Memory access on the Apple M1 processor COUNTING CYCLES AND INSTRUCTIONS ON THE APPLE M1 PROCESSOR Counting cycles and instructions on the Apple M1 processor. When benchmarking software, we often start by measuring the time elapsed. If you are benchmarking data bandwidth or latency, it is the right measure. However, if you are benchmarking computational tasks where you avoid disk and network accesses and where you only access a fewpages of
INSTRUCTIONS PER CYCLE: AMD ZEN 2 VERSUS INTEL The performance of a processor is determined by several factors. For example, processors with a higher frequency tend to do more work per unit of time. Physics makes it difficult to produce processors that have higher frequency. Modern processors can execute many instructions per cycle. Thus a 3.4GHz processor has 3.4 billion cycles per second, Continue reading Instructions per cycle: ADDING A (PREDICTABLE) BRANCH TO EXISTING CODE CAN Software is full of “branches”. They often take the form of if-then clauses in code. Modern processors try to predict the result of branches often long before evaluating them. Hard-to-predict branches are a challenge performance-wise because when a processor fails to predict correctly a branch, it does useless work that must be thrown away. A Continue reading Adding a (predictable EVEN FASTER BITMAP DECODING Bitmaps are a simple data structure used to represent sets of integers. For example, you can represent all sets of integers in [0,64) using a single 64-bit integer. When they are applicable, bitmaps are very efficient compared to the alternatives (e.g., a hash set). Unfortunately, extracting the bit sets in a bitmap can be expensive. Continue reading Even faster bitmap decoding QUICKLY PRUNING ELEMENTS IN SIMD VECTORS USING THE Modern processors have powerful vector instructions. However, some algorithms are tricky to implement using vector instructions. I often need to prune selected values from a vector. On x64 processors, we can achieve this result using table lookups and an efficient shuffle instruction. Building up the table each time gets tiring, however. Let us consider two Continue reading Quickly pruning ON HUMAN INTELLIGENCE… A PERSPECTIVE FROM COMPUTER SCIENCE On human intelligence a perspective from computer science. Whenever I read social scientists, there is often, implicit in the background, the concept of “intelligence” as a well defined quantity. I have some amount of intelligence. Maybe you have a bit more. But what does computer science have to say about any of this? PER-CORE FREQUENCY SCALING AND AVX-512: AN EXPERIMENT Intel has fancy new instructions (AVX-512) that are powerful, in part for heavy numerical work. When a core uses these heaviest of these new instructions, the core’s frequency comes down to maintain the power usage within bounds. I wanted to test it out so I wrote a little threaded program. It runs on four threads Continue reading Per-core frequency scaling and AVX-512: an experiment SOME USEFUL REGULAR EXPRESSIONS FOR In my blog post, My programming setup, I stressed how important regular expressions are to my programming activities. Regular expressions can look intimidating and outright ugly. However, they should not be underestimated. Someone asked for examples of regular expressions that I rely upon. Here a few. It is commonly considered a faux pas to include Continue reading Some useful regular REUSING A THREAD IN C++ FOR BETTER In a previous post, I measured the time necessary to start a thread, execute a small job and return. auto mythread = std::thread( { counter++; }); mythread.join(); The answer is thousands of nanoseconds. Importantly, that is the time as measured by the main thread. That is, sending the query, and getting back the result, takes Continue reading Reusing a thread in C++ for better performance MEMORY ACCESS ON THE APPLE M1 PROCESSOR When a program is mostly just accessing memory randomly, a standard cost model is to count the number of distinct random accesses. The general idea is that memory access is much slower than most other computational tasks. Furthermore, the cost model can be extended to count “nearby” memory accesses as free. That is, if I Continue reading Memory access on the Apple M1 processor COUNTING CYCLES AND INSTRUCTIONS ON THE APPLE M1 PROCESSOR Counting cycles and instructions on the Apple M1 processor. When benchmarking software, we often start by measuring the time elapsed. If you are benchmarking data bandwidth or latency, it is the right measure. However, if you are benchmarking computational tasks where you avoid disk and network accesses and where you only access a fewpages of
INSTRUCTIONS PER CYCLE: AMD ZEN 2 VERSUS INTEL The performance of a processor is determined by several factors. For example, processors with a higher frequency tend to do more work per unit of time. Physics makes it difficult to produce processors that have higher frequency. Modern processors can execute many instructions per cycle. Thus a 3.4GHz processor has 3.4 billion cycles per second, Continue reading Instructions per cycle: ADDING A (PREDICTABLE) BRANCH TO EXISTING CODE CAN Software is full of “branches”. They often take the form of if-then clauses in code. Modern processors try to predict the result of branches often long before evaluating them. Hard-to-predict branches are a challenge performance-wise because when a processor fails to predict correctly a branch, it does useless work that must be thrown away. A Continue reading Adding a (predictable EVEN FASTER BITMAP DECODING Bitmaps are a simple data structure used to represent sets of integers. For example, you can represent all sets of integers in [0,64) using a single 64-bit integer. When they are applicable, bitmaps are very efficient compared to the alternatives (e.g., a hash set). Unfortunately, extracting the bit sets in a bitmap can be expensive. Continue reading Even faster bitmap decoding QUICKLY PRUNING ELEMENTS IN SIMD VECTORS USING THE Modern processors have powerful vector instructions. However, some algorithms are tricky to implement using vector instructions. I often need to prune selected values from a vector. On x64 processors, we can achieve this result using table lookups and an efficient shuffle instruction. Building up the table each time gets tiring, however. Let us consider two Continue reading Quickly pruning ON HUMAN INTELLIGENCE… A PERSPECTIVE FROM COMPUTER SCIENCE On human intelligence a perspective from computer science. Whenever I read social scientists, there is often, implicit in the background, the concept of “intelligence” as a well defined quantity. I have some amount of intelligence. Maybe you have a bit more. But what does computer science have to say about any of this? PER-CORE FREQUENCY SCALING AND AVX-512: AN EXPERIMENT Intel has fancy new instructions (AVX-512) that are powerful, in part for heavy numerical work. When a core uses these heaviest of these new instructions, the core’s frequency comes down to maintain the power usage within bounds. I wanted to test it out so I wrote a little threaded program. It runs on four threads Continue reading Per-core frequency scaling and AVX-512: an experiment MEMORY ACCESS ON THE APPLE M1 PROCESSOR When a program is mostly just accessing memory randomly, a standard cost model is to count the number of distinct random accesses. The general idea is that memory access is much slower than most other computational tasks. Furthermore, the cost model can be extended to count “nearby” memory accesses as free. That is, if I Continue reading Memory access on the Apple M1 processor JUNE 2021 – DANIEL LEMIRE'S BLOG I my previous blog post, I documented how one might proceed to compute the number of digits of an integer quickly. E.g., given the integer 999, you want 3 but given the integer 1000, you want 4. It is effectively the integer logarithm in base 10. On computers, you can quickly compute the integer logarithm Continue reading Computing the number of digits of an integer even faster INSTRUCTIONS PER CYCLE: AMD ZEN 2 VERSUS INTEL The performance of a processor is determined by several factors. For example, processors with a higher frequency tend to do more work per unit of time. Physics makes it difficult to produce processors that have higher frequency. Modern processors can execute many instructions per cycle. Thus a 3.4GHz processor has 3.4 billion cycles per second, Continue reading Instructions per cycle: AMD SCIENCE IS THE BELIEF IN THE IGNORANCE OF EXPERTS Science is the belief in the ignorance of experts said Richard Feynman. Feynman had a Nobel prize in physics. He was a remarquable educator: his lecture notes are still popular. He foresaw nanotechnology and quantum computing. He is credited with identifying the cause of the Space Shuttle Challenger disaster. There is abeautiful talk by
APPLE’S M1 PROCESSOR AND THE FULL 128-BIT INTEGER PRODUCT If I multiply two 64-bit integers (having values in [0, 264)), the product requires 128 bits. Intel and AMD processors (x64) can compute the full (128-bit) product of two 64-bit integers using a single instruction (mul). ARM processors, such as those found in your mobile phone, require two instructions to achieve the same result: mul computes Continue reading Apple’s M1 processor and the FAST FLOAT PARSING IN PRACTICE In our work parsing JSON documents as quickly as possible, we found that one of the most challenging problem is to parse numbers. That is, you want to take the string “1.3553e142” and convert it quickly to a double-precision floating-point number. You can use the strtod function from the standard C/C++ library, but it is Continue reading Fast float parsing in practice SETTING UP A ROCKPRO64 (POWERFUL SINGLE-CARD COMPUTER A few months ago, I ordered ROCKPro64. If you are familiar with the Raspberry Pi, then it is a bit of the same an inexpensive computer that comes in the form of a single card. The ROCKPro64 differs from the Raspberry Pi in that it is much closer in power to a normal PC. You Continue reading Setting up a ROCKPro64 (powerful single-cardcomputer)
EVEN FASTER BITMAP DECODING Bitmaps are a simple data structure used to represent sets of integers. For example, you can represent all sets of integers in [0,64) using a single 64-bit integer. When they are applicable, bitmaps are very efficient compared to the alternatives (e.g., a hash set). Unfortunately, extracting the bit sets in a bitmap can be expensive. Continue reading Even faster bitmap decoding FASTEST WAY TO COMPUTE THE GREATEST COMMON DIVISOR Given two positive integers x and y, the greatest common divisor (GCD) z is the largest number that divides both x and y. For example, given 64 and 32, the greatest common divisor is 32. There is a fast technique to compute the GCD called the binary GCD algorithm or Stein’s algorithm. According to Wikipedia, Continue reading Fastest way to compute the greatest common divisor THE AVERAGE OF AVERAGES IS NOT THE AVERAGE A fact that we teach in our OLAP class is that you can’t take the average of averages and hope it will match the average. This is a common enough mistake for people working with databases and doing number crunching. It is only true if all of the averages are computed over sets having the Continue reading The average of averages is notthe average
SOME USEFUL REGULAR EXPRESSIONS FOR In my blog post, My programming setup, I stressed how important regular expressions are to my programming activities. Regular expressions can look intimidating and outright ugly. However, they should not be underestimated. Someone asked for examples of regular expressions that I rely upon. Here a few. It is commonly considered a faux pas to include Continue reading Some useful regular REUSING A THREAD IN C++ FOR BETTER In a previous post, I measured the time necessary to start a thread, execute a small job and return. auto mythread = std::thread( { counter++; }); mythread.join(); The answer is thousands of nanoseconds. Importantly, that is the time as measured by the main thread. That is, sending the query, and getting back the result, takes Continue reading Reusing a thread in C++ for better performance FAST FLOAT PARSING IN PRACTICE In our work parsing JSON documents as quickly as possible, we found that one of the most challenging problem is to parse numbers. That is, you want to take the string “1.3553e142” and convert it quickly to a double-precision floating-point number. You can use the strtod function from the standard C/C++ library, but it is Continue reading Fast float parsing in practice ADDING A (PREDICTABLE) BRANCH TO EXISTING CODE CAN Software is full of “branches”. They often take the form of if-then clauses in code. Modern processors try to predict the result of branches often long before evaluating them. Hard-to-predict branches are a challenge performance-wise because when a processor fails to predict correctly a branch, it does useless work that must be thrown away. A Continue reading Adding a (predictable THE AVERAGE OF AVERAGES IS NOT THE AVERAGE A fact that we teach in our OLAP class is that you can’t take the average of averages and hope it will match the average. This is a common enough mistake for people working with databases and doing number crunching. It is only true if all of the averages are computed over sets having the Continue reading The average of averages is notthe average
HOW FAST CAN YOU ALLOCATE A LARGE BLOCK OF MEMORY IN C++HOW TO ALLOCATE MORE MEMORYALLOCATE MEMORY MINECRAFT In C++, the most basic memory allocation code is just a call to the new operator: char *buf = new char; According to a textbook interpretation, we just allocated s bytes1. If you benchmark this line of code, you might find that it almost entirely free on a per-byte basis for large values of s. But Continue reading How fast can you allocate a large block of memory in C++? “CRACKING” RANDOM NUMBER GENERATORS (XOROSHIRO128 In software, we generate random numbers by calling a function called a “random number generator”. Such functions have hidden states, so that repeated calls to the function generate new numbers that appear random. If you know this state, you can predict all future outcomes of the random number generators. O’Neill, a professor at Harvey Mudd Continue reading “Cracking” random AVX-512: WHEN AND HOW TO USE THESE NEW INSTRUCTIONSAVX 512 AMDAVX 512 APPLICATIONSAVX 512 BENCHMARKAVX 512 PDFAVX AUDIO VIDEO Our processors typically do computations using small data stores called registers. On 64-bit processors, 64-bit registers are frequently used. Most modern processors also have vector instructions and these instructions operate on larger registers (128-bit, 256-bit, or even 512-bit). Intel’s new processors have AVX-512 instructions. These instructions are capable of operating on large 512-bit THE DAY I SUBSCRIBED TO A DOZEN PORN SITES… This morning, I noticed some odd charges on my VISA card. They were attributed to sites such as videosupport1.com, bngvsupport.com, paysupport1.com, bdpayhelp.com. I called up my bank. They gave me the phone number of the company behind these pay sites and told me to ask what the charges were for. I called the company behind Continue reading The day I subscribed to a dozen WAS LIFE BETTER IN THE 1970S? People from my generation often complain that their parents were better off. They are often quick to dismiss the Internet and smart phones as irrelevant to their well-being. Were they better off? Though it has recently peaked, the number of cars per person is higher than it was in the seventies. Current cars are much Continue reading Was life better in the 1970s? SOME USEFUL REGULAR EXPRESSIONS FOR In my blog post, My programming setup, I stressed how important regular expressions are to my programming activities. Regular expressions can look intimidating and outright ugly. However, they should not be underestimated. Someone asked for examples of regular expressions that I rely upon. Here a few. It is commonly considered a faux pas to include Continue reading Some useful regular REUSING A THREAD IN C++ FOR BETTER In a previous post, I measured the time necessary to start a thread, execute a small job and return. auto mythread = std::thread( { counter++; }); mythread.join(); The answer is thousands of nanoseconds. Importantly, that is the time as measured by the main thread. That is, sending the query, and getting back the result, takes Continue reading Reusing a thread in C++ for better performance FAST FLOAT PARSING IN PRACTICE In our work parsing JSON documents as quickly as possible, we found that one of the most challenging problem is to parse numbers. That is, you want to take the string “1.3553e142” and convert it quickly to a double-precision floating-point number. You can use the strtod function from the standard C/C++ library, but it is Continue reading Fast float parsing in practice ADDING A (PREDICTABLE) BRANCH TO EXISTING CODE CAN Software is full of “branches”. They often take the form of if-then clauses in code. Modern processors try to predict the result of branches often long before evaluating them. Hard-to-predict branches are a challenge performance-wise because when a processor fails to predict correctly a branch, it does useless work that must be thrown away. A Continue reading Adding a (predictable THE AVERAGE OF AVERAGES IS NOT THE AVERAGE A fact that we teach in our OLAP class is that you can’t take the average of averages and hope it will match the average. This is a common enough mistake for people working with databases and doing number crunching. It is only true if all of the averages are computed over sets having the Continue reading The average of averages is notthe average
HOW FAST CAN YOU ALLOCATE A LARGE BLOCK OF MEMORY IN C++HOW TO ALLOCATE MORE MEMORYALLOCATE MEMORY MINECRAFT In C++, the most basic memory allocation code is just a call to the new operator: char *buf = new char; According to a textbook interpretation, we just allocated s bytes1. If you benchmark this line of code, you might find that it almost entirely free on a per-byte basis for large values of s. But Continue reading How fast can you allocate a large block of memory in C++? “CRACKING” RANDOM NUMBER GENERATORS (XOROSHIRO128 In software, we generate random numbers by calling a function called a “random number generator”. Such functions have hidden states, so that repeated calls to the function generate new numbers that appear random. If you know this state, you can predict all future outcomes of the random number generators. O’Neill, a professor at Harvey Mudd Continue reading “Cracking” random AVX-512: WHEN AND HOW TO USE THESE NEW INSTRUCTIONSAVX 512 AMDAVX 512 APPLICATIONSAVX 512 BENCHMARKAVX 512 PDFAVX AUDIO VIDEO Our processors typically do computations using small data stores called registers. On 64-bit processors, 64-bit registers are frequently used. Most modern processors also have vector instructions and these instructions operate on larger registers (128-bit, 256-bit, or even 512-bit). Intel’s new processors have AVX-512 instructions. These instructions are capable of operating on large 512-bit THE DAY I SUBSCRIBED TO A DOZEN PORN SITES… This morning, I noticed some odd charges on my VISA card. They were attributed to sites such as videosupport1.com, bngvsupport.com, paysupport1.com, bdpayhelp.com. I called up my bank. They gave me the phone number of the company behind these pay sites and told me to ask what the charges were for. I called the company behind Continue reading The day I subscribed to a dozen WAS LIFE BETTER IN THE 1970S? People from my generation often complain that their parents were better off. They are often quick to dismiss the Internet and smart phones as irrelevant to their well-being. Were they better off? Though it has recently peaked, the number of cars per person is higher than it was in the seventies. Current cars are much Continue reading Was life better in the 1970s? SOME USEFUL REGULAR EXPRESSIONS FOR In my blog post, My programming setup, I stressed how important regular expressions are to my programming activities. Regular expressions can look intimidating and outright ugly. However, they should not be underestimated. Someone asked for examples of regular expressions that I rely upon. Here a few. It is commonly considered a faux pas to include Continue reading Some useful regular DANIEL LEMIRE'S BLOG Daniel Lemire is a computer science professor at the University of Quebec (TELUQ) in Montreal. His research is focused on software performance and data engineering. He is a techno-optimist. COMPUTING THE NUMBER OF DIGITS OF AN INTEGER EVEN FASTER I my previous blog post, I documented how one might proceed to compute the number of digits of an integer quickly. E.g., given the integer 999, you want 3 but given the integer 1000, you want 4. It is effectively the integer logarithm in base 10. On computers, you can quickly compute the integer logarithm Continue reading Computing the number of digits of an integer even faster FAST FLOAT PARSING IN PRACTICE In our work parsing JSON documents as quickly as possible, we found that one of the most challenging problem is to parse numbers. That is, you want to take the string “1.3553e142” and convert it quickly to a double-precision floating-point number. You can use the strtod function from the standard C/C++ library, but it is Continue reading Fast float parsing in practice COUNTING CYCLES AND INSTRUCTIONS ON THE APPLE M1 PROCESSOR Counting cycles and instructions on the Apple M1 processor. When benchmarking software, we often start by measuring the time elapsed. If you are benchmarking data bandwidth or latency, it is the right measure. However, if you are benchmarking computational tasks where you avoid disk and network accesses and where you only access a fewpages of
AVX-512: WHEN AND HOW TO USE THESE NEW INSTRUCTIONS Our processors typically do computations using small data stores called registers. On 64-bit processors, 64-bit registers are frequently used. Most modern processors also have vector instructions and these instructions operate on larger registers (128-bit, 256-bit, or even 512-bit). Intel’s new processors have AVX-512 instructions. These instructions are capable of operating on large 512-bit DO NOT WASTE TIME WITH STL VECTORS I spend a lot of time with the C++ Standard Template Library. It is available on diverse platforms, it is fast and it is (relatively) easy to learn. It has been perhaps too conservative at times: we only recently got a standard hash table data structure (with C++11). However, using STL is orders of magnitude Continue reading Do not waste time with STL vectors REVISITING VERNOR VINGE’S “PREDICTIONS” FOR 2025 Vernor Vinge is a retired mathematics professor who became famous through his science-fiction novels. He is also famous as being one of the first to contemplate the idea of a “technological singularity“. There is debate as to what the technological singularity, but the general idea goes as follows. At some point in the near future Continue reading Revisiting Vernor Vinge’s FASTEST WAY TO COMPUTE THE GREATEST COMMON DIVISOR Given two positive integers x and y, the greatest common divisor (GCD) z is the largest number that divides both x and y. For example, given 64 and 32, the greatest common divisor is 32. There is a fast technique to compute the GCD called the binary GCD algorithm or Stein’s algorithm. According to Wikipedia, Continue reading Fastest way to compute the greatest common divisor I STILL DON’T HAVE THE MULTIPLICATION TABLES MEMORIZED I read this on slashdot: I have a PhD in math, and I still don’t have the multiplication tables memorized Now I know I am not the only one! In other news, I still deduce my age from my birth date (takes me a minute or so each time); I was identified as having a Continue reading I still don’t have the multiplication tables memorized SOME USEFUL REGULAR EXPRESSIONS FOR In my blog post, My programming setup, I stressed how important regular expressions are to my programming activities. Regular expressions can look intimidating and outright ugly. However, they should not be underestimated. Someone asked for examples of regular expressions that I rely upon. Here a few. It is commonly considered a faux pas to include Continue reading Some useful regular REUSING A THREAD IN C++ FOR BETTER In a previous post, I measured the time necessary to start a thread, execute a small job and return. auto mythread = std::thread( { counter++; }); mythread.join(); The answer is thousands of nanoseconds. Importantly, that is the time as measured by the main thread. That is, sending the query, and getting back the result, takes Continue reading Reusing a thread in C++ for better performance FAST FLOAT PARSING IN PRACTICE In our work parsing JSON documents as quickly as possible, we found that one of the most challenging problem is to parse numbers. That is, you want to take the string “1.3553e142” and convert it quickly to a double-precision floating-point number. You can use the strtod function from the standard C/C++ library, but it is Continue reading Fast float parsing in practice ADDING A (PREDICTABLE) BRANCH TO EXISTING CODE CAN Software is full of “branches”. They often take the form of if-then clauses in code. Modern processors try to predict the result of branches often long before evaluating them. Hard-to-predict branches are a challenge performance-wise because when a processor fails to predict correctly a branch, it does useless work that must be thrown away. A Continue reading Adding a (predictable THE AVERAGE OF AVERAGES IS NOT THE AVERAGE A fact that we teach in our OLAP class is that you can’t take the average of averages and hope it will match the average. This is a common enough mistake for people working with databases and doing number crunching. It is only true if all of the averages are computed over sets having the Continue reading The average of averages is notthe average
HOW FAST CAN YOU ALLOCATE A LARGE BLOCK OF MEMORY IN C++HOW TO ALLOCATE MORE MEMORYALLOCATE MEMORY MINECRAFT In C++, the most basic memory allocation code is just a call to the new operator: char *buf = new char; According to a textbook interpretation, we just allocated s bytes1. If you benchmark this line of code, you might find that it almost entirely free on a per-byte basis for large values of s. But Continue reading How fast can you allocate a large block of memory in C++? “CRACKING” RANDOM NUMBER GENERATORS (XOROSHIRO128 In software, we generate random numbers by calling a function called a “random number generator”. Such functions have hidden states, so that repeated calls to the function generate new numbers that appear random. If you know this state, you can predict all future outcomes of the random number generators. O’Neill, a professor at Harvey Mudd Continue reading “Cracking” random AVX-512: WHEN AND HOW TO USE THESE NEW INSTRUCTIONSAVX 512 AMDAVX 512 APPLICATIONSAVX 512 BENCHMARKAVX 512 PDFAVX AUDIO VIDEO Our processors typically do computations using small data stores called registers. On 64-bit processors, 64-bit registers are frequently used. Most modern processors also have vector instructions and these instructions operate on larger registers (128-bit, 256-bit, or even 512-bit). Intel’s new processors have AVX-512 instructions. These instructions are capable of operating on large 512-bit THE DAY I SUBSCRIBED TO A DOZEN PORN SITES… This morning, I noticed some odd charges on my VISA card. They were attributed to sites such as videosupport1.com, bngvsupport.com, paysupport1.com, bdpayhelp.com. I called up my bank. They gave me the phone number of the company behind these pay sites and told me to ask what the charges were for. I called the company behind Continue reading The day I subscribed to a dozen WAS LIFE BETTER IN THE 1970S? People from my generation often complain that their parents were better off. They are often quick to dismiss the Internet and smart phones as irrelevant to their well-being. Were they better off? Though it has recently peaked, the number of cars per person is higher than it was in the seventies. Current cars are much Continue reading Was life better in the 1970s? SOME USEFUL REGULAR EXPRESSIONS FOR In my blog post, My programming setup, I stressed how important regular expressions are to my programming activities. Regular expressions can look intimidating and outright ugly. However, they should not be underestimated. Someone asked for examples of regular expressions that I rely upon. Here a few. It is commonly considered a faux pas to include Continue reading Some useful regular REUSING A THREAD IN C++ FOR BETTER In a previous post, I measured the time necessary to start a thread, execute a small job and return. auto mythread = std::thread( { counter++; }); mythread.join(); The answer is thousands of nanoseconds. Importantly, that is the time as measured by the main thread. That is, sending the query, and getting back the result, takes Continue reading Reusing a thread in C++ for better performance FAST FLOAT PARSING IN PRACTICE In our work parsing JSON documents as quickly as possible, we found that one of the most challenging problem is to parse numbers. That is, you want to take the string “1.3553e142” and convert it quickly to a double-precision floating-point number. You can use the strtod function from the standard C/C++ library, but it is Continue reading Fast float parsing in practice ADDING A (PREDICTABLE) BRANCH TO EXISTING CODE CAN Software is full of “branches”. They often take the form of if-then clauses in code. Modern processors try to predict the result of branches often long before evaluating them. Hard-to-predict branches are a challenge performance-wise because when a processor fails to predict correctly a branch, it does useless work that must be thrown away. A Continue reading Adding a (predictable THE AVERAGE OF AVERAGES IS NOT THE AVERAGE A fact that we teach in our OLAP class is that you can’t take the average of averages and hope it will match the average. This is a common enough mistake for people working with databases and doing number crunching. It is only true if all of the averages are computed over sets having the Continue reading The average of averages is notthe average
HOW FAST CAN YOU ALLOCATE A LARGE BLOCK OF MEMORY IN C++HOW TO ALLOCATE MORE MEMORYALLOCATE MEMORY MINECRAFT In C++, the most basic memory allocation code is just a call to the new operator: char *buf = new char; According to a textbook interpretation, we just allocated s bytes1. If you benchmark this line of code, you might find that it almost entirely free on a per-byte basis for large values of s. But Continue reading How fast can you allocate a large block of memory in C++? “CRACKING” RANDOM NUMBER GENERATORS (XOROSHIRO128 In software, we generate random numbers by calling a function called a “random number generator”. Such functions have hidden states, so that repeated calls to the function generate new numbers that appear random. If you know this state, you can predict all future outcomes of the random number generators. O’Neill, a professor at Harvey Mudd Continue reading “Cracking” random AVX-512: WHEN AND HOW TO USE THESE NEW INSTRUCTIONSAVX 512 AMDAVX 512 APPLICATIONSAVX 512 BENCHMARKAVX 512 PDFAVX AUDIO VIDEO Our processors typically do computations using small data stores called registers. On 64-bit processors, 64-bit registers are frequently used. Most modern processors also have vector instructions and these instructions operate on larger registers (128-bit, 256-bit, or even 512-bit). Intel’s new processors have AVX-512 instructions. These instructions are capable of operating on large 512-bit THE DAY I SUBSCRIBED TO A DOZEN PORN SITES… This morning, I noticed some odd charges on my VISA card. They were attributed to sites such as videosupport1.com, bngvsupport.com, paysupport1.com, bdpayhelp.com. I called up my bank. They gave me the phone number of the company behind these pay sites and told me to ask what the charges were for. I called the company behind Continue reading The day I subscribed to a dozen WAS LIFE BETTER IN THE 1970S? People from my generation often complain that their parents were better off. They are often quick to dismiss the Internet and smart phones as irrelevant to their well-being. Were they better off? Though it has recently peaked, the number of cars per person is higher than it was in the seventies. Current cars are much Continue reading Was life better in the 1970s? SOME USEFUL REGULAR EXPRESSIONS FOR In my blog post, My programming setup, I stressed how important regular expressions are to my programming activities. Regular expressions can look intimidating and outright ugly. However, they should not be underestimated. Someone asked for examples of regular expressions that I rely upon. Here a few. It is commonly considered a faux pas to include Continue reading Some useful regular DANIEL LEMIRE'S BLOG Daniel Lemire is a computer science professor at the University of Quebec (TELUQ) in Montreal. His research is focused on software performance and data engineering. He is a techno-optimist. COMPUTING THE NUMBER OF DIGITS OF AN INTEGER EVEN FASTER I my previous blog post, I documented how one might proceed to compute the number of digits of an integer quickly. E.g., given the integer 999, you want 3 but given the integer 1000, you want 4. It is effectively the integer logarithm in base 10. On computers, you can quickly compute the integer logarithm Continue reading Computing the number of digits of an integer even faster FAST FLOAT PARSING IN PRACTICE In our work parsing JSON documents as quickly as possible, we found that one of the most challenging problem is to parse numbers. That is, you want to take the string “1.3553e142” and convert it quickly to a double-precision floating-point number. You can use the strtod function from the standard C/C++ library, but it is Continue reading Fast float parsing in practice COUNTING CYCLES AND INSTRUCTIONS ON THE APPLE M1 PROCESSOR Counting cycles and instructions on the Apple M1 processor. When benchmarking software, we often start by measuring the time elapsed. If you are benchmarking data bandwidth or latency, it is the right measure. However, if you are benchmarking computational tasks where you avoid disk and network accesses and where you only access a fewpages of
AVX-512: WHEN AND HOW TO USE THESE NEW INSTRUCTIONS Our processors typically do computations using small data stores called registers. On 64-bit processors, 64-bit registers are frequently used. Most modern processors also have vector instructions and these instructions operate on larger registers (128-bit, 256-bit, or even 512-bit). Intel’s new processors have AVX-512 instructions. These instructions are capable of operating on large 512-bit DO NOT WASTE TIME WITH STL VECTORS I spend a lot of time with the C++ Standard Template Library. It is available on diverse platforms, it is fast and it is (relatively) easy to learn. It has been perhaps too conservative at times: we only recently got a standard hash table data structure (with C++11). However, using STL is orders of magnitude Continue reading Do not waste time with STL vectors REVISITING VERNOR VINGE’S “PREDICTIONS” FOR 2025 Vernor Vinge is a retired mathematics professor who became famous through his science-fiction novels. He is also famous as being one of the first to contemplate the idea of a “technological singularity“. There is debate as to what the technological singularity, but the general idea goes as follows. At some point in the near future Continue reading Revisiting Vernor Vinge’s FASTEST WAY TO COMPUTE THE GREATEST COMMON DIVISOR Given two positive integers x and y, the greatest common divisor (GCD) z is the largest number that divides both x and y. For example, given 64 and 32, the greatest common divisor is 32. There is a fast technique to compute the GCD called the binary GCD algorithm or Stein’s algorithm. According to Wikipedia, Continue reading Fastest way to compute the greatest common divisor I STILL DON’T HAVE THE MULTIPLICATION TABLES MEMORIZED I read this on slashdot: I have a PhD in math, and I still don’t have the multiplication tables memorized Now I know I am not the only one! In other news, I still deduce my age from my birth date (takes me a minute or so each time); I was identified as having a Continue reading I still don’t have the multiplication tables memorizedSkip to content
DANIEL LEMIRE'S BLOG Daniel Lemire is a computer science professor at the University of Quebec (TELUQ) in Montreal. His research is focused on software performance and data engineering. He is a techno-optimist.Menu and widgets
* My home page
* My papers
* My software
SUBSCRIBE
You can subscribe to this blog by email.
WHERE TO FIND ME?
I am on Twitter and GitHub: You can also find Daniel Lemire on* on Google Scholar
with
4k citations and over 75 peer-reviewed publications,* on Facebook ,
* and on LinkedIn .
Before the pandemic of 2020, you could meet Daniel in person, as he was organizing regular talks open to the public in Montreal: tribalaband technolab
. Search for:
SUPPORT MY WORK!
I do not accept any advertisement. However, you can support the blog with donations through paypal . Please consider getting in touch if you are a supporter so that I can thank you.RECENT POSTS
* Science and Technology (December 5th 2020) * Interview by Adam Gordon Bell * Java Buffer types versus native arrays: which is faster? * Science and Technology links (November 28th 2020) * How fast does interpolation search converge?RECENT COMMENTS
* Morgan on Science and Technology links (November 14th 2020) * Daniel Lemire on We released simdjson 0.3: the fastest JSON parser in the world is even better! * Daniel Ryan on We released simdjson 0.3: the fastest JSON parser in the world is even better! * Daniel Lemire on Science and Technology links (July 25th 2020) * Tyle on Science and Technology links (July 25th 2020)PAGES
* A short history of technology* About me
* Book recommendations * Interviews and talks* My bets
* My favorite articles* My readers
* My sayings
* Predictions
* Recommended video games* Terms of use
* Write good papers
ARCHIVES
Archives Select Month December 2020 (2) November 2020 (10) October 2020 (6) September 2020 (6) August 2020 (4) July 2020 (6) June 2020 (7) May 2020 (6) April 2020 (7) March 2020 (8) February 2020 (7) January 2020 (7) December 2019 (10) November 2019 (6) October 2019 (7) September 2019 (9) August 2019 (9) July 2019 (10) June 2019 (9) May 2019 (10) April 2019 (8) March 2019 (15) February 2019 (9) January 2019 (10) December 2018 (9) November 2018 (8) October 2018 (10) September 2018 (9) August 2018 (10) July 2018 (14) June 2018 (9) May 2018 (11) April 2018 (11) March 2018 (10) February 2018 (7) January 2018 (15) December 2017 (9) November 2017 (16) October 2017 (13) September 2017 (20) August 2017 (12) July 2017 (8) June 2017 (9) May 2017 (10) April 2017 (11) March 2017 (11) February 2017 (6) January 2017 (8) December 2016 (8) November 2016 (4) October 2016 (6) September 2016 (10) August 2016 (6) July 2016 (4) June 2016 (6) May 2016 (5) April 2016 (10) March 2016 (9) February 2016 (8) January 2016 (5) December 2015 (8) November 2015 (4) October 2015 (8) September 2015 (5) August 2015 (6) July 2015 (5) June 2015 (2) May 2015 (4) April 2015 (4) March 2015 (5) February 2015 (5) January 2015 (3) December 2014 (6) November 2014 (4) October 2014 (3) September 2014 (5) August 2014 (5) July 2014 (4) June 2014 (2) May 2014 (6) April 2014 (7) March 2014 (3) February 2014 (5) January 2014 (6) December 2013 (8) November 2013 (5) October 2013 (5) September 2013 (5) August 2013 (3) July 2013 (4) June 2013 (4) May 2013 (3) April 2013 (7) March 2013 (6) February 2013 (6) January 2013 (8) December 2012 (2) November 2012 (5) October 2012 (4) September 2012 (6) August 2012 (4) July 2012 (4) June 2012 (3) May 2012 (3) April 2012 (6) March 2012 (5) February 2012 (3) January 2012 (9) December 2011 (3) November 2011 (5) October 2011 (5) September 2011 (4) August 2011 (8) July 2011 (3) June 2011 (5) May 2011 (6) April 2011 (6) March 2011 (5) February 2011 (4) January 2011 (10) December 2010 (7) November 2010 (6) October 2010 (3) September 2010 (3) August 2010 (5) July 2010 (4) June 2010 (7) May 2010 (5) April 2010 (7) March 2010 (8) February 2010 (5) January 2010 (7) December 2009 (4) November 2009 (6) October 2009 (10) September 2009 (8) August 2009 (11) July 2009 (9) June 2009 (7) May 2009 (7) April 2009 (7) March 2009 (7) February 2009 (14) January 2009 (14) December 2008 (16) November 2008 (25) October 2008 (13) September 2008 (15) August 2008 (14) July 2008 (15) June 2008 (14) May 2008 (15) April 2008 (20) March 2008 (18) February 2008 (12) January 2008 (19) December 2007 (24) November 2007 (23) October 2007 (19) September 2007 (13) August 2007 (23) July 2007 (18) June 2007 (15) May 2007 (19) April 2007 (9) March 2007 (7) February 2007 (27) January 2007 (20) December 2006 (20) November 2006 (18) October 2006 (9) September 2006 (11) August 2006 (25) July 2006 (10) June 2006 (18) May 2006 (27) April 2006 (25) March 2006 (11) February 2006 (11) January 2006 (39) December 2005 (23) November 2005 (25) October 2005 (20) September 2005 (26) August 2005 (39) July 2005 (17) June 2005 (16) May 2005 (9) April 2005 (13) March 2005 (30) February 2005 (20) January 2005 (30) December 2004 (11) November 2004 (19) October 2004 (14) September 2004 (17) August 2004 (13) July 2004 (16) June 2004 (16) May 2004 (12)BORING STUFF
* Log in
* Entries feed
* Comments feed
* WordPress.org
SCIENCE AND TECHNOLOGY (DECEMBER 5TH 2020) * Researchers find that older people can lose weight just as easilyas younger people
.
* Google DeepMind claims to have solved the protein folding problem,
an important problem in medicine. This breakthrough could greatly accelerate drug development and lead to new cures. Yet,not everyone is convinced that they actually solved the problem.
* “Indian Americans have risen to become the richest ethnicity in America, with an average household income of $126,891 (compared to the US average of $65,316). (…) Almost 40% of all Indians in the United States have a master’s, doctorate, or other professional degree, which is five times the national average.” (source)
* There is a popular idea in the US currently: we should just forgive all student debts. Catherine and Yannelis find that “universal and capped forgiveness policies are highly regressive, with the vast majority of benefits accruing to high-incomeindividuals.”
* Researchers successfully deployed advanced genetic engineering techniques (based on CRISPR) against cancer in mice.
* Researchers rejuvenated the cells in the eyes old mice, restauring their vision . (Source:Nature.)
* Remember all these studies claiming that birth order determined your fate, with older siblings going more in science and younger siblings going for more artistic careers? It seems that these results do not replicate very well given a re-analysis.The
effects are much weaker than initially believed and they do not necessarily go in the expected direction. * Older people (over 70) have less zinc in their blood. Their zinc level predicts their mortality rate. The
more zinc, the less likely they are to die. * Shenzhen (China) has truly driveless cars on the roads.
* Centanarians have low levels of blood sugar, and they are less likely to suffer from diabetes than adults in general.
* We have an actual treatment to help people suffering from progeria,
a crippling disease. * Eating eggs is quite safe.
* The state-of-the-art in image processing includes convolutional neural networks (CNN). Though it gives good results, it is a computationally expensive approach. Google has adapted a technique from natural-language processing called transformers to the task and they report massive gains in computational efficiency.
Posted on December 5, 2020Author
Daniel Lemire Categories 2 Comments on Science and Technology (December 5th 2020) INTERVIEW BY ADAM GORDON BELL A few weeks ago, Adam Gordon Bell had me on his podcast. You canlisten to it
.
Here is the abstract: > Did you ever meet somebody who seemed a little bit different than > the rest of the world? Maybe they question things that others > wouldn’t question or said things that others would never say. > Daniel is a world-renowned expert on software performance, and one > of the most popular open source developers. If you measure by GitHub > followers. Today, he’s gonna share his story. It involves time at > a research lab, teaching students in a new way. it will also involve > upending people’s assumptions about IO performance. Elon Musk And > Julia Roberts will come up a little bit more than you might expect. I would not describe myself as “world renowned” about anything, but Adam needs to do the a bit of promotion. My interview is right after an interview with Brian Kernighan: he is
world renowned.
I also do not think that I am “different from the rest of the world” though I have maybe given more thought than most to the need to be different. I have always preoccupied about trying to do work that others do not do: sadly, it is much harder than it sounds. I usually talk mostly about my work, but Adam wanted to go a bit personal, like how I was initially struggling at school. FURTHER READING: After giving this interview, I read Paul Graham’s latest essay . If you liked my interview, you will probably enjoy Graham’s essay. You might enjoy his essay in any case. Posted on December 1, 2020December 1, 2020Author
Daniel Lemire Categories Leave a comment on Interview byAdam Gordon Bell
JAVA BUFFER TYPES VERSUS NATIVE ARRAYS: WHICH IS FASTER? When programming in C, one has to allocate and de-allocate memory by hand. It is an error prone process. In contrast, newer languages like Java often manage their memory automatically. Java relies on garbage collection. In effect, memory is allocated as needed by the programmer, and then Java figures out that some piece of data is no longer needed, and it retrieves the corresponding memory. The garbage collection process is fast and safe, but it is not free: despite decades of optimization, it can still cause major headaches todevelopers.
Java has native arrays (e.g., the int type). These arrays are typically allocated on the “Java heap”. That is, they are allocated and managed by Java as dynamic data, subject to garbagecollection.
Java also has Buffer types such as the IntBuffer. These are high-level abstractions that can be backed by native Java arrays but also by other data sources, including data that is outside of the Java heap. Thus you can use Buffer types to avoid relying so much on the Javaheap.
But my experience is that it comes with some performance penalty compared to native arrays. I would not say that Buffers are slow. In fact, given a choice between a Buffer and a stream (DataInputStream), you should strongly favour Buffer types.
However, they are not as fast as native arrays in my experience. I can create an array of 50,000 integers, either with “new int” or as “IntBuffer.allocate(50000)”. The latter should essentially create an array (on the Java heap) but wrappred with an IntBuffer “interface”. A possible intuition is that wrapping an array with an high-level interface should be free. Though it is true that high level abstractions can come with no performance penalty (and sometimes, even, performance gains), whether they do is an empirical matter. You should never just assume that your abstraction comes for free. Because I am making an empirical statement, let us test it out empirically with the simplest test I can imagine. I am going to add one to every element in the array/IntBuffer. for(int k = 0; k < s.array.length; k++) {s.array += 1;
}
for(int k = 0; k < s.buffer.limit(); k++) { s.buffer.put(k, s.buffer.get(k) + 1);}
I get the following results on my desktop (OpenJDK 14, 4.2 GHz Intelprocessor):
int
2.5 mus
IntBuffer
12 mus
That is, arrays are over 4 times faster than IntBuffers in this test. You can run the benchmark yourself if you’d like.
My expectation is that many optimizations that Java applies to arrays are not applied to Buffer types. Of course, this tells us little about what happens when Buffers are used to map values from outside of the Java heap. My experience suggests that things can be even worse. Buffer types have not made native arrays obsolete, at least not as far as performance is concerned. Posted on November 30, 2020Author
Daniel Lemire Categories 4 Comments on Java Buffer types versus native arrays: which is faster? SCIENCE AND TECHNOLOGY LINKS (NOVEMBER 28TH 2020) * Homework favours kids with wealthier and better educated parents . My own kids have access to two parents with a college education, including a father who is publishing mathematically-intensive research papers. Do you think for a minute that it is fair to expect kids who have poorly educated parents to compete on homework assignments? (Not that I help my kidsall that much…)
* Though researchers have reported that animal populations are falling worldwide (presumably because of human beings), this trend is entirely driven by 3% of the animals that are strongly declining while most animals (vertebrates) are not in decline.
* The expansion of parental leave and child care subsidies has not affected gender inequalities in the workplace . (That is not an argument for abolishing parental leave and child care subsidies.) * An hallucinogenic tea can help you grow new brain cells.
* It appears that aging is partially caused by aging factors found in our blood. In mice, researchers achieved rejuvenation (improved cognition and reduced inflammation) by diluting blood plasma. It
confirms earlier work on the topic but shows rejuvenation in the brain. It does not mean that we know how to rejuvenate human beings, but it gives you a new angle of attack that is safe andinexpensive.
* A paper claims that hyperbaric oxygen therapy brings about rejuvenation in human beings.
In effect, it shows a lengthening of the telomeres, this component of our DNA that grows shorter with each division. The lengthening is in some cells only. They also show a reduction of the number of senescent cells: these zombie cells that we tend to accumulate with age. The reduction in senescent cells is only for part of the body and it might be caused by the oxygen (that may kill the senescent cells). It is unclear how this expensive therapy compares with a good exercise regimen. We have reliable markers of biological age based on methylation and they were not used as part of this study. * Countries that adopt a flat tax system (as opposed to the more common progressive system) grow richer exponentially faster.
That is, though it may seem intuitive that richer people should pay higher percentage of their income in taxes, it may come at a substantial cost with respect to overall wealth. * Diabetes is related to a disfunction of the pancreas. Thankfully we can create insuline producing cells, and we can even insert these cells in one’s pancreas. Sadly, they are soon attacked by the immune system and destroyed. It appears that progress is being made, and that viable cells have survived transplantation in the pancreas through a new technique that protects them from the immune system. It works in mice. * Cochrane, a credible source when it comes to medical research, published a
review
of the evidence regarding masks and hand washing with respect to respiratory viral infections: > There is uncertainty about the effects of face masks. The > low‐moderate certainty of the evidence means our confidence in the > effect estimate is limited, and that the true effect may be > different from the observed estimate of the effect. The pooled > results of randomised trials did not show a clear reduction in > respiratory viral infection with the use of medical/surgical masks > during seasonal influenza. There were no clear differences between > the use of medical/surgical masks compared with N95/P2 respirators > in healthcare workers when used in routine care to reduce > respiratory viral infection. Hand hygiene is likely to modestly > reduce the burden of respiratory illness. Harms associated with > physical interventions were under‐investigated. It does not follow that you should not wear masks or that you should avoid washing your hands. I do and I recommend you do too. However, you should be critical of any statement to the effect that science is telling us that masks and hand washing stop airborne viruses, especially when such statements are made in a political context. Posted on November 29, 2020November 29, 2020Author
Daniel Lemire Categories 11 Comments on Science and Technology links (November 28th 2020) HOW FAST DOES INTERPOLATION SEARCH CONVERGE? When searching in a sorted array, the standard approach is to rely on a binary search. If the input array contains N elements, after log(N) + 1 random queries in the sorted array, you will find the value you are looking for. The algorithm is well known, even by kids. You first guess that the value is in the middle, you check the value in the middle, you compare it against your target and go either to the upper half of lower half of the array based on the result of the comparison. Binary search only requires that the values be sorted. What if the values are not only sorted, but they also follow a regular distribution. Maybe you are generating random values, uniformly distributed. Maybe you are using hash values. In a classical paper,
Perl et al. described a potentially more effective approach called interpolation search. It is applicable when you know the distribution of your data. The intuition is simple: instead of guessing that the target value is in the middle of your range, you adjust your guess based on the value. If the value is smaller than average, you aim near the beginning of the array. If the value much larger than average, you guess that the index should be near the end. The expected search time is then much better: log(log(N)). To gain some intuition, I quickly implemented interpolation search in C++ and ran a little experiment, generating large arrays and search in them using interpolation search. As you can see, as you multiply the size of the array by 10, the number of hits or comparisons remains nearly constant. Furthermore, interpolation search is likely to quickly get very close to the target. Thus the results are better than they look if memory locality is a factor.N
HITS
100
2.9
1000
3.5
10000
3.8
100000
4.0
100000
4.5
1000000
4.6
10000000
4.9
You might object that such a result is inferior to a hash table, and I do expect well implemented hash tables to perform better, but you should be mindful that many hash table implementations gain performance at the expense of higher memory usage, and that they often lose the ability to visit the values in sorted order at high speed. It is also easier to merge two sorted arrays than to merge two hashtables.
This being said, I am not aware of interpolation search being actually used productively in software today. If you have a reference to such an artefact, please share! UPDATE: Some readers suggest that Big table relies on a form of interpolation search . UPDATE: It appears that interpolation search was tested out in git (1,
2
).
Credit: Jeff King.
FURTHER READING: Interpolation search revisitedby Muła
Posted on November 25, 2020November 26, 2020Author
Daniel Lemire Categories 12 Comments on How fast does interpolation search converge? THE DISAGREEABLE SCIENTIST CONJECTURE If you are a nerd, the Internet is a candy store… if only you stay away from mainstream sites. Some of the best scientists have blogs, YouTube channels, they post their papers online. When they review a paper, they speak frankly, openly. Is the work good or irrelevant? You can agree or disagree, but their points are clear and well stated. You may expect that researchers always work in this manner. That they always speak their mind. Nothing could be further from the truth in my experience. We have a classical power structure with a few people deciding on the Overton window. Here are the subjects, we can discuss, here are the relevant topics. We have added layers and layers of filters to protect us against disruption. That is, there is free discussion… as long as you follow the beaten path. Here are some of the things that you must never discuss:*
* These people in field X are getting nowhere. I think that their work is no good. We should move on and leave them behind. * We have this theoretical _modèle_ but it does not seem to help us very much in the real world, maybe we should drop it. I find that the most interesting researchers break both of these barriers from time to time. In other words, they are not veryreasonable.
My conjecture is that it is not an accident. To be precise, my conjecture is that the best scientists are disagreeable people. It is a technical statement. I am saying that they have the courage to offend as an intellectual. The business of research is bureaucratic. In a bureaucracy, the day to day goes much smoother if you are agreeable. But being disagreeable at times might help career-wise: you can demand to be respected, demand to be credited. That is certainly valuable to get ahead and bepromoted.
But I am not thinking about the business of science, I am thinking about science itself. The progress of scientific knowledge needs disagreeable people. The statement itself is obvious: to bring a new idea into the fold, someone must first champion it and since new ideas tend to displace old ideas. And so if you fear to displease others, you will never bring anything disruptive to the table. But that is not what I mean. Or it is not the only thing that I mean. When we are thinking of new ideas, deciding whether to spend time on them, we weight many factors in our head. If you are a strong conformist, you will automatically, without thinking, prune out really disruptive ideas. There are some papers you will even refuse to read for fear that you might get in trouble, be rejected by some of yourpeers.
I believe that it takes disagreeable people to pick up the dangerous ideas and pursue them. Science needs risk taking, but the risks are disproportionnally taken by a few disagreeable people. To be clear, again, I use the term disagreeable in a technical manner: I do not mean that these people are not fun to have around. My conjecture is falsifiable. I believe that after controlling for the potential benefits to one’s career of being disagreeable (insisting on credit and fighting for oneselve), we will find a strong correlation between breakthrough/disruptive research findings andbeing disagreeable.
It is a population-level prediction. I do not predict that a given individual will become known as the new Einstein. This being said, I have to wonder whether Einstein would have a YouTube channel where he voiced controversial opinions if he lived today. I bet he would. My conjecture also leads to a cultural-level prediction, though it becomes harder to formalize it. I believe that cultures that protect more strongly freedom of speech in the scientific domain will contribute disproportionally to science. And that is because a culture of freedom of speech encourages and supports open dissent withestablished ideas.
Posted on November 22, 2020November 22, 2020Author
Daniel Lemire Categories 1 Comment on The disagreeable scientist conjecture PROGRAMMING IS SOCIAL Software programming looks at a glance like work done best done in isolation. Nothing could be further from the truth in my experience. Though you may be working on your little program alone, you should not dismiss the social component of the work. I often say that “programming is social” to justify the fact that I know and practice multiple programming languages. I also use this saying to justify the popularity of programming languages like JavaScript, Go and even C and Java. Let me elaborate on what I mean by “programming is social”: * Programmers reuse each other’s work on a massive scale. Programmers are lazy and refuse to do the same task again and again. So they code frequently needed operations into packages. They tend distribute these packages. The most popular programming languages tend to have free, ready-made components to solve most problems already. JavaScript and Python have free and high-quality libraries and extensions for most things. So it pays to know popular programminglanguages.
* Most programmers encounter similar issues over time. Some programming difficulties are particularly vexing. Yet programmers are great at sharing questions and answers. You ability to ask clear questions, to provide clear answers, and to read and understand both, is important to your ongoing success as a programmer. Some programming languages have the advantage as they benefit from an accumulated set of knowledge. A programming language like Java does well in this respect. It pays to use well documented languages. * Programming code is also, literally, a language. It is not uncommon that I will ask from someone that they code up their idea so I can understand it. Programming languages that easy to read win: Go and Python. Often, it pays to use the programming language that your community favours, even if you share no code with them, just so you can communicate more easily. It may be possible to write an Android application in Go, for example. But you would be wiser to using something like Kotlin or Java. Just because that is what your peersuse.
* If you do great work, at some point you may need to teach others about how they can continue your work or use your work. Teaching requires good communication. It is helpful to have clear code in a language that many people know. Posted on November 19, 2020Author
Daniel Lemire Categories Leave a comment on Programming issocial
DOUBLE-BLIND PEER REVIEW IS A BAD IDEA When you submit a manuscript to a journal or to a conference, you do not know who reviews your manuscript. Increasingly, due to concerns with biases and homophily , journals and conferences are moving to a double-blind peer review where you have to submit your paper without disclosing your identity. There is also a competing move toward more openness where everyone’s identity is disclosed. The intuition behind double-blind review is that it is harder to discriminate against people if you do not know their name and affiliation. Of course, editors and chairs still get to know your identity. The intuition behind open peer review is that if your reviews are published, you will be kept in check and may get punished if you are too biased. But people are concerned about their reviews or the reviews of their papers being published. There are many undesirable biases involved in a professional setting. Of course, there are undesirable biases against some minorities and women. There are other biases as well. There are indications that the prestige of the author can be a determining factor when judging a piece of work. People generally tend to review people who are like themselves more highly. There are undesirable orthodoxy biases as well: uncommon ideas are far more difficult to defend even when the most common ideas have not been revisited lately. Conventional affiliations are more highly rated than unconventional affiliations. Yet we should not immediately accept that hiding the identity of the author is the solution. The mere fact that we recognize a problem, and that there is some action related to the problem, does not imply that we must proceed with that action. Our tendency to do so relies on a fallacy known as the politician’s syllogism. The Australian government, motivated by a study that claim blind auditions helped women, conducted an extensive evaluation of blind interviews and found the following:
> This study assessed whether women and minorities are discriminated > against in the early stages of the recruitment process for senior > positions in the Australian Public Service (APS). It also tested the > impact of implementing a ‘blind’ or de-identified approach to > reviewing candidates. Over 2,100 public servants from 15 agencies > participated in the trial. They completed an exercise in which they > shortlisted applicants for a hypothetical senior role in their > agency. Participants were randomly assigned to receive application > materials for candidates in standard form or in de-identified form > (with information about candidate gender, race and ethnicity > removed). Overall, the results indicate the need for caution when > moving towards ’blind’ recruitment processes in the APS, as > de-identification may frustrate efforts aimed at promoting> diversity.
To be clear, what they found was the reverse of what they were expecting: blinding interviews made things slightly worse for women. And this study that shows that blind interviews helped women get hired by orchestra? Its statistical analysis does not stand up to scrutiny.
And the left-leaning New York Times has recently published an essay arguing that blind interviews make orchestra less diverse.
Clearly, we believe that we can effectively combat undesirable prejudices in hiring since most employers do not hire based on a double-blind process. PhD students submit their thesis for review without hiding their name. Nobody is advocating that research papers be published anonymously as a rule. Nobody is advocating that we stop broadcasting the name of our employers, where we got our degrees and so forth. Nobody is advocating that when we report on a research result, we hide the name of the journal… Yet if we wanted to present _pure research results_, that is what we would do: hide affiliations, journal names, author names. So why would we not want to hide the identity of the researchers during peer review despite the apparent advantages? Firstly, the evidence for the benefits of double-blind peer reviews is a set of anecdotes. Double-blind experiments can bring biases to light the same way a microscope can show you a bacteria: they are great inquiry tools, but not necessary cures. What is scientific fact is that people have biases, homophily, and that you can, up to a point, anonymize content. However, the evidence for benefits is mixed. It is not clear that it helps women, for example. Do we get more participation from people outside the major universities over time under double-blind peer review? We do not know. Major conferences that did switch to double-blind peer review, like NeurIPS, are heavily dominated by a few elite institutions with almost no outsiders. Secondly, telling someone from a poorly known organization, from a poor or non-English country or from non-dominant gender identity that they need to hide who they are to be treated fairly is not entirely a positive message. I certainly want to live in a world where a woman can publish her work as a woman. Stressing biases without properly addressing them can render fields unattractive to those who might suffer from these biases. Another concern is that double-blind renders open scholarship difficult. I have been posting most my papers online, prior to peer review on arXiv or others servers, sometimes years before they are even submitted. I write all my software openly, engaging freely with multiple engineers and researchers. I practice what I call open scholarship. Obviously, it means I cannot reasonably take part in double-blind venues. Making open scholarship more difficult like seems a step backward. You can argue that you can still anonymize your contributions, in a bureaucratic manner, for the few days that the review last. But such a proposal dismisses the fact that open scholarship is primarily a cultural practice founded on the idea that the research happens in free and open networks. And what happens after the work has been accepted? When the referees are biased, why would the readers not be biased as well? What is more important, the readers or the reviewers? Do we write papers to be published or to be read? I vote for the latter without hesitation. Yet, at best, double-blind peer review might help with getting papers accepted, but it does nothing for post-publication assessment. It is almost as if we thought that the end goal of the game was to get the research published in prestigious venues. Are we all about maximizing the impact factor or do we care to produce impactful research? If you are to be consistent with your beliefs, then if you promote double-blind peer review, you should also demand that we stop cataloguing and broadcasting affiliations. At a minimum, we should downplay the names of the authors: if we include them at all, they should be at the end of the paper, in small characters. If you are consistent with your beliefs, you should never, ever, give lists of names with affiliations. It seems logically incoherent for someone from an elite institution to be arguing for double-blind peer review while visibly broadcasting their elite institution. In part, I believe that they end up with such an illogical result because they start from a fallacy, the politician’s syllogism. The San Francisco Declaration on Research Assessment tells us: “When involved in committees making decisions about funding, hiring, tenure, or promotion, make assessments based on scientific content rather than publication metrics.” Focusing on how papers get accepted misses the point of what we want to value. Yet a direct consequence of double-blind peer review is to make highly selective paper acceptance socially and politically more sustainable. There is no free lunch. Double-blind peer review is not without cost.Blank reported
that authors from outside academia have a lower acceptance rate under double-blind peer review presumably because reviewers, when they can, tend to give a chance to outsiders despite the fact that outsider do not conform to the field’s orthodoxy as well as insiders may. Moreover, Blank indicates that double-blind peer review isoverall harsher.
This “harsh” nature has been replicated and quantified. Double-blind peer review manuscripts are less likely to be successful than single-blind peerreview manuscripts.
So there are unintended consequences to double-blind peer review. Having hasher reviews and lower acceptance rates may not be a positive. A student may think: “Why continue to seek approval, when you can leave science and do something else where you’ll beappreciated?”
And is the harsh nature entirely a side-effect? The introduction of double-blind peer review is partly justified by the mission we give the reviewers: select only the very best work. Once we relax this constraint on reviewers, double-blind peer review becomes much less necessary. In some sense, double-blind peer review is a way to make socially acceptable an elitist system. If we want, for example, to increase the representation of women, there are potentially other means that are less intrusive and more positive, like, for example, including more women in the peer review process as reviewers, editors and so forth . The same applies to other biases. For example, you should ensure that people from small colleges are represented, or from poorer or non-English countries. And what about including people who have less orthodox ideas? What about including more outsiders? What about what Stonebraker might call “consumers of the research”? Look at the most desirable conferences in computer science that have adopted double-blind peer review. How many are chaired by people from non-elite institutions? When they organize plenary talks, how many are from non-eliteinstitutions?
At a minimum, if we want to get more constructive reviews, we should give serious consideration to the demand that pre-publication peer reviews be published. Transparency is
a good, practical strategy to fight undesirable biases and get people to be more constructive. We should be mindful that blinding a process, everything else being equal, makes it less transparent. In an open system, if I give raving reviews to my friends, and harsh reviews to ideas that I hate, I risk being exposed. In a fully blinded process, I can always claim impartiality. But if everyone is blinded bureaucratically, people with unacceptable biases can maintain plausible deniability should they ever be caught. And here is another idea. Do we need the crazy low acceptance rates? In computer science, it is common that fewer than 15% of all papers are accepted. Do we realize that the outcome is unavoidably a power hierarchy controlled by a select few who pick the winners. By accepting more papers, we would necessarily make biases in peer review less harmful. We would reduce the power of the select few. Open source journals like PLOS One have shown that you can turn peer review away from a selection of the winners to a pruning of the bad research, with good results. The argument used to be that the conference was to be held in a hotel with only so many rooms, but zoom and youtube have millions of rooms. Of course, the downside then is that hiring and promotion committees cannot simply count the number of papers at prestigious venues and they must read the papers and discuss them. It is hard work. And the candidate can no longer just offer a list of papers, they have to explain why their work matters in a way that we can understand. I do not think that the initial submission is the right time to judge the importance of a piece of work. If you look at even the best venues, most of the accepted papers are not impactful. That’s not the authors’ fault. It is just that really impactful work is rare and unpredictable. And it often takes time before we can recognize it. And different people will value different papers. By insisting that referees can reliably select the very best work, we fail to take into account the thoroughly documented limitations of pre-publication peer review. In some sense, by making it look more objective, we make things worse. We should just acknowledge that pre-publication reviews are intrinsically limited and build the system with these limitationsin mind.
Though the problems that double-blind peer review seeks to address are real and significant, double-blind peer review is itself a rather crude and pessimistic solution that has several undesirable consequences. We can do better. (Presented at the ACM Publications Board Meeting, November 19th 2020) FURTHER READING: Gender and peer review UPDATE: I love Peer Review: Implementing a “publish, then review”model of publishing
APPENDIX: Some selected reactions from twitter… > I agree wholeheartedly with @lemire > . Fighting nepotism > with double blind is like trying to stop a mudslide with your bare > hands. It’s a law that the fuzzier the criteria to measure > quality, the more success (perceived value) depends on network > effects https://t.co/gSTLeA3npL 1/n>
> — Balázs Kégl (@balazskegl) November 25, 2020>
> This thread by @balazskegl> and post by
> @lemire make some> good points.
>
> I’ve never been a big fan of double-blind reviewing. Just like > there’s “security theater,” double-blind reviewing seems like > “objectivity theater.” It makes people feel better without > necessarily helping. https://t.co/nLAZLCCmM3>
> — Lev Reyzin (@lreyzin) November 25, 2020>
Posted on November 19, 2020December 3, 2020Author
Daniel Lemire Categories 31 Comments on Double-blind peer review is a bad idea SCIENCE AND TECHNOLOGY LINKS (NOVEMBER 14TH 2020) * COVID 19 forced enterprises to move to remote work. There has been decades of research showing that allowing workers to work remotely improves job satisfaction and productivity. It improves work-family balance. It reduces sick leaves. Not absolutely everything is positive, but much if it is. So why are employers reluctant to allow remote work? According to some researchers, it has to do with worker selection. That is, everything else being equal, if you recruit people to work from home, you will tend to disproportionally attract people who are lazy or incompetent. (I am not sure how broadly applicable this idea is.) * There is increasing evidence that Alzheimer’s begins in the gut.
* The claim that more people are alive today than have ever diedappears to be wrong
.
* Schools adopt face recognition technology.
* Increasing your protein consumption is likely to make your bodymore muscular
:
_slightly increasing current protein intake for several months by 0.1 g/kg/d in a dose-dependent manner over a range of doses from 0.5 to 3.5 g/kg/d may increase or maintain lean body mass_. * Is social science free from political biases? Despite what they assume, social scientists are probably not free from such biases and the consequences are probably quite bad, say Honeycutt and Jussim.
For example, papers finding biases against women receive far more citations than papers failing to find such biases, despite the fact that the papers finding biases might be far weaker methodologically. * Measured intelligence in human beings vary by ethnic origin. Lasker et al. attempt to relate this effect to both skin color and European ancestry. They find that skin color is not a significant variable while European ancestry appears to correlate well with measured intelligence . The whole topic is often considered to be outside of the Overton window and most social scientists would consider such inquiries to be unacceptable. I personally object to the current state of intelligence research on other grounds: as a computer scientist, I find that psychologists play with the concept of intelligence without ever definining it properly. That is, while you might be measuring something, you should make sure that you really understand what you are measuring. Someone’s height is a well defined attribute but “intelligence” is not a comparably well defined attribute. That you can quantify “something” does not imply that you know what you are measuring. I challenge psychologists to relate intelligence to the Church-Turing thesis. * In mammals, babies can often repair injuries without scars, but this ability is quickly lost and adults accumulate scars over time. There is protein found in the skin of baby mice, but usually not present in adult mice. When applying this protein to the skin of adult mice, we find that the adult skin regains the baby-skin ability to regenerate without scars . In effect, this single protein rejuvenates the adult skin. * According to Carlsmith, we might be within range of being able to match the human brain using maybe tens of thousands powerfulprocessors
. Using
current technology, it would be costly though not for corporations like Google. In fact, the cost is sufficiently low that the work could be done in secret, if Carlsmith is right. * 1% of the world’s population emits 50% of CO2 from commercialaviation .
* Apple has released a processor/system for their laptop called M1. It powers both the recently released MacBook Air and the smaller MacBook Pro. It has 16 billion transistors . Unsurprisingly, maybe, that is more than the number of transistors that you can find in the latest iPhone, which has about 12 billion transistors. But the
iPhone 7 had about 3.3 billion transistors . The iPhone 5s had about a billion transistors . If you look at long-term charts of the number of transistors inside our systems , we appear to be maintaining an exponential growth in the number of transistors. Interpreted as an exponental fall in the number of transistors in commonly available processors, Moore’s law is very much aliveeven though we keep
hearing that the end is in sight.
In turn, this unavoidably leads to higher and higher performance as our chips are able to do more per unit of time. Interestingly, the power usage itself also tends to fall. The early Pentium 4 mobile processors at the beginning of the century consumed 35 Watts for the processor alone: you can probably charge your whole iPhone for a day of us using 35 Watts for 15 minutes. For comparison, you brain consumes about 20 Watts. * Though we do not have AIDS (HIV) vaccine yet, we might finally have a drug that reliably protects us (at least women) from gettinginfected
.
Posted on November 14, 2020Author
Daniel Lemire Categories 4 Comments on Science and Technology links (November 14th 2020) XBOX SERIES X AND PLAYSTATION 5: EARLY IMPRESSIONS This week, my family got a copy of each new major game console: the Microsoft Xbox Series X and the Sony PlayStation 5. I haven’t yet had time to try them out well, but I know enough to give my firstimpressions.
They are both very similar machines from the inside. The same kind of processor, the same kind of memory. Reportedly, the Xbox Series X has a few more cores, and it might be the fastest of the two, but it is only fair to say that they are close. They sell at a comparable price. But of these consoles look at first glance like an incremental upgrade on the previous generation. Though the PlayStation 5 is much taller than a PlayStation 4, it is basically functionally the same. I just removed the PlayStation 4 and put the PlayStation 5 instead. In another room where we tried it, it would not work, but it had to do with a bad HDMI cable. Using the Sony PlayStation 5 with the provided HDMI cable solved the problem. Upgrading to the PlayStation 5, we were able to bring back all our games, and they appear to work well. The PlayStation 5 controller is like nothing I have ever experienced before. It is not that the Xbox Series X has a bad controller: they appear to be much the same. But the PlayStation 5’s “haptic feedback” feels like a form of virtual reality. You can feel textures, and water, and so forth. It is also much slicker looking that the PlayStation 4 controller. It remains to be seen whether game makers will take advantage of the new controller. Both machines offer a qualitatively different experience: they are both very fast. So much faster than the previous generation that you get a real leap. Everything is snappier. The PlayStation 5 has a fast, but tiny disk. Our disk is already almost full. There is no way to expand it right now. You can connect an external drive, but it will only help you with legacy (e.g. PlayStation 4) games. This is going to be a big problem, and quick. The Xbox Series X has a more reasonable disk, but it will also fillquickly.
They are both quiet game consoles. They generate a fair amount of heat but they do so quietly. The Sony PlayStation 5 appears to fully support bluetooth components, while the XBox Series X only supports hardware adopting Microsoft’s proprietary wireless technology. The XBox Series X has a legacy USB port that can be used to recharge your controller… and not much else. You cannot hook a microphone or speakers to it. To connect a speaker or a microphone to your XBox Series X without going through the television, you have to hook it up to the controller through a dongle. The Sony PlayStation 5 has modern and seemingly fully functional USB ports. The XBox Series X has a wide range of games available through Microsoft gamepass, and the price is attractive. There are few new generation titles, but the XBox Series X makes it up in volume. The fact that there are relatively few (if any) games exclusive to the XBox Series X probably makes it a less exciting console if you already have an XBox. The Sony PlayStation 5 can play your PlayStation 4 games, and it has a few interesting titles coming out. I am looking forward to receiving and trying Spider-Man Miles Morales. Overall, it is a great time to be a gamer, especially if you can afford these consoles. If not, you might rejoice in the fact that used XBox and PlayStation consoles just got cheaper. Posted on November 14, 2020Author
Daniel Lemire Categories Leave a comment on Xbox Series X and PlayStation 5: early impressions BENCHMARKING THEOREM PROVERS FOR PROGRAMMING TASKS: YICES VS. Z3 One neat family of tools that most programmers should know about are “theorem provers”. If you went to college in computer science, you may have been exposed to them… but you may not think of using themwhen programming.
Though I am sure that they can be used to prove theorems, I have never used them for such a purpose. They are useful for quickly checking some assumptions and finding useful constants. Let me give a simpleexample.
We have that unsigned odd integers in software have multiplicative inverses. That is, if you are given the number 3, you can find another number such that when you multiply it with 3, you get 1. There are efficient algorithms to find such multiplicative inverses, but a theorem prover can do it without any fuss or domain knowledge. You can write the following Python program:s = Solver()
a = BitVec('a', 64)s.add(a*3 == 1)
s.check()
print(s.model())
It will return 12297829382473034411. As 64-bit unsigned integers, if you multiply 12297829382473034411 with 3, you get back 1. If there was no possible solution, the theorem prover would tell as well. So it can find useful constants, or prove that no constant can be found. For some related tasks, I have been using the popular z3 theorem prover and it has served me well. But it can be slow at times. So I asked Geoff Langdale for advice and he recommended yices, another theorem prover that might be faster for the kind of work that programmers do, e.g., using fixed-bit integer values. Though I trust Geoff, I wanted to derive some measures. So I built the following benchmark. For all integers between 0 and 1000, I try to find a multiplicative inverse. It will not always work (even numbers do not have inverse), but the theorem prover is left to figure thatout.
What are the results?z3
15 s
yices
1 s
So, at least in this one test, yices is 15 times faster than z3. My Python scripts are available.
You can install z3 and yices by using the standard pip tool. Be mindful that yices should be present on your system, but the authors provide easy instructions . I found the Python interface of yices to be quite painful compared to z3. So if performance is not a concern, z3 might serve you well. But why refer to performance? Go back the numbers above. To solve 1000 inverse problems in 15 s is really quite slow on a per number basis. It is on the order of 60 million CPU cycles per number. And it is an easy problem. As you start asking more complicated questions, a theorem prover can quickly slow down to the point of becoming unusable. Being able to go just 10x faster can make a large differencein practice.
CAVEAT: It is just one test and it does not, in any way, establish the superiority (in general) of yices over z3. Posted on November 8, 2020Author
Daniel Lemire Categories 6 Comments on Benchmarking theorem provers for programming tasks: yices vs. z3 HOW WILL THE PANDEMIC IMPACT SOFTWARE PROGRAMMING JOBS? Software programming is not for everyone, but among the careers that are mostly unregulated, and thus mostly free from rents, it has consistently been one of the best choices. You can earn more money if you embrace some professions that are regulated (e.g., medical professional), but if you are a recent immigrant, or someone who could not afford college education, programming is a decent and accessiblechoice.
I expect that what makes it a good avenue is a mix of different uniquefeatures:
* It is relatively easy to tell a good programmer from a bad one. It is hard to produce correct and efficient software “by accident”. Thus even if you lack the best credentials, you can still “prove” that you are good, quickly. * It is one of the few industry that has been consistently innovating. Thus there are always new jobs created. Once we are done putting businesses online, mobile applications appear, and so forth. So what happens when a pandemic happens and remote work becomes the norm all of a sudden? It is impossible to predict the future, but I like to put my views in concrete terms, with a time stamp on them. I have been programming for decades and my impression is that you do not learn to program by taking classes. Not really. You can learn the basics that way, but nothing close to what you need to be a productive member of the industry. In this respect, programming is not unique. I do not think you can take Japanese classes and expect to show up in Tokyo and be a functional member of the city. Simply put, there is much that is not formalized. In programming, there is also the additional problem that the best programmers are often doing something else besides teaching. It is entirely possible that the very best historians are also teaching, but the very best programmers are programming not teaching. You do not become a computer science professor based on your programming skills. In fact, most computer science professors have never released generally useful software. Thankfully, you can learn to program on your own. My youngest son just finished a complete video game, written in C# using Unity. It should appear on Steam soon. I never taught my son any programming. Not really. He did take a few classes for fun, but he is almost entirelyself-taught.
Yet, human beings are social creatures. If you want to “up your game”, you need to see what the very best people are doing, you need to be challenged from them. It is possible online. My best advice to people who wanted to become good programmers was to go and work with a master. If you work with someone who is a very good programmer, you will learn. You will learn faster than you ever could on your own. I, myself, have learned a lot from the wide range of superb programmers I have had the pleasure of working with. Of course, it is still possible for a junior programmer to work with an experience master despite the pandemic. However, my impression is that it has become harder. I can only base it on my limited view of the world, but I am much less interested in taking in new graduate students and research assistants today. I had a “lab”: a room filled with graduate students and a few research assistant. These people would come work, I would come in at random times during the day, we would chat, we would look at code on the giant white board. Sometimes, on Fridays, we would play games. There are even rumours that beer was available at times. The room is still there. I am no longer showing up. The white board is probably blank (I don’t know). I use Zoom, extensively, but I cannot believe that it is the same effect. The camaraderie is gone. My experience might be unique, but if it is at all representative of what is happening, I bet that many junior folks are getting much less personal training and coaching that they otherwise would. If that iscorrect…
I predict that there will be fewer new hires. I expect that unexperienced programmers will be less appealing than ever. Any challenge making training and coaching harder is bound to reduce theirnumber.
Meanwhile, people who know what they are doing and can be relied to work well from home are going to be more in demand than ever. Since it describes the very best programmers earning the very best salaries, what this suggests is that the salary distribution will spread even more. A few top programmers will receive the salaries that would have otherwise gone to the younger programmers. It may also lead to some industry concentration. If it is harder to find “fresh blood”, then it makes it harder to start a new company. Many of the local tech talks had less to do with the speakers and more to do with meeting new faces and discussing employment. We have been told for years how the secret to the Silicon Valley was in the impromptu meeting by the local burger joint… What happens when people work from home? If the narrative about Silicon Valley was at all true, then you would expect fewer new companies. Longer term, I do not believe that this should impact the innovation rate in the software industry. People will adjust. However, I think that short-term job prospects for the younger programmers are going tobe difficult.
CREDIT: This blog post is motivated by an exchange on Twitter with Richard Startin and Ben Adams. Posted on November 1, 2020Author
Daniel Lemire Categories 3 Comments on How will the pandemic impact software programming jobs? SCIENCE AND TECHNOLOGY LINKS (OCTOBER 31ST 2020) * Amazon has 1 million employees.
* “The iPhone 12 contains a Lidar. The first 3D Lidar was released a decade ago and cost $75,000.” (Calum Chace)
* There is water on the Moon, possibly
enough to make fuel. * Good looking people have greater social networks and may receive favorable treatment from others, but it is a mixed blessing. They are
better supported, but might also be enticed to party more and invest more in sex which takes time away from work. * It looks like the regular use of skin creams could reduce inflammation in your whole bodyand thus,
possibly, keep you healthier. (speculative) * You can predict someone’s heightwithin a few
centimeters from their genes. * We found new salivary glands hidden under our skull’s base.
* People are driving forklifts remotelyfrom an office.
* Toronto (the Canadian city) is going to try out automated shuttles.
* Genes may predict mathematical abilities and related brain volume.
* Bees have five eyes.
* In vitro (in laboratory), we have been able to regenerate cartilage . This will not help you in the near future if you have joint pains, but people in the future may fare better. * As we age, we accumulate senescent cells and they are believed to cause trouble. Senolytics are midly toxic compounds that target senescent cells and destroy them. Researchers found that a particular senolytic proved capable of improving frailty and cognitive functionsin old mice .
There are ongoing clinical trials regarding senolytic drugs in human beings, but we still have some time to go. * In A global decline in research productivity? Evidence from Chinaand Germany
,
the authors verify recent results related the United Statespointing
that while the number of researchers is steadily increasing, high-value outputs do not seem to increase at a similar rate. One possible implication for these results is that, keeping everything else equal, increasing the number of researchers is wasteful. In fact, it may suggest that we are overesting in the production of new researchers (i.e., we might be training too many PhDs). My own take is that we are insufficiently preoccupied with research productivity. We encourage researchers to write grant applications, publish papers, acquire rents (i.e., patents), but innovation is based on a “throw over the wall” model from the researcher’s point of view. A typical researcher believe that it is not his or her purpose to enhance products, cure diseases and so forth. The simplistic approach of “getting more researchers” may therefore not translate into new innovative products and cancer cures. To get to Mars, we may need more people like Elon Musk and Jeff Bezos, more Moon projects, and fewer new PhDs. Even if you disagree with this last assertion, the fact is that it becomes harder and harder to justify training more PhDs in the hope of getting more prosperity. Posted on October 31, 2020Author
Daniel Lemire Categories 1 Comment on Science and Technology links (October 31st 2020) WHAT THE HECK IS THE VALUE OF “-N % N” IN PROGRAMMING LANGUAGES? When coding efficient algorithms having to do with hashing, random number generations or even cryptography, a common construction is the expression “-n%n“. My experience has been that it confuses many programmers, so let us examine it further. To illustrate, let us look at the implementation of std::uniform_int_distribution found in the GNU C++ library (Linux) and clean up the line in question: threshold = -range % range; The percent sign (%) in this expression refers to the modulo operation. It returns the remainder of the integer division. To simplify the discussion, let us assume that range is strictly positive since dividing by zero causes problems. We should pay attention to the leading minus sign (–). It is the unary operator that negates a value, and not the subtraction sign. There is a difference between “-range % range" and “0-range % range". They are not at all equivalent. They will actually give you different values; the latter expression is always zero. And that is because of the priority of operation. The negation operation has precedence on the modulo operation which has precedence on the subtraction operation. Thus you can rewrite “-range % range" as “(-range) % range". And you can write “0-range % range" as “0-(range % range)“.
When the variable range is a signed integer, then the expression -range % range is zero. In a programming language with only signed integers, like Java, this expression is always zero. So let us assume that the variable range is an unsigned type, as it is meant to be. In such cases, the expression is generally non-zero. We need to compute -range. What does it mean to negate an unsignedvalue?
When the variable range is an unsigned type, Visual Studio is likely to be unhappy at the expression -range. A recent Visual Studio returns the following warning: warning C4146: unary minus operator applied to unsigned type, resultstill unsigned
Nevertheless, I believe that it is a well defined operation in C++, Go and many other programming languages. Jonathan Adamczewski has a whole blog post on the topic which suggests that the Visual Studio warning is best explained by a historical deviations from the C++ standard from the Microsoft Visual Studio team. (Note that the current Visual Studio team seems committed to the standards going forward.) My favorite definition is that –range is defined by range + (-range) = 0. That is, it is the value such that when you add it to range, you get zero. Mathematicians would say that it is the “additive inverse”. In programming languages (like Go and C++) where unsigned integers wrap around, then there is always one, and only one, additive inverse to every integer value. You can define this additive inverse without the unary negation: if max is the maximum value that you can represent, then you can replace –range by maximum – range + 1. Or, maybe more simply, as (0-range). And indeed, in the Swift programming language, this particular line was represented as follow:
let threshold = (0 &- range) % range The Swift language has two subtraction operations, one that is not allowed to overflow (the usual ‘-‘), and one that is allowed to overflow (‘&-‘). It is somewhat inconvenient that Swift forces us to write so much code, but we must admit that the result is probably less likely to confuse a good programmer. In C#, the system will not let you negate an unsigned integer and will instead cast it as a signed integer, so you have to go the long way around if you want to remain in unsigned mode, like so… threshold = (uint.MaxValue - scale + 1) % scale This expression is unfortunately type specific (here uint). To conclude: you can learn a lot just by examining one line of code. To put it another way, programming is a much deeper and complex practice than it seems at first. As I was telling a student of mine yesterday: you are not supposed to read new code and understand it right away all of the time. Posted on October 28, 2020November 6, 2020Author
Daniel Lemire Categories 17 Comments on What the heck is the value of “-n 17 n” in programming languages? RIDICULOUSLY FAST UNICODE (UTF-8) VALIDATION One of the most common “data type” in programming is the text string. When programmers think of a string, they imagine that they are dealing with a list or an array of characters. It is often a “good enough” approximation, but reality is more complex. The characters must be encoded into bits in some way. Most strings on the Internet, including this blog post, are encoded using a standard called UTF-8. The UTF-8 format represents “characters” using 1, 2, 3 or 4 bytes. It is a generalization of the ASCII standard which uses just one byte per character. That is, an ASCII string is also an UTF-8string.
It is slightly more complicated because, technically, what UTF-8 describes are code points, and a visible character, like emojis, can be made of several code points… but it is a pedantic distinction formost programmers.
There are other standards. Some older programming languages like C# and Java rely on UTF-16. In UTF-16, you use two or four bytes per character. It seemed like a good idea at the time, but I believe that the consensus is increasingly moving toward using UTF-8 all the time,everywhere.
What most character encodings have in common is that they are subject to constraints and that these constraints must be enforce. To put it another way, not any random sequence of bits is UTF-8. Thus you must validate that the strings you receive are valid UTF-8. Does it matter? It does. For example, Microsoft’s web server had a security vulnerability whereas one could send URIs that would appear to the security checks as being valid and safe, but once interpreted by the server, would allow an attacker to navigate on the disk of the server. Even if security is not a concern, you almost surely want to reject invalid strings before you store them in your database as it is a form of corruption. So your programming languages, your web servers, your browsers, your database engines, all validate UTF-8 all of the time. If your strings are mostly just ASCII strings, then checks are quite fast and UTF-8 validation is no issue. However, the days when all of your strings were reliably ASCII strings are gone. We live in the world of emojis and international characters. Back in 2018, I started wondering… How fast can you validate UTF-8strings?
The answer I got back then is a few CPU cycles per character. That may seem satisfying, but I was not happy. It took years, but I believe we have now arrived at what might be close to the best one can do: the lookup algorithm. It can be more than ten times faster than common fast alternatives. We wrote a research paper about it: Validating UTF-8 In Less Than One Instruction Per Byte (to appear at Software: Practice and Experience). We have also published our benchmarking software.
Because we have a whole research paper to explain it, I will not go into the details, but the core insight is quite neat. Most of the UTF-8 validation can be done by looking at pairs of successive bytes. Once you have identified all violations that you can detect by looking at all pairs of successive bytes, there is relatively little left todo (per byte).
Our processors all have fast SIMD instructions. They are instructions that operate on wide registers (128 bits, 256 bits, and so forth). Most of them have a “vectorized lookup” instruction that can take, say, 16 byte values (in the range 0 to 16) and look them up in a 16-byte table. Intel and AMD processors have the pshufb instruction that match this description. A value in the range 0 to 16 is sometimes called a nibble, it spans 4 bits. A byte is made of two nibbles (the low and high nibble). In the lookup algorithm, we call a vectorized lookup instruction three times: once on the low nibble, once on the high nibble and once on the high nibble of the next byte. We have three corresponding 16-byte lookup tables. By choosing them just right, the bitwise AND of the three lookups will allow us to spot any error. Refer to the paper for more details , but the net result is that you can validate almost entirely a UTF-8 string using roughly 5 lines of fast C++ code without any branching… and these 5 lines validate blocks as large as 32 bytes at a time… simd8 classify(simd8 input, simd8 previous_input) { auto prev1 = input.prev<1>(previous_input); auto byte_1_high = prev1.shift_right <4>().lookup_16(table1); auto byte_1_low = (prev1 & 0x0F).lookup_16(table2); auto byte_2_high = input.shift_right <4>().lookup_16(table3); return (byte_1_high & byte_1_low & byte_2_high);}
It is not immediately obvious that this would be sufficient and 100% safe. But it is . You only need a few inexpensive additional technical steps. The net result is that on recent Intel/AMD processors, you need just under an instruction per byte to validate even the worse random inputs, and because of how streamlined the code is, you can retire more than three such instructions per cycle. That is, we use a small fraction of a CPU cycle (less than 1/3) per input byte in the worst case on a recent CPU. Thus we consistently achieve speeds of over 12GB/s.
The lesson is that while lookup tables are useful, vectorized lookup tables are fundamental building blocks for high-speed algorithms. If you need to use the fast lookup UTF-8 validation function in a production setting, we recommend that you go through the simdjsonlibrary
(version 0.5 or better). It is well tested and has features to make your life easier like runtime dispatching. Though the simdjson library is motivated by JSON parsing, you can use it to just validate UTF-8 even when there is no JSON in sight. The simdjson supports 64-bit ARM and x64 processors, with fallback functions for other systems. We package it as a single header file along with a single source file; so you can pretty much just drop it into your C++ project. CREDIT: Muła popularized more than anyone the vectorized classification technique that is key to the lookup algorithm. To my knowledge, Keiser first came up with the three-lookup strategy. To my knowledge, the first practical (non hacked) SIMD-based UTF-8 validation algorithm was crafted by K. Willets. Several people, including Z. Wegner showed that you could do better. Travis Downs also provided clever insights on how to accelerate conventional algorithms. FURTHER READING: If you like this work, you may like Base64 encoding and decoding at almost the speed of a memory copy (Software: Practice and Experience 50 (2), 2020) and Parsing Gigabytes of JSON per Second (VLDB Journal 28 (6), 2019). Posted on October 20, 2020October 20, 2020Author
Daniel Lemire Categories 20 Comments on Ridiculously fast unicode (UTF-8) validation SCIENCE AND TECHNOLOGY LINKS (OCTOBER 17TH 2020) * Computer vision (i.e., artificial intelligence) and cameras are used in London to monitor citizens with respect to social distancing.
* A fecal transplant from old mice to young mice appears to “agethe young mice
“.
It appears that the reverse might also work: fecal transplants from the young to the old could “rejuvenate the old”. (Speculative.) * Using high levels of vitamin D supplements appear safe and has benefits on diseases like asthma and psoriasis.
* Obesity and type 2 diabetes are associated with low bone turnover along with an increased fracture risk,
but we do not understand why. * Men have greater variability in brain structure, and that
is true at all ages. Posted on October 17, 2020Author
Daniel Lemire Categories Leave a comment on Science and Technology links (October 17th 2020) WHY IS 0.1 + 0.2 NOT EQUAL TO 0.3? In most programming languages, the value 0.1 + 0.2 differs from 0.3. Let us try it out in Node (JavaScript): > 0.1 + 0.2 == 0.3false
Yet 1 + 2 is equal to 3. Why is that? Let us look at it a bit more closely. In most instances, your computer will represent numbers like 0.1 or 0.2 using binary64.
In this format, numbers are represented using a 53-bit mantissa (a number between 252 and 253) multiplied by a power of two. When you type 0.1 or 0.2, the computer does not represent 0.1 or 0.2exactly.
Instead, it tries to find the closest possible value. For the number 0.1, the best match is 7205759403792794 times 2-56. It is just slightly larger than 0.1, about 0.10000000000000000555. Importantly, this is a bit larger than 0.1. The compute could have used 7205759403792793 times 2-56 or 0.099999999999999991 instead, but it is a slightly worse approximation. For 0.2, the computer will use 7205759403792794 times 2-55 or about 0.2000000000000000111. Again, this is just slightly larger than than0.2.
What about 0.3? The compute will use 5404319552844595 times 2-54, or approximately 0.29999999999999998889776975, so just under 0.3. When the computer adds 0.1 and 0.2, it has no longer any idea what the original numbers are. It only has 0.10000000000000000555 and 0.2000000000000000111. When it adds them together, it seeks the best approximation to the sum of these two numbers. It finds, unsurprisingly, the a value just above 0.3 is the best match: 5404319552844596 times 2-54, or approximately 0.30000000000000004440. And that is why 0.1 + 0.2 is not equal to 0.3 in software. When you stream different sequences of approximations, even if the exact values would be equal, there is no reason to expect that your approximationswill match.
If you are working a lot with decimals, you can try to rely on another computer type, the decimal. It is much slower, but it would not have this exact problem since it is designed specifically for decimalvalues:
>>> Decimal(1)/Decimal(10) + Decimal(2)/Decimal(10)Decimal('0.3')
However, decimals have other problems: >>> Decimal(1)/Decimal(3)*Decimal(3) == Decimal(1)False
What is going on? What can’t computers support numbers the way humanbeings do?
Computers can do computations the way human beings do. For example, WolframAlpha has none of the problems above. Effectively, it gives the impression that it processes values a bit like human beings do. But it is slow. You may think that computers being so fast, there is really no reason of being inconvenienced at the expense of speed. And that may well be true, but many software projects that start out believing that performance is irrelevant, end up being asked to optimize later. And it can be really difficult to engineer speed back into a system that sacrificed performance at every step.Speed matters.
AMD recently released its latest processors (zen3).
They are expected to be 20% faster than their previous family of processors (zen2). This 20% performance boost is viewed as a remarkable achievement. Going only 20% faster is worth billions ofdollars to AMD.
Posted on October 10, 2020October 12, 2020Author
Daniel Lemire Categories 6 Comments on Why is 0.1 + 0.2not equal to 0.3?
SCIENCE AND TECHNOLOGY LINKS (OCTOBER 3RD 2020) * The mortality rate for kids under five have fallen by 60% since1990
.
* Samsung new storage drives are both affordable and really fast (upto 7GB/s)
.
* Alzheimer’s disease may be driven by overactivation of cerebralfructose metabolism
.
The researchers write: _we propose that Alzheimer’s disease is a modern disease driven by changes in dietary lifestyle in which fructose can disrupt cerebral metabolism and neuronal function_. * Belly fat increases your mortality risk : _A nearly J shaped association was found between waist circumference and waist-to-height ratio and the risk of all cause mortality in men and women_. It does not appear that being fat in other ways is harmful: _larger hip circumference and thigh circumference were associated with a lower risk_. So instead of weighting yourself, you ought to watch your waistcircumference.
* Elderly people in Finland are living longer while being also fitter and healthier.
* Investing in innovation pays : _innovation efforts produce social benefits that are many multiples of the investment costs_. * Gender difference in occupational preferences is largely independent of individual, parental, and regional controls:
> We document that female apprentices tend to choose occupations that > are oriented towards working with people, while male apprentices > tend to favor occupations that involve working with things. In fact, > our analysis suggests that this variable is by any statistical > measure among the most important proximate predictors of > occupational gender segregation. Posted on October 3, 2020Author
Daniel Lemire Categories 1 Comment on Science and Technology links (October 3rd 2020) HOW EXPENSIVE IS INTEGER-OVERFLOW TRAPPING IN C++? Integers in programming languages have a valid range but arithmetic operations can result in values that exceed such ranges. For example, adding two large integers can result in an integer that cannot be represented in the integer type. We often refer to such error conditions as overflows. In a programming languages like Swift, an overflow will result in the program aborting its execution. The rationale is that once an arithmetic operation has failed, everything else the program might be doing is suspect and you are better off aborting the program. Most other programming languages are not so cautious. For example, a Rust program compiled in release mode will not abort by default. In C++, most compilers will simply ignore the overflow. However, popular compilers give you choices. When using GCC and clang, you can specify that integer overflows should result in a program crash (abort) using the -ftrapv flag. I was curious about the performance implications so I wrote a small program that simply adds all of the values in a large array.
The answer I sought turns out to depend critically on the choice ofcompiler:
GCC 9
CLANG 9
no trapping
0.17 ns/int
0.11 ns/int
trapping
2.1 ns/int
0.32 ns/int
slowdown
12 x
3 x
With no trapping, the clang compiler beats GCC (0.11 vs. 0.17) by a 50% margin but this should not preoccupy us too much: it is a singlemicrobenchmark.
What is a lot more significant is that enabling overflow trapping in GCC incurs an order of magnitude slowdown. Though it is only one microbenchmark, the size of the result suggests that we should be concerned. Looking at the assembly, I find that the clang compiler generates sensible code on x64 processor, with simple jumps added when the overflow is detected. Meanwhile, GCC seems to call poorly optimized runtime library functions. Overall this one test does establish that checking for overflows canbe expensive.
CREDIT: This blog post was motivated by an email by Stefan Kanthak. Posted on September 23, 2020September 23, 2020Author
Daniel Lemire Categories 11 Comments on How expensive is integer-overflow trapping in C++? SCIENCE AND TECHNOLOGY LINKS (SEPTEMBER 19TH 2020) * A large city dating back 4,300 years has been discovered in China.
It predates the Chinese civilization. At its center was a wide pyramid supporting a 20-acre palace. Little is known about the people living there other than the fact that they had relatively advancedtechnology.
* Genetic testing of dead warriors from 3000 years ago in Germany reveals that they could not drink milk.
It is unclear when Europeans began to drink milk. * Video games might be especially beneficial to boys as it improvestheir literacy
.
* Anti‐inflammatory treatment rescues age-related memory deficitsin mice
.
* People today live several years older than thirty years ago. The core of the improvement is due to a reduction of deaths due to heartdisease
.
Meanwhile drug overdoses are more lethal than they were. Posted on September 19, 2020Author
Daniel Lemire Categories Leave a comment on Science and Technology links (September 19th 2020)POSTS NAVIGATION
Page 1 Page 2 … Page 101Next page
Proudly powered by WordPressDetails
Copyright © 2024 ArchiveBay.com. All rights reserved. Terms of Use | Privacy Policy | DMCA | 2021 | Feedback | Advertising | RSS 2.0