More Annotations

Favourite Annotations

Text

HOW TO PRINT STRINGS AND INTEGERS IN INTEL ASSEMBLY ON Print strings. Then, to print, we will call the sys_write system call : The value in %eax (4) indicates the system call we need ( sys_write ). The 1 in %ebx indicates that we want to write in the console. Finally the two last parameters indicates the string to print and the size of the string. In Intel assembly, the int instruction launch an CATCH: A POWERFUL YET SIMPLE C++ TEST FRAMEWORK Recently, I came accross a new test framework for C++ program: Catch. Until I found Catch, I was using Boost Test Framework. It is working quite well, but the problem is that you need to build Boost and link to the Boost Test Framework, which is not highly convenient. INSTALL AND USE CLANG STATIC ANALYZER ON A CMAKE PROJECT I recently started a bit of work on my compiler (eddic) again. I started by adapting it to build on CLang with libc++. There was some minor adaptions to make it compile, but nothing really fancy. It n C++17 MIGRATION OF EXPRESSION TEMPLATES LIBRARY (ETL In ETL, I have a make_temporary function. This function either forwards an ETL container or creates a temporary container from an ETL expression. This is based on a compile-time traits. The return type of the function is the not the same in both cases. What you did in those case before C++17, is use SFINAE and make two functions: template DEEP LEARNING LIBRARY 1.0 Deep Learning Library 1.0 - Fast Neural Network Library. I'm very happy to announce the release of the first version of Deep Learning Library (DLL) 1.0. DLL is a neural network library with a focus on speed and ease of use. I started working on this library about 4 years ago for my Ph.D. thesis. USE TEMPLIGHT AND TEMPLAR TO DEBUG C++ TEMPLATES templight++ -Xtemplight -profiler -Xtemplight -memory -Xtemplight -ignore-system -std = c++14 main.cpp. All the templight options starts with -Xtemplight and then you can use any clang++ options. This will generate a a.memory.trace.pbf file in the current directory. You can then run Templar. use File > Open Trace to open the trace file. C++11 CONCURRENCY TUTORIAL Indeed, in most cases, the std::atomic operations are implemented with lock-free operations that are much faster than locks. The C++11 Concurrency Library introduces Atomic Types as a template class: std::atomic. You can use any Type you want with that template and the operations on that variable will be atomic and so thread-safe. SIMPLIFY YOUR TYPE TRAITS WITH C++14 VARIABLE TEMPLATES Often if you write templated code, you have to write and use a lot of different traits. In this article, I'll focus on the traits that are representing values, typically a boolean value. For instance, SHORT REVIEW OF BULLSEYE COVERAGE Short review of Bullseye Coverage. Bullseye is a commercial Code Coverage analyzer. It is fully-featured with an export to HTML, to XML and even a specific GUI to see the application.It costs about 800$, with a renewal fee of about 200$ per year. I'm currently using gcov and passing the results to Sonar. This works well, but there are

several

VIVALDI + VIMIUM = FINALLY NO MORE FIREFOX! First, you have to only display the Vivaldi button in the settings page. Then, you can use this custom CSS: to hide the title completely! To get rid of the scroll bar, you need to use the Stylish extension and use this custom CSS: If you want to have full HTML5 video support, you need to install extra codecs. HOW TO PRINT STRINGS AND INTEGERS IN INTEL ASSEMBLY ON Print strings. Then, to print, we will call the sys_write system call : The value in %eax (4) indicates the system call we need ( sys_write ). The 1 in %ebx indicates that we want to write in the console. Finally the two last parameters indicates the string to print and the size of the string. In Intel assembly, the int instruction launch an CATCH: A POWERFUL YET SIMPLE C++ TEST FRAMEWORK Recently, I came accross a new test framework for C++ program: Catch. Until I found Catch, I was using Boost Test Framework. It is working quite well, but the problem is that you need to build Boost and link to the Boost Test Framework, which is not highly convenient. INSTALL AND USE CLANG STATIC ANALYZER ON A CMAKE PROJECT I recently started a bit of work on my compiler (eddic) again. I started by adapting it to build on CLang with libc++. There was some minor adaptions to make it compile, but nothing really fancy. It n C++17 MIGRATION OF EXPRESSION TEMPLATES LIBRARY (ETL In ETL, I have a make_temporary function. This function either forwards an ETL container or creates a temporary container from an ETL expression. This is based on a compile-time traits. The return type of the function is the not the same in both cases. What you did in those case before C++17, is use SFINAE and make two functions: template DEEP LEARNING LIBRARY 1.0 Deep Learning Library 1.0 - Fast Neural Network Library. I'm very happy to announce the release of the first version of Deep Learning Library (DLL) 1.0. DLL is a neural network library with a focus on speed and ease of use. I started working on this library about 4 years ago for my Ph.D. thesis. USE TEMPLIGHT AND TEMPLAR TO DEBUG C++ TEMPLATES templight++ -Xtemplight -profiler -Xtemplight -memory -Xtemplight -ignore-system -std = c++14 main.cpp. All the templight options starts with -Xtemplight and then you can use any clang++ options. This will generate a a.memory.trace.pbf file in the current directory. You can then run Templar. use File > Open Trace to open the trace file. C++11 CONCURRENCY TUTORIAL Indeed, in most cases, the std::atomic operations are implemented with lock-free operations that are much faster than locks. The C++11 Concurrency Library introduces Atomic Types as a template class: std::atomic. You can use any Type you want with that template and the operations on that variable will be atomic and so thread-safe. SIMPLIFY YOUR TYPE TRAITS WITH C++14 VARIABLE TEMPLATES Often if you write templated code, you have to write and use a lot of different traits. In this article, I'll focus on the traits that are representing values, typically a boolean value. For instance, SHORT REVIEW OF BULLSEYE COVERAGE Short review of Bullseye Coverage. Bullseye is a commercial Code Coverage analyzer. It is fully-featured with an export to HTML, to XML and even a specific GUI to see the application.It costs about 800$, with a renewal fee of about 200$ per year. I'm currently using gcov and passing the results to Sonar. This works well, but there are

several

VIVALDI + VIMIUM = FINALLY NO MORE FIREFOX! First, you have to only display the Vivaldi button in the settings page. Then, you can use this custom CSS: to hide the title completely! To get rid of the scroll bar, you need to use the Stylish extension and use this custom CSS: If you want to have full HTML5 video support, you need to install extra codecs. C++17 MIGRATION OF EXPRESSION TEMPLATES LIBRARY (ETL if constexpr. The most exciting new thing in C++17 for me is the if constexpr statement. This is a really really great thing. In essence, it's a normal if statement, but with one very important difference. The statement that is not taken (the else if the condition is true, or the if constexpr if the condition is false) is discarded.And what is interesting is what happens to discarded statements: SIMPLIFY YOUR TYPE TRAITS WITH C++14 VARIABLE TEMPLATES Often if you write templated code, you have to write and use a lot of different traits. In this article, I'll focus on the traits that are representing values, typically a boolean value. For instance, USE TEMPLIGHT AND TEMPLAR TO DEBUG C++ TEMPLATES templight++ -Xtemplight -profiler -Xtemplight -memory -Xtemplight -ignore-system -std = c++14 main.cpp. All the templight options starts with -Xtemplight and then you can use any clang++ options. This will generate a a.memory.trace.pbf file in the current directory. You can then run Templar. use File > Open Trace to open the trace file. C++11 CONCURRENCY TUTORIAL In the previous article, we saw how to use mutexes to fix concurrency problems. In this post, we will continue to work on mutexes with more advanced techniques. We JENKINS DECLARATIVE PIPELINE AND AWESOME GITHUB This worked quite well. Later on, Jenkins introduced the notion of Pipeline. Instead of a single set of commands to be executed, the build was defined in multi-stages pipeline of commands. This is defined as a Groovy script. One big advantage of this is RELEASE OF ZAPCC 1.0 Release of zapcc 1.0 - Fast C++ compiler. If you remember, I recently wrote about zapcc C++ compilation speed against gcc 5.4 and clang 3.9 in which I was comparing the beta version of zapcc against gcc and clang. I just been informed that zapcc was just released in version 1.0. I though it was a good occasion to test it again. SHORT REVIEW OF BULLSEYE COVERAGE Short review of Bullseye Coverage. Bullseye is a commercial Code Coverage analyzer. It is fully-featured with an export to HTML, to XML and even a specific GUI to see the application.It costs about 800$, with a renewal fee of about 200$ per year. I'm currently using gcov and passing the results to Sonar. This works well, but there are

several

HOW TO SPEED UP RAID (5-6) GROWING WITH MDADM ? Increase speed limits. The easiest thing to do is to increase the system speed limits on raid. You can see the current limits on your system by using these commands: sysctl dev.raid.speed_limit_min sysctl dev.raid.speed_limit_max. These values are set in Kibibytes per second

(KiB/s).

HOW TO COMPUTE METRICS OF C++ PROJECT USING CCCC CCCC (C and C++ Code Counter) is a little command-line tool that generates metrics from the source code of a C or C++ project. The output of the tool is a simple HTML website with information about all your sources. CCCC generates not only information about the number of lines of codes for each of your modules, but also complexity metrics

like

HOW TO INSTALL A SPECIFIC VERSION OF GCC ON UBUNTU 11.04 Sometimes you need to install a specific version of gcc for some reasons, for example when you need to have the same compiler version

as the

several

VIVALDI + VIMIUM = FINALLY NO MORE FIREFOX! First, you have to only display the Vivaldi button in the settings page. Then, you can use this custom CSS: to hide the title completely! To get rid of the scroll bar, you need to use the Stylish extension and use this custom CSS: If you want to have full HTML5 video support, you need to install extra codecs. C++17 MIGRATION OF EXPRESSION TEMPLATES LIBRARY (ETL if constexpr. The most exciting new thing in C++17 for me is the if constexpr statement. This is a really really great thing. In essence, it's a normal if statement, but with one very important difference. The statement that is not taken (the else if the condition is true, or the if constexpr if the condition is false) is discarded.And what is interesting is what happens to discarded statements: SIMPLIFY YOUR TYPE TRAITS WITH C++14 VARIABLE TEMPLATES Often if you write templated code, you have to write and use a lot of different traits. In this article, I'll focus on the traits that are representing values, typically a boolean value. For instance, USE TEMPLIGHT AND TEMPLAR TO DEBUG C++ TEMPLATES templight++ -Xtemplight -profiler -Xtemplight -memory -Xtemplight -ignore-system -std = c++14 main.cpp. All the templight options starts with -Xtemplight and then you can use any clang++ options. This will generate a a.memory.trace.pbf file in the current directory. You can then run Templar. use File > Open Trace to open the trace file. C++11 CONCURRENCY TUTORIAL In the previous article, we saw how to use mutexes to fix concurrency problems. In this post, we will continue to work on mutexes with more advanced techniques. We JENKINS DECLARATIVE PIPELINE AND AWESOME GITHUB This worked quite well. Later on, Jenkins introduced the notion of Pipeline. Instead of a single set of commands to be executed, the build was defined in multi-stages pipeline of commands. This is defined as a Groovy script. One big advantage of this is RELEASE OF ZAPCC 1.0 Release of zapcc 1.0 - Fast C++ compiler. If you remember, I recently wrote about zapcc C++ compilation speed against gcc 5.4 and clang 3.9 in which I was comparing the beta version of zapcc against gcc and clang. I just been informed that zapcc was just released in version 1.0. I though it was a good occasion to test it again. SHORT REVIEW OF BULLSEYE COVERAGE Short review of Bullseye Coverage. Bullseye is a commercial Code Coverage analyzer. It is fully-featured with an export to HTML, to XML and even a specific GUI to see the application.It costs about 800$, with a renewal fee of about 200$ per year. I'm currently using gcov and passing the results to Sonar. This works well, but there are

several

(KiB/s).

like

HOW TO INSTALL A SPECIFIC VERSION OF GCC ON UBUNTU 11.04 Sometimes you need to install a specific version of gcc for some reasons, for example when you need to have the same compiler version

as the

BLOG BLOG("BAPTISTE WICHT"); Retirement Calculator. The biggest novelty in this version is the addition of a retirement calculator. This is still very basic, but it may give information on how CATCH: A POWERFUL YET SIMPLE C++ TEST FRAMEWORK Recently, I came accross a new test framework for C++ program: Catch. Until I found Catch, I was using Boost Test Framework. It is working quite well, but the problem is that you need to build Boost and link to the Boost Test Framework, which is not highly convenient. INSTALL AND USE CLANG STATIC ANALYZER ON A CMAKE PROJECT I recently started a bit of work on my compiler (eddic) again. I started by adapting it to build on CLang with libc++. There was some minor adaptions to make it compile, but nothing really fancy. It n C++17 MIGRATION OF EXPRESSION TEMPLATES LIBRARY (ETL In ETL, I have a make_temporary function. This function either forwards an ETL container or creates a temporary container from an ETL expression. This is based on a compile-time traits. The return type of the function is the not the same in both cases. What you did in those case before C++17, is use SFINAE and make two functions: template C++11 CONCURRENCY TUTORIAL Indeed, in most cases, the std::atomic operations are implemented with lock-free operations that are much faster than locks. The C++11 Concurrency Library introduces Atomic Types as a template class: std::atomic. You can use any Type you want with that template and the operations on that variable will be atomic and so thread-safe. DEEP LEARNING LIBRARY 1.0 Deep Learning Library 1.0 - Fast Neural Network Library. I'm very happy to announce the release of the first version of Deep Learning Library (DLL) 1.0. DLL is a neural network library with a focus on speed and ease of use. I started working on this library about 4 years ago for my Ph.D. thesis. HOW TO PRINT STRINGS AND INTEGERS IN INTEL ASSEMBLY ON Print strings. Then, to print, we will call the sys_write system call : The value in %eax (4) indicates the system call we need ( sys_write ). The 1 in %ebx indicates that we want to write in the console. Finally the two last parameters indicates the string to print and the size of the string. In Intel assembly, the int instruction launch an C++11 CONCURRENCY TUTORIAL In the previous article, we saw how to use mutexes to fix concurrency problems. In this post, we will continue to work on mutexes with more advanced techniques. We C++11 PERFORMANCE TIP: WHEN TO USE STD::POW ? Update: I've added a new section for larger values of n.. Recently, I've been wondering about the performance of std::pow(x, n).I'm talking here about the case when n is an integer. In the case when n is not an integer, I believe, you should always use std::pow or use another specialized library. In case when n is an integer, you can actually replace it with the direct equivalent (for instance SIMPLIFY YOUR TYPE TRAITS WITH C++14 VARIABLE TEMPLATES Often if you write templated code, you have to write and use a lot of different traits. In this article, I'll focus on the traits that are representing values, typically a boolean value. For instance, BLOG BLOG("BAPTISTE WICHT"); Retirement Calculator. The biggest novelty in this version is the addition of a retirement calculator. This is still very basic, but it may give information on how CATCH: A POWERFUL YET SIMPLE C++ TEST FRAMEWORK Recently, I came accross a new test framework for C++ program: Catch. Until I found Catch, I was using Boost Test Framework. It is working quite well, but the problem is that you need to build Boost and link to the Boost Test Framework, which is not highly convenient. INSTALL AND USE CLANG STATIC ANALYZER ON A CMAKE PROJECT I recently started a bit of work on my compiler (eddic) again. I started by adapting it to build on CLang with libc++. There was some minor adaptions to make it compile, but nothing really fancy. It n C++17 MIGRATION OF EXPRESSION TEMPLATES LIBRARY (ETL In ETL, I have a make_temporary function. This function either forwards an ETL container or creates a temporary container from an ETL expression. This is based on a compile-time traits. The return type of the function is the not the same in both cases. What you did in those case before C++17, is use SFINAE and make two functions: template C++11 CONCURRENCY TUTORIAL Indeed, in most cases, the std::atomic operations are implemented with lock-free operations that are much faster than locks. The C++11 Concurrency Library introduces Atomic Types as a template class: std::atomic. You can use any Type you want with that template and the operations on that variable will be atomic and so thread-safe. DEEP LEARNING LIBRARY 1.0 Deep Learning Library 1.0 - Fast Neural Network Library. I'm very happy to announce the release of the first version of Deep Learning Library (DLL) 1.0. DLL is a neural network library with a focus on speed and ease of use. I started working on this library about 4 years ago for my Ph.D. thesis. HOW TO PRINT STRINGS AND INTEGERS IN INTEL ASSEMBLY ON Print strings. Then, to print, we will call the sys_write system call : The value in %eax (4) indicates the system call we need ( sys_write ). The 1 in %ebx indicates that we want to write in the console. Finally the two last parameters indicates the string to print and the size of the string. In Intel assembly, the int instruction launch an C++11 CONCURRENCY TUTORIAL In the previous article, we saw how to use mutexes to fix concurrency problems. In this post, we will continue to work on mutexes with more advanced techniques. We C++11 PERFORMANCE TIP: WHEN TO USE STD::POW ? Update: I've added a new section for larger values of n.. Recently, I've been wondering about the performance of std::pow(x, n).I'm talking here about the case when n is an integer. In the case when n is not an integer, I believe, you should always use std::pow or use another specialized library. In case when n is an integer, you can actually replace it with the direct equivalent (for instance SIMPLIFY YOUR TYPE TRAITS WITH C++14 VARIABLE TEMPLATES Often if you write templated code, you have to write and use a lot of different traits. In this article, I'll focus on the traits that are representing values, typically a boolean value. For instance, BLOG BLOG("BAPTISTE WICHT"); Retirement Calculator. The biggest novelty in this version is the addition of a retirement calculator. This is still very basic, but it may give information on how HOW TO PRINT STRINGS AND INTEGERS IN INTEL ASSEMBLY ON Print strings. Then, to print, we will call the sys_write system call : The value in %eax (4) indicates the system call we need ( sys_write ). The 1 in %ebx indicates that we want to write in the console. Finally the two last parameters indicates the string to print and the size of the string. In Intel assembly, the int instruction launch an C++ CONTAINERS BENCHMARK: VECTOR/LIST/DEQUE AND PLF Overall, for insertions, the vector and deque are the fastest for small types and the list is the fastest for the very large types. colony offers a medium performance on this benchmark but is quite stable for different data types. When you know the size of the collection, you should always use reserve () SIMPLIFY YOUR TYPE TRAITS WITH C++14 VARIABLE TEMPLATES Often if you write templated code, you have to write and use a lot of different traits. In this article, I'll focus on the traits that are representing values, typically a boolean value. For instance, RELEASE OF ZAPCC 1.0 Release of zapcc 1.0 - Fast C++ compiler. If you remember, I recently wrote about zapcc C++ compilation speed against gcc 5.4 and clang 3.9 in which I was comparing the beta version of zapcc against gcc and clang. I just been informed that zapcc was just released in version 1.0. I though it was a good occasion to test it again. USE CLANG-TIDY FOR STATIC ANALYSIS AND INTEGRATION IN Use clang-tidy for static analysis and integration in Sonarqube. clang-tidy is an extensive linter C++. It provides a complete framework for analysis of C++ code. Some of the checks are very simple but some of them are very complete and most of the checks from the clang-static-analyzer are integrated into clang-tidy. SHORT REVIEW OF BULLSEYE COVERAGE Short review of Bullseye Coverage. Bullseye is a commercial Code Coverage analyzer. It is fully-featured with an export to HTML, to XML and even a specific GUI to see the application.It costs about 800$, with a renewal fee of about 200$ per year. I'm currently using gcov and passing the results to Sonar. This works well, but there are

several

MANAGE COMMAND-LINE OPTIONS WITH BOOST PROGRAM OPTIONS That's where Boost Program Options enters the game! Boost Program Options is one of the Boost C++ Libraries. It is a very powerful library to handle command-line options. You define all the options of the program and then Boost Program Options takes care of all. It parses the command line, handles errors, gets values and even displays

help.

as the

CATCH: A POWERFUL YET SIMPLE C++ TEST FRAMEWORK Recently, I came accross a new test framework for C++ program: Catch. Until I found Catch, I was using Boost Test Framework. It is working quite well, but the problem is that you need to build Boost and link to the Boost Test Framework, which is not highly convenient. C++17 MIGRATION OF EXPRESSION TEMPLATES LIBRARY (ETL In ETL, I have a make_temporary function. This function either forwards an ETL container or creates a temporary container from an ETL expression. This is based on a compile-time traits. The return type of the function is the not the same in both cases. What you did in those case before C++17, is use SFINAE and make two functions: template C++11 CONCURRENCY TUTORIAL Indeed, in most cases, the std::atomic operations are implemented with lock-free operations that are much faster than locks. The C++11 Concurrency Library introduces Atomic Types as a template class: std::atomic. You can use any Type you want with that template and the operations on that variable will be atomic and so thread-safe. INSTALL AND USE CLANG STATIC ANALYZER ON A CMAKE PROJECTCMAKE USE STATIC LIBRARYCMAKE USE STATIC LIBRARYCMAKE USE STATIC LIBSCMAKE BUILD STATIC LIBRARYCMAKE LINK STATIC LIBRARYCMAKE STATIC BUILD I recently started a bit of work on my compiler (eddic) again. I started by adapting it to build on CLang with libc++. There was some minor adaptions to make it compile, but nothing really fancy. It n C++11 CONCURRENCY TUTORIAL In the previous article, we saw how to use mutexes to fix concurrency problems. In this post, we will continue to work on mutexes with more advanced techniques. We RELEASE OF ZAPCC 1.0 Release of zapcc 1.0 - Fast C++ compiler. If you remember, I recently wrote about zapcc C++ compilation speed against gcc 5.4 and clang 3.9 in which I was comparing the beta version of zapcc against gcc and clang. I just been informed that zapcc was just released in version 1.0. I though it was a good occasion to test it again. C++11 PERFORMANCE TIP: WHEN TO USE STD::POW ? Update: I've added a new section for larger values of n.. Recently, I've been wondering about the performance of std::pow(x, n).I'm talking here about the case when n is an integer. In the case when n is not an integer, I believe, you should always use std::pow or use another specialized library. In case when n is an integer, you can actually replace it with the direct equivalent (for instance USE CLANG-TIDY FOR STATIC ANALYSIS AND INTEGRATION IN Use clang-tidy for static analysis and integration in Sonarqube. clang-tidy is an extensive linter C++. It provides a complete framework for analysis of C++ code. Some of the checks are very simple but some of them are very complete and most of the checks from the clang-static-analyzer are integrated into clang-tidy. VIVALDI + VIMIUM = FINALLY NO MORE FIREFOX! First, you have to only display the Vivaldi button in the settings page. Then, you can use this custom CSS: to hide the title completely! To get rid of the scroll bar, you need to use the Stylish extension and use this custom CSS: If you want to have full HTML5 video support, you need to install extra codecs. HOW TO COMPUTE METRICS OF C++ PROJECT USING CCCC CCCC (C and C++ Code Counter) is a little command-line tool that generates metrics from the source code of a C or C++ project. The output of the tool is a simple HTML website with information about all your sources. CCCC generates not only information about the number of lines of codes for each of your modules, but also complexity metrics

like

BLOG BLOG("BAPTISTE WICHT"); Retirement Calculator. The biggest novelty in this version is the addition of a retirement calculator. This is still very basic, but it may give information on how C++ CONTAINERS BENCHMARK: VECTOR/LIST/DEQUE AND PLF Overall, for insertions, the vector and deque are the fastest for small types and the list is the fastest for the very large types. colony offers a medium performance on this benchmark but is quite stable for different data types. When you know the size of the collection, you should always use reserve () CATCH: A POWERFUL YET SIMPLE C++ TEST FRAMEWORK Recently, I came accross a new test framework for C++ program: Catch. Until I found Catch, I was using Boost Test Framework. It is working quite well, but the problem is that you need to build Boost and link to the Boost Test Framework, which is not highly convenient. C++17 MIGRATION OF EXPRESSION TEMPLATES LIBRARY (ETL if constexpr. The most exciting new thing in C++17 for me is the if constexpr statement. This is a really really great thing. In essence, it's a normal if statement, but with one very important difference. The statement that is not taken (the else if the condition is true, or the if constexpr if the condition is false) is discarded.And what is interesting is what happens to discarded statements: C++11 CONCURRENCY TUTORIAL In the previous article, we saw how to use mutexes to fix concurrency problems. In this post, we will continue to work on mutexes with more advanced techniques. We USE TEMPLIGHT AND TEMPLAR TO DEBUG C++ TEMPLATES templight++ -Xtemplight -profiler -Xtemplight -memory -Xtemplight -ignore-system -std = c++14 main.cpp. All the templight options starts with -Xtemplight and then you can use any clang++ options. This will generate a a.memory.trace.pbf file in the current directory. You can then run Templar. use File > Open Trace to open the trace file. SIMPLIFY YOUR TYPE TRAITS WITH C++14 VARIABLE TEMPLATES Often if you write templated code, you have to write and use a lot of different traits. In this article, I'll focus on the traits that are representing values, typically a boolean value. For instance, NAMED OPTIONAL TEMPLATE PARAMETERS TO CONFIGURE A CLASS AT Extracting integral values. We will start with the parameter a that holds a value of type int with a default value of 1. Here is one way of writing it: struct a_id; template < int value > struct a: std:: integral_constant < int, value > {using type_id = a_id;};. So, a is simply an integral constant with another typedef type_id.Why do we need this id ? Because a is a type template, we cannot HOW TO COMPUTE METRICS OF C++ PROJECT USING CCCC CCCC (C and C++ Code Counter) is a little command-line tool that generates metrics from the source code of a C or C++ project. The output of the tool is a simple HTML website with information about all your sources. CCCC generates not only information about the number of lines of codes for each of your modules, but also complexity metrics

like

SHORT REVIEW OF BULLSEYE COVERAGE Short review of Bullseye Coverage. Bullseye is a commercial Code Coverage analyzer. It is fully-featured with an export to HTML, to XML and even a specific GUI to see the application.It costs about 800$, with a renewal fee of about 200$ per year. I'm currently using gcov and passing the results to Sonar. This works well, but there are

several

techniques. We

RELEASE OF ZAPCC 1.0 If you remember, I recently wrote about zapcc C++ compilation speed against gcc 5.4 and clang 3.9 in which I was comparing the beta version of zapcc against gcc and clang.. I just been informed that zapcc was just released in version 1.0. I though it was a good occasion to test it again. C++11 PERFORMANCE TIP: WHEN TO USE STD::POW ? Update: I've added a new section for larger values of n.. Recently, I've been wondering about the performance of std::pow(x, n).I'm talking here about the case when n is an integer. In the case when n is not an integer, I believe, you should always use std::pow or use another specialized library. In case when n is an integer, you can actually replace it with the direct equivalent (for instance VIVALDI + VIMIUM = FINALLY NO MORE FIREFOX! How I replaced Pentadactly with Vivaldi and Vimium. I've been using the Pentadactyl Firefox extension for a long time. This extensions "vimifies" Firefox and it does a very good job of it. CATCH: A POWERFUL YET SIMPLE C++ TEST FRAMEWORK Recently, I came accross a new test framework for C++ program: Catch. Until I found Catch, I was using Boost Test Framework. It is working quite well, but the problem is that you need to build Boost and link to the Boost Test Framework, which is not highly convenient. INTEGER LINEAR TIME SORTING ALGORITHMS The numbers are impressive. In place counting sort is between 3-4 times faster than std::sort and radix sort is twice faster than std::sort!Bin Sort does not performs very well and counting sort even if generally faster than std::sort does not scale very well.. Let's test with more duplicates (m = n / INSTALL AND USE CLANG STATIC ANALYZER ON A CMAKE PROJECTCMAKE USE STATIC LIBRARYCMAKE USE STATIC LIBRARYCMAKE USE STATIC LIBSCMAKE BUILD STATIC LIBRARYCMAKE LINK STATIC LIBRARYCMAKE STATIC BUILD I recently started a bit of work on my compiler (eddic) again. I started by adapting it to build on CLang with libc++. There was some minor adaptions to make it compile, but nothing really fancy. It n C++17 MIGRATION OF EXPRESSION TEMPLATES LIBRARY (ETL if constexpr. The most exciting new thing in C++17 for me is the if constexpr statement. This is a really really great thing. In essence, it's a normal if statement, but with one very important difference. The statement that is not taken (the else if the condition is true, or the if constexpr if the condition is false) is discarded.And what is interesting is what happens to discarded statements: USE CLANG-TIDY FOR STATIC ANALYSIS AND INTEGRATION IN clang-tidy is an extensive linter C++. It provides a complete framework for analysis of C++ code. Some of the checks are very simple but some of them are very complete and most of the checks from the clang-static-analyzer are integrated into clang-tidy. C++11 CONCURRENCY TUTORIAL In the previous article, we saw how to use mutexes to fix concurrency problems. In this post, we will continue to work on mutexes with more advanced techniques. We C++11 CONCURRENCY TUTORIAL In the previous article, we saw advanced techniques about mutexes. In this post, we will continue to work on mutexes with more advanced

techniques. We

RELEASE OF ZAPCC 1.0 If you remember, I recently wrote about zapcc C++ compilation speed against gcc 5.4 and clang 3.9 in which I was comparing the beta version of zapcc against gcc and clang.. I just been informed that zapcc was just released in version 1.0. I though it was a good occasion to test it again. C++11 PERFORMANCE TIP: WHEN TO USE STD::POW ? Update: I've added a new section for larger values of n.. Recently, I've been wondering about the performance of std::pow(x, n).I'm talking here about the case when n is an integer. In the case when n is not an integer, I believe, you should always use std::pow or use another specialized library. In case when n is an integer, you can actually replace it with the direct equivalent (for instance VIVALDI + VIMIUM = FINALLY NO MORE FIREFOX! How I replaced Pentadactly with Vivaldi and Vimium. I've been using the Pentadactyl Firefox extension for a long time. This extensions "vimifies" Firefox and it does a very good job of it. BLOG BLOG("BAPTISTE WICHT"); Retirement Calculator. The biggest novelty in this version is the addition of a retirement calculator. This is still very basic, but it may give information on how C++ CONTAINERS BENCHMARK: VECTOR/LIST/DEQUE AND PLF Already more than three years ago, I've written a benchmark of some of the STL containers, namely the vector, the list and the deque.Since this article was very popular, I decided to improve the benchmarks and collect again all the results. CATCH: A POWERFUL YET SIMPLE C++ TEST FRAMEWORK Recently, I came accross a new test framework for C++ program: Catch. Until I found Catch, I was using Boost Test Framework. It is working quite well, but the problem is that you need to build Boost and link to the Boost Test Framework, which is not highly convenient. C++17 MIGRATION OF EXPRESSION TEMPLATES LIBRARY (ETL if constexpr. The most exciting new thing in C++17 for me is the if constexpr statement. This is a really really great thing. In essence, it's a normal if statement, but with one very important difference. The statement that is not taken (the else if the condition is true, or the if constexpr if the condition is false) is discarded.And what is interesting is what happens to discarded statements: C++11 CONCURRENCY TUTORIAL In the previous article, we saw how to use mutexes to fix concurrency problems. In this post, we will continue to work on mutexes with more advanced techniques. We SIMPLIFY YOUR TYPE TRAITS WITH C++14 VARIABLE TEMPLATES Often if you write templated code, you have to write and use a lot of different traits. In this article, I'll focus on the traits that are representing values, typically a boolean value. For instance, USE TEMPLIGHT AND TEMPLAR TO DEBUG C++ TEMPLATES C++ has some very good tools to debug, profile and analyze source files and executables. This all works well for standard runtime program. But, when you are using templates, you sometimes want these tools to act at compile-time. PVS-STUDIO ON C++ LIBRARY REVIEW PVS-Studio is a commercial static analyzer for C, C++ and C#. It works in both Windows and Linux. It has been a long time since I wanted to

test it on my

your sources.

SHORT REVIEW OF BULLSEYE COVERAGE Bullseye is a commercial Code Coverage analyzer. It is fully-featured with an export to HTML, to XML and even a specific GUI to see the application.It costs about 800$, with a renewal fee of about 200$ per

year.

Skip to main content Toggle navigation Blog blog("Baptiste Wicht");

* About

* Publications

* Projects

* Tags

* Archives

* RSS

*

* Personal

* Boost

* Intel

* dll

* Tips

* Releases

* Performance

* etl

* budgetwarrior

* C++

* WordPress

* Git

* Google

* Assembly

* EDDI

* projects

* OSGi

* JTheque

* Benchmarks

* Hardware

* Machine Learning

* Web

* C++14

* Tools

* Concurrency

* The site

* clang

* Compilers

* Others

* C++11

* Java 7

* gcc

* Java

* Linux

* Gentoo

* Optimization

* Deep Learning

* Books

* Performances

* Conception

*

BUDGETWARRIOR 1.0.1: ALLOCATION TRACKING, RETIREMENT CALCULATOR AND

BUG FIXES

POSTED: 2018-04-03 10:58 ------------------------- I'm happy to announce the release of budgetwarrior 1.0.1. This new version contains a series of improvement over the 1.0 version and some

new features.

I haven't been very active this last month. I have been working a bit on budgetwarrior for features I needed for my budget. I've also been contacted with questions on my thor operating system and since that point I've doing some work on thor as well. This new version of budgetwarrior has quite a few new features even though it's a minor version. Note: The data from all the views is totally randomized and does not

make sense ;)

RETIREMENT CALCULATOR The biggest novelty in this version is the addition of a retirement calculator. This is still very basic, but it may give information on how close (or far) you are from early retirement. Here is what the

view gives you:

Using your annual withdrawal rate and expected Rate Of Return, it can compute how many years you will need to reach your goals Financial Independence (FI). It will also gives you your FI ratio and a few more information about your savings rate, income, expenses and so on. It's nothing very fancy but it can be very useful.

NEW FEATURES

I've also added a few graphs based on the budget information. The first is the visualization of the expenses over time: This can be pretty useful to see how are your expenses going. Even if your income is going, expenses should not necessarily go up (you

should save more!).

Another new view can show your asset allocation over time and the current asset allocation of your entire net worth or specifically for

your portfolio.

This is also really useful if you want to have a global view of your asset allocation into bonds, stocks and such. There are also two other new minor features. You can now search expenses by name. This is really useful once you start having many expenses. Another new view is the Full aggregate view. Before, you could aggregate your expenses by month or year, now they can be aggregate since the beginning of the budget. With this, you can see how much you spend on coffee since you started keeping track of your budget. For me, it's a lot! Both these features are available both in command line and in the web interface.

IMPROVEMENTS

There are also a few improvements with this new version. You can now set a default account (in the configuration file with default_account=X). It will be set by default in both the web view and the console view. The rebalance view has been made more clear. I've added a second batch update view with only the assets that are being used (amount > 0). And lastly, the yearly overview is now displaying correctly the previous year savings rate. Finally, there are also a few bug fixes. That is is the main reason I decided to release now. If you were using asset with different currency, several views where not correctly using the exchange rate to display them. Moreover, the average expenses in the monthly overview was not correct. Finally, if you were editing old expenses after having archived the accounts, it could be edited with the wrong

account.

INSTALLATION

If you are on Gentoo, you can install it using layman: layman -a wichtounet emerge -a budgetwarrior If you are on Arch Linux, you can use this AUR repository (wait a few days for the new version to be updated)_ For other systems, you'll have to install from sources: git clone --recursive git://github.com/wichtounet/budgetwarrior.git

cd budgetwarrior

git checkout 1.0.1

make

sudo make install

If you want to test the server mode, the default username is admin and the default password is 1234. You can change them in the configuration file with web_user and web_password.

CONCLUSION

Although it's a minor version, it improves and fixes quite a few things, especially for the web view. I encourage you to try it out. Don't hesitate to let me a comment if you fail to use it or don't understand something ;) There are still a few things that I want to do, as I said when I introduced the web version

.

The website still needs to be made faster. And the communication between the console and the server can also be improved. If you are interested by the sources, you can download them on Github:

budgetwarrior .

If you have a suggestion or you found a bug, please post an issue on

Github.

If you have any comment, don't hesitate to contact me, either by letting a comment on this post or by email.

0 Comments

I GOT RID OF VIVALDI BROWSER FOR GOOGLE CHROME POSTED: 2018-03-16 08:31 ------------------------- About a year ago, I switched from Firefox to Vivaldi

.

This week, I decided to get rid of Vivaldi and replaced with Google Chrome. In this post, I'm going to outline the reasons why I got rid

of it.

Before, I switched to Vivaldi because Firefox was dropping support for XUL/XPCOM extensions and I was using Pentadactyl. In fact, Pentadactyl was the only reason I was using Firefox. It was slow and bloated and a bit unstable, but the extension was making it worth. Since they are dropping support for such extensions, I did not want to use Firefox anymore. So I switched to Vivaldi with Vimium. It's not as great as Firefox plus Pentadactyl. But it's a more customizable version of Google Chrome on which it's based. But, in that year or so of using Vivaldi, I have had many issues. Some of them were not too bad and there was some workarounds. But they continued to pile up and they did not fix any of them so now, I decided it's too much. Since the beginning, it always has been slow. It's not really bad, but still noticeable compared to Chrome. Especially opening Vivaldi is pretty bad. This is something I can live with, but they should really do something to make it faster. The thing that I had the most issues with is multimedia. For instance Youtube (but all the other platforms have the same issues). The first problem with media is to get a video in fullscreen. Most of the time, when I press the fullscreen button on Youtube, it grays out the screen and I have to press ESC. If I do that around five to ten times, it finally goes fullscreen. It may be because of my multi-monitor setup but Google Chrome has no issues whatsoever with that. It's pretty painful to do, but again I could live for since I don't use full screen a lot. A second problem I had with media is they were running too fast. I'm not kidding, really too fast, not too slow. The media was running about twice too fast, you could see the seconds going fast on Youtube. I never seen this issue in any other tool, but it was happening at every start of Vivaldi. The fix was to restart Vivaldi every time this happened and the video played normally. Another problem I had from the beginning is to make all HTML5 videos work. You have to download the binary plugins from Chrome to let Vivaldi play all HTML5 videos. It's not a big deal, but the problem is that they are overwritten after each update of Vivaldi. So you have to

do it all the time.

A new media issue I had on the last update of Vivaldi is with Flash. At the beginning it was working even if it was outdated. I just had to confirm to run it with a warning. But, since the last update, I only had the warning that it was outdated. But I could not confirm to use it, the option was not here anymore. And it was still happening after I updated Flash... The only option to run Flash was to use a private navigation window... And finally, I had another big issue with the last version of Vivaldi as well. The browser keeps crashing on my work computer. It can stay up a few minutes and then crash. The complete interface is not updated. I can still press the tabs and I can see the title of the window change, but the interface does not update. Again, it may come from my special window manager (I use awesome), but it's the only application not working... With all these issues and especially the last two new problems, I decided it was time to cut the losses. So I reinstalled Google Chrome, transferred my plugins and everything worked like a charm. I still use Vimium to use vim bindings so my usage of the browser does not change. Of course, I don't have the customization that I had with Vivaldi. I would really really like to get rid of the address bar in the browser. I would also like to significantly reduce the size of the tab bar. But I prefer to live without these improvements than with so many bugs. I think Vivaldi is a good idea, but with a terrible implementation. I also considered qutebrowser as an alternative. But for now it's still missing many features that I don't want to get rid of. So I will stay with Google Chrome for the time being. What about you ? Do you have any experience with Vivaldi ?

3 Comments

DECREASE DLL NEURAL NETWORK COMPILATION TIME WITH C++17 POSTED: 2018-02-07 11:39 ------------------------- Just last week, I've migrated my Expression Templates Library (ETL)

library to C++17

,

it is now also done in my Deep Learning Library (DLL) library. In ETL, this resulted in a _much nicer code overall_, but no real improvement in compilation time. The objective of the migration of DLL was two-fold. First, I also wanted to simplify some code, especially with if constexpr. But I also especially wanted to try to reduce the compilation time. In the past, I've already tried a few changes with C++17

,

with good results on the compilation of the entire test suite. While this is very good, this is not very representative of users of the library. Indeed, normally you'll have only one network in your source file not several. The new changes will especially help in the case of many networks, but less in the case of a single network per source

file.

This time, I decided to test the compilation on the examples. I've tested the eight official examples from the DLL library: * mnist_dbn: A fully-connected Deep Belief Network (DBN) on the MNIST data set with three layers * char_cnn: A special CNN with embeddings and merge and group layers for text recognition * imagenet_cnn: A 12 layers Convolutional Neural Network (CNN) for

Imagenet

* mnist_ae: A simple two-layers auto-encoder for MNIST * mnist_cnn: A simple 6 layers CNN for MNIST * mnist_deep_ae: A deep auto-encoder for MNIST, only fully-connected * mnist_lstm: A Recurrent Neural Network (RNN) with Long Short Term

Memory (LSTM) cells

* mnist_mlp: A simple fully-connected network for MNIST, with

dropout

* mnist_rnn: A simple RNN with simple cells for MNIST This is really representative of what users can do with the library and I think it's a much better for compilation time. For reference, you can find the source code of all the examples online

.

RESULTS

Let's start with the results. I've tested this at different stages of the migration with clang 5 and GCC 7.2. I tested the following steps: * The original C++14 version * Simply compiling in c++17 mode (-std=c++17) * Using the C++17 version of the ETL library * Upgrading DLL to C++17 (without ETL) * ETL and DLL in C++17 versions I've compiled each example independently in release_debug mode. Here are the results for G++ 7.2:

EXAMPLE

0

1

2

3

4

5

6

7

8 C++14

37.818

32.944

33.511

15.403

29.998

16.911

24.745

18.974

19.006 -std=c++17

38.358

32.409

32.707

15.810

30.042

16.896

24.635

19.134

19.027 ETL C++17

36.045

31.000

30.942

15.322

28.840

16.747

24.151

18.208

18.939 DLL C++17

35.251

32.577

32.854

15.653

29.758

16.851

24.606

19.098

19.146 Final C++17

32.289

31.133

30.939

15.232

28.753

16.526

24.326

18.116

17.819 Final Improvement

14.62%

5.49%

7.67%

1.11%

4.15%

2.27%

1.69%

4.52%

6.24%

The difference by just enabling c++17 is not significant. On the other hand, some significant gain can be obtained by using the C++17 version of ETL, especially for the DBN version and for the CNN versions. Except for the DBN case, the migration of DLL to C++17 did not bring any significant advantage. When everything is combined, the gains are more important :) In the best case, the example is 14.6% faster to

compile.

Let's see if it's the same with clang++ 5.0:

EXAMPLE

0

1

2

3

4

5

6

7

8 C++14

40.690

34.753

35.488

16.146

31.926

17.708

29.806

19.207

20.858 -std=c++17

40.502

34.664

34.990

16.027

31.510

17.630

29.465

19.161

20.860 ETL C++17

37.386

33.008

33.896

15.519

30.269

16.995

28.897

18.383

19.809 DLL C++17

37.252

34.592

35.250

16.131

31.782

17.606

29.595

19.126

20.782 Final C++17

34.470

33.154

33.881

15.415

30.279

17.078

28.808

18.497

19.761 Final Improvement

15.28%

4.60%

4.52%

5.15%

3.55%

3.34%

3.69%

5.25%

First of all, as I have seen time after time, clang is still slower than GCC. It's a not a big difference, but still significant. Overall, the gains are a bit higher on clang than on GCC, but not by much. Interestingly, the migration of DLL to C++17 is less interesting in terms of compilation time for clang. It seems even to slow down compilation on some examples. On the other hand, the migration of ETL is more important than on GCC. Overall, every example is faster to compile using both libraries in C++17, but we don't have spectacular speed-ups. With clang, we have speedups from 3.3% to 15.3%. With GCC, we have speedup from 1.1% to 14.6%. It's not very high, but I'm already satisfied with these

results.

C++17 IN DLL

Overall, the migration of DLL to C++17 was quite similar to that of ETL. You can take a look at my previous article if you want more details on C++17 features I've used. I've _replaced a lot of SFINAE functions_ with if constexpr. I've also replaced a lot of statif_if with if constexpr. There was a large number of these in DLL's code. I also enabled all the constexpr that were commented for this exact time :) I was also thinking that I could replace a lot of meta-programming stuff with _fold expressions_. While I was able to replace a few of them, most of them were harder to replace with fold expressions. Indeed, the variadic pack is often hidden behind another class and therefore the pack is not directly usable from the network class or the group and merge layers classes. I didn't want to start a big refactoring just to use a C++17 feature, the current state of this

code is fine.

I made some use of structured bindings as well, but again not as much as I was thinking. In fact, a lot of time, I'm assigning the elements of a pair or tuple to existing variables not declaring new variables and unfortunately, you can only use structured bindings with auto

declaration.

Overall, the _code is significantly better now_, but there was less impact than there was on ETL. It's also a smaller code base, so maybe this is normal and my expectations were too high ;)

CONCLUSION

The trunk of DLL is now a C++17 library :) I think this improve the quality of the code by a nice margin! Even though, there is still some work to be done to improve the code, especially for the DBN pretraining code, the quality is quite good now. Moreover, the switch to C++17 made the compilation of neural networks using the DLL library _faster to compile_, from 1.1% in the worst case to 15.3% in the best case! I don't know when I will release the next version of DLL, but it will take some time. I'll especially have to polish the RNN support and add a sequence to sequence loss before I will release the 1.1

version of DLL.

I'm quite satisfied with C++17 even if I would have liked a bit more features to play with! I'm already a big fan of if constexpr, this can make the code much nicer and fold expressions are much more intuitive than their previous recursive template counterpart. I may also consider migrating some parts of the cpp-utils library, but if I do, it will only be through the use of conditionals in order not to break the other projects that are based on the library.

2 Comments

C++17 MIGRATION OF EXPRESSION TEMPLATES LIBRARY (ETL) POSTED: 2018-02-02 14:03 ------------------------- I've finally decided to migrate my Expression Templates Library (ETL) project to C++17. I've talking about doing that for a long time and I've released several releases without doing the change, but the next version will be a C++17 library. The reason why I didn't want to rush the change was that this means the library needs a very recent compiler that may not be available to everybody. Indeed, after this change, the ETL library now needs at least GCC 7.1 or Clang 4.0. I've already made some previous experiments in the past. For instance, by using if constexpr, I've managed to speed up compilation by 38% and I've also written an article about the fold expressions introduced

in C++17

.

But I haven't migrated a full library yet. This is now done with ETL. In this article, I'll try to give some example of improvements by

using C++17.

This will only cover the C++17 features I'm using in the updated ETL library, I won't cover all of the new C++17 features.

IF CONSTEXPR

The most exciting new thing in C++17 for me is the if constexpr statement. This is a really really great thing. In essence, it's a normal if statement, but with one very important difference. The statement that is not taken (the else if the condition is true, or the if constexpr if the condition is false) is _discarded_. And what is interesting is what happens to _discarded_ statements: * The body of a _discarded_ statement does not participate in return

type deduction.

* The discarded statement is not instantiated * The discarded statement can _odr-use_ a variable that is not

defined

Personally, I'm especially interested by points 1 and 2. Let's start with an example where point 1 is useful. In ETL, I have a make_temporary function. This function either forwards an ETL container or creates a temporary container from an ETL expression. This is based on a compile-time traits. The return type of the function is the not the same in both cases. What you did in those case before C++17, is use SFINAE and make two functions: template )> decltype(auto) make_temporary(E&& expr) { return std::forward(expr);

}

template )> decltype(auto) make_temporary(E&& expr) { return force_temporary(std::forward(expr));

}

One version of the function will forward and the other version will force a temporary and the return type can be different since these are two different functions. This is not bad, but still requires two functions where you only want to write one. However, in C++17, we can do much better using if constexpr: template decltype(auto) make_temporary(E&& expr) { if constexpr (is_dma) { return std::forward(expr);

} else {

return force_temporary(std::forward(expr));

}

I think this version is really superior to the previous one. We only have one function and the logic is much clearer! Let's now see an advantage of the point 2. In ETL, there are two kinds of matrices, matrices with compile-time dimensions (fast matrices) and matrices with runtime dimensions (dynamic matrices). When they are used, for instance for a matrix-multiplication, I use static assertions for fast matrices and runtime assertions for dynamic matrices. Here is an example for the validation of the matrix-matrix

multiplication:

template )> static void check(const A& a, const B& b, const C& c) { static_assert(all_2d, "Matrix multiplication needs matrices");

cpp_assert(

dim<1>(a) == dim<0>(b) //interior dimensions && dim<0>(a) == dim<0>(c) //exterior dimension 1 && dim<1>(b) == dim<1>(c), //exterior dimension 2 "Invalid sizes for multiplication");

cpp_unused(a);

cpp_unused(b);

cpp_unused(c);

}

template )> static void check(const A& a, const B& b, const C& c) { static_assert(all_2d, "Matrix multiplication needs matrices");

static_assert(

dim<1, A>() == dim<0, B>() //interior dimensions && dim<0, A>() == dim<0, C>() //exterior dimension 1 && dim<1, B>() == dim<1, C>(), //exterior dimension 2 "Invalid sizes for multiplication");

cpp_unused(a);

cpp_unused(b);

cpp_unused(c);

}

Again, we use SFINAE to distinguish the two different cases. In that case, we cannot use a normal if since the value of the dimensions cannot be taken at compile-time for dynamic matrices, more precisely, some templates cannot be instantiated for dynamic matrices. As for the cpp_unused, we have to use for the static version because we don't use them and for the dynamic version because they won't be used if the assertions are not enabled. Let's use if constexpr to avoid having two

functions:

template static void check(const A& a, const B& b, const C& c) { static_assert(all_2d, "Matrix multiplication needs matrices"); if constexpr (all_fast) { static_assert(dim<1, A>() == dim<0, B>() //interior dimensions && dim<0, A>() == dim<0, C>() //exterior dimension 1 && dim<1, B>() == dim<1, C>(), //exterior dimension 2 "Invalid sizes for multiplication");

} else {

cpp_assert(dim<1>(a) == dim<0>(b) //interior dimensions && dim<0>(a) == dim<0>(c) //exterior dimension 1 && dim<1>(b) == dim<1>(c), //exterior dimension 2 "Invalid sizes for multiplication");

}

cpp_unused(a);

cpp_unused(b);

cpp_unused(c);

}

Since the _discarded_ won't be instantiated, we can now use a single function! We also avoid some duplications of the first static assertion of the unused statements. Pretty great, right ? But we can do better with C++17. Indeed, it added a nice new attribute . Let's see what this gives us: template static void check( const A& a, const B& b, const C& c) { static_assert(all_2d, "Matrix multiplication needs matrices"); if constexpr (all_fast) { static_assert(dim<1, A>() == dim<0, B>() //interior dimensions && dim<0, A>() == dim<0, C>() //exterior dimension 1 && dim<1, B>() == dim<1, C>(), //exterior dimension 2 "Invalid sizes for multiplication");

} else {

cpp_assert(dim<1>(a) == dim<0>(b) //interior dimensions && dim<0>(a) == dim<0>(c) //exterior dimension 1 && dim<1>(b) == dim<1>(c), //exterior dimension 2 "Invalid sizes for multiplication");

}

No more need for cpp_unused trick :) This attribute tells the compiler that a variable or parameter can be sometimes unused and therefore does not lead to a warning for it. Only one thing that is not great with this attribute is that it's too long, 16 characters. It almost double the width of my check function signature. Imagine if you have more parameters, you'll soon have to use several lines. I wish there was a way to set an attribute for all parameters together or a shortcut. I'm considering whether to use a short macro to use in place of it, but haven't yet decided. Just a note, if you have else if statements, you need to set them as constexpr as well! This was a bit weird for me, but you can figure it as if the condition is constexpr, then the if (or else if) is

constexpr as well.

Overall, I'm really satisfied with the new if constexpr! This really makes the code much nicer in many cases, especially if you abuse metaprogramming like I do. You may remember that I've coded a version of static if in the past

with C++14

in the past. This was able to solve point 2, but not point 1 and was much uglier. Now we have a good solution to it. I've replaced two of these in the current code with the new if constexpr.

FOLD EXPRESSIONS

For me, fold expressions is the second major feature of C++17. I wont' go into too much details here, since I've already talked about fold expression in the past . But I'll show two examples of refactorings I've been able to do with

this.

Here was the size() function of a static matrix in ETL before: static constexpr size_t size() { return mul_all;

}

The Dims parameter pack from the declaration of fast_matrix: template struct fast_matrix_impl; And the mul_all is a simple helper that multiplies each value of the variadic parameter pack: template struct mul_all_impl final : std::integral_constant::value> {}; template struct mul_all_impl final : std::integral_constant {}; template constexpr size_t mul_all = mul_all_impl::value; Before C++17, the only way to compute this result at compilation time was to use template recursion, either with types or with constexpr functions. I think this is pretty heavy only for doing a multiplication sum. Now, with fold expressions, we can manipulate the parameter pack directly and rewrite our size function: static constexpr size_t size() { return (Dims * ...);

}

This is much better! This clearly states that each value of the parameter should be multiplied together. For instance 1,2,3 will

become (12)3.

Another place where I was using this was to code a traits that tests if a set of boolean are all true at compilation-time: template constexpr bool and_v = std::is_same< cpp::tmp_detail::bool_list, cpp::tmp_detail::bool_list>::value; I was using a nice trick here to test if all booleans are true. I don't remember where I picked it up, but it's quite nice and very fast

to compile.

This was used for instance to test that a set of expressions are all single-precision floating points: template constexpr bool all_single_precision = and_v<(is_single_precision)...>; Now, we can get rid of the and_v traits and use directly the parameter

pack directly:

template constexpr bool all_single_precision = (is_single_precision && ...); I think using fold expressions results in much clearer syntax and better code and it's a pretty nice feature overall :) As a note here, I'd like to mention, that you can also use this syntax to call a function on each argument that you have, which makes for much nicer syntax as well and I'll be using that in DLL once I migrate

it to C++17.

MISCELLANEOUS

There are also a few more C++17 features that I've used to improve ETL, but that have a bit less impact. A very nice feature of C++17 is the support for structured bindings. Often you end up with a function that returns several parts of information in the form of a pair or a tuple or even a fixed-size array. You can use an object for this, but if you don't, you end up with code that is not terribly nice:

size_t index;

bool result;

float alpha;

std::tie(index, result, alpha) = my_function(); It's not terribly bad, but in these cases, you should be be hoping for something better. With c++17, you can do better: auto = my_function(); Now you can directly use auto to deduce the types of the three variables at once and you can get all the results in the variables at once as well :) I think this is really nice and can really profit some projects. In ETL, I've almost no use for this, but I'm going to be using that a bit more in DLL. Something really nice to clean up the code in C++17 is the ability to declared nested namespaces in one line. Before, you have a nested namespace etl::impl::standard for instance, you would do:

namespace etl {

namespace impl {

namespace standard { // Someting inside etl::impl::standard } // end of namespace standard } // end of namespace impl } // end of namespace etl In C++17, you can do: namespace etl::impl::standard { // Someting inside etl::impl::standard } // end of namespace etl::impl::standard I think it's pretty neat :) Another very small change is the ability to use the typename keyword in place of the class keyword when declaring template template parameters. Before, you had to declare: template class X> now you can also use: template typename X> It's just some syntactic sugar, but I think it's quite nice. The last improvement that I want to talk about is one that probably very few know about but it's pretty neat. Since C++11, you can use the alignas(X) specifier for types and objects to specify on how many bytes you want to align these. This is pretty nice if you want to align on the stack. However, this won't always work for dynamic memory allocation. Imagine this struct: struct alignas(128) test_struct { char data; }; If you declare an object of this type on the stack, you have the guarantee that it will be aligned on 128 bytes. However, if you use new to allocate it on the heap, you don't have such guarantee. Indeed, the problem is that 128 is greater than the maximum default alignment. This is called an over-aligned type. In such cases, the result will be aligned on the max alignment of your system. Since C++17, new supports aligned dynamic memory allocation of over-aligned types. Therefore, you can use a simple alignas to allocate dynamic over-aligned types :) I need this in ETL for matrices that need to be aligned for vectorized code. Before, I was using a larger array with some padding in order to find an aligned element inside, but that is not very nice, now the code is much better.

COMPILATION TIME

I've done a few tests to see how much impact these news features have on compilation time. Here, I'm doing benchmark on compiling the entire test suite in different compilation mode, I enabled most compilation options (all GPU and BLAS options in order to make sure almost all of the library is compiled). Since I'm a bit short on time before going to vacation, I've only gathered the results with g++. Here are the results with G++ 7.2.0

debug

release

release_debug

C++14

862s

1961s

1718s

C++17

892s

2018s

1745s

Difference

+3.4%

+2.9%

+1.5%

Overall, I'm a bit disappointed by these results, it's around 3% slower to compile the C++17 version than the C++14 version. I was thinking that this would a least be as fast to compile as before. It seems that currently with G++ 7.2, if constexpr are slower to compile than the equivalent SFINAE functions. I didn't do individual benchmarks of all the features I've migrated, therefore, it may not be coming from if constexpr, but since it's the greatest change by far, it's the more likely candidate. Once I'll have a little more time, after my vacations, I'll try to see if that is also the case with

clang.

Keep in mind that we are compiling the test suite here. The ETL test suite is using the manual selection mode of the library in order to be able to test all the possible implementations of each operation. This makes a considerable difference in performance. I expect better compilation time when this is used in automatic selection mode (the default mode). In the default mode, a lot more code can be disabled with if constexpr. I will test this next with the DLL library which I will also migrate to C++17.

CONCLUSION

This concludes this report on the migration of my ETL library from C++14 to C++17. Overall, I'm really satisfied with the improvement of the code, it's much better. I'm a bit disappointed by the slight increase (around 3%) in compilation time, but it's not dramatic either. I'm still hoping that once it's used in DLL, I will see a decrease in compilation, but we'll see that when I'll be done with the migration of DLL to C++17 which may take some time since I'll have two weeks vacation in China starting Friday. The new version is available only through the _master_ branch. It will be released as the 1.3 version probably when I integrate some new features, but in itself will not be released as new version. You can take a look in the Github etl repository if you are interested.

0 Comments

BUDGETWARRIOR 1.0: WEB INTERFACE AND ASSET TRACKING! POSTED: 2018-01-26 12:52 ------------------------- I'm happy to announce the release of budgetwarrior 1.0. This is a major change over the previous version.

WEB INTERFACE

Until now, budgetwarrior could only be used in command line. This is fine for me, but not for every body. Since I wanted to share my budget with my girlfriend, I needed something less nerdy ;) Therefore, I added support for _a web interface for budgetwarrior_. Every feature of the console application is now available in the web version. Moreover, since the web version offers _slightly better_ graphical capabilities, I added a few more graphs and somewhat more information at some places. I'm not nearly an expert in web interface, but I think I managed to get something not too bad together. There are still some things to improve that I'll go through in the future but so far the web interface is pretty satisfying and it is MOBILE FRIENDLY! The web server is coded in C++ (who would have guessed...) and is embedded in the application, you need to use the command SERVER to use

it:

budget server

and the server will be launched (by default at localhost:8080). You can configure the port with server_port=X in the configuration file and the listen address with server_listen=X. You can access your server at http://localhost:8080. Here is what this will display: Note: All the data is randomized The main page shows your assets, the current net worth, your monthly cash-flow and the state of your objectives. The menu will give you access to all the features of the application. You can add expenses and earnings, see reports, manage your assets and your objectives and so on. Basically, you can do everything you did in the application, but you have access to more visualization tools than you would on the console. For instance, you can access your fortune

over time:

or see how your portfolio does in terms of currency: Normally, unless I forgot something (in which case, I'll fix it), everything should be doable from the web interface. This is simply easier people that are not as nerdy as me for console ;) The management is still the same, the server will write to the same file the base application uses. Therefore, you cannot use the server and the command line application on the same machine at the same time. Nevertheless, if the server is not running, you can still use the command line application. This could be useful if you want to use the web visualization while still using the command line tool for managing

the budget.

The default user and password is admin:1234, but you of course change it using web_password and web_user in the configuration. You can also disable the security if you are sure of yourself by setting server_secure=true in the configuration. The server currently does not

support

Currently, it does not protect against concurrent modifications of the same data. It is very unlikely to happen with only a few people using the applications, but I plan to improve that in the future.

SERVER MODE

Although it's not possible to use both the server and the command line application at the same time, it's possible to use the command line application in server mode. In this case, instead of reading and writing the data from the hard disk, the application will send requests to the server to read and write the data it needs. With this, you can use both the server and the command line application at the

same time!

While running, the server exposes a simple API that can be used to get all the information about the budget data and that can also be used to add new expenses, earnings and so on directly to the server data. The API is also protected by authentication. Currently, the server does not support HTTPS. However, you can run it behind a proxy such as nginx which is running in HTTPS. This is what I'm doing. The server mode supports SSL from the client to the server, you just have to set server_sll=true in the configuration. This is the mode I'm currently using and will continue using. With this, I can quickly do some modifications using the command line and if I want to see advanced visualization, I just have to open my browser and everything is updated. Moreover, in the future, other people involved with my budget will be able to access the web interface. This also solves the synchronization problem in a much better way than before. Just as it was the case with the server, this is not made to be used in parallel by different users. This should be perfectly fine for a

small household.

ASSETS TRACKING

Already a few months ago, I've added the feature to track assets `_ into budgetwarrior. You can define the list of the assets you possess. The tool will then help you track the value of your assets. You can set your desired distribution of bonds, cash and stocks and the tool will help you see if you need to rebalance your assets. This will let you compute your net worth, with :code:`budget asset value: Moreover, you can also set a few of your assets as your portfolio assets. These assets have a desired distribution and are handled different. These are the assets you directly manage yourself, your investment portfolio. You can then track their value and see if they need rebalancing. For instance, here is a randomized rebalancing of your portfolio, with budget asset rebalance: All these features are now also available on the web version as well. BETTER CONSOLE USABILITY A few months ago, I added some quality-of-life improvements to the console appplication

.

You can now cycle through the list of possible values for accounts for instance in the console! This is down with the UP and DOWN keys. Now, I also added auto-completion with TAB key. You can write Ins and it will complete to Insurances if you have an Insurances account in your budget. This makes it much faster to enter new expenses or to update asset values.

INSTALLATION

If you are on Gentoo, you can install it using layman: layman -a wichtounet emerge -a budgetwarrior If you are on Arch Linux, you can use this AUR repository (wait a few day for the new version to be updated)_ For other systems, you'll have to install from sources: git clone --recursive git://github.com/wichtounet/budgetwarrior.git

cd budgetwarrior

git checkout 1.0

make

sudo make install

CONCLUSION

Overall, even though I'm not a fan of web development, it was quite fun to add all these features to budgetwarrior and made it much better I think. This is a very significant change to the project since it almost doubled in number of source lines of code, but I think it's a change that was needed. I think these changes really make budgetwarrior more useful to a wider group of people and I'm pretty to have finally come around and implemented them. I still have a few things I plan to improve in the near future. First, I want to make the website a bit faster, there are many scripts and stylesheets that are being loaded and make the site a bit bloated. I'll also enable gzip compression of the website to speed up things. I will also ensure that the server can handle requests concurrently without any problem of the data (should be simple since we don't need high performance). I may also add a new module to budgetwarrior to track your progress towards retirement if this is something you are interested in, but I haven't decided in what form exactly. Finally, I will also try to optimize the requests that are being done between the server and the client when run in server mode. Indeed, it currently downloads almost all the data from the server which is far from optimal. If you are interested by the sources, you can download them on Github:

budgetwarrior .

If you have a suggestion or you found a bug, please post an issue on

Github.

If you have any comment, don't hesitate to contact me, either by letting a comment on this post or by email.

1 Comment

MY THESIS IS AVAILABLE: DEEP LEARNING FEATURE EXTRACTION FOR IMAGE

PROCESSING

POSTED: 2018-01-15 15:11 ------------------------- I'm happy to say that I've finally put my thesis online and updated my

Publications

page.

I should have done that earlier but it slipped my mind, so there it

is!

My thesis (Deep Learning Feature Extraction for Image Processing) is now available to download. Here is the abstract of the thesis: In this thesis, we propose to use methodologies that automatically learn how to extract relevant features from images. We are especially interested in evaluating how these features compare against handcrafted features. More precisely, we are interested in the unsupervised training that is used for the Restricted Boltzmann Machine (RBM) and Convolutional RBM (CRBM) models. These models relaunched the Deep Learning interest of the last decade. During the time of this thesis, the auto-encoders approach, especially Convolutional Auto-Encoders (CAE) have been used more and more. Therefore, one objective of this thesis is also to compare the CRBM approach with the CAE approach. The scope of this work is defined by several machine learning tasks. The first one, handwritten digit recognition, is analysed to see how much the unsupervised pretraining technique introduced with the Deep Belief Network (DBN) model improves the training of neural networks. The second, detection and recognition of Sudoku in images, is evaluating the efficiency of DBN and Convolutional DBN (CDBN) models for classification of images of poor quality. Finally, features are learned fully unsupervised from images for a keyword spotting task and are compared against well-known handcrafted features. Moreover, the thesis was also oriented around a software engineering axis. Indeed, a complete machine learning framework was developed during this thesis to explore possible optimizations and possible algorithms in order to train the tested models as fast as possible. If you are interested, you can: * Read it on ResearchGate * Directly download the PDF I hope this will interest a few of you! As always, if you have any question, don't hesitate to let me a comment ;) As for the current projects, I'm still currently working on the next version of budgetwarrior, but I don't have any expected release date. It will depend on much time I'm able to put to the project.

0 Comments

EXPRESSION TEMPLATES LIBRARY 1.2.1: FASTER GPU AND NEW FEATURES POSTED: 2018-01-09 11:06 ------------------------- Happy new year to all my dear readers! It has been a while since I've posted on this blog. I've had to serve three weeks in the army and then I had two weeks vacation. I've been actively working on budgetwarrior with a brand new web interface! More

on that later ;)

Today, I'm happy to release the version 1.2.1 of my Expression Templates Library (ETL) project. This is a minor version but with significantly better GPU support and a few new features and bug fixes so I decided to release it now.

FASTER GPU SUPPORT

Last year, I implemented the support for the detection of advanced GPU

patterns in ETL

.

This will significantly reduce the number of CUDA kernel calls that are being launched. For instance, each of the following expressions will be evaluated using a single GPU kernel: yy = 1.1 * x + y yy = x + 1.1 * y yy = 1.1 * y + 1.2 * y yy = 1.1 * x * y yy = x / (1.1 * y) This makes some operation significantly faster. Moreover, I've reduced a lot the numbers of device synchronization in the library. Especially, I've removed almost all synchronization from the etl-gpu-blas sub library. This means that synchronization is mostly only done when data needs to go back to the CPU. For machine learning, this means at the end of the epoch to compute the final error. This makes a HUGE difference in time, I didn't realize before that I was doing way too much synchronization. With these two changes, I've been able to attain _state of the art training performance on GPU_ with my Deep Learning Library (DLL)

project!

Moreover, I've now added for random number generations on the GPU and for shuffle operations as well.

NEW FEATURES

I've also added a few new features recently. They were especially added to support new features in DLL. Matrices and vectors can now be normalized in order to have zero-mean and unit-variance distribution. You can also merge matrices together. For now, there is no GPU support, so this will use CPU anyway. I plan

to fix that later.

In addition to bias_batch_mean that I added before, I also added bias_batch_var now with the variance in place of the mean. This is mainly used for Batch Normalization in machine learning, but it may have some other usages. The GPU support has been added as well

directly.

And the last feature is the support for embedding and the gradients of embedding. Again this is totally related to machine learning, but can be very useful as well. I haven't add the time to develop the GPU version so far, but this will come as well.

PERFORMANCE

Nothing fancy on the CPU performance side, I only added vectorization for hyperbolic versions. This makes _tanh much faster on CPU_.

BUG FIXES

I fixed quite a few bugs in this version, which is one of the main reason I released it: 1. When using large fast_matrix and aliasing was detected, there was a big chance of stack overflow occurring. This is now fixed by using a dynamic temporary. 1. Some assignables such sub_view did not perform any detection for aliasing. This is now fixed and aliasing is detected everywhere. 1. fast_dyn_matrix can now be correctly used with _bool_ 1. The use of iterators was not always ensuring correct CPU/GPU consistency. This is now correctly handled. 1. The 4D convolution in GPU were not using the correct flipping 1. Fix small compilation bug with sub_matrix and GPU

WHAT'S NEXT ?

I don't really know what will be in the next release. This should be the release 1.3. One possible idea would be to improve and review the support for sparse matrix which is more than poor as of now. But I'm not really motivated to work on that :P Moreover, I'm now _actively_ working on the next release of budgetwarrior which will probably still

come this month.

I'm also still hesitating in switching to C++17 for the library to make it faster to compile. And also to clean some parts of the code. I would be able to remove quite some SFINAE with the new _if constexpr_, but I'm afraid this will make the library to difficult to use since it would need at least GCC 7 or clang 3.9.

DOWNLOAD ETL

You can download ETL on Github . If you only interested in the 1.2.1 version, you can look at the Releases pages or clone the tag 1.2.1. There are several branches: * _master_ Is the eternal development branch, may not always be

stable

* _stable_ Is a branch always pointing to the last tag, no

development here

For the future release, there always will tags pointing to the corresponding commits. You can also have access to previous releases on Github or via the release tags. The documentation is still a bit sparse. There are a few examples and the Wiki, but there still is work to be done. If you have questions on how to use or configure the library, please don't hesitate. Don't hesitate to comment this post if you have any comment on this library or any question. You can also open an Issue on Github if you have a problem using this library or propose a Pull Request if you have any contribution you'd like to make to the library. Hope this may be useful to some of you :)

0 Comments

ADVANCED GPU PATTERNS OPTIMIZATION IN ETL POSTED: 2017-11-26 15:44 ------------------------- The GPU performance of my Expression Templates Library (ETL) is pretty good when most of the time is spent inside expensive operations such as Matrix-Matrix Multiplication or convolutions. However, when most of the time is spent in linear kernels, performance is not great because this will invoke a lot of CUDA kernels. Indeed, the way it is done is that each sub expressions compute its result in a temporary GPU vector (or matrix) and these temporaries are passed through the expressions. For instance, this expression: yy = 1.1 * x + 1.2 * y will be executed on the GPU as something like this:

t1 = 1.1 * x

t2 = 1.2 * y

yy = t1 + t2

that will results in three GPU kernels being invoked. In the CPU case, the complete expression will be executed as one CPU kernel, that is constructed with Expression Templates. Unfortunately, a CUDA kernel cannot be constructed in the same way since the CUDA compiler does not support general template metaprogramming. That's why I've implemented by using small kernels for each expression. Fortunately, we can do better with a bit more meta-programming. Indeed, there are some patterns that are repeated a lot and that easily be implemented in CUDA kernels. I've started detecting a few of these patterns and for each of them a single CUDA kernel is executed. For instance, each of the following expressions can be executed with a

single kernel:

yy = 1.1 * x + y yy = x + 1.1 * y yy = 1.1 * y + 1.2 * y yy = 1.1 * x * y yy = x / (1.1 * y) This results in significantly performance improvement for these

expressions!

I have tested these new improvements in my Deep Learning Library (DLL) project (not yet merged) and it resulted in 25% FASTER MOMENTUM COMPUTATION and 17% FASTER NESTEROV ADAM (NADAM). I'm going to continue to investigate which kernels need to be made faster for DLL and try to improve the overall performance. Currently, the GPU performance of DLL is very good for large convolutional networks, but could be improved for small fully-connected networks. Indeed, in that case, quite some time is spent outside the matrix-matrix multiplication and inside serial expressions for which GPU could be improved. Once I'm done with my optimizations, I'll probably post again on the blog with the latest results. All these new optimizations are now in the MASTER branch of the ETL project if you want to check it out. You can access the project on

Github .

0 Comments

INITIAL SUPPORT FOR LONG SHORT TERM MEMORY (LSTM) IN DLL POSTED: 2017-11-24 15:16 ------------------------- I'm really happy to announce that I just merged support for Long Short Term Memory (LSTM) cells into my Deep Learning Library (DLL) machine learning framework. Two weeks ago, I already merged suport for Recurrent Neural network (RNN)

.

It's nothing fancy yet, but forward propagation of LSTM and basic Backpropagation Through Time (BPTT) are now supported. It was not really complicated to implemenet the forward pass but the backward pass is much complicated for an LSTM than for a RNN. It took me quite a long time to figure out all the gradients formulas and the documentation on that is quite scarce. For now, still only existing classification loss is supported for RNN and LSTM. As I said last time, I still plan to add support for sequence-to-sequence loss in order to be able to train models able to generate characters. However, I don't know when I'll be able to work on that. Now that I've got the code for LSTM, I should be able to implement a GRU cell and NAS cell quite easily I believe. For example, here is a simple LSTM used on MNIST for classification: #include "dll/neural/dense_layer.hpp" #include "dll/neural/lstm_layer.hpp" #include "dll/neural/recurrent_last_layer.hpp" #include "dll/network.hpp" #include "dll/datasets.hpp" int main(int /*argc*/, char* /*argv*/ ) { // Load the dataset auto dataset = dll::make_mnist_dataset_nc(dll::batch_size<100>{}, dll::scale_pre<255>{}); constexpr size_t time_steps = 28; constexpr size_t sequence_length = 28; constexpr size_t hidden_units = 100; // Build the network using network_t = dll::dyn_network_desc< dll::network_layers< dll::lstm_layer, dll::recurrent_last_layer, dll::dense_layer

>

, dll::updater // Adam , dll::batch_size<100> // The mini-batch size

>::network_t;

auto net = std::make_unique(); // Display the network and dataset

net->display();

dataset.display();

// Train the network for performance sake net->fine_tune(dataset.train(), 50); // Test the network on test set net->evaluate(dataset.test());

return 0;

}

The network is quite similar to the one used previously with an RNN, just replace rnn with lstm and that's it. It starts with LSTM layer, followed by a layer extracting the last time step and finally a dense layer with a softmax function. The network is trained with Adam for 50 epochs. You can change the activation function , the initializer for the weights and the biases and number of steps for BPTT truncation. Here is the result I got on my last run: ------------------------------------------------------------ | Index | Layer | Parameters | Output Shape | ------------------------------------------------------------ | 0 | LSTM (TANH) (dyn) | 51200 | | | 1 | RNN(last) | 0 | | | 2 | Dense(SOFTMAX) (dyn) | 1000 | | ------------------------------------------------------------ Total Parameters: 52200 -------------------------------------------- | mnist | Size | Batches | Augmented Size | -------------------------------------------- | train | 60000 | 600 | 60000 | | test | 10000 | 100 | 10000 | -------------------------------------------- Network with 3 layers LSTM(dyn): 28x28 -> TANH -> 28x100 RNN(last): 28x100 -> 100 Dense(dyn): 100 -> SOFTMAX -> 10 Total parameters: 52200

Dataset

Training: In-Memory Data Generator

Size: 60000

Batches: 600

Testing: In-Memory Data Generator

Size: 10000

Batches: 100

Train the network with "Stochastic Gradient Descent"

Updater: ADAM

Loss: CATEGORICAL_CROSS_ENTROPY Early Stop: Goal(error)

With parameters:

epochs=50

batch_size=100

learning_rate=0.001

beta1=0.9

beta2=0.999

epoch 0/50 batch 600/ 600 - error: 0.07943 loss: 0.28504 time 20910ms epoch 1/50 batch 600/ 600 - error: 0.06683 loss: 0.24021 time 20889ms epoch 2/50 batch 600/ 600 - error: 0.04828 loss: 0.18233 time 21061ms epoch 3/50 batch 600/ 600 - error: 0.04407 loss: 0.16665 time 20839ms epoch 4/50 batch 600/ 600 - error: 0.03515 loss: 0.13290 time 22108ms epoch 5/50 batch 600/ 600 - error: 0.03207 loss: 0.12019 time 21393ms epoch 6/50 batch 600/ 600 - error: 0.02973 loss: 0.11239 time 28199ms epoch 7/50 batch 600/ 600 - error: 0.02653 loss: 0.10455 time 37039ms epoch 8/50 batch 600/ 600 - error: 0.02482 loss: 0.09657 time 23127ms epoch 9/50 batch 600/ 600 - error: 0.02177 loss: 0.08422 time 41766ms epoch 10/50 batch 600/ 600 - error: 0.02453 loss: 0.09382 time 29765ms epoch 11/50 batch 600/ 600 - error: 0.02575 loss: 0.09796 time 21449ms epoch 12/50 batch 600/ 600 - error: 0.02107 loss: 0.07833 time 42056ms epoch 13/50 batch 600/ 600 - error: 0.01877 loss: 0.07171 time 24673ms epoch 14/50 batch 600/ 600 - error: 0.02095 loss: 0.08481 time 20878ms epoch 15/50 batch 600/ 600 - error: 0.02040 loss: 0.07578 time 41515ms epoch 16/50 batch 600/ 600 - error: 0.01580 loss: 0.06083 time 25705ms epoch 17/50 batch 600/ 600 - error: 0.01945 loss: 0.07046 time 20903ms epoch 18/50 batch 600/ 600 - error: 0.01728 loss: 0.06683 time 41828ms epoch 19/50 batch 600/ 600 - error: 0.01577 loss: 0.05947 time 27810ms epoch 20/50 batch 600/ 600 - error: 0.01528 loss: 0.05883 time 21477ms epoch 21/50 batch 600/ 600 - error: 0.01345 loss: 0.05127 time 44718ms epoch 22/50 batch 600/ 600 - error: 0.01410 loss: 0.05357 time 25174ms epoch 23/50 batch 600/ 600 - error: 0.01268 loss: 0.04765 time 23827ms epoch 24/50 batch 600/ 600 - error: 0.01342 loss: 0.05004 time 47232ms epoch 25/50 batch 600/ 600 - error: 0.01730 loss: 0.06872 time 22532ms epoch 26/50 batch 600/ 600 - error: 0.01337 loss: 0.05016 time 30114ms epoch 27/50 batch 600/ 600 - error: 0.01842 loss: 0.07049 time 40136ms epoch 28/50 batch 600/ 600 - error: 0.01262 loss: 0.04639 time 21793ms epoch 29/50 batch 600/ 600 - error: 0.01403 loss: 0.05292 time 34096ms epoch 30/50 batch 600/ 600 - error: 0.01185 loss: 0.04456 time 35420ms epoch 31/50 batch 600/ 600 - error: 0.01098 loss: 0.04180 time 20909ms epoch 32/50 batch 600/ 600 - error: 0.01337 loss: 0.04687 time 30113ms epoch 33/50 batch 600/ 600 - error: 0.01415 loss: 0.05292 time 37393ms epoch 34/50 batch 600/ 600 - error: 0.00982 loss: 0.03615 time 20962ms epoch 35/50 batch 600/ 600 - error: 0.01178 loss: 0.04830 time 29305ms epoch 36/50 batch 600/ 600 - error: 0.00882 loss: 0.03408 time 38293ms epoch 37/50 batch 600/ 600 - error: 0.01148 loss: 0.04341 time 20841ms epoch 38/50 batch 600/ 600 - error: 0.00960 loss: 0.03701 time 29204ms epoch 39/50 batch 600/ 600 - error: 0.00850 loss: 0.03094 time 39802ms epoch 40/50 batch 600/ 600 - error: 0.01473 loss: 0.05136 time 20831ms epoch 41/50 batch 600/ 600 - error: 0.01007 loss: 0.03579 time 29856ms epoch 42/50 batch 600/ 600 - error: 0.00943 loss: 0.03370 time 38200ms epoch 43/50 batch 600/ 600 - error: 0.01205 loss: 0.04409 time 21162ms epoch 44/50 batch 600/ 600 - error: 0.00980 loss: 0.03674 time 32279ms epoch 45/50 batch 600/ 600 - error: 0.01068 loss: 0.04133 time 38448ms epoch 46/50 batch 600/ 600 - error: 0.00913 loss: 0.03478 time 20797ms epoch 47/50 batch 600/ 600 - error: 0.00985 loss: 0.03759 time 28885ms epoch 48/50 batch 600/ 600 - error: 0.00912 loss: 0.03295 time 41120ms epoch 49/50 batch 600/ 600 - error: 0.00930 loss: 0.03438 time 21282ms Restore the best (error) weights from epoch 39 Training took 1460s

Evaluation Results

error: 0.02440

loss: 0.11315

evaluation took 1000ms Again, nothing fancy yet, but this example has not been optimized for performance nor for accuracy. I also made a few changes to the RNN layer. I added support for biases and improved the code as well for performance and readability. All this support is now in the MASTER branch of the DLL project if you want to check it out. You can also check out the example online:

mnist_lstm.cpp

You can access the project on Github

.

Currently I'm working on the GPU performance again. The performance of some is still not as good as I want it to be, especially complex operation like used in Adam and Nadam. Currently, there are many calls to GPU BLAS libraries and I want to try to extract some more optimized patterns. Once it's done, I'll post more on that later on the blog.

0 Comments

DLL: PRETTY PRINTING AND LIVE OUTPUT POSTED: 2017-11-19 15:15 ------------------------- I've improved a lot the display of my Deep Learning Library (DLL). I know this is generally not the most important point in a machine learning framework, but the first impression being important. Therefore, I decided it was time to get a nicer output in the console for training networks. A network or a dataset can be displayed using the display() function. I've added a display_pretty() function to them to display it more nicely. I've also added the dll::dump_timers_nice() function to do the same for dll::dump_timers(). I've also improved the display for the results of the batches during training. Now, the display is updated every 100ms and it also displays the current estimated time until the end of the epoch. With that, the user should have a much better idea on what's going on during training, especially when training networks when the epochs are taking a long time to complete. Here is a full output of the training of fully-connected network on MNIST (mnist_mlp.cpp ): ------------------------------------------------------------ | Index | Layer | Parameters | Output Shape | ------------------------------------------------------------ | 0 | Dense(SIGMOID) (dyn) | 392000 | | | 1 | Dropout(0.50)(dyn) | 0 | | | 2 | Dense(SIGMOID) (dyn) | 125000 | | | 3 | Dropout(0.50)(dyn) | 0 | | | 4 | Dense(SOFTMAX) (dyn) | 2500 | | ------------------------------------------------------------ Total Parameters: 519500 -------------------------------------------- | mnist | Size | Batches | Augmented Size | -------------------------------------------- | train | 60000 | 600 | 60000 | | test | 10000 | 100 | 10000 | -------------------------------------------- Train the network with "Stochastic Gradient Descent"

Updater: NADAM

Loss: CATEGORICAL_CROSS_ENTROPY Early Stop: Goal(error)

With parameters:

epochs=50

batch_size=100

learning_rate=0.002

beta1=0.9

beta2=0.999

epoch 0/50 batch 600/ 600 - error: 0.04623 loss: 0.15097 time 3230ms epoch 1/50 batch 600/ 600 - error: 0.03013 loss: 0.09947 time 3188ms epoch 2/50 batch 600/ 600 - error: 0.02048 loss: 0.06565 time 3102ms epoch 3/50 batch 600/ 600 - error: 0.01593 loss: 0.05258 time 3189ms epoch 4/50 batch 600/ 600 - error: 0.01422 loss: 0.04623 time 3160ms epoch 5/50 batch 600/ 600 - error: 0.01112 loss: 0.03660 time 3131ms epoch 6/50 batch 600/ 600 - error: 0.01078 loss: 0.03546 time 3200ms epoch 7/50 batch 600/ 600 - error: 0.01003 loss: 0.03184 time 3246ms epoch 8/50 batch 600/ 600 - error: 0.00778 loss: 0.02550 time 3222ms epoch 9/50 batch 600/ 600 - error: 0.00782 loss: 0.02505 time 3119ms epoch 10/50 batch 600/ 600 - error: 0.00578 loss: 0.02056 time 3284ms epoch 11/50 batch 600/ 600 - error: 0.00618 loss: 0.02045 time 3220ms epoch 12/50 batch 600/ 600 - error: 0.00538 loss: 0.01775 time 3444ms epoch 13/50 batch 600/ 600 - error: 0.00563 loss: 0.01803 time 3304ms epoch 14/50 batch 600/ 600 - error: 0.00458 loss: 0.01598 time 3577ms epoch 15/50 batch 600/ 600 - error: 0.00437 loss: 0.01436 time 3228ms epoch 16/50 batch 600/ 600 - error: 0.00360 loss: 0.01214 time 3180ms epoch 17/50 batch 600/ 600 - error: 0.00405 loss: 0.01309 time 3090ms epoch 18/50 batch 600/ 600 - error: 0.00408 loss: 0.01346 time 3045ms epoch 19/50 batch 600/ 600 - error: 0.00337 loss: 0.01153 time 3071ms epoch 20/50 batch 600/ 600 - error: 0.00297 loss: 0.01021 time 3131ms epoch 21/50 batch 600/ 600 - error: 0.00318 loss: 0.01103 time 3076ms epoch 22/50 batch 600/ 600 - error: 0.00277 loss: 0.00909 time 3090ms epoch 23/50 batch 600/ 600 - error: 0.00242 loss: 0.00818 time 3163ms epoch 24/50 batch 600/ 600 - error: 0.00267 loss: 0.00913 time 3229ms epoch 25/50 batch 600/ 600 - error: 0.00295 loss: 0.00947 time 3156ms epoch 26/50 batch 600/ 600 - error: 0.00252 loss: 0.00809 time 3066ms epoch 27/50 batch 600/ 600 - error: 0.00227 loss: 0.00773 time 3156ms epoch 28/50 batch 600/ 600 - error: 0.00203 loss: 0.00728 time 3158ms epoch 29/50 batch 600/ 600 - error: 0.00240 loss: 0.00753 time 3114ms epoch 30/50 batch 600/ 600 - error: 0.00263 loss: 0.00864 time 3099ms epoch 31/50 batch 600/ 600 - error: 0.00210 loss: 0.00675 time 3096ms epoch 32/50 batch 600/ 600 - error: 0.00163 loss: 0.00628 time 3120ms epoch 33/50 batch 600/ 600 - error: 0.00182 loss: 0.00611 time 3045ms epoch 34/50 batch 600/ 600 - error: 0.00125 loss: 0.00468 time 3140ms epoch 35/50 batch 600/ 600 - error: 0.00183 loss: 0.00598 time 3093ms epoch 36/50 batch 600/ 600 - error: 0.00232 loss: 0.00711 time 3068ms epoch 37/50 batch 600/ 600 - error: 0.00170 loss: 0.00571 time 3057ms epoch 38/50 batch 600/ 600 - error: 0.00162 loss: 0.00530 time 3115ms epoch 39/50 batch 600/ 600 - error: 0.00155 loss: 0.00513 time 3226ms epoch 40/50 batch 600/ 600 - error: 0.00150 loss: 0.00501 time 2987ms epoch 41/50 batch 600/ 600 - error: 0.00122 loss: 0.00425 time 3117ms epoch 42/50 batch 600/ 600 - error: 0.00108 loss: 0.00383 time 3102ms epoch 43/50 batch 600/ 600 - error: 0.00165 loss: 0.00533 time 2977ms epoch 44/50 batch 600/ 600 - error: 0.00142 loss: 0.00469 time 3009ms epoch 45/50 batch 600/ 600 - error: 0.00098 loss: 0.00356 time 3055ms epoch 46/50 batch 600/ 600 - error: 0.00127 loss: 0.00409 time 3076ms epoch 47/50 batch 600/ 600 - error: 0.00132 loss: 0.00438 time 3068ms epoch 48/50 batch 600/ 600 - error: 0.00130 loss: 0.00459 time 3045ms epoch 49/50 batch 600/ 600 - error: 0.00107 loss: 0.00365 time 3103ms Restore the best (error) weights from epoch 45 Training took 160s

Evaluation Results

error: 0.01740

loss: 0.07861

evaluation took 67ms ----------------------------------------------------------------------------- | % | Timer | Count | Total | Average | ----------------------------------------------------------------------------- | 100.000% | net:train:ft | 1 | 160.183s | 160.183s | | 100.000% | net:trainer:train | 1 | 160.183s | 160.183s | | 99.997% | net:trainer:train:epoch | 50 | 160.178s | 3.20356s | | 84.422% | net:trainer:train:epoch:batch | 30000 | 135.229s | 4.50764ms | | 84.261% | sgd::train_batch | 30000 | 134.971s | 4.49904ms | | 44.404% | sgd::grad | 30000 | 71.1271s | 2.3709ms | | 35.453% | sgd::forward | 30000 | 56.7893s | 1.89298ms | | 32.245% | sgd::update_weights | 90000 | 51.6505s | 573.894us | | 32.226% | sgd::apply_grad:nadam | 180000 | 51.6211s | 286.783us | | 28.399% | dense:dyn:forward | 180300 | 45.4903s | 252.303us | | 17.642% | dropout:train:forward | 60000 | 28.2595s | 470.99us | | 13.707% | net:trainer:train:epoch:error | 50 | 21.957s | 439.14ms | | 12.148% | dense:dyn:gradients | 90000 | 19.4587s | 216.207us | | 4.299% | sgd::backward | 30000 | 6.88546s | 229.515us | | 3.301% | dense:dyn:backward | 60000 | 5.28729s | 88.121us | | 0.560% | dense:dyn:errors | 60000 | 896.471ms | 14.941us | | 0.407% | dropout:backward | 60000 | 651.523ms | 10.858us | | 0.339% | dropout:test:forward | 60000 | 542.799ms | 9.046us | | 0.161% | net:compute_loss:CCE | 60100 | 257.915ms | 4.291us | | 0.099% | sgd::error | 30000 | 158.33ms | 5.277us | ----------------------------------------------------------------------------- I hope this will make the output of the machine learning framework

more useful.

All this support is now in the MASTER branch of the DLL project if you want to check it out. You can also check out the example online:

More Annotations

Paul Gonzalez

2021-05-20 07:57:31

Paul Gonzalez

2021-05-20 07:57:32

Paul Gonzalez

2021-05-20 07:57:32

Paul Gonzalez

2021-05-20 07:57:32

Paul Gonzalez

2021-05-20 07:57:35

Paul Gonzalez

2021-05-20 07:57:38

Paul Gonzalez

2021-05-20 07:57:39

Paul Gonzalez

2021-05-20 07:57:39

Paul Gonzalez

2021-05-20 07:57:39

Paul Gonzalez

2021-05-20 07:57:42

Paul Gonzalez

2021-05-20 07:57:44

Paul Gonzalez

2021-05-20 07:57:45

Favourite Annotations

Paul Gonzalez

2019-11-05 00:35:33

Paul Gonzalez

2019-11-05 00:35:47

Paul Gonzalez

2019-11-05 00:35:53

Paul Gonzalez

2019-11-05 00:36:20

Paul Gonzalez

2019-11-05 00:36:23

Paul Gonzalez

2019-11-05 00:36:56

Paul Gonzalez

2019-11-05 00:37:26

Paul Gonzalez

2019-11-05 00:37:43

Paul Gonzalez

2019-11-05 00:38:53

Paul Gonzalez

2019-11-05 00:39:12

Paul Gonzalez

2019-11-05 00:39:26

Paul Gonzalez

2019-11-05 00:39:48

Text

several

several

several

(KiB/s).

like

as the

several

several

several

(KiB/s).

like

as the

several

help.

as the

like

like

like

several

techniques. We

techniques. We

test it on my

your sources.

year.

* About

* Publications

* Projects

* Tags

* Archives