Building ATLAS
03 Feb 2016Today I attempted to build ATLAS (Automatically Tuned Linear Algebra Software), an optimised linear algebra package which finely tunes itself to your system during the build process. While ATLAS is available through many package managers, these non-optimised builds miss out on much of the benefit of using this library.
I found building ATLAS to be more involved than many other related packages like LAPACK, however with a little perserverance it was achievable.
Disable CPU Throttling
During its extensive tuning in the build process, ATLAS needs to perform a lot of profiling. For this precise profiling to work requires the disabling of CPU throttling, the process whereby modern CPUs reduce their clock frequency while idle to save power.
I was able to achieve this on my system (running Arch Linux on a computer with twin Intel(R) Xeon(R) E5-2630s) through use of the cpupower package. First, install the package:
and then assign the “performance” power governor:
This unfortunately did not work by itself on my processors due to an apparent
problem with the intel_pstate power governor. To resolve this,
restart the computer with the additional kernel parameter
intel_pstate=disable
, and then execute
to enable the generic governors. Set the performance governor as above, then proceed with building ATLAS.
Avoiding Hyperthreading
For reasons they are more adept to explain, the ATLAS authors strongly discourage the use of hyperthreading. To disover the IDs of unique cores on my sytem, I discovered the great likwid package. To install it:
Use the likwid-topology
command to print, among other information, a table
of all the hyperthreads and their mappings to physical sockets and cores. This
allowed me to determine that threads 0-11 (out of 0-23) would map on to unique
cores.
Build ATLAS
We are now ready to build ATLAS itself. Download both it and the latest LAPACK release, unpack ATLAS and create a build directory. To build the software, first run the configure command in the build directory:
The first argument points to the LAPACK tarball you downloaded, the second lists the IDs of the unique threads from the step above, the next two arguments make the build system use a more accurate timer during tuning (replace MHZ with your clock speed in MHz, and do not use these options if the build will be competing for resources), and the final argument sets the maximum level of verbosity.
Once the configuration is complete, run the build with
When complete, consider running the tests:
and if desired, install to the system location with: