Tuesday, 23 August 2016

Compiling and installing GROMACS 2016 by using Intel C/C++ and Fortran compiler, and adding CUDA support


Content:

1. Introduction.

2. Setting the building environment.

3. Short notes on AVX-capable CPUs support available in GROMACS.

4. Downloading and installing CUDA.

5. Compiling and installing OpenMPI.

6. Compiling and installing GROMACS.

7. Invoking GROMACS.


GROMACS is an open source software for performing molecular dynamics sumulations. It also provides an excellent set of tools which can be used to analyze the results of the sumulations. GROMACS is fast and robust and its code supports a long range of run-time and compile-time optimizations. This document explains how to compile GROMACS 2016 on CentOS 7 and Scientific Linux 7 with CUDA support, by means of using Intel C/C++ and Fortran compiler.

Before starting with the compilation, be sure that you are awared of the following:

  • Do not use the latest version of GROMACS for production right after its official release, unless you are developer or just want to see what is new. Every software product based on such a huge amount of source code might contain some critical bugs at the beginning. Wait up for 1-2 weeks after the release date and then check carefully the GROMACS user-support forums. If you see no critical bugs reported (or minor bugs which might affect your simulations in particular) there you can compile the latest release of GROMACS. Even then test the build agains some known simulation results of yours. If you see no big differences (or see some expected ones) you can proceed with the implementation of the latest GROMACS release on your system for production.

  • If you administer HPC facility where the compute nodes are equipped with different processores, you most probably need to compile GROMACS code separately to match each CPU type features. To do so create build hosts for each CPU type by using nodes that matches that type. Compile there GROMACS and then clone the installation to the rest of the nodes of the same CPU type.

  • Always use the latest CUDA compatible to the particular GROMACS release (carefully check the GROMACS GPU documentation).

  • During the compilation of GROMACS always build its own FFTW library. That really boosts the productivity of GROMACS.

  • Compiling OpenMPI with Intel Compiler is not of critical importance (the system OpenMPI libraries provided by the Linux distributions could be employed instead), but it might rise up the productivity of the simulations. Also Intel C/C++ and Fortran compiler provides its native MPI support that could be used later, but having freshly compiled OpenMPI always helps you to be always up to date to the resent MPI development. Before starting the compilation be absolutely sure what libraries and compiler options do you need to successfully compile your custom OpenMPI and GROMACS!

  • Use the latest Intel C/C++ and Fortran Compiler if it is possible. That largerly guarantees that the specific processor flags of the GPU and CPU will be taken into account by the C/C++ and Fortran compilers during the compilation process.

 

2. Setting the building environment.

Before starting be sure you have your build folder created. You might need an unprivileged user to perform the compilation. Open this document to see how to do that:

https://vessokolev.blogspot.com/2016/08/speeding-up-your-scientific-python-code.html

See paragraphs 2, 3, and 4 there.

 

3. Short notes on AVX-capable CPUs support available in GROMACS.

If the result of the execution of cat /proc/cpuinfo shows avx2 CPU flag (coloured red bellow):

$ cat /proc/cpuinfo
...
processor : 23
vendor_id : GenuineIntel
cpu family : 6
model : 63
model name : Intel(R) Xeon(R) CPU E5-2670 v3 @ 2.30GHz
stepping : 2
microcode : 0x37
cpu MHz : 1221.156
cache size : 30720 KB
physical id : 0
siblings : 24
core id : 13
cpu cores : 12
apicid : 27
initial apicid : 27
fpu : yes
fpu_exception : yes
cpuid level : 15
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm ida arat epb pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm xsaveopt cqm_llc cqm_occup_llc
bogomips : 4589.54
clflush size : 64
cache_alignment : 64
address sizes : 46 bits physical, 48 bits virtual
power management:

then your CPU supports Intel® Advanced Vector Extensions 2 (Intel® AVX2). GROMACS supports AVX2 and that feature busts significantly the performance of the sumulations when compute bonded interactions. More on that CPU architecture features here:

https://software.intel.com/en-us/articles/how-intel-avx2-improves-performance-on-server-applications

 

4. Downloading and installing CUDA.

You need to install CUDA rpm packages on both build host and compute nodes. The easiest and most efficient way to do so and get updates later (when there are any available) is through yum. To install the NVidia CUDA yum repository file visit:

https://developer.nvidia.com/cuda-downloads

and download the repository rpm file, as illustrated in the picture shown bellow:

Alternative way to get the repository rpm file is to browse the NVidia CUDA repository directory at:

http://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64

scroll down there, and find and download the rpm file named "cuda-repo-rhel7-*" (select the recent one). Then install it locally by using yum localinstall

# yum localinstall /path/to/cuda-repo-rhel7-*.rpm

Once ready with the CUDA respository installation do become a root or super user and install the CUDA Toolkit rpm packages:

# yum install cuda

Note that the process of installation takes time, which mainly depends on both network connectivity and productivity of the local system. Also note that yum automatically installs (through dependencies in the rpm packages) DKMS to support the rebuilding of the NVidia kernel modules when booting a new kernel.

If you do not expect to use your compute nodes for compiling any code with CUDA support and only execute compiled binary code there, you might no need to install all rpm packages through the meta package "cuda" (as shown above). You could specify which of the packages you really need to install there (see the repository). To preview all packages provided by the NVidia CUDA repository execute:

# yum --disablerepo="*" --enablerepo="cuda" list available

Alternative way to preview all packages available in the repository "cuda" is to use the cached locally sqlite3 database of that repository:

# yum makecache
# HASH=`ls -p /var/cache/yum/x86_64/7/cuda/ | grep -v '/$' | grep primary.sqlite.bz2 | awk -F "-" '{print $1}'`
# cp /var/cache/yum/x86_64/7/cuda/${HASH} ~/tmp
# bunzip ${HASH}-primary.sqlite.bz2
# sqlite3 ${HASH}-primary.sqlite
sqlite> select name,version,arch,summary from packages;

 

5. Compiling and installing OpenMPI

Be sure you have the building environment set as explained before. The install the packages hwloc-devel and valgrind-devel:

# yum install hwloc-devel valgrind-devel

and finally proceed with the configuration, compilation, and installation:

$ cd /home/builder/compile
$ . ~/.intel_env
$ . /usr/local/appstack/.appstack_env
$ wget https://www.open-mpi.org/software/ompi/v2.0/downloads/openmpi-2.0.0.tar.bz2
$ tar jxvf openmpi-2.0.0.tar.bz2
$ cd openmpi-2.0.0
$ ./configure --prefix=/usr/local/appstack/openmpi-2.0.0 --enable-ipv6 --enable-mpi-fortran --enable-mpi-cxx --with-cuda --with-hwloc
$ gmake
$ gmake install
$ ln -s /usr/local/appstack/openmpi-2.0.0 /usr/local/appstack/openmpi
$ export PATH=/usr/local/appstack/openmpi/bin:$PATH
$ export LD_LIBRARY_PATH=/usr/local/appstack/openmpi/lib:$LD_LIBRARY_PATH

Do not forged to update the variables PATH and LD_LIBRARY_PATH by editting their values in the file /usr/local/appstack/.appstack_env. The OpenMPI installation thus compiled and installed provides your applications with more actual MPI tools and libraries than might be provided by the recent Intel C/C++/Fortran Compiler package.

 

6. Compiling and installing GROMACS

Be sure you have the building environment set as explained before and OpenMPI installed as shown above. Then proceed with GROMACS compilation and installation:

$ cd /home/builder/compile
$ wget ftp://ftp.gromacs.org/pub/gromacs/gromacs-2016.tar.gz
$ tar zxvf gromacs-2016.tar.gz
$ cd gromacs-2016
$ . ~/.intel_env
$ . /usr/local/appstack/.appstack_env
$ cmake . -DCMAKE_INSTALL_PREFIX=/usr/local/appstack/gromacs-2016 -DGMX_MPI=ON -DGMX_BUILD_OWN_FFTW=ON -DGMX_GPU=ON -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda -DMPI_C_LIBRARIES=/usr/local/appstack/openmpi/lib/libmpi.so -DMPI_C_INCLUDE_PATH=/usr/local/appstack/openmpi/include -DMPI_CXX_LIBRARIES=/usr/local/appstack/openmpi/lib/libmpi.so -DMPI_CXX_INCLUDE_PATH=/usr/local/appstack/openmpi/include
$ gmake
$ gmake install
$ export PATH=/usr/local/appstack/gromacs-2016/bin:$PATH
$ export LD_LIBRARY_PATH=/usr/local/appstack/gromacs-2016/lib64:$LD_LIBRARY_PATH

Do not forged to update the variables PATH and LD_LIBRARY_PATH by editting their values in the file /usr/local/appstack/.appstack_env.

 

7. Invoking GROMACS

To invoke GROMACS compiler and installed by following the instruction in this document you need to have the executable gmx_mpi (not gmx!!!) in your PATH environmental variable, as well as the path to libgromacs_mpi.so. You may set those paths by appending to your .bashrc the line:

. /usr/local/appstack/.appstack_env

You may execute as well the above as a command line only when you need to invoke gmx_mpi if you do not want to write this down to .bashrc.


No comments:

Post a Comment