Content:
2. Enabling the use of EPEL repository
3. Installing multiple CUDA versions on CentOS 7
4. Monitoring the NVidia GPU device by nvidia-smi
5. Making the software state in the NVIDIA driver persistent
6. Installing cuDNN for multiple versions of CUDA
7. Installing Intel Python 3 and tensorflow-gpu
8. Testing the CUDA and cuDNN installation
8.1. Testing if cuDNN library is loadable
8.2. Testing the CUDA Python 3 integration by using Numba
8.3. Testing the CUDA Python 3 integration by using tensorflow-gpu
1. Introduction
This publication describes how to install multiple versions of CUDA and cuDNN on the same system running CentOS 7 to support various applications, and tensorflow in particular (via tensorflow-gpu). The recipes provided bellow can be follow when adding GPU computing support for compute nodes, which are part of HPC cluster.
2. Enabling the use of EPEL repository
This is an optional step, applicable if the configuration for using EPEL repository is not presented in /etc/yum.repos.d. EPEL is required here because the installation of the nvidia graphics driver, part of CUDA packages, requires the presence of DKMS in the system in advance. That package is included in EPEL. To use EPEL install first its repository package:
# yum install epel-release # yum update
The dkms RPM package will be installed later, as a dependence required by the CUDA packages (see next section).
3. Installing multiple CUDA versions on CentOS 7
The most reasonable question here is why do we need multiple version of CUDA installed and supported locally on the system. Its answer is straightforward - it is all about the application software specific requirements. Some software products are very specific about the version of CUDA.
The most rational way to install the CUDA packages on CentOS 7 is through yum. NVidia provides the configuration files for using their yum repositories as a separate RPM package, which might be downloaded here:
https://developer.nvidia.com/cuda-downloads
To initiate the download consequently select Linux > x86_64 > CentOS > 7 > rpm (network) > Download as shown in the screen shots bellow:
and installed by following the instructions given bellow the "Download" button.
From time to time some inconsistencies appear in the CUDA yum repository. To prevent any problems they might cause edit the file /etc/yum.repos.d/cuda.repo by changing there the line:
enabled=1
into
enabled=0
From now on, every time an access to the CUDA repository RPM packages is required, do supply the command line option --enablerepo=cuda to yum.
After finishing with the yum configuration install the RPM packages containing the versions of CUDA currently supported by the vendor:
# yum --enablerepo=cuda install cuda-8-0 cuda-9-0 cuda-9-1
That will install plenty of packages. Take into account their installation size and prepare to meet that demand for disk space.
If, by any chance, the installer misses to install the packages nvidia-kmod, xorg-x11-drv-nvidia, xorg-x11-drv-nvidia-libs, and xorg-x11-drv-nvidia-gl, install them separately:
# yum --enablerepo=cuda install nvidia-kmod xorg-x11-drv-nvidia xorg-x11-drv-nvidia-libs xorg-x11-drv-nvidia-gl
4. Monitoring the NVidia GPU device by nvidia-smi
The tool nvidia-smi is part of the package xorg-x11-drv-nvidia. It shows the current status of the NVidia GPU device:
$ nvidia-smi
Sun May 6 17:15:10 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.30 Driver Version: 390.30 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Quadro K620 On | 00000000:02:00.0 Off | N/A |
| 34% 36C P8 1W / 30W | 1MiB / 2000MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
That tool is useful to check how many applications are currently running on the GPU device, what is the temperature there, the consumed power, the utilization rate, and what amount of memory is taken by the applications.
5. Making the software state in the NVIDIA driver persistent
To prevent the driver from releasing the NVidia GPU device, when that device is not in use by any process, the daemon nvidia-persistenced (part of the package xorg-x11-drv-nvidia) needs to be enabled and started:
# systemctl enable nvidia-persistenced # systemctl start nvidia-persistenced
6. Installing cuDNN for multiple versions of CUDA
The cuDNN library and header files can be downloaded from the web page of the vendor at:
https://developer.nvidia.com/cudnn
Note that a proper user registration is required to obtain the cuDNN files. Also, you need to download the archives with cuDNN library and header files for each and every CUDA version locally installed and supported. Which process, in turn, will end up bringing the following files into the download directory:
cudnn-8.0-linux-x64-v6.0.tgz cudnn-9.0-linux-x64-v7.tgz cudnn-9.1-linux-x64-v7.tgz
To proceed with the installation, unpack the content of the archives into the respective CUDA installation folders and recreate the database with the dynamic linker run time bindings, by executing (as root or super user) the command lines:
# tar --strip-components 1 -xf cudnn-8.0-linux-x64-v6.0.tgz -C /usr/local/cuda-8.0 # tar --strip-components 1 -xf cudnn-9.0-linux-x64-v7.tgz -C /usr/local/cuda-9.0 # tar --strip-components 1 -xf cudnn-9.1-linux-x64-v7.tgz -C /usr/local/cuda-9.1 # ldconfig /
It is recommended to check the successful archive unpacking and the proper recreation of the database with the dynamic linker run time bindings, by listing the database cache and grep the output for locating the string "cudnn" in it:
$ ldconfig -p | grep cudnn
The grep result indicating successful cuDNN installation, will look like:
libcudnn.so.7 (libc6,x86-64) => /usr/local/cuda-9.0/targets/x86_64-linux/lib/libcudnn.so.7 libcudnn.so.7 (libc6,x86-64) => /usr/local/cuda-9.1/targets/x86_64-linux/lib/libcudnn.so.7 libcudnn.so.6 (libc6,x86-64) => /usr/local/cuda-8.0/targets/x86_64-linux/lib/libcudnn.so.6 libcudnn.so (libc6,x86-64) => /usr/local/cuda-8.0/targets/x86_64-linux/lib/libcudnn.so libcudnn.so (libc6,x86-64) => /usr/local/cuda-9.0/targets/x86_64-linux/lib/libcudnn.so libcudnn.so (libc6,x86-64) => /usr/local/cuda-9.1/targets/x86_64-linux/lib/libcudnn.so
Do not become confused due to the multiple declarations made for libcudnn.so in the database (as seen in the output above). Seemingly, that indicates a collision, but note that each of libcudnn.so files is a symlink and it also provides an unique version number. That number is used by the tensorflow libraries to find which of the files matches best the version requirements.
7. Installing Intel Python 3 and tensorflow-gpu
If Intel Python 3 is not available in the system, follow the instructions given here:
https://software.intel.com/en-us/articles/installing-intel-free-libs-and-python-yum-repo
on how to install it. It is a single RPM package (mind its large installation size of several gigabytes) which contains tensorflow but (currently) does not include tensorflow-gpu module. Once Intel Python 3 is available the tensorflow-gpu module could be installed by invoking pip (the one provided by Intel Python 3).
Do not install tensorflow-gpu or any other module for Intel Python 3 as root or super user. Avoid any module installations inside the /opt/intel/intelpython3/ folder. Instead, perform the installation as unprivileged user and append the --user option to pip:
$ /opt/intel/intelpython3/bin/pip install --user tensorflow-gpu
The output information generated during the installation process should look like:
Collecting tensorflow-gpu
Downloading https://files.pythonhosted.org/packages/59/41/ba6ac9b63c5bfb90377784e29c4f4c478c74f53e020fa56237c939674f2d/tensorflow_gpu-1.8.0-cp36-cp36m-manylinux1_x86_64.whl (216.2MB)
100% |████████████████████████████████| 216.3MB 7.8kB/s
Collecting protobuf>=3.4.0 (from tensorflow-gpu)
Downloading https://files.pythonhosted.org/packages/74/ad/ecd865eb1ba1ff7f6bd6bcb731a89d55bc0450ced8d457ed2d167c7b8d5f/protobuf-3.5.2.post1-cp36-cp36m-manylinux1_x86_64.whl (6.4MB)
100% |████████████████████████████████| 6.4MB 266kB/s
Collecting gast>=0.2.0 (from tensorflow-gpu)
Downloading https://files.pythonhosted.org/packages/5c/78/ff794fcae2ce8aa6323e789d1f8b3b7765f601e7702726f430e814822b96/gast-0.2.0.tar.gz
Collecting termcolor>=1.1.0 (from tensorflow-gpu)
Downloading https://files.pythonhosted.org/packages/8a/48/a76be51647d0eb9f10e2a4511bf3ffb8cc1e6b14e9e4fab46173aa79f981/termcolor-1.1.0.tar.gz
Requirement already satisfied: wheel>=0.26 in /opt/intel/intelpython3/lib/python3.6/site-packages (from tensorflow-gpu)
Collecting tensorboard<1.9.0,>=1.8.0 (from tensorflow-gpu)
Downloading https://files.pythonhosted.org/packages/59/a6/0ae6092b7542cfedba6b2a1c9b8dceaf278238c39484f3ba03b03f07803c/tensorboard-1.8.0-py3-none-any.whl (3.1MB)
100% |████████████████████████████████| 3.1MB 545kB/s
Collecting grpcio>=1.8.6 (from tensorflow-gpu)
Downloading https://files.pythonhosted.org/packages/c8/b8/00e703183b7ae5e02f161dafacdfa8edbd7234cb7434aef00f126a3a511e/grpcio-1.11.0-cp36-cp36m-manylinux1_x86_64.whl (8.8MB)
100% |████████████████████████████████| 8.8MB 195kB/s
Collecting astor>=0.6.0 (from tensorflow-gpu)
Downloading https://files.pythonhosted.org/packages/b2/91/cc9805f1ff7b49f620136b3a7ca26f6a1be2ed424606804b0fbcf499f712/astor-0.6.2-py2.py3-none-any.whl
Requirement already satisfied: numpy>=1.13.3 in /opt/intel/intelpython3/lib/python3.6/site-packages (from tensorflow-gpu)
Requirement already satisfied: six>=1.10.0 in /opt/intel/intelpython3/lib/python3.6/site-packages (from tensorflow-gpu)
Collecting absl-py>=0.1.6 (from tensorflow-gpu)
Downloading https://files.pythonhosted.org/packages/90/6b/ba04a9fe6aefa56adafa6b9e0557b959e423c49950527139cb8651b0480b/absl-py-0.2.0.tar.gz (82kB)
100% |████████████████████████████████| 92kB 8.8MB/s
Requirement already satisfied: setuptools in /opt/intel/intelpython3/lib/python3.6/site-packages (from protobuf>=3.4.0->tensorflow-gpu)
Requirement already satisfied: werkzeug>=0.11.10 in /opt/intel/intelpython3/lib/python3.6/site-packages (from tensorboard<1.9.0,>=1.8.0->tensorflow-gpu)
Collecting bleach==1.5.0 (from tensorboard<1.9.0,>=1.8.0->tensorflow-gpu)
Downloading https://files.pythonhosted.org/packages/33/70/86c5fec937ea4964184d4d6c4f0b9551564f821e1c3575907639036d9b90/bleach-1.5.0-py2.py3-none-any.whl
Collecting markdown>=2.6.8 (from tensorboard<1.9.0,>=1.8.0->tensorflow-gpu)
Downloading https://files.pythonhosted.org/packages/6d/7d/488b90f470b96531a3f5788cf12a93332f543dbab13c423a5e7ce96a0493/Markdown-2.6.11-py2.py3-none-any.whl (78kB)
100% |████████████████████████████████| 81kB 8.9MB/s
Collecting html5lib==0.9999999 (from tensorboard<1.9.0,>=1.8.0->tensorflow-gpu)
Downloading https://files.pythonhosted.org/packages/ae/ae/bcb60402c60932b32dfaf19bb53870b29eda2cd17551ba5639219fb5ebf9/html5lib-0.9999999.tar.gz (889kB)
100% |████████████████████████████████| 890kB 1.7MB/s
Building wheels for collected packages: gast, termcolor, absl-py, html5lib
Running setup.py bdist_wheel for gast ... done
Stored in directory: /home/vesso/.cache/pip/wheels/9a/1f/0e/3cde98113222b853e98fc0a8e9924480a3e25f1b4008cedb4f
Running setup.py bdist_wheel for termcolor ... done
Stored in directory: /home/vesso/.cache/pip/wheels/7c/06/54/bc84598ba1daf8f970247f550b175aaaee85f68b4b0c5ab2c6
Running setup.py bdist_wheel for absl-py ... done
Stored in directory: /home/vesso/.cache/pip/wheels/23/35/1d/48c0a173ca38690dd8dfccfa47ffc750db48f8989ed898455c
Running setup.py bdist_wheel for html5lib ... done
Stored in directory: /home/vesso/.cache/pip/wheels/50/ae/f9/d2b189788efcf61d1ee0e36045476735c838898eef1cad6e29
Successfully built gast termcolor absl-py html5lib
Installing collected packages: protobuf, gast, termcolor, html5lib, bleach, markdown, tensorboard, grpcio, astor, absl-py, tensorflow-gpu
Successfully installed absl-py-0.2.0 astor-0.6.2 bleach-1.5.0 gast-0.2.0 grpcio-1.11.0 html5lib-0.9999999 markdown-2.6.11 protobuf-3.5.2.post1 tensorboard-1.8.0 tensorflow-gpu-1.8.0 termcolor-1.1.0
NOTE: The files brought by the tensorflow-gpu installation to the local file system will be located under ${HOME}/.local/lib/python3.6/site-packages/ directory!
8. Testing the CUDA and cuDNN installation
8.1. Testing if cuDNN library is loadable
That kind of test is very easy to perform. If it returns no error that means all symbols brought by the library libcudnn.so are known to the Python 3 interpreter.
To perform the test create the Python 3 script:
import ctypes
t=ctypes.cdll.LoadLibrary("libcudnn.so")
print(t._name)
save it as a file under the name cudnn_loading_cheker.py and then execute the script:
$ /opt/intel/intelpython3/bin/python3 cudnn_loading_cheker.py
If the libcudnn.so is successfully loaded the script will return the name of the library file:
libcudnn.so
and rise an error message otherwise.
8.2. Testing the CUDA Python 3 integration by using Numba
Along with the other modules for scientific computing and data analysis, the Intel Python 3 package supplies Numba. To perform GPU computing based on CUDA, the Numba jit compiler requires the environmental variables NUMBAPRO_NVVM and NUMBAPRO_LIBDEVICE both properly declared before start compiling any Python code containing GPU instructions. Those variables should point to the installation tree of the latest version of CUDA:
$ export NUMBAPRO_NVVM=/usr/local/cuda-9.1/nvvm/lib64/libnvvm.so.3.2.0 $ export NUMBAPRO_LIBDEVICE=/usr/local/cuda-9.1/nvvm/libdevice
It is highly recommendable to declare these variables in ${HOME}/.bashrc file.
Once the variables are declared and loaded, execute the test script /opt/intel/intelpython3/lib/python3.6/site-packages/numba/cuda/tests/cudapy/test_matmul.py:
$ /opt/intel/intelpython3/bin/python3 /opt/intel/intelpython3/lib/python3.6/site-packages/numba/cuda/tests/cudapy/test_matmul.py
In case of successful execution the script will exit by displaying the message:
. ---------------------------------------------------------------------- Ran 1 test in 0.093s OK
8.3. Testing the CUDA Python 3 integration by using tensorflow-gpu
A simple script for testing tensorflow-gpu can be found here:
https://github.com/yaroslavvb/stuff/blob/master/matmul_benchmark.py
It should be downloaded and then executed by using Intel Python 3 interpreter:
$ /opt/intel/intelpython3/bin/python3 matmul_benchmark.py
and in case of successful execution the following result will appear on the screen:
/opt/intel/intelpython3/lib/python3.6/site-packages/h5py/__init__.py:34: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`. from ._conv import register_converters as _register_converters 2018-05-06 16:21:22.591713: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2018-05-06 16:21:22.684411: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:898] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2018-05-06 16:21:22.684824: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties: name: Quadro K620 major: 5 minor: 0 memoryClockRate(GHz): 1.124 pciBusID: 0000:01:00.0 totalMemory: 1.95GiB freeMemory: 1.92GiB 2018-05-06 16:21:22.684855: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0 2018-05-06 16:21:23.151861: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix: 2018-05-06 16:21:23.151903: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929] 0 2018-05-06 16:21:23.151916: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0: N 2018-05-06 16:21:23.152061: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1692 MB memory) -> physical GPU (device: 0, name: Quadro K620, pci bus id: 0000:01:00.0, compute capability: 5.0) 8192 x 8192 matmul took: 1.34 sec, 817.99 G ops/sec







