Sunday, 6 May 2018

Installing Intel Python 3, tensorflow-gpu, and multiple versions of CUDA and cuDNN on CentOS 7


1. Introduction

2. Enabling the use of EPEL repository

3. Installing multiple CUDA versions on CentOS 7

4. Monitoring the NVidia GPU device by nvidia-smi

5. Making the software state in the NVIDIA driver persistent

6. Installing cuDNN for multiple versions of CUDA

7. Installing Intel Python 3 and tensorflow-gpu

8. Testing the CUDA and cuDNN installation

8.1. Testing if cuDNN library is loadable

8.2. Testing the CUDA Python 3 integration by using Numba

8.3. Testing the CUDA Python 3 integration by using tensorflow-gpu

This publication describes how to install multiple versions of CUDA and cuDNN on the same system running CentOS 7 to support various applications, and tensorflow in particular (via tensorflow-gpu). The recipes provided bellow can be follow when adding GPU computing support for compute nodes, which are part of HPC cluster.

This is an optional step, applicable if the configuration for using EPEL repository is not presented in /etc/yum.repos.d. EPEL is required here because the installation of the nvidia graphics driver, part of CUDA packages, requires the presence of DKMS in the system in advance. That package is included in EPEL. To use EPEL install first its repository package:

# yum install epel-release
# yum update

The dkms RPM package will be installed later, as a dependence required by the CUDA packages (see next section).

The most reasonable question here is why do we need multiple version of CUDA installed and supported locally on the system. Its answer is straightforward - it is all about the application software specific requirements. Some software products are very specific about the version of CUDA.

The most rational way to install the CUDA packages on CentOS 7 is through yum. NVidia provides the configuration files for using their yum repositories as a separate RPM package, which might be downloaded here:

To initiate the download consequently select Linux > x86_64 > CentOS > 7 > rpm (network) > Download as shown in the screen shots bellow:

and installed by following the instructions given bellow the "Download" button.

From time to time some inconsistencies appear in the CUDA yum repository. To prevent any problems they might cause edit the file /etc/yum.repos.d/cuda.repo by changing there the line:




From now on, every time an access to the CUDA repository RPM packages is required, do supply the command line option --enablerepo=cuda to yum.

After finishing with the yum configuration install the RPM packages containing the versions of CUDA currently supported by the vendor:

# yum --enablerepo=cuda install cuda-8-0 cuda-9-0 cuda-9-1

That will install plenty of packages. Take into account their installation size and prepare to meet that demand for disk space.

If, by any chance, the installer misses to install the packages nvidia-kmod, xorg-x11-drv-nvidia, xorg-x11-drv-nvidia-libs, and xorg-x11-drv-nvidia-gl, install them separately:

# yum --enablerepo=cuda install nvidia-kmod xorg-x11-drv-nvidia xorg-x11-drv-nvidia-libs xorg-x11-drv-nvidia-gl

The tool nvidia-smi is part of the package xorg-x11-drv-nvidia. It shows the current status of the NVidia GPU device:

$ nvidia-smi

Sun May  6 17:15:10 2018
| NVIDIA-SMI 390.30                 Driver Version: 390.30                    |
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|   0  Quadro K620         On   | 00000000:02:00.0 Off |                  N/A |
| 34%   36C    P8     1W /  30W |      1MiB /  2000MiB |      0%      Default |
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|  No running processes found                                                 |

That tool is useful to check how many applications are currently running on the GPU device, what is the temperature there, the consumed power, the utilization rate, and what amount of memory is taken by the applications.

To prevent the driver from releasing the NVidia GPU device, when that device is not in use by any process, the daemon nvidia-persistenced (part of the package xorg-x11-drv-nvidia) needs to be enabled and started:

# systemctl enable nvidia-persistenced
# systemctl start nvidia-persistenced

The cuDNN library and header files can be downloaded from the web page of the vendor at:

Note that a proper user registration is required to obtain the cuDNN files. Also, you need to download the archives with cuDNN library and header files for each and every CUDA version locally installed and supported. Which process, in turn, will end up bringing the following files into the download directory:


To proceed with the installation, unpack the content of the archives into the respective CUDA installation folders and recreate the database with the dynamic linker run time bindings, by executing (as root or super user) the command lines:

# tar --strip-components 1 -xf cudnn-8.0-linux-x64-v6.0.tgz -C /usr/local/cuda-8.0
# tar --strip-components 1 -xf cudnn-9.0-linux-x64-v7.tgz -C /usr/local/cuda-9.0
# tar --strip-components 1 -xf cudnn-9.1-linux-x64-v7.tgz -C /usr/local/cuda-9.1
# ldconfig /

It is recommended to check the successful archive unpacking and the proper recreation of the database with the dynamic linker run time bindings, by listing the database cache and grep the output for locating the string "cudnn" in it:

$ ldconfig -p | grep cudnn

The grep result indicating successful cuDNN installation, will look like: (libc6,x86-64) => /usr/local/cuda-9.0/targets/x86_64-linux/lib/ (libc6,x86-64) => /usr/local/cuda-9.1/targets/x86_64-linux/lib/ (libc6,x86-64) => /usr/local/cuda-8.0/targets/x86_64-linux/lib/ (libc6,x86-64) => /usr/local/cuda-8.0/targets/x86_64-linux/lib/ (libc6,x86-64) => /usr/local/cuda-9.0/targets/x86_64-linux/lib/ (libc6,x86-64) => /usr/local/cuda-9.1/targets/x86_64-linux/lib/

Do not become confused due to the multiple declarations made for in the database (as seen in the output above). Seemingly, that indicates a collision, but note that each of files is a symlink and it also provides an unique version number. That number is used by the tensorflow libraries to find which of the files matches best the version requirements.

If Intel Python 3 is not available in the system, follow the instructions given here:

on how to install it. It is a single RPM package (mind its large installation size of several gigabytes) which contains tensorflow but (currently) does not include tensorflow-gpu module. Once Intel Python 3 is available the tensorflow-gpu module could be installed by invoking pip (the one provided by Intel Python 3).

Do not install tensorflow-gpu or any other module for Intel Python 3 as root or super user. Avoid any module installations inside the /opt/intel/intelpython3/ folder. Instead, perform the installation as unprivileged user and append the --user option to pip:

$ /opt/intel/intelpython3/bin/pip install --user tensorflow-gpu

The output information generated during the installation process should look like:

Collecting tensorflow-gpu
  Downloading (216.2MB)
    100% |████████████████████████████████| 216.3MB 7.8kB/s 
Collecting protobuf>=3.4.0 (from tensorflow-gpu)
  Downloading (6.4MB)
    100% |████████████████████████████████| 6.4MB 266kB/s 
Collecting gast>=0.2.0 (from tensorflow-gpu)
Collecting termcolor>=1.1.0 (from tensorflow-gpu)
Requirement already satisfied: wheel>=0.26 in /opt/intel/intelpython3/lib/python3.6/site-packages (from tensorflow-gpu)
Collecting tensorboard<1.9.0,>=1.8.0 (from tensorflow-gpu)
  Downloading (3.1MB)
    100% |████████████████████████████████| 3.1MB 545kB/s 
Collecting grpcio>=1.8.6 (from tensorflow-gpu)
  Downloading (8.8MB)
    100% |████████████████████████████████| 8.8MB 195kB/s 
Collecting astor>=0.6.0 (from tensorflow-gpu)
Requirement already satisfied: numpy>=1.13.3 in /opt/intel/intelpython3/lib/python3.6/site-packages (from tensorflow-gpu)
Requirement already satisfied: six>=1.10.0 in /opt/intel/intelpython3/lib/python3.6/site-packages (from tensorflow-gpu)
Collecting absl-py>=0.1.6 (from tensorflow-gpu)
  Downloading (82kB)
    100% |████████████████████████████████| 92kB 8.8MB/s 
Requirement already satisfied: setuptools in /opt/intel/intelpython3/lib/python3.6/site-packages (from protobuf>=3.4.0->tensorflow-gpu)
Requirement already satisfied: werkzeug>=0.11.10 in /opt/intel/intelpython3/lib/python3.6/site-packages (from tensorboard<1.9.0,>=1.8.0->tensorflow-gpu)
Collecting bleach==1.5.0 (from tensorboard<1.9.0,>=1.8.0->tensorflow-gpu)
Collecting markdown>=2.6.8 (from tensorboard<1.9.0,>=1.8.0->tensorflow-gpu)
  Downloading (78kB)
    100% |████████████████████████████████| 81kB 8.9MB/s 
Collecting html5lib==0.9999999 (from tensorboard<1.9.0,>=1.8.0->tensorflow-gpu)
  Downloading (889kB)
    100% |████████████████████████████████| 890kB 1.7MB/s 
Building wheels for collected packages: gast, termcolor, absl-py, html5lib
  Running bdist_wheel for gast ... done
  Stored in directory: /home/vesso/.cache/pip/wheels/9a/1f/0e/3cde98113222b853e98fc0a8e9924480a3e25f1b4008cedb4f
  Running bdist_wheel for termcolor ... done
  Stored in directory: /home/vesso/.cache/pip/wheels/7c/06/54/bc84598ba1daf8f970247f550b175aaaee85f68b4b0c5ab2c6
  Running bdist_wheel for absl-py ... done
  Stored in directory: /home/vesso/.cache/pip/wheels/23/35/1d/48c0a173ca38690dd8dfccfa47ffc750db48f8989ed898455c
  Running bdist_wheel for html5lib ... done
  Stored in directory: /home/vesso/.cache/pip/wheels/50/ae/f9/d2b189788efcf61d1ee0e36045476735c838898eef1cad6e29
Successfully built gast termcolor absl-py html5lib
Installing collected packages: protobuf, gast, termcolor, html5lib, bleach, markdown, tensorboard, grpcio, astor, absl-py, tensorflow-gpu
Successfully installed absl-py-0.2.0 astor-0.6.2 bleach-1.5.0 gast-0.2.0 grpcio-1.11.0 html5lib-0.9999999 markdown-2.6.11 protobuf-3.5.2.post1 tensorboard-1.8.0 tensorflow-gpu-1.8.0 termcolor-1.1.0

NOTE: The files brought by the tensorflow-gpu installation to the local file system will be located under ${HOME}/.local/lib/python3.6/site-packages/ directory!

That kind of test is very easy to perform. If it returns no error that means all symbols brought by the library are known to the Python 3 interpreter.

To perform the test create the Python 3 script:

import ctypes



save it as a file under the name and then execute the script:

$ /opt/intel/intelpython3/bin/python3

If the is successfully loaded the script will return the name of the library file:

and rise an message otherwise.

Along with the other modules for scientific computing and data analysis, the Intel Python 3 package supplies Numba. To perform GPU computing based on CUDA, the Numba jit compiler requires the environmental variables NUMBAPRO_NVVM and NUMBAPRO_LIBDEVICE both properly declared before start compiling any Python code containing GPU instructions. Those variables should point to the installation tree of the latest version of CUDA:

$ export NUMBAPRO_NVVM=/usr/local/cuda-9.1/nvvm/lib64/
$ export NUMBAPRO_LIBDEVICE=/usr/local/cuda-9.1/nvvm/libdevice

It is highly recommendable to declare these variables in ${HOME}/.bashrc file.

Once the variables are declared and loaded, execute the test script /opt/intel/intelpython3/lib/python3.6/site-packages/numba/cuda/tests/cudapy/

$ /opt/intel/intelpython3/bin/python3 /opt/intel/intelpython3/lib/python3.6/site-packages/numba/cuda/tests/cudapy/

In case of successful execution the script will exit by displaying the message:

Ran 1 test in 0.093s


A simple script for testing tensorflow-gpu can be found here:

It should be downloaded and then executed by using Intel Python 3 interpreter:

$ /opt/intel/intelpython3/bin/python3

and in case of successful execution the following result will appear on the screen:

/opt/intel/intelpython3/lib/python3.6/site-packages/h5py/ FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
2018-05-06 16:21:22.591713: I tensorflow/core/platform/] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-05-06 16:21:22.684411: I tensorflow/stream_executor/cuda/] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-05-06 16:21:22.684824: I tensorflow/core/common_runtime/gpu/] Found device 0 with properties: 
name: Quadro K620 major: 5 minor: 0 memoryClockRate(GHz): 1.124
pciBusID: 0000:01:00.0
totalMemory: 1.95GiB freeMemory: 1.92GiB
2018-05-06 16:21:22.684855: I tensorflow/core/common_runtime/gpu/] Adding visible gpu devices: 0
2018-05-06 16:21:23.151861: I tensorflow/core/common_runtime/gpu/] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-05-06 16:21:23.151903: I tensorflow/core/common_runtime/gpu/]      0 
2018-05-06 16:21:23.151916: I tensorflow/core/common_runtime/gpu/] 0:   N 
2018-05-06 16:21:23.152061: I tensorflow/core/common_runtime/gpu/] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1692 MB memory) -> physical GPU (device: 0, name: Quadro K620, pci bus id: 0000:01:00.0, compute capability: 5.0)

 8192 x 8192 matmul took: 1.34 sec, 817.99 G ops/sec

No comments:

Post a Comment