Showing posts with label "Scientific Linux". Show all posts
Showing posts with label "Scientific Linux". Show all posts

Using high-grade ECC temporary keys with Apache and mod_ssl on CentOS 7

To support ECHDE-based SSL ciphers, your Apache server (through mod_ssl) needs a temporary ECC key to be generated and loaded during each start or restart of the httpd daemon. While on RHEL8 and CentOS 8 you can automatically generate and load such keys, every time when Apache daemon is invoked, simply by adding the declaration:

SSLOpenSSLConfCmd Curves P-384

to the section in /etc/httpd/conf.d/ssl.conf, there is no such support in mod_ssl 2.4.6 - the version shipped with CentOS 7. That does not mean mod_ssl 2.4.6 does not support ECDHE and the use of temporary ECC keys. It supports it, but instead of allowing the temporary ECC key size to be configured as a parameter in /etc/httpd/conf.d/ssl.conf, it sticks only to 256-bit secp256k1 key. Key of that size might not be suitable (cannot be recognized secure enough) for use in long-time running Apache instances. One workaround here is to manually generate ECC key, using secp384r1 (that is high-grade key), and add it to the end of the certificate file (that is the file pointed by the SSLCertificateFile declaration in /etc/httpd/conf.d/ssl.conf).

To generate the key, use openssl:

$ openssl ecparam -genkey -name secp384r1

The output will look like:

-----BEGIN EC PARAMETERS-----
BgUrgQQAIg==
-----END EC PARAMETERS-----
-----BEGIN EC PRIVATE KEY-----
MIGkAgEBBDCL5996z0+JsNAR2boRU0zULurtLbKbILiysJx3BbWEWFrlkuXL11BS
MI9bqrYNXjOgBwYFK4EEACKhZANiAAQISZYiUhc3GUXCaxOfsNun9KBiMyp9yAR3
qGH2NR1Va51Q4WfS4X0XPCaa3w3gA5g69MQV8aak3BVnbE27Q+yAZ5zi+dSNt5VU
Jg1tGgZmzX+SfJ6WWtMSv0Aa3r62UEE=
-----END EC PRIVATE KEY-----

Add it to the end of the certificate file, right after the line:

-----END CERTIFICATE-----

Save the changes to the file and reload or restart Apache's daemon httpd:

$ sudo systemctl reload httpd

To check if the server is successfully loaded and using the newly generated 384-bit long temporary key, use openssl:

$ openssl s_client -connect your_server_name:443

In case of success, you should see in the result the following line:

Server Temp Key: ECDH, P-384, 384 bits

NOTES: The configuration of mod_ssl should include at least one ECDHE cipher. Do not try to go to 521-bit key, for it is not supported by some browsers (yet).


Using SmartCard-HSM 4K USB-Token to work with ECDSA keys in RHEL, CentOS, and Enterprise Linux, versions 7 and 8

Content:

1. Introduction

2. Connecting the device

3. Downloading, compiling, and installing the last stable release of OpenSC and HSM code

4. Initializing the device (setting the PINs)

5. Generating 521-bit EC key pair

6. Using the USB token as OpenSSL PKCS#11 engine

7. Generating CSR, based on a key pair previously stored in the token, through OpenSSL PKCS#11 engine

8. Importing X.509 certificate to the token storage

 

1. Introduction

In the modern cryptographic systems, the use of Elliptic Curve cryptography (ECC) becomes de facto a standard. Meanwhile, even now (it is 2019), it is quite rare to find a cryptographic token that supports the generation and use of ECDSA keys, up to 512 bits, SHA512 hash function (embedded in the processor of the token), and comes with a driver for Linux OS support. Most of the available tokens support mostly RSA key pairs, and even if they support ECDSA, that support is partial and the maximum key size is limited bellow 512 bits.

Surprisingly, you can find a reliable crypto token, that supports ECDSA 512-bit keys and SHA512 at:

cardomatic.de

The SmartCard-HSM 4K USB-Token, they have in sale, is supported by the driver library libccid.so, which is part of the OpenSC package (version 0.16 or higher). If one can download, compile, and install locally both the latest OpenSC and HSM code release, that is sufficient to employ the SmartCard-HSM 4K USB-Token in OpenSSL (as an engine), Mozilla Firefox, and Mozilla Thunderbird (as a security device).

The notes given bellow explain how to make the SmartCard-HSM 4K USB-Token device available in a Linux-based system running CentOS 7, Scientific Linux 7, Red Hat Enterprise Linux 8, and (possibly) Red Hat Enterprise Linux 8, how initializing the device, and the modules to interact the opensc tools (like pkcs11-tools and pkcs15-tools).

 

2. Connecting the device

Install the following packages (there is a good chance they are already installed):

# yum install pcsc pcsc-lite-ccid opensc

Enable the daemon pcscd to be loaded when starting/restarting the system:

# systemctl enable pcscd

and run it manually afterwards:

# systemctl start pcscd

Now you can connect the SmartCard-HSM 4K USB-Token device to the system. Records like those given bellow will be added to the /var/log/messages by syslog:

May 10 15:00:06 argus kernel: usb 3-10: new full-speed USB device number 9 using xhci_hcd
May 10 15:00:06 argus kernel: usb 3-10: New USB device found, idVendor=04e6, idProduct=5816
May 10 15:00:06 argus kernel: usb 3-10: New USB device strings: Mfr=1, Product=2, SerialNumber=5
May 10 15:00:06 argus kernel: usb 3-10: Product: uTrust 3512 SAM slot Token
May 10 15:00:06 argus kernel: usb 3-10: Manufacturer: Identiv
May 10 15:00:06 argus kernel: usb 3-10: SerialNumber: 55511247701280

Their appearance there does not exactly mean the device is accessible by the applications. It only indicates that the USB device is properly recognized by libusb. In addition, the device will appear under the Vendor Name: SCM Microsystems, Inc., Vendor ID: 04e6, and Product ID: 5816, in the output generated by the execution of lsusb:

Bus 003 Device 009: ID 04e6:5816 SCM Microsystems, Inc.

If detailed debugging information, regarding the process of device detection, performed by pcscd daemon, is required, do modify the content of the file /usr/lib/systemd/system/pcscd.service (used by systemd to invoke the daemon pcscd), by adding the additional declaration --debug to the end of the line starting with "ExecStart":

ExecStart=/usr/sbin/pcscd --foreground --auto-exit --debug

Save the changes to the file, load the new list of declarations:

# systemctl daemon-reload

and finally restart the daemon pcscd:

# systemctl restart pcscd

At that moment, start monitoring the new content appended to the file /var/log/messages (# tailf /var/log/messages will do), then unplug and plug in the device, and watch what kind of messages the syslog daemon is appending to the file.

To prevent the fast growth of the /var/log/messages size on the disk, immediately after finishing with the analysis of the debug information, do restore the original declarations in /usr/lib/systemd/system/pcscd.service (remove the additional declaration --debug), reload it, and restart the daemon pcscd!

 

3. Downloading, compiling, and installing the last stable release of OpenSC and HSM code

First, install the required RPM packages (if they are not already installed):

# yum install git gcc gcc-c++ make

Then fetch the code of sc-hsm-embedded project, hosted on Github, configure, compile, and install it (change the destination folder after --prefix, if needed):

$ git clone https://github.com/CardContact/sc-hsm-embedded.git
$ cd sc-hsm-embedded
$ ./configure --prefix=/usr/local/sc-hsm-embedded
$ make
$ sudo make install

General note: Red Hat Enterprise Linux 8 (as well as CentOS 8 and Scientific Linux 8), ships OpenSC version 0.19. That means one does not need to compile OpenSC code, as explained bellow, in systems running EL8.

Once the installation is successfully completed, download, compile, and install the latest stable release of the OpenSC code (change the destination folder after --prefix and --with-completiondir, if needed):

$ wget https://github.com/OpenSC/OpenSC/releases/download/0.19.0/opensc-0.19.0.tar.gz
$ tar xvf opensc-0.19.0.tar.gz
$ cd opensc-0.19.0
$ ./configure --prefix=/usr/local/opensc-0.19.0 --with-completiondir=/usr/local/opensc-0.19.0/share/bash-completion/completions
$ make
$ sudo make install

The next step is to add the folders containing the installed binaries and libraries to the user's PATH and LD_LIBRARY_PATH environmental variables. Those additions can be made either system-wide (available to all users, inclusive) or user-specific (available only to specific users, exclusive):

  • system-wide:

    Write down the following export declarations:

    export PATH=/usr/local/opensc-0.19.0/bin:$PATH
    export LD_LIBRARY_PATH=/usr/local/sc-hsm-embedded/lib:/usr/local/opensc-0.19.0/lib:$LD_LIBRARY_PATH

    to the file /etc/profile.d/opensc.sh (create the file, it does not exist by default there).

  • user-specific:

    Open each ~/.bashrc file and append (to the end) the following lines:

    export PATH=/usr/local/opensc-0.19.0/bin:$PATH
    export LD_LIBRARY_PATH=/usr/local/sc-hsm-embedded/lib:/usr/local/opensc-0.19.0/lib:$LD_LIBRARY_PATH

 

4. Initializing the device (setting the PINs)

The SmartCard-HSM 4K USB-Token device comes uninitialized! That means the device cannot be used, unless complete the initialization process first.

Connect the token to the system and invoke the tool sc-hsm-tool by specifying the SO PIN (SO stands for "Security Officer", change SO PIN 3537363231383830, specified in the example bellow, with another unique 16-digit long decimal number and do not forget it) and the user's PIN (change 123456 to another number):

$ sc-hsm-tool --initialize --so-pin 3537363231383830 --pin 123456

Here, the number after --so-pin is the SO PIN and the one after --pin is the user's PIN. Note that the SO PIN is the most important PIN, for it allows write access to the all slots of the device storage. Also, it can change every PUK and PIN, previously stored there (that includes the SO PIN itself). So do not loose SO PIN number!


5. Generating 521-bit EC key pair

Invoke the tool pkcs11-tool and specify the key type (EC:specp521r1 for 521-bit EC encryption key) and key pair ID label (it helps later to access the keys):

$ pkcs11-tool --module libsc-hsm-pkcs11.so -l --keypairgen --key-type EC:secp521r1 --id 10 --label "EC pair ID 10"

The user's PIN will be requested next, to grant access to the device storage and processor:

Enter PKCS#11 token PIN for SmartCard-HSM:

If the provided user's PIN is correct, it will take up to 2-3 seconds to execute and complete the key generation process.

The most important part of the command line above is the ID number assigned to the key pair (given above is 10, but you may reserve some other number, including hexadecimal number syntax, which is not already taken), because from that moment onward, the keys of that pair can be selected and involved in operations only if their ID number is specified.

 

6. Using the USB token as OpenSSL PKCS#11 engine

Install the package openssl-pkcs11 to enable the OpenSSL PKCS#11 engines functionality:

# yum install openssl-pkcs11

After that, store the following minimal OpenSSL configuration into a local file (openssl_hsm.cnf, for the sake of our illustration process):

openssl_conf = openssl_def

[openssl_def]
engines = engine_section

[engine_section]
pkcs11 = pkcs11_section

[pkcs11_section]
engine_id = libsc-hsm-pkcs11
dynamic_path = /usr/lib64/engines-1.1/pkcs11.so
MODULE_PATH = /usr/local/opensc-0.19.0/lib/libsc-hsm-pkcs11.so
init = 0


[ req ]
distinguished_name      = req_distinguished_name
string_mask = utf8only

[ req_distinguished_name ]

The only declaration in openssl_hsm.cnf one might need to change is that of MODULE_PATH:

MODULE_PATH = /usr/local/opensc-0.19.0/lib/libsc-hsm-pkcs11.so

Replace /usr/local/opensc-0.19.0 there with the actual folder, where the file libsc-hsm-pkcs11.so is stored. Note that the file is brought there during the installation of the HSM code).

 

7. Generating CSR, based on a key pair previously stored in the token, through OpenSSL PKCS#11 engine

If the OpenSSL PKCS#11 engine configuration file is available, the generation of CSR PKCS#10 block requires to specify the key pair ID (that is the number assigned to the key pair during the process of key pair generation):

$ openssl req -new -subj "/CN=server.example.com" -out request.pem -sha512 -config openssl_hsm.cnf -engine pkcs11 -keyform engine -key 10

One can expand both command line arguments and the set of declarations stored in openssl_hsm.cnf, if the CSR should contain some specific extensions, requested by the certificate issuer.

 

8. Importing X.509 certificate to the token storage

That type of operation is possible only if the X.509 certificate is stored in DER format (binary). For example, if the file SU_ECC_Root_CA.crt contains PEM formatted X.509 certificate, that certificate might be converted into DER format, by using openssl:

$ openssl x509 -in SU_ECC_Root_CA.crt -outform DER -out SU_ECC_Root_CA.der

To import the DER formatted X.509 certificate to the USB storage token, use the tool pkcs11-tool:

$ pkcs11-tool --module ibsc-hsm-pkcs11.so --login --write-object SU_ECC_Root_CA.der --type cert --id `uuidgen | tr -d -` --label "SU_ECC_ROOT_CA"

The label string after --label, naming the X.509 certificate, should be specified in a way hinting the purpose of having and using that certificate.


Installing Intel Python 3, tensorflow-gpu, and multiple versions of CUDA and cuDNN on CentOS 7

Content:

1. Introduction

2. Enabling the use of EPEL repository

3. Installing multiple CUDA versions on CentOS 7

4. Monitoring the NVidia GPU device by nvidia-smi

5. Making the software state in the NVIDIA driver persistent

6. Installing cuDNN for multiple versions of CUDA

7. Installing Intel Python 3 and tensorflow-gpu

8. Testing the CUDA and cuDNN installation

8.1. Testing if cuDNN library is loadable

8.2. Testing the CUDA Python 3 integration by using Numba

8.3. Testing the CUDA Python 3 integration by using tensorflow-gpu


This publication describes how to install multiple versions of CUDA and cuDNN on the same system running CentOS 7 to support various applications, and tensorflow in particular (via tensorflow-gpu). The recipes provided bellow can be follow when adding GPU computing support for compute nodes, which are part of HPC cluster.


This is an optional step, applicable if the configuration for using EPEL repository is not presented in /etc/yum.repos.d. EPEL is required here because the installation of the nvidia graphics driver, part of CUDA packages, requires the presence of DKMS in the system in advance. That package is included in EPEL. To use EPEL install first its repository package:

# yum install epel-release
# yum update

The dkms RPM package will be installed later, as a dependence required by the CUDA packages (see next section).


The most reasonable question here is why do we need multiple version of CUDA installed and supported locally on the system. Its answer is straightforward - it is all about the application software specific requirements. Some software products are very specific about the version of CUDA.

The most rational way to install the CUDA packages on CentOS 7 is through yum. NVidia provides the configuration files for using their yum repositories as a separate RPM package, which might be downloaded here:

https://developer.nvidia.com/cuda-downloads

To initiate the download consequently select Linux > x86_64 > CentOS > 7 > rpm (network) > Download as shown in the screen shots bellow:


and installed by following the instructions given bellow the "Download" button.

From time to time some inconsistencies appear in the CUDA yum repository. To prevent any problems they might cause edit the file /etc/yum.repos.d/cuda.repo by changing there the line:

enabled=1

into

enabled=0

From now on, every time an access to the CUDA repository RPM packages is required, do supply the command line option --enablerepo=cuda to yum.

After finishing with the yum configuration install the RPM packages containing the versions of CUDA currently supported by the vendor:

# yum --enablerepo=cuda install cuda-8-0 cuda-9-0 cuda-9-1

That will install plenty of packages. Take into account their installation size and prepare to meet that demand for disk space.

If, by any chance, the installer misses to install the packages nvidia-kmod, xorg-x11-drv-nvidia, xorg-x11-drv-nvidia-libs, and xorg-x11-drv-nvidia-gl, install them separately:

# yum --enablerepo=cuda install nvidia-kmod xorg-x11-drv-nvidia xorg-x11-drv-nvidia-libs xorg-x11-drv-nvidia-gl

The tool nvidia-smi is part of the package xorg-x11-drv-nvidia. It shows the current status of the NVidia GPU device:

$ nvidia-smi

Sun May  6 17:15:10 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.30                 Driver Version: 390.30                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Quadro K620         On   | 00000000:02:00.0 Off |                  N/A |
| 34%   36C    P8     1W /  30W |      1MiB /  2000MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

That tool is useful to check how many applications are currently running on the GPU device, what is the temperature there, the consumed power, the utilization rate, and what amount of memory is taken by the applications.


To prevent the driver from releasing the NVidia GPU device, when that device is not in use by any process, the daemon nvidia-persistenced (part of the package xorg-x11-drv-nvidia) needs to be enabled and started:

# systemctl enable nvidia-persistenced
# systemctl start nvidia-persistenced

The cuDNN library and header files can be downloaded from the web page of the vendor at:

https://developer.nvidia.com/cudnn

Note that a proper user registration is required to obtain the cuDNN files. Also, you need to download the archives with cuDNN library and header files for each and every CUDA version locally installed and supported. Which process, in turn, will end up bringing the following files into the download directory:

cudnn-8.0-linux-x64-v6.0.tgz
cudnn-9.0-linux-x64-v7.tgz
cudnn-9.1-linux-x64-v7.tgz

To proceed with the installation, unpack the content of the archives into the respective CUDA installation folders and recreate the database with the dynamic linker run time bindings, by executing (as root or super user) the command lines:

# tar --strip-components 1 -xf cudnn-8.0-linux-x64-v6.0.tgz -C /usr/local/cuda-8.0
# tar --strip-components 1 -xf cudnn-9.0-linux-x64-v7.tgz -C /usr/local/cuda-9.0
# tar --strip-components 1 -xf cudnn-9.1-linux-x64-v7.tgz -C /usr/local/cuda-9.1
# ldconfig /

It is recommended to check the successful archive unpacking and the proper recreation of the database with the dynamic linker run time bindings, by listing the database cache and grep the output for locating the string "cudnn" in it:

$ ldconfig -p | grep cudnn

The grep result indicating successful cuDNN installation, will look like:

libcudnn.so.7 (libc6,x86-64) => /usr/local/cuda-9.0/targets/x86_64-linux/lib/libcudnn.so.7
libcudnn.so.7 (libc6,x86-64) => /usr/local/cuda-9.1/targets/x86_64-linux/lib/libcudnn.so.7
libcudnn.so.6 (libc6,x86-64) => /usr/local/cuda-8.0/targets/x86_64-linux/lib/libcudnn.so.6
libcudnn.so (libc6,x86-64) => /usr/local/cuda-8.0/targets/x86_64-linux/lib/libcudnn.so
libcudnn.so (libc6,x86-64) => /usr/local/cuda-9.0/targets/x86_64-linux/lib/libcudnn.so
libcudnn.so (libc6,x86-64) => /usr/local/cuda-9.1/targets/x86_64-linux/lib/libcudnn.so

Do not become confused due to the multiple declarations made for libcudnn.so in the database (as seen in the output above). Seemingly, that indicates a collision, but note that each of libcudnn.so files is a symlink and it also provides an unique version number. That number is used by the tensorflow libraries to find which of the files matches best the version requirements.


If Intel Python 3 is not available in the system, follow the instructions given here:

https://software.intel.com/en-us/articles/installing-intel-free-libs-and-python-yum-repo

on how to install it. It is a single RPM package (mind its large installation size of several gigabytes) which contains tensorflow but (currently) does not include tensorflow-gpu module. Once Intel Python 3 is available the tensorflow-gpu module could be installed by invoking pip (the one provided by Intel Python 3).

Do not install tensorflow-gpu or any other module for Intel Python 3 as root or super user. Avoid any module installations inside the /opt/intel/intelpython3/ folder. Instead, perform the installation as unprivileged user and append the --user option to pip:

$ /opt/intel/intelpython3/bin/pip install --user tensorflow-gpu

The output information generated during the installation process should look like:

Collecting tensorflow-gpu
  Downloading https://files.pythonhosted.org/packages/59/41/ba6ac9b63c5bfb90377784e29c4f4c478c74f53e020fa56237c939674f2d/tensorflow_gpu-1.8.0-cp36-cp36m-manylinux1_x86_64.whl (216.2MB)
    100% |████████████████████████████████| 216.3MB 7.8kB/s 
Collecting protobuf>=3.4.0 (from tensorflow-gpu)
  Downloading https://files.pythonhosted.org/packages/74/ad/ecd865eb1ba1ff7f6bd6bcb731a89d55bc0450ced8d457ed2d167c7b8d5f/protobuf-3.5.2.post1-cp36-cp36m-manylinux1_x86_64.whl (6.4MB)
    100% |████████████████████████████████| 6.4MB 266kB/s 
Collecting gast>=0.2.0 (from tensorflow-gpu)
  Downloading https://files.pythonhosted.org/packages/5c/78/ff794fcae2ce8aa6323e789d1f8b3b7765f601e7702726f430e814822b96/gast-0.2.0.tar.gz
Collecting termcolor>=1.1.0 (from tensorflow-gpu)
  Downloading https://files.pythonhosted.org/packages/8a/48/a76be51647d0eb9f10e2a4511bf3ffb8cc1e6b14e9e4fab46173aa79f981/termcolor-1.1.0.tar.gz
Requirement already satisfied: wheel>=0.26 in /opt/intel/intelpython3/lib/python3.6/site-packages (from tensorflow-gpu)
Collecting tensorboard<1.9.0,>=1.8.0 (from tensorflow-gpu)
  Downloading https://files.pythonhosted.org/packages/59/a6/0ae6092b7542cfedba6b2a1c9b8dceaf278238c39484f3ba03b03f07803c/tensorboard-1.8.0-py3-none-any.whl (3.1MB)
    100% |████████████████████████████████| 3.1MB 545kB/s 
Collecting grpcio>=1.8.6 (from tensorflow-gpu)
  Downloading https://files.pythonhosted.org/packages/c8/b8/00e703183b7ae5e02f161dafacdfa8edbd7234cb7434aef00f126a3a511e/grpcio-1.11.0-cp36-cp36m-manylinux1_x86_64.whl (8.8MB)
    100% |████████████████████████████████| 8.8MB 195kB/s 
Collecting astor>=0.6.0 (from tensorflow-gpu)
  Downloading https://files.pythonhosted.org/packages/b2/91/cc9805f1ff7b49f620136b3a7ca26f6a1be2ed424606804b0fbcf499f712/astor-0.6.2-py2.py3-none-any.whl
Requirement already satisfied: numpy>=1.13.3 in /opt/intel/intelpython3/lib/python3.6/site-packages (from tensorflow-gpu)
Requirement already satisfied: six>=1.10.0 in /opt/intel/intelpython3/lib/python3.6/site-packages (from tensorflow-gpu)
Collecting absl-py>=0.1.6 (from tensorflow-gpu)
  Downloading https://files.pythonhosted.org/packages/90/6b/ba04a9fe6aefa56adafa6b9e0557b959e423c49950527139cb8651b0480b/absl-py-0.2.0.tar.gz (82kB)
    100% |████████████████████████████████| 92kB 8.8MB/s 
Requirement already satisfied: setuptools in /opt/intel/intelpython3/lib/python3.6/site-packages (from protobuf>=3.4.0->tensorflow-gpu)
Requirement already satisfied: werkzeug>=0.11.10 in /opt/intel/intelpython3/lib/python3.6/site-packages (from tensorboard<1.9.0,>=1.8.0->tensorflow-gpu)
Collecting bleach==1.5.0 (from tensorboard<1.9.0,>=1.8.0->tensorflow-gpu)
  Downloading https://files.pythonhosted.org/packages/33/70/86c5fec937ea4964184d4d6c4f0b9551564f821e1c3575907639036d9b90/bleach-1.5.0-py2.py3-none-any.whl
Collecting markdown>=2.6.8 (from tensorboard<1.9.0,>=1.8.0->tensorflow-gpu)
  Downloading https://files.pythonhosted.org/packages/6d/7d/488b90f470b96531a3f5788cf12a93332f543dbab13c423a5e7ce96a0493/Markdown-2.6.11-py2.py3-none-any.whl (78kB)
    100% |████████████████████████████████| 81kB 8.9MB/s 
Collecting html5lib==0.9999999 (from tensorboard<1.9.0,>=1.8.0->tensorflow-gpu)
  Downloading https://files.pythonhosted.org/packages/ae/ae/bcb60402c60932b32dfaf19bb53870b29eda2cd17551ba5639219fb5ebf9/html5lib-0.9999999.tar.gz (889kB)
    100% |████████████████████████████████| 890kB 1.7MB/s 
Building wheels for collected packages: gast, termcolor, absl-py, html5lib
  Running setup.py bdist_wheel for gast ... done
  Stored in directory: /home/vesso/.cache/pip/wheels/9a/1f/0e/3cde98113222b853e98fc0a8e9924480a3e25f1b4008cedb4f
  Running setup.py bdist_wheel for termcolor ... done
  Stored in directory: /home/vesso/.cache/pip/wheels/7c/06/54/bc84598ba1daf8f970247f550b175aaaee85f68b4b0c5ab2c6
  Running setup.py bdist_wheel for absl-py ... done
  Stored in directory: /home/vesso/.cache/pip/wheels/23/35/1d/48c0a173ca38690dd8dfccfa47ffc750db48f8989ed898455c
  Running setup.py bdist_wheel for html5lib ... done
  Stored in directory: /home/vesso/.cache/pip/wheels/50/ae/f9/d2b189788efcf61d1ee0e36045476735c838898eef1cad6e29
Successfully built gast termcolor absl-py html5lib
Installing collected packages: protobuf, gast, termcolor, html5lib, bleach, markdown, tensorboard, grpcio, astor, absl-py, tensorflow-gpu
Successfully installed absl-py-0.2.0 astor-0.6.2 bleach-1.5.0 gast-0.2.0 grpcio-1.11.0 html5lib-0.9999999 markdown-2.6.11 protobuf-3.5.2.post1 tensorboard-1.8.0 tensorflow-gpu-1.8.0 termcolor-1.1.0

NOTE: The files brought by the tensorflow-gpu installation to the local file system will be located under ${HOME}/.local/lib/python3.6/site-packages/ directory!

That kind of test is very easy to perform. If it returns no error that means all symbols brought by the library libcudnn.so are known to the Python 3 interpreter.

To perform the test create the Python 3 script:

import ctypes

t=ctypes.cdll.LoadLibrary("libcudnn.so")

print(t._name)

save it as a file under the name cudnn_loading_cheker.py and then execute the script:

$ /opt/intel/intelpython3/bin/python3 cudnn_loading_cheker.py

If the libcudnn.so is successfully loaded the script will return the name of the library file:

libcudnn.so

and rise an error message otherwise.

Along with the other modules for scientific computing and data analysis, the Intel Python 3 package supplies Numba. To perform GPU computing based on CUDA, the Numba jit compiler requires the environmental variables NUMBAPRO_NVVM and NUMBAPRO_LIBDEVICE both properly declared before start compiling any Python code containing GPU instructions. Those variables should point to the installation tree of the latest version of CUDA:

$ export NUMBAPRO_NVVM=/usr/local/cuda-9.1/nvvm/lib64/libnvvm.so.3.2.0
$ export NUMBAPRO_LIBDEVICE=/usr/local/cuda-9.1/nvvm/libdevice

It is highly recommendable to declare these variables in ${HOME}/.bashrc file.

Once the variables are declared and loaded, execute the test script /opt/intel/intelpython3/lib/python3.6/site-packages/numba/cuda/tests/cudapy/test_matmul.py:

$ /opt/intel/intelpython3/bin/python3 /opt/intel/intelpython3/lib/python3.6/site-packages/numba/cuda/tests/cudapy/test_matmul.py

In case of successful execution the script will exit by displaying the message:

.
----------------------------------------------------------------------
Ran 1 test in 0.093s

OK

A simple script for testing tensorflow-gpu can be found here:

https://github.com/yaroslavvb/stuff/blob/master/matmul_benchmark.py

It should be downloaded and then executed by using Intel Python 3 interpreter:

$ /opt/intel/intelpython3/bin/python3 matmul_benchmark.py

and in case of successful execution the following result will appear on the screen:

/opt/intel/intelpython3/lib/python3.6/site-packages/h5py/__init__.py:34: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
2018-05-06 16:21:22.591713: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-05-06 16:21:22.684411: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:898] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-05-06 16:21:22.684824: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties: 
name: Quadro K620 major: 5 minor: 0 memoryClockRate(GHz): 1.124
pciBusID: 0000:01:00.0
totalMemory: 1.95GiB freeMemory: 1.92GiB
2018-05-06 16:21:22.684855: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2018-05-06 16:21:23.151861: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-05-06 16:21:23.151903: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929]      0 
2018-05-06 16:21:23.151916: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0:   N 
2018-05-06 16:21:23.152061: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1692 MB memory) -> physical GPU (device: 0, name: Quadro K620, pci bus id: 0000:01:00.0, compute capability: 5.0)

 8192 x 8192 matmul took: 1.34 sec, 817.99 G ops/sec


Compiling and installing GROMACS 2016 by using Intel C/C++ and Fortran compiler, and adding CUDA support


Content:

1. Introduction.

2. Setting the building environment.

3. Short notes on AVX-capable CPUs support available in GROMACS.

4. Downloading and installing CUDA.

5. Compiling and installing OpenMPI.

6. Compiling and installing GROMACS.

7. Invoking GROMACS.


GROMACS is an open source software for performing molecular dynamics sumulations. It also provides an excellent set of tools which can be used to analyze the results of the sumulations. GROMACS is fast and robust and its code supports a long range of run-time and compile-time optimizations. This document explains how to compile GROMACS 2016 on CentOS 7 and Scientific Linux 7 with CUDA support, by means of using Intel C/C++ and Fortran compiler.

Before starting with the compilation, be sure that you are awared of the following:

  • Do not use the latest version of GROMACS for production right after its official release, unless you are developer or just want to see what is new. Every software product based on such a huge amount of source code might contain some critical bugs at the beginning. Wait up for 1-2 weeks after the release date and then check carefully the GROMACS user-support forums. If you see no critical bugs reported (or minor bugs which might affect your simulations in particular) there you can compile the latest release of GROMACS. Even then test the build agains some known simulation results of yours. If you see no big differences (or see some expected ones) you can proceed with the implementation of the latest GROMACS release on your system for production.

  • If you administer HPC facility where the compute nodes are equipped with different processores, you most probably need to compile GROMACS code separately to match each CPU type features. To do so create build hosts for each CPU type by using nodes that matches that type. Compile there GROMACS and then clone the installation to the rest of the nodes of the same CPU type.

  • Always use the latest CUDA compatible to the particular GROMACS release (carefully check the GROMACS GPU documentation).

  • During the compilation of GROMACS always build its own FFTW library. That really boosts the productivity of GROMACS.

  • Compiling OpenMPI with Intel Compiler is not of critical importance (the system OpenMPI libraries provided by the Linux distributions could be employed instead), but it might rise up the productivity of the simulations. Also Intel C/C++ and Fortran compiler provides its native MPI support that could be used later, but having freshly compiled OpenMPI always helps you to be always up to date to the resent MPI development. Before starting the compilation be absolutely sure what libraries and compiler options do you need to successfully compile your custom OpenMPI and GROMACS!

  • Use the latest Intel C/C++ and Fortran Compiler if it is possible. That largerly guarantees that the specific processor flags of the GPU and CPU will be taken into account by the C/C++ and Fortran compilers during the compilation process.

 

2. Setting the building environment.

Before starting be sure you have your build folder created. You might need an unprivileged user to perform the compilation. Open this document to see how to do that:

https://vessokolev.blogspot.com/2016/08/speeding-up-your-scientific-python-code.html

See paragraphs 2, 3, and 4 there.

 

3. Short notes on AVX-capable CPUs support available in GROMACS.

If the result of the execution of cat /proc/cpuinfo shows avx2 CPU flag (coloured red bellow):

$ cat /proc/cpuinfo
...
processor : 23
vendor_id : GenuineIntel
cpu family : 6
model : 63
model name : Intel(R) Xeon(R) CPU E5-2670 v3 @ 2.30GHz
stepping : 2
microcode : 0x37
cpu MHz : 1221.156
cache size : 30720 KB
physical id : 0
siblings : 24
core id : 13
cpu cores : 12
apicid : 27
initial apicid : 27
fpu : yes
fpu_exception : yes
cpuid level : 15
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm ida arat epb pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm xsaveopt cqm_llc cqm_occup_llc
bogomips : 4589.54
clflush size : 64
cache_alignment : 64
address sizes : 46 bits physical, 48 bits virtual
power management:

then your CPU supports Intel® Advanced Vector Extensions 2 (Intel® AVX2). GROMACS supports AVX2 and that feature busts significantly the performance of the sumulations when compute bonded interactions. More on that CPU architecture features here:

https://software.intel.com/en-us/articles/how-intel-avx2-improves-performance-on-server-applications

 

4. Downloading and installing CUDA.

You need to install CUDA rpm packages on both build host and compute nodes. The easiest and most efficient way to do so and get updates later (when there are any available) is through yum. To install the NVidia CUDA yum repository file visit:

https://developer.nvidia.com/cuda-downloads

and download the repository rpm file, as illustrated in the picture shown bellow:

Alternative way to get the repository rpm file is to browse the NVidia CUDA repository directory at:

http://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64

scroll down there, and find and download the rpm file named "cuda-repo-rhel7-*" (select the recent one). Then install it locally by using yum localinstall

# yum localinstall /path/to/cuda-repo-rhel7-*.rpm

Once ready with the CUDA respository installation do become a root or super user and install the CUDA Toolkit rpm packages:

# yum install cuda

Note that the process of installation takes time, which mainly depends on both network connectivity and productivity of the local system. Also note that yum automatically installs (through dependencies in the rpm packages) DKMS to support the rebuilding of the NVidia kernel modules when booting a new kernel.

If you do not expect to use your compute nodes for compiling any code with CUDA support and only execute compiled binary code there, you might no need to install all rpm packages through the meta package "cuda" (as shown above). You could specify which of the packages you really need to install there (see the repository). To preview all packages provided by the NVidia CUDA repository execute:

# yum --disablerepo="*" --enablerepo="cuda" list available

Alternative way to preview all packages available in the repository "cuda" is to use the cached locally sqlite3 database of that repository:

# yum makecache
# HASH=`ls -p /var/cache/yum/x86_64/7/cuda/ | grep -v '/$' | grep primary.sqlite.bz2 | awk -F "-" '{print $1}'`
# cp /var/cache/yum/x86_64/7/cuda/${HASH} ~/tmp
# bunzip ${HASH}-primary.sqlite.bz2
# sqlite3 ${HASH}-primary.sqlite
sqlite> select name,version,arch,summary from packages;

 

5. Compiling and installing OpenMPI

Be sure you have the building environment set as explained before. The install the packages hwloc-devel and valgrind-devel:

# yum install hwloc-devel valgrind-devel

and finally proceed with the configuration, compilation, and installation:

$ cd /home/builder/compile
$ . ~/.intel_env
$ . /usr/local/appstack/.appstack_env
$ wget https://www.open-mpi.org/software/ompi/v2.0/downloads/openmpi-2.0.0.tar.bz2
$ tar jxvf openmpi-2.0.0.tar.bz2
$ cd openmpi-2.0.0
$ ./configure --prefix=/usr/local/appstack/openmpi-2.0.0 --enable-ipv6 --enable-mpi-fortran --enable-mpi-cxx --with-cuda --with-hwloc
$ gmake
$ gmake install
$ ln -s /usr/local/appstack/openmpi-2.0.0 /usr/local/appstack/openmpi
$ export PATH=/usr/local/appstack/openmpi/bin:$PATH
$ export LD_LIBRARY_PATH=/usr/local/appstack/openmpi/lib:$LD_LIBRARY_PATH

Do not forged to update the variables PATH and LD_LIBRARY_PATH by editting their values in the file /usr/local/appstack/.appstack_env. The OpenMPI installation thus compiled and installed provides your applications with more actual MPI tools and libraries than might be provided by the recent Intel C/C++/Fortran Compiler package.

 

6. Compiling and installing GROMACS

Be sure you have the building environment set as explained before and OpenMPI installed as shown above. Then proceed with GROMACS compilation and installation:

$ cd /home/builder/compile
$ wget ftp://ftp.gromacs.org/pub/gromacs/gromacs-2016.tar.gz
$ tar zxvf gromacs-2016.tar.gz
$ cd gromacs-2016
$ . ~/.intel_env
$ . /usr/local/appstack/.appstack_env
$ cmake . -DCMAKE_INSTALL_PREFIX=/usr/local/appstack/gromacs-2016 -DGMX_MPI=ON -DGMX_BUILD_OWN_FFTW=ON -DGMX_GPU=ON -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda -DMPI_C_LIBRARIES=/usr/local/appstack/openmpi/lib/libmpi.so -DMPI_C_INCLUDE_PATH=/usr/local/appstack/openmpi/include -DMPI_CXX_LIBRARIES=/usr/local/appstack/openmpi/lib/libmpi.so -DMPI_CXX_INCLUDE_PATH=/usr/local/appstack/openmpi/include
$ gmake
$ gmake install
$ export PATH=/usr/local/appstack/gromacs-2016/bin:$PATH
$ export LD_LIBRARY_PATH=/usr/local/appstack/gromacs-2016/lib64:$LD_LIBRARY_PATH

Do not forged to update the variables PATH and LD_LIBRARY_PATH by editting their values in the file /usr/local/appstack/.appstack_env.

 

7. Invoking GROMACS

To invoke GROMACS compiler and installed by following the instruction in this document you need to have the executable gmx_mpi (not gmx!!!) in your PATH environmental variable, as well as the path to libgromacs_mpi.so. You may set those paths by appending to your .bashrc the line:

. /usr/local/appstack/.appstack_env

You may execute as well the above as a command line only when you need to invoke gmx_mpi if you do not want to write this down to .bashrc.


Creative Commons - Attribution 2.5 Generic. Powered by Blogger.

Implementing LUKS Encryption on Software RAID Arrays with LVM2 Management

A comprehensive guide to partition-level encryption for maximum security ...

Search This Blog

Translate