Vesselin Kolev's Tech Corner: Python

Showing posts with label Python. Show all posts

Lightweight parser for locating entries in the FreeRADIUS log files by MAC address

By Vesselin Kolev September 06, 2018 No comments

The goal of developing and supporting this parser script, is to help analyzing the log files mainainned by the FreeRADIUS daemon radiusd (the content of those files is in plain text). When analyzing those files, the script is parsing their content line by line, and conditionally performing an additional check, if the line content has pattern mathching the one typical for the WiFi clients' autnetnication requests. That additional check has a goal to check if the MAC address of the client's WiFi adaper matches the one passed to the script as an invoking parameter. If match is found, the script prints the line to the standard output.

The script might be of help to the administrators of the FreeRADIUS servers that are part of the Eduroam infrastructure. Its code could be modified to support specific tasks like grouping the entries in the log files or providing front end to Zabbix agents.

You can get the code of the script and more information at:

https://github.com/vessokolev/radius_log_parser

Installing Intel Python 3, tensorflow-gpu, and multiple versions of CUDA and cuDNN on CentOS 7

By Vesselin Kolev May 06, 2018 No comments

Content:

1. Introduction

2. Enabling the use of EPEL repository

3. Installing multiple CUDA versions on CentOS 7

4. Monitoring the NVidia GPU device by nvidia-smi

5. Making the software state in the NVIDIA driver persistent

6. Installing cuDNN for multiple versions of CUDA

7. Installing Intel Python 3 and tensorflow-gpu

8. Testing the CUDA and cuDNN installation

8.1. Testing if cuDNN library is loadable

8.2. Testing the CUDA Python 3 integration by using Numba

8.3. Testing the CUDA Python 3 integration by using tensorflow-gpu

1. Introduction

This publication describes how to install multiple versions of CUDA and cuDNN on the same system running CentOS 7 to support various applications, and tensorflow in particular (via tensorflow-gpu). The recipes provided bellow can be follow when adding GPU computing support for compute nodes, which are part of HPC cluster.

2. Enabling the use of EPEL repository

This is an optional step, applicable if the configuration for using EPEL repository is not presented in /etc/yum.repos.d. EPEL is required here because the installation of the nvidia graphics driver, part of CUDA packages, requires the presence of DKMS in the system in advance. That package is included in EPEL. To use EPEL install first its repository package:

# yum install epel-release
# yum update

The dkms RPM package will be installed later, as a dependence required by the CUDA packages (see next section).

3. Installing multiple CUDA versions on CentOS 7

The most reasonable question here is why do we need multiple version of CUDA installed and supported locally on the system. Its answer is straightforward - it is all about the application software specific requirements. Some software products are very specific about the version of CUDA.

The most rational way to install the CUDA packages on CentOS 7 is through yum. NVidia provides the configuration files for using their yum repositories as a separate RPM package, which might be downloaded here:

https://developer.nvidia.com/cuda-downloads

To initiate the download consequently select Linux > x86_64 > CentOS > 7 > rpm (network) > Download as shown in the screen shots bellow:

and installed by following the instructions given bellow the "Download" button.

From time to time some inconsistencies appear in the CUDA yum repository. To prevent any problems they might cause edit the file /etc/yum.repos.d/cuda.repo by changing there the line:

enabled=1

into

enabled=0

From now on, every time an access to the CUDA repository RPM packages is required, do supply the command line option --enablerepo=cuda to yum.

After finishing with the yum configuration install the RPM packages containing the versions of CUDA currently supported by the vendor:

# yum --enablerepo=cuda install cuda-8-0 cuda-9-0 cuda-9-1

That will install plenty of packages. Take into account their installation size and prepare to meet that demand for disk space.

If, by any chance, the installer misses to install the packages nvidia-kmod, xorg-x11-drv-nvidia, xorg-x11-drv-nvidia-libs, and xorg-x11-drv-nvidia-gl, install them separately:

# yum --enablerepo=cuda install nvidia-kmod xorg-x11-drv-nvidia xorg-x11-drv-nvidia-libs xorg-x11-drv-nvidia-gl

4. Monitoring the NVidia GPU device by nvidia-smi

The tool nvidia-smi is part of the package xorg-x11-drv-nvidia. It shows the current status of the NVidia GPU device:

$ nvidia-smi

Sun May  6 17:15:10 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.30                 Driver Version: 390.30                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Quadro K620         On   | 00000000:02:00.0 Off |                  N/A |
| 34%   36C    P8     1W /  30W |      1MiB /  2000MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

That tool is useful to check how many applications are currently running on the GPU device, what is the temperature there, the consumed power, the utilization rate, and what amount of memory is taken by the applications.

5. Making the software state in the NVIDIA driver persistent

To prevent the driver from releasing the NVidia GPU device, when that device is not in use by any process, the daemon nvidia-persistenced (part of the package xorg-x11-drv-nvidia) needs to be enabled and started:

# systemctl enable nvidia-persistenced
# systemctl start nvidia-persistenced

6. Installing cuDNN for multiple versions of CUDA

The cuDNN library and header files can be downloaded from the web page of the vendor at:

https://developer.nvidia.com/cudnn

Note that a proper user registration is required to obtain the cuDNN files. Also, you need to download the archives with cuDNN library and header files for each and every CUDA version locally installed and supported. Which process, in turn, will end up bringing the following files into the download directory:

cudnn-8.0-linux-x64-v6.0.tgz
cudnn-9.0-linux-x64-v7.tgz
cudnn-9.1-linux-x64-v7.tgz

To proceed with the installation, unpack the content of the archives into the respective CUDA installation folders and recreate the database with the dynamic linker run time bindings, by executing (as root or super user) the command lines:

# tar --strip-components 1 -xf cudnn-8.0-linux-x64-v6.0.tgz -C /usr/local/cuda-8.0
# tar --strip-components 1 -xf cudnn-9.0-linux-x64-v7.tgz -C /usr/local/cuda-9.0
# tar --strip-components 1 -xf cudnn-9.1-linux-x64-v7.tgz -C /usr/local/cuda-9.1
# ldconfig /

It is recommended to check the successful archive unpacking and the proper recreation of the database with the dynamic linker run time bindings, by listing the database cache and grep the output for locating the string "cudnn" in it:

$ ldconfig -p | grep cudnn

The grep result indicating successful cuDNN installation, will look like:

libcudnn.so.7 (libc6,x86-64) => /usr/local/cuda-9.0/targets/x86_64-linux/lib/libcudnn.so.7
libcudnn.so.7 (libc6,x86-64) => /usr/local/cuda-9.1/targets/x86_64-linux/lib/libcudnn.so.7
libcudnn.so.6 (libc6,x86-64) => /usr/local/cuda-8.0/targets/x86_64-linux/lib/libcudnn.so.6
libcudnn.so (libc6,x86-64) => /usr/local/cuda-8.0/targets/x86_64-linux/lib/libcudnn.so
libcudnn.so (libc6,x86-64) => /usr/local/cuda-9.0/targets/x86_64-linux/lib/libcudnn.so
libcudnn.so (libc6,x86-64) => /usr/local/cuda-9.1/targets/x86_64-linux/lib/libcudnn.so

Do not become confused due to the multiple declarations made for libcudnn.so in the database (as seen in the output above). Seemingly, that indicates a collision, but note that each of libcudnn.so files is a symlink and it also provides an unique version number. That number is used by the tensorflow libraries to find which of the files matches best the version requirements.

7. Installing Intel Python 3 and tensorflow-gpu

If Intel Python 3 is not available in the system, follow the instructions given here:

https://software.intel.com/en-us/articles/installing-intel-free-libs-and-python-yum-repo

on how to install it. It is a single RPM package (mind its large installation size of several gigabytes) which contains tensorflow but (currently) does not include tensorflow-gpu module. Once Intel Python 3 is available the tensorflow-gpu module could be installed by invoking pip (the one provided by Intel Python 3).

Do not install tensorflow-gpu or any other module for Intel Python 3 as root or super user. Avoid any module installations inside the /opt/intel/intelpython3/ folder. Instead, perform the installation as unprivileged user and append the --user option to pip:

$ /opt/intel/intelpython3/bin/pip install --user tensorflow-gpu

The output information generated during the installation process should look like:

Collecting tensorflow-gpu
  Downloading https://files.pythonhosted.org/packages/59/41/ba6ac9b63c5bfb90377784e29c4f4c478c74f53e020fa56237c939674f2d/tensorflow_gpu-1.8.0-cp36-cp36m-manylinux1_x86_64.whl (216.2MB)
    100% |████████████████████████████████| 216.3MB 7.8kB/s 
Collecting protobuf>=3.4.0 (from tensorflow-gpu)
  Downloading https://files.pythonhosted.org/packages/74/ad/ecd865eb1ba1ff7f6bd6bcb731a89d55bc0450ced8d457ed2d167c7b8d5f/protobuf-3.5.2.post1-cp36-cp36m-manylinux1_x86_64.whl (6.4MB)
    100% |████████████████████████████████| 6.4MB 266kB/s 
Collecting gast>=0.2.0 (from tensorflow-gpu)
  Downloading https://files.pythonhosted.org/packages/5c/78/ff794fcae2ce8aa6323e789d1f8b3b7765f601e7702726f430e814822b96/gast-0.2.0.tar.gz
Collecting termcolor>=1.1.0 (from tensorflow-gpu)
  Downloading https://files.pythonhosted.org/packages/8a/48/a76be51647d0eb9f10e2a4511bf3ffb8cc1e6b14e9e4fab46173aa79f981/termcolor-1.1.0.tar.gz
Requirement already satisfied: wheel>=0.26 in /opt/intel/intelpython3/lib/python3.6/site-packages (from tensorflow-gpu)
Collecting tensorboard<1.9.0,>=1.8.0 (from tensorflow-gpu)
  Downloading https://files.pythonhosted.org/packages/59/a6/0ae6092b7542cfedba6b2a1c9b8dceaf278238c39484f3ba03b03f07803c/tensorboard-1.8.0-py3-none-any.whl (3.1MB)
    100% |████████████████████████████████| 3.1MB 545kB/s 
Collecting grpcio>=1.8.6 (from tensorflow-gpu)
  Downloading https://files.pythonhosted.org/packages/c8/b8/00e703183b7ae5e02f161dafacdfa8edbd7234cb7434aef00f126a3a511e/grpcio-1.11.0-cp36-cp36m-manylinux1_x86_64.whl (8.8MB)
    100% |████████████████████████████████| 8.8MB 195kB/s 
Collecting astor>=0.6.0 (from tensorflow-gpu)
  Downloading https://files.pythonhosted.org/packages/b2/91/cc9805f1ff7b49f620136b3a7ca26f6a1be2ed424606804b0fbcf499f712/astor-0.6.2-py2.py3-none-any.whl
Requirement already satisfied: numpy>=1.13.3 in /opt/intel/intelpython3/lib/python3.6/site-packages (from tensorflow-gpu)
Requirement already satisfied: six>=1.10.0 in /opt/intel/intelpython3/lib/python3.6/site-packages (from tensorflow-gpu)
Collecting absl-py>=0.1.6 (from tensorflow-gpu)
  Downloading https://files.pythonhosted.org/packages/90/6b/ba04a9fe6aefa56adafa6b9e0557b959e423c49950527139cb8651b0480b/absl-py-0.2.0.tar.gz (82kB)
    100% |████████████████████████████████| 92kB 8.8MB/s 
Requirement already satisfied: setuptools in /opt/intel/intelpython3/lib/python3.6/site-packages (from protobuf>=3.4.0->tensorflow-gpu)
Requirement already satisfied: werkzeug>=0.11.10 in /opt/intel/intelpython3/lib/python3.6/site-packages (from tensorboard<1.9.0,>=1.8.0->tensorflow-gpu)
Collecting bleach==1.5.0 (from tensorboard<1.9.0,>=1.8.0->tensorflow-gpu)
  Downloading https://files.pythonhosted.org/packages/33/70/86c5fec937ea4964184d4d6c4f0b9551564f821e1c3575907639036d9b90/bleach-1.5.0-py2.py3-none-any.whl
Collecting markdown>=2.6.8 (from tensorboard<1.9.0,>=1.8.0->tensorflow-gpu)
  Downloading https://files.pythonhosted.org/packages/6d/7d/488b90f470b96531a3f5788cf12a93332f543dbab13c423a5e7ce96a0493/Markdown-2.6.11-py2.py3-none-any.whl (78kB)
    100% |████████████████████████████████| 81kB 8.9MB/s 
Collecting html5lib==0.9999999 (from tensorboard<1.9.0,>=1.8.0->tensorflow-gpu)
  Downloading https://files.pythonhosted.org/packages/ae/ae/bcb60402c60932b32dfaf19bb53870b29eda2cd17551ba5639219fb5ebf9/html5lib-0.9999999.tar.gz (889kB)
    100% |████████████████████████████████| 890kB 1.7MB/s 
Building wheels for collected packages: gast, termcolor, absl-py, html5lib
  Running setup.py bdist_wheel for gast ... done
  Stored in directory: /home/vesso/.cache/pip/wheels/9a/1f/0e/3cde98113222b853e98fc0a8e9924480a3e25f1b4008cedb4f
  Running setup.py bdist_wheel for termcolor ... done
  Stored in directory: /home/vesso/.cache/pip/wheels/7c/06/54/bc84598ba1daf8f970247f550b175aaaee85f68b4b0c5ab2c6
  Running setup.py bdist_wheel for absl-py ... done
  Stored in directory: /home/vesso/.cache/pip/wheels/23/35/1d/48c0a173ca38690dd8dfccfa47ffc750db48f8989ed898455c
  Running setup.py bdist_wheel for html5lib ... done
  Stored in directory: /home/vesso/.cache/pip/wheels/50/ae/f9/d2b189788efcf61d1ee0e36045476735c838898eef1cad6e29
Successfully built gast termcolor absl-py html5lib
Installing collected packages: protobuf, gast, termcolor, html5lib, bleach, markdown, tensorboard, grpcio, astor, absl-py, tensorflow-gpu
Successfully installed absl-py-0.2.0 astor-0.6.2 bleach-1.5.0 gast-0.2.0 grpcio-1.11.0 html5lib-0.9999999 markdown-2.6.11 protobuf-3.5.2.post1 tensorboard-1.8.0 tensorflow-gpu-1.8.0 termcolor-1.1.0

NOTE: The files brought by the tensorflow-gpu installation to the local file system will be located under ${HOME}/.local/lib/python3.6/site-packages/ directory!

8. Testing the CUDA and cuDNN installation

8.1. Testing if cuDNN library is loadable

That kind of test is very easy to perform. If it returns no error that means all symbols brought by the library libcudnn.so are known to the Python 3 interpreter.

To perform the test create the Python 3 script:

import ctypes

t=ctypes.cdll.LoadLibrary("libcudnn.so")

print(t._name)

save it as a file under the name cudnn_loading_cheker.py and then execute the script:

$ /opt/intel/intelpython3/bin/python3 cudnn_loading_cheker.py

If the libcudnn.so is successfully loaded the script will return the name of the library file:

libcudnn.so

and rise an error message otherwise.

8.2. Testing the CUDA Python 3 integration by using Numba

Along with the other modules for scientific computing and data analysis, the Intel Python 3 package supplies Numba. To perform GPU computing based on CUDA, the Numba jit compiler requires the environmental variables NUMBAPRO_NVVM and NUMBAPRO_LIBDEVICE both properly declared before start compiling any Python code containing GPU instructions. Those variables should point to the installation tree of the latest version of CUDA:

$ export NUMBAPRO_NVVM=/usr/local/cuda-9.1/nvvm/lib64/libnvvm.so.3.2.0
$ export NUMBAPRO_LIBDEVICE=/usr/local/cuda-9.1/nvvm/libdevice

It is highly recommendable to declare these variables in ${HOME}/.bashrc file.

Once the variables are declared and loaded, execute the test script /opt/intel/intelpython3/lib/python3.6/site-packages/numba/cuda/tests/cudapy/test_matmul.py:

$ /opt/intel/intelpython3/bin/python3 /opt/intel/intelpython3/lib/python3.6/site-packages/numba/cuda/tests/cudapy/test_matmul.py

In case of successful execution the script will exit by displaying the message:

.
----------------------------------------------------------------------
Ran 1 test in 0.093s

OK

8.3. Testing the CUDA Python 3 integration by using tensorflow-gpu

A simple script for testing tensorflow-gpu can be found here:

https://github.com/yaroslavvb/stuff/blob/master/matmul_benchmark.py

It should be downloaded and then executed by using Intel Python 3 interpreter:

$ /opt/intel/intelpython3/bin/python3 matmul_benchmark.py

and in case of successful execution the following result will appear on the screen:

/opt/intel/intelpython3/lib/python3.6/site-packages/h5py/__init__.py:34: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
2018-05-06 16:21:22.591713: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-05-06 16:21:22.684411: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:898] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-05-06 16:21:22.684824: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties: 
name: Quadro K620 major: 5 minor: 0 memoryClockRate(GHz): 1.124
pciBusID: 0000:01:00.0
totalMemory: 1.95GiB freeMemory: 1.92GiB
2018-05-06 16:21:22.684855: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2018-05-06 16:21:23.151861: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-05-06 16:21:23.151903: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929]      0 
2018-05-06 16:21:23.151916: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0:   N 
2018-05-06 16:21:23.152061: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1692 MB memory) -> physical GPU (device: 0, name: Quadro K620, pci bus id: 0000:01:00.0, compute capability: 5.0)

 8192 x 8192 matmul took: 1.34 sec, 817.99 G ops/sec

Speeding up your scientific Python code on CentOS and Scientific Linux by using Intel Compilers

By Vesselin Kolev August 02, 2016 No comments

Content:

1. Introduction.

2. Installing the rpm packages needed during the compilation process.

3. Create an unprivileged user to run the compilation process.

4. Create folder for installing the compiled packages.

5. Settting the Intel Compilers environmental variables.

6. Compiling and installing SQLite library.

7. Compiling and installing Python 2.7.

8. Compiling and installing BLAS and LAPACK.

9. Installing setuptools.

10. Compiling and installing Cython.

11. Compiling and installing NumPy, SciPy, and Pandas.

12. Compiling and installing Matplotlib (optional).

13. Compiling and installing HDF5 support for Python - h5py.

14. Testing the installed Python modules.

15. Using the installed Python modules.

1. Introduction.

The goal of this document is to describe easy, safe, and illustrative way to bring more speed to your scientific Python code by compiling Python and a set of important modules (like sqlite3, NumPy, SciPy, Pandas, and h5py) by using Intel Compilers. The recipes described bellow are intented to run the compilation and installation as unprivileged user which is the safest way to do so. Also the used installaton schema process potential conflicts between the packages installed by the distribution package manager and the one brought to the local system by following the recipes.

The document is specific to the Linux distributions CentOS and ScientificLinux - the most used Linux distributions for science. With minor changes the recipes could be easily adapted for other Linux distributions which supports Intel Compilers.

Note that the compilation recipes provided bellow uses specific optimization for the currently used processor. Feel free to change that if you want to spread the product of the compilations over a compute cluster. Also the recipes might be collected into one and executed as a single configuration and installation script. They are given bellow separated mainly to make the details for each package compilation more visible for the reader.

2. Installing the rpm packages needed during the compilation process.

The following packages have to be installed in advance by using yum in order to support the compilation process: gcc, gcc-c++, gcc-gfortran, gcc-objc, gcc-objc++, libtool, cmake, ncurses-devel, openssl-devel, bzip2-devel, zlib-devel, readline-devel, gdbm-devel, tk-devel, and bzip2. Install them all together at once:

# yum install gcc gcc-c++ gcc-gfortran gcc-objc gcc-objc++ libtool cmake ncurses-devel openssl-devel bzip2-devel zlib-devel readline-devel gdbm-devel tk-devel bzip2

3. Create an unprivileged user to run the compilation process.

The default settings for creating user in RHEL, CentOS, and SL are fair enough in this case:

# useradd builder

The user name chosen for running the compilation process is "builder". But tou might choose a different user name if the one of "builder" is already taken of reserved. Finally set the password for this new user and/or install OpenSSH public key (in /home/builder/.ssh/authorized_keys) if this account is supposed to be accessed remotely.

4. Create folder for installing the compiled packages.

This documentation uses as a destination folder /usr/local/appstack. To prevent the use of "root" or a super user during the compilation and installation process make /usr/local/appstack owned by "builder":

# chown -R builder:builder /usr/local/appstack

Create (as user "builder") an empty file /usr/local/appstack/.appstack_env:

$ touch /usr/local/appstack/.appstack_env
$ chmod 644 /usr/local/appstack/.appstack_env

which would be provided later to the users who want to update their shell environmental variables in order to use the product of the alternatively compiled packages stored in /usr/local/appstack.

5. Settting the Intel Compilers environmental variables.

If the Intel Compilers packages are properly installed and accessible to the user "builder" the following variables have to be exported before invoking the Intel compilers as default C/C++, and Fortran compilers:

export CC=icc
export CXX=icpc
export CFLAGS='-O3 -xHost -ip -no-prec-div -fPIC'
export CXXFLAGS='-O3 -xHost -ip -no-prec-div -fPIC'
export FC=ifort
export FCFLAGS='-O3 -xHost -ip -no-prec-div -fPIC'
export CPP='icc -E'
export CXXCPP='icpc -E'

Unless it is very neccessary these variables should not appear in either /home/builder/.bashrc or /home/builder/.bash_profile. A possible way to load them occasionally (only when they are needed) is to create the file /home/builder/.intel_env, and describe the export declarations there. Then they could be loaded within the current bash shell session by executing:

$ . ~/.intel_env

6. Compiling and installing SQLite library.

The SQLite library is actively used in a wide range of scientific software applications. In order to make the library more productive its code needs to be compiled by Intel C/C++ compiler. Here is the recipe how to do that (consider using the latest stable version of SQLite!):

$ cd /home/builder/compile
$ . ~/.intel_env
$ wget https://sqlite.org/2016/sqlite-autoconf-3130000.tar.gz
$ tar zxvf sqlite-autoconf-3130000.tar.gz
$ cd sqlite-autoconf-3130000
$ ./configure --prefix=/usr/local/appstack/sqlite-3.13.0 --enable-shared --enable-readline --enable-fts5 --enable-json1
$ gmake
$ gmake install
$ ln -s /usr/local/appstack/sqlite-3.13.0 /usr/local/appstack/sqlite3
$ export PATH=/usr/local/appstack/sqlite3/bin:$PATH
$ export LD_LIBRARY_PATH=/usr/local/appstack/sqlite3/lib:$LD_LIBRARY_PATH

The last two command lines update the user's environmental variables PATH and LD_LIBRARY_PATH so the next compilation within the same bash shell session could use the paths to the SQLite library and executables. Also do update PATH and LD_LIBRARY_PATH in the file /usr/local/appstack/.appstack_env which is supposed to be exported by the users to to get the paths to the alternatively compiled executable binaries and libraries.

7. Compiling and installing Python 2.7.

To make the execution of the Python code faster the Python 2.7 should be compiled by using the Intel C/C++ compiler. Note that compiling Python this way makes it very hard to use the Python modules provided by the RPM packages. Hence all required Python modules should also be built in the same manner (custom compilaton by using Intel Compilers) and linked to the custom compiled version of Python. In the scientific practice it is important to use fast SQLite Python interface. To have it built-in SQLite ought to be compiled with Intel C/C++ Compiler as it is described above. Be sure that all requred rpm packages are installed in advance, as explained in "Installing the rpm packages needed during the compilation process". Finally, do follow this recipe to compile and install custom Python 2.7 distribution (always use the latest stable Python 2.7 version!):

$ cd /home/builder/compile
$ wget https://www.python.org/ftp/python/2.7.12/Python-2.7.12.tar.xz
$ tar Jxvf Python-2.7.12.tar.xz
$ . ~/.intel_env # Execute this if the previous bash shell session containing the compiler environmental variables has been closed!
$ . /usr/local/appstack/.appstack_env # Execute this if the previous bash shell session containing the environmental variables has been closed!
$ ./configure --prefix=/usr/local/appstack/python-2.7.12 --without-gcc --enable-ipv6 --enable-shared CFLAGS=-I/usr/local/appstack/sqlite3/include LDFLAGS=-L/usr/local/appstack/sqlite3/lib CPPFLAGS=-I/usr/local/appstack/sqlite3/include
$ gmake
$ gmake install
$ ln -s /usr/local/appstack/python-2.7.12 /usr/local/appstack/python2
$ export PATH=/usr/local/appstack/python2/bin:$PATH
$ export LD_LIBRARY_PATH=/usr/local/appstack/python2/lib:$LD_LIBRARY_PATH
$ export PYTHONPATH=/usr/local/appsatack/python2/lib

The last three lines of the recipe do update the environmental variables PATH and LD_LIBRARY_PATH currently available in the currently running bash shell session, and create a new one - PYTHONPATH (critically important variable for running any Python modules). They could help the next compilation (if the same bash shell session is used to do that). Also do update these variables in the file /usr/local/appstack/.appstack_env so the Python 2.7 installation folder to become the first in line in the path catalogue:

$ export PATH=/usr/local/appstack/python2/bin:/usr/local/appstack/sqlite3/bin:$PATH
$ export LD_LIBRARY_PATH=/usr/local/appstack/python2/lib:/usr/local/appstack/sqlite3/lib:$LD_LIBRARY_PATH

IMPORTANT! Do not forget to include in /usr/local/appstack/.appstack_env the Python path declaration:

export PYTHONPATH=/usr/local/appsatack/python2/lib

Otherwise none of the modules compiled bellow would not properly work!

8. Compiling and installing BLAS and LAPACK.

In order to compile and install scipy library one need BLAS and LAPACK libraries compiled and installed locally. It is enough to compile LAPACK tarball since it includes the BLAS code and if compiled properly provides libblas.so shared library. To speed up the execution of any code that uses LAPACK and BLAS the LAPACK source code should be compiled by using Intel Fortran Compiler according to the recipe given bellow (always use the latest stable version of LAPACK!):

$ cd /home/builder/compile
$ wget http://www.netlib.org/lapack/lapack-3.6.1.tgz
$ tar zxvf lapack-3.6.1.tgz
$ cd lapack-3.6.1
$ . ~/.intel_env # Execute this if the previous bash shell session containing the compiler environmental variables has been closed!
$ . /usr/local/appstack/.appstack_env # Execute this if the previous bash shell session containing the environmental variables has been closed!
$ cmake . -DCMAKE_INSTALL_PREFIX=/usr/local/appstack/lapack-3.6.1 -DCMAKE_INSTALL_LIBDIR=/usr/local/appstack/lapack-3.6.1/lib64 -DBUILD_SHARED_LIBS=1
$ gmake
$ gmake install
$ ln -s /usr/local/appstack/lapack-3.6.1 /usr/local/appstack/lapack
$ export LD_LIBRARY_PATH=/usr/local/appstack/lapack/lib64:$LD_LIBRARY_PATH

The last line of the recipe just updates the environmental variable LD_LIBRARY_PATH available within the currently used bash shell session. It could help the next compilation (if the same bash shell session is used). Also do update LD_LIBRARY_PATH in the file /usr/local/appstack/.appstack_env so the LAPACK installation folder to become the first in line in the path catalogue:

$ export LD_LIBRARY_PATH=/usr/local/appstack/lapack/lib64:/usr/local/appstack/python2/lib:/usr/local/appstack/sqlite3/lib:$LD_LIBRARY_PATH

An alternative method for bringing BLAS and LAPACK libraries to scipy is to compile and install ATLAS. Another way to do so is to use the BLAS and LAPACK which are already compiled as static libraries and provided by Intel C/C++ and Fortran Compiler installation tree. For more details take a look at this discussion:

https://software.intel.com/en-us/forums/intel-math-kernel-library/topic/611135

The method for obtaining BLAS and LAPACK libraries proposed in this document brings the lastest version of these libraries and it is easy to perform.

9. Installing setuptools.

Setuptools is needed when installing external to the Python distribution modules. The installation process is very short and easy:

$ cd /home/builder/compile
$ wget https://bootstrap.pypa.io/ez_setup.py
$ . /usr/local/appstack/.appstack_env # Execute this if the previous bash shell session containing the environmental variables has been closed!
$ python2 ez_setup.py

10. Compiling and installing Cython.

Cython provides C-extensions for Python and it is required by variaty of Python modules and NumPy, SciPy, and Pandas, in particular. Its installation is simple and follows the recipe (use the latest stable version of Cython!):

$ cd /home/builder/compile
$ wget https://pypi.python.org/packages/c6/fe/97319581905de40f1be7015a0ea1bd336a756f6249914b148a17eefa75dc/Cython-0.24.1.tar.gz
$ tar zxvf Cython-0.24.1.tar.gz
$ cd Cython-0.24.1
$ . ~/.intel_env # Execute this if the previous bash shell session containing the compiler environmental variables has been closed!
$ . /usr/local/appstack/.appstack_env # Execute this if the previous bash shell session containing the environmental variables has been closed!
$ python2 setup.py install

11. Compiling and installing NumPy, SciPy, and Pandas.

NumPy, SciPy, and Pandas are only three of the python libraries which development is coordinated by SciPy.org. The Python modules they provide are usually "a must" in the scientific practice. In many cases they could replace or even surpass their commercially developed and distributed rivals. There are more Python modules there but they either does not require such a specific compilation (SymPy, IPyton) or they might not be usable without running a graphical environment (Matplotlib). The recipe bellow shows how to compile and install NumPy, SciPy, and Pandas (use their latest stable versions!):

$ cd /home/builder/compile
$ . ~/.intel_env # Execute this if the previous bash shell session containing the compiler environmental variables has been closed!
$ . /usr/local/appstack/.appstack_env # Execute this if the previous bash shell session containing the environmental variables has been closed!
$ export BLAS=/usr/local/appstack/lapack/lib64
$ export LAPACK=/usr/local/appstack/lapack/lib64
$ wget https://github.com/numpy/numpy/archive/v1.11.1.tar.gz
$ wget wget https://github.com/scipy/scipy/releases/download/v0.18.0/scipy-0.18.0.tar.gz
$ wget https://pypi.python.org/packages/11/09/e66eb844daba8680ddff26335d5b4fead77f60f957678243549a8dd4830d/pandas-0.18.1.tar.gz
$ tar zxvf v1.11.1.tar.gz
$ tar zxvf scipy-0.18.0.tar.gz
$ tar zxvf pandas-0.18.1.tar.gz
$ cd numpy-1.11.1
$ python2 setup.py install
$ cd ..
$ cd scipy-0.18.0
$ python2 setup.py install
$ cd ..
$ cd pandas-0.18.1
$ python2 setup.py install

12. Compiling and installing Matplotlib (optional).

The direct use of Matplotlib requires the graphical user environment enabled which in most cases is not supported by the distributed computing. Nevertheless if Matplotlib need to be presented in the system it could be compiled and installed in the same manner done before for NumPy, SciPy, and Pandas. To provide at least one image graphical output driver the libpng-devel rpm package have to be locally installed:

# yum install libpng-devel

After that follow the recipe bellow to compile and install Matplotlib module for Python (use the latest stable version of Matplotlib!):

$ cd /home/builder/compile
$ wget https://github.com/matplotlib/matplotlib/archive/v1.5.2.tar.gz
$ tar zxvf v1.5.2.tar.gz
$ cd matplotlib-1.5.2
$ . ~/.intel_env # Execute this if the previous bash shell session containing the compiler environmental variables has been closed!
$ . /usr/local/appstack/.appstack_env # Execute this if the previous bash shell session containing the environmental variables has been closed!
$ python2 setup.py install

13. Compiling and installing HDF5 support for Python - h5py.

HDF5 support is essential when using Python to access and manage fast and adequately large data structures of different type. Currently the low level interface to HDF5 in Python is provided by the module h5py. To compile h5py one need first to compile HDF5 framework and install it locally so its libraries to be accessible to h5py. Note that by default both CentOS and SL provide HDF5 support but the executables and libraries which their RPM packages bring to the system are compiled by using GCC. Therefore if the goal is to achieve high speed of the Python code when using HDF5 speed both HDF5 libraries and h5py module should be compiled by using Intel C/C++ and Fortran compilers. The example bellow shows how to do that:

$ cd /home/builder/compile
$ wget http://www.hdfgroup.org/ftp/HDF5/releases/hdf5-1.10/hdf5-1.10.0-patch1/src/hdf5-1.10.0-patch1.tar.bz2
$ wget https://github.com/h5py/h5py/archive/2.6.0.tar.gz
$ tar jxvf hdf5-1.10.0-patch1.tar.bz2
$ tar zxvf 2.6.0.tar.gz
$ cd hdf5-1.10.0-patch1
$ . ~/.intel_env # Execute this if the previous bash shell session containing the compiler environmental variables has been closed!
$ . /usr/local/appstack/.appstack_env # Execute this if the previous bash shell session containing the environmental variables has been closed!
$ ./configure --prefix=/usr/local/appstack/hdf5-1.10.0-patch1 --enable-fortran --enable-cxx --enable-shared --enable-optimization=high
$ gmake
$ gmake install
$ ln -s /usr/local/appstack/hdf5-1.10.0-patch1 /usr/local/appstack/hdf5
$ export PATH=/usr/local/appstack/hdf5/bin:$PATH
$ export LD_LIBRARY_PATH=/usr/local/appstack/hdf5/lib:$LD_LIBRARY_PATH
$ export HDF5_DIR=/usr/local/appstack/hdf5
$ cd ..
$ cd h5py-2.6.0
$ python2 setup.py install

If the compilation and installation are successful remove the folders containing the source code of the compiled modules. Also append the export declaration:

export HDF5_DIR=/usr/local/appstack/hdf5

to the file /usr/local/appstack/.appstack_env becase otherwise the module h5py could not be imported. Also there do update the environmental variables PATH and LD_LIBRARY_PATH to include the paths to the installed HDF5 binaries and libraries.

Note that there is also a high-level interface to HDF5 for Python, called PyTables. Currently (August 2016) it can be compiled only against HDF5 version 1.8.

14. Testing the installed Python modules.

The simpliest way to test the successfully compiled and installed Python modules is to load them from within a Python shell. Before starting this test do not forged to export the environmental variables from the file /usr/local/appstack/.appstack_env in order to access the customized version of Python as well as all necessary customized libraries. Then run the test:

$ . /usr/local/appstack/.appstack_env # Do this only of the envoronmental variables are not loaded yes into the memory!
$ for i in {"numpy","scipy","pandas","h5py"} ; do echo "import ${i}" | python > /dev/null 2>&1 ; if [ "$?" == 0 ] ; then echo "${i} has been successfully imported" ; fi ; done

If all requested modules are imported successfully the following output messages are expected to appear in the current bash shell window:

numpy has been successfully imported
scipy has been successfully imported
pandas has been successfully imported
h5py has been successfully imported

If the name of any of the requested modules does not appear there then try to import that module manually like this (the example given bellow is for checking NumPy):

$ python
Python 2.7.12 (default, Aug 1 2016, 20:41:13)
[GCC Intel(R) C++ gcc 4.8 mode] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy

and check the displayed error message to find out how to fix the problem. Very often people are trying to import into Python shell a module they have just compiled, by invoking python from within the bash shell, while the bash working directory is currently pointing to the folder containing the source code used for compiling that module. That is not a proper way for importing any Python module because in that particular case the current folder contains specific Python files that get loaded by default and thus prevent the requested module from being properly imported into the memory.

15. Using the installed Python modules.

To use thus installed modules is enough to use the custom compiled Python version and load the envoronmental variables:

$ . /usr/local/appstack/.appstack_env

Vesselin Kolev's Tech Corner

About

Categories

Lightweight parser for locating entries in the FreeRADIUS log files by MAC address

Installing Intel Python 3, tensorflow-gpu, and multiple versions of CUDA and cuDNN on CentOS 7

Speeding up your scientific Python code on CentOS and Scientific Linux by using Intel Compilers

Report Abuse

Implementing LUKS Encryption on Software RAID Arrays with LVM2 Management

Search This Blog

Blog Archive

Translate

Labels

Blog Archive

BTemplates.com

BTemplates.com

Blogroll

About