Build a Deep Learning Environment through Compiling gpu-supported Tensorflow
It took me a while to build a deep learning environment after getting my new graphic card RTX2080. Different kinds of errors occurred during my set up. The reason might be that the RTX series are too new to be supported well. Finally a deep learning environment was successfully set up through self compiling gpu-supported tensorflow 1.13.
1. Configuration
Graphical Card: Nvidia RTX2080
OS: Ubuntu18.04
Nvidia Driver version: 415.27
python: 3.6
CUDA: 10.0
cudnn: 7.4.2
2. Framework
The installation sequence is
(1) installing Nvidia Driver (2) installing CUDA (3) installing cuDnn (4) compiling and installing tensorflow
2-1 Install Nvidia Driver
Two methods can be found online to install the Nvidia driver.
method 1: through 3rd-part apt repository
pre-installation of Nvidia driver, disable the default graphic driver
$ sudo gedit /etc/modprobe.d/blacklist.conf# Write the following to the end of blacklist.conf file
blacklist vga16fb
blacklist nouveau
blacklist rivafb
blacklist rivatv
blacklist nvidiafb# update the initramfs in the terminal
$ sudo update-initramfs -u# reboot
$ sudo reboot
# check the disables are successful or not
$ lsmod | grep nouveau # if no output, succeed.
install nvidia driver
Ctrl+Alt+F3(F4, F5) to command line mode (Ctrl+Alt+F2 back)# remove the previous nvidia driver if have
sudo apt remove nvidia-*# add the graphics-drivers PPA
$ sudo add-apt-repository ppa:graphics-drivers/ppa
$ sudo apt update# detect the supported driver
$ ubuntu-drivers devices
== /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0 ==
modalias : pci:v000010DEd00001E87sv000010DEsd000012A6bc03sc00i00
vendor : NVIDIA Corporation
driver : nvidia-driver-418 - third-party free recommended
driver : nvidia-driver-410 - third-party free
driver : nvidia-driver-415 - third-party free
driver : xserver-xorg-video-nouveau - distro free builtin# nvidia-driver-415 is installed here, version 418 is installed at # first,but always failed to run tensorflow
$ sudo apt-get install nvidia-driver-415
Method 2: through .run file from Nvidia websit (tried, not succeed at all, omit here)
2–2 Install CUDA
Nvidia CUDA Installation Guide for Linux is here. Following is the method I used.
download CUDA 10.0 Toolkit run file (cuda_10.0.130_410.48_linux.run) for Ubuntu 18.04 from Nvidia website.
# suppose downloaded to ~/Downloads folder
$ cd ~/Downloads# add execute permission
$ chmod +x cuda_10.0.130_410.48_linux.run
# execute .run file
$ sudo ./cuda_10.0.130_410.48_linux.run# environment Setup
$ gedit ~/.bashrc
# add the following to the end of.bashrc file
export PATH=$PATH:/usr/local/cuda-10.0/bin/
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-10.0/lib64/:/usr/local/cuda-10.0/include/
# reload the environment variables through reopening the terminal # or type the following code in the terminal
$ source ~/.bashrc
$ sudo ldconfig# check the installation of CUDA
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130
2–3 Install cudnn
Nvidia cudnn installation documentation is here. I used the method of installing from a Tar file. (Tried the method of Installing from a Debian File, found the folders for cudnn.h and libcudnn* are not in /usr/local/cuda/, leading some errors later or may need some configuration I didn’t know).
installing from a Tar file:
Download cudnn-10.0-linux-x64-v7.4.2.24.tgz from Nvidia website.
# suppose cudnn-10.0-linux-x64-v7.4.2.24.tgz is downloaded to
# ~/Downloads folder
$ cd ~/Downloads
# Unzip the cuDNN package
$ tar -xzvf cudnn-10.0-linux-x64-v7.4.2.24.tgz
# Copy the following files into the CUDA Toolkit directory, and
# change the file permissions.
$ sudo cp cuda/include/cudnn.h /usr/local/cuda/include
$ sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
$ sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*# check cudnn version
$ cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2
2–4 Compile and Install Tensorflow
Official Tensorflow compile guide is here.
2–4–1 Install build tool : Bazel
Bazel installation on Ubuntu documentation is here.
Version: 0.19.2 (tested 0.24.0 and 0.23.0, both failed), then checked the tensorflow official tested build configurations, they used Bazel 0.19.2.
Download bazel-0.19.2-installer-linux-x86_64.sh here.
# suppose bazel-0.19.2-installer-linux-x86_64.sh is downloaded to
# ~/Downloads folder
$ cd ~/Downloads# install the prerequisites
$ sudo apt-get install pkg-config zip g++ zlib1g-dev unzip python
# add the execute permission and run
$ chmod +x bazel-<version>-installer-linux-x86_64.sh
$ ./bazel-<version>-installer-linux-x86_64.sh --user# environment Setup
$ gedit ~/.bashrc
# add the following to the end of.bashrc file
PATH=$PATH:$HOME/bin # Bazel is in $HOME/bin
# reload the environment variables through reopening the terminal # or type the following code in the terminal
$ source ~/.bashrc
$ sudo ldconfig# check Bazel version
$ bazel version
2–4–2 Install TensorFlow package dependencies
$ sudo apt install python-dev python-pip # or python3-dev python3-pip$ pip install -U --user pip six numpy wheel setuptools mock
$ pip install -U --user keras_applications==1.0.6 --no-deps
$ pip install -U --user keras_preprocessing==1.0.5 --no-deps# downgrade gcc-7 (default in Ubuntu 18.04) to gcc-4.8 (according here)
$ sudo apt-get install gcc-4.8 gcc-4.8-multilib g++-4.8 g++-4.8-multilib
# swtich to gcc-4.8 and g++-4.8
$ sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-4.8 50
$ sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-7 40
$ sudo update-alternatives --config gcc
$ sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-4.8 50
$ sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-7 40
$ sudo update-alternatives --config g++
2–4–3 Download the Tensorflow Source Code and Configure the build
# git clone
$ git clone https://github.com/tensorflow/tensorflow.git
# inside the tensorflow folder
$ cd tensorflow
# checkout tensorflow version
$ git checkout r1.13# configure the build
# Configure the following questions manually and the remaining
# configurations are set default (press enter direct)
# Please specify the location of python. [Default is /usr/bin
/python]: /usr/bin/python3
# Do you wish to build TensorFlow with XLA JIT support? [Y/n]:Y
# Do you wish to build TensorFlow with CUDA support? [y/N]: y
# Please specify the CUDA SDK version you want to use. [Leave
empty to default to CUDA 10.0]: 10.0
# Please specify the cuDNN version you want to use. [Leave empty
to default to cuDNN 7]: 7.4
# Please specify the locally installed NCCL version you want to use. [Default is to use https://github.com/nvidia/nccl]: 1.3
$ ./configure
2–4–4 Build
# Bazel build GPU support
$ bazel build --config=opt --config=cuda //tensorflow/tools/pip_package:build_pip_package # run for a little long time (~1h)# Build the package
# a .whl file is generated in /tmp/tensorflow_pkg
$ ./bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
2–4–5 Install the Package
$ pip install /tmp/tensorflow_pkg/tensorflow-1.13.1-cp36-cp36m-linux_x86_64.whl