Introduction and goal
Before I jumped into the field of deep learning my first thoughts were about the hardware I would need to run deep learning models. Of course I could have used cloud services such as Amazon AWS GPU instances, but when I saw their pricing I realized that this wasn't a viable solution in the long run. Truth is that I also wanted to replace my old RX 280 by a shinny new GTX 1070 for gaming... And to my surprise this GPU revealed to be much faster than what amazon could offer at a much higher price! Find the benchmark I used for comparison at the end of the post.
We'll not discuss how to setup the hardware part required to use this tutorial. Rather my goal here is to setup the software part. I assume you already have a machine with an NVIDIA GPU capable of using CUDA and a Linux OS installed on it (here we will use Ubuntu 16.04 LTS x64). If you have an AMD/ATI GPU most of this tutorial still applies to you but I won't cover the installation on this kind of hardware.
This post will be split in 2 parts:
- The first one will teach you how to setup your environment locally so that you can use your machine with all the necessary tools for DL.
- The second part will focus on using your machine remotely with security in mind so that you can access it and turn it on/off from anywhere in the world.
Most of the knowledge gained here (if not all) could also be applied to other deep learning frameworks such as Pytorch.
By the end of this post you will be able to run out-of-the-box deep learning models which would be as effective as an Amazon AWS GPU instance.
This tutorial is targeting 2 type of audience: One with a basic computer science background who would like to properly setup a secure remote environment for deep learning, and the other which don't have a background in CS but would like to have their own deep learning rig. As these 2 audience are distinct, parts of this tutorial are in collapsed boxes which mainly target the second audience. Those with a CS background can skip these pieces of information if they already know about the subject.
Installation of Anaconda
Anaconda is a very useful tool to switch between different virtual environments.
What is a virtual environment? Why use it?
A virtual environment is a kind of sandbox where your system programs get replaced by new ones from this environment.
For example, lets say you have Python 2 installed on your machine but you have to launch a Python 3 script, how would you do that? You can either install Python 3 alongside Python 2 and if you want to chose one of the 2 you will either do
But now you notice that it require Tensorflow version 0.12 and you have Tensorflow 1.0 on your machine. So what would you do?
Well, you replace one by the other but then you get dependencies conflicts as it depends on older libraries.
Now you scratch your head and tell yourself: "All this mess will only result in breaking my python installation and if I want to execute a code with TF 1.0 I will have to reinstall it back". This is where virtual environments come into play because they allow you to switch between multiples installations with only 1 command.
To install it open your terminal and type:
If you get an output like
Python 2.x.x then type:
if instead you get an output with
Python 3.x.x use:
Then use the command:
chmod +x Anaconda2-4.3.1-Linux-x86_64.sh
to give the execution rights to the file you just downloaded with
Now run it with:
Accept the license and default location. Let it install the required files then type
yes when asked to prepend Anaconda in your path and you're all set!
To let your terminal reload the
~/.bashrc which contains the path to your Anaconda installation. You can also close and reopen your console to apply the changes.
Now if you type:
At this point you should get a list of parameters to use with
conda, if you don't have them restart this section from the beginning and check that you properly followed the instructions.
Creating a virtual environment with Anaconda
To create a virtual environment with anaconda use the following command:
conda create -n deep-learning python=3.5 anaconda
This command will create an environment called
deep-learning which will run Python 3.5 and which have as basic library the ones included by default with anaconda. Accept the package installation and let it finish its work.
Now if you type:
conda env list
You should have something like this:
# conda environments: # deep-learning /home/ekami/anaconda2/envs/deep-learning root * /home/ekami/anaconda2
You just created your first virtual environment!
/home/ekami/anaconda2 indicate the environment in which you are in. Currently we are in the default python environment of conda but lets move to our
deep-learning with this command:
source activate deep-learning
Once you're done you should see something similar to this:
(deep-learning) [email protected]:~$
(deep-learning) at the beginning indicate the virtual environment in which you are in.
Now if you type:
you should get something like this:
Python 3.5.2 :: Anaconda 4.3.1 (64-bit)
If you want to go back to your original environment just type:
Perfect, now lets move on and install the required packages.
Installing CUDA & cuDNN
[This part is irrelevant if you want to use Tensorflow on your CPU]
To run your deep learning models you'll have to have a library called CUDA.
What is CUDA?
CUDA is a set of libraries which permit you to run GPGPU (General-purpose computing on graphics processing units) allowing Tensorflow to make a "bridge" between its code and your physical GPU. As you would use Tensorflow functions to create and run your deep learning models, Tensorflow would use CUDA functions to run them on the GPU.
Paste your download link to a wget command. For example:
Before running the installer you need to switch to a new tty (a full screen console if you like). To do so click on ctrl + alt + F1 on your keyboard, log yourself in and type:
sudo service lightdm stop
This command will kill your graphical interface to allow you to install the nvidia drivers.
In this same tty, from the folder where you downloaded cuda, run this command:
chmod +x ./cuda_8.0.61_375.26_linux-run && sudo ./cuda_8.0.61_375.26_linux-run
Then fill the questions as below:
Do you accept the previously read EULA? accept/decline/quit: accept Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 375.26? (y)es/(n)o/(q)uit: y Do you want to install the OpenGL libraries? (y)es/(n)o/(q)uit [ default is yes ]: y Do you want to run nvidia-xconfig? This will update the system X configuration file so that the NVIDIA X driver is used. The pre-existing X configuration file will be backed up. This option should not be used on systems that require a custom X configuration, such as systems with multiple GPU vendors. (y)es/(n)o/(q)uit [ default is no ]: n Install the CUDA 8.0 Toolkit? (y)es/(n)o/(q)uit: y Enter Toolkit Location [ default is /usr/local/cuda-8.0 ]: Do you want to install a symbolic link at /usr/local/cuda? (y)es/(n)o/(q)uit: y Install the CUDA 8.0 Samples? (y)es/(n)o/(q)uit: n Installing the NVIDIA display driver...
And let it finish.
You can answer "No" when asked for installing the GPU drivers if you already have installed the proprietary graphic drivers for your GPU.
If you're not sure, answer "yes".
If you run into a conflict with the open source Nouveau drivers you may need to blacklist it first with these commands.
When it's done you will have to add the CUDA libraries dynamic and static libraries to your
What are PATH and LD_LIBRARY_PATH?
PATH is an environmental variable in Linux and other Unix-like operating systems that tells the shell (the thing in which you issue your commands, the one we use by default is called
bash) which directories to search for executable files. So basically when you type:
How does your shell knows what conda is? Well it will first look into the directory listed in your variable
PATH and if this program exists it will execute it. Otherwise you get a
conda: command not found error. If you want to see what does your variable
PATH looks like type:
You'll see all the directories in which your shell is looking for when you issue it a command.
Now in the similar fashion as
LD_LIBRARY_PATH is an environmental variable used, not to look for executables, but to look for what we call dynamic libraries or shared libraries.
What are dynamic and static libraries?
The static libraries are files ending with a
.a whereas dynamic libraries are files ending with a
.so. The static libraries are libraries which are included in a program when it is compiled (for example the program
echo $PATH may directly include something like
display.a internally to be able to display something to the screen). As it is internal, the program result in 1 big executable file which do not rely on external files.
As for the dynamic libraries , these ones are excluded from the compilation, they are files living somewhere on your hard drive on their own.
In our case we can see where the CUDA dynamic libraries are located by typing:
You'll see a bunch of
.so file there, Tensorflow will look for these
.so files. But to allow it to do so, we need to tell him where to look, that is, setting the
LD_LIBRARY_PATH to point to this directory.
To add CUDA to your environment open the
.bashrc file with this command:
scroll until you reach the end of the file, you should see something like this:
# added by Anaconda2 4.3.1 installer export PATH="/home/ekami/anaconda2/bin:$PATH"
That looks familiar right? Now you need to specify where CUDA is such that you end up with something like this:
# added by Anaconda2 4.3.1 installer export PATH="/home/ekami/anaconda2/bin:/usr/local/cuda/bin:$PATH" export LD_LIBRARY_PATH="/usr/local/cuda/lib64:$LD_LIBRARY_PATH"
: used as a separator.
Now close this file with the nano editor use control + X keys of your keyboard, it will ask if you want to save the file, type
Y then press enter.
Finally reload your
.bashrc file with:
And you're done with CUDA! Yay!
Now lets install cuDNN, you though you could open your champagne? Not yet!
What is cuDNN?
From NVIDIA website:
cuDNN is a GPU-accelerated library of primitives for deep neural networks. cuDNN provides highly tuned implementations for standard routines such as forward and backward convolution, pooling, normalization, and activation layers.
You can imagine cuDNN as some
.so libraries which could have been shipped with CUDA but which are not because everyone doing GPGPU doesn't necessarily want to do deep learning. So just consider them as a set of libraries living alongside CUDA.
To do so go to this page, register yourself to the NVIDIA website then download the file for linux systems as below:
You now should have downloaded a
.tgz file. Go into the directory where you saved it and uncompress it:
cd ~/Downloads/ && tar -xvf cudnn-8.0-linux-x64-v5.1.tgz
A new directory named
cuda will be created. Go into it and copy all its content to your CUDA directory. Run the following commands:
cd cuda && sudo cp -vfR * /usr/local/cuda/
Now if you run this command:
ls /usr/local/cuda/lib64/ | grep libcudnn.so
You should have an output similar to this:
libcudnn.so libcudnn.so.5 libcudnn.so.5.1.10
If not then you missed something. Restart this section by following carefully each steps.
Finally restart your system with:
Congratulations! You just passed the less intuitive part of this tutorial!
Installing the required packages for deep learning
Lets start by installing OpenBlas which is a package used to accelerate tensor operations on the CPU:
sudo apt-get install libopenblas-dev liblapack-dev
Now move to your
deep-learning virtual environment if you're not already in it with:
source activate deep-learning
We will install the following packages:
- The Scipy library collection
What is Scipy?
Scipy is a collection of optimized libraries for scientific computing. It contains libraries such as Numpy, Matplotlib and Pandas.
conda install -c anaconda scipy
Finally to install Tensorflow you have 2 choices, either you want to install Tensorflow on your CPU, in this case you just have to run this command:
pip install tensorflow
or you want to install it to use your GPU, if you followed this tutorial entirely this is probably what you want.
pip install tensorflow-gpu
(If you're an AMD GPU user you may want to take a look at the community Tensorflow version for OpenCL)
Finally you have to tell Jupyter to recognize your virtual environment as a kernel. We can simply do this with nb conda kernels:
conda install nb_conda
This program tells Jupyter to recognize your installed anaconda environment.
Now close your current terminal so we can properly test Jupyter from a new terminal session.
And that's all! We're done for this part :)
Now comes the fun part!
You won't have to do a lot of things here as jupyter is shipped with anaconda. But lets test that our environment works properly.
To do so I invite you to run a handcrafted benchmark which will try to predict numbers based on the MNIST dataset and output the time it took to do so.
To download it run:
Then run jupyter with:
This will open a new tab in your web browser where you can open the
tensorflow_benchmark.ipynb file from Jupyter.
Now select your virtual env we previously created:
Then click on "Kernel > Restart & Run all"
If everything went fine and you used tensorflow on your GPU you should have something like this from the console in which you ran Jupyter:
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0 locally I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5 locally I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.8.0 locally
And at the end of the last cell you should start to see things like this:
Iter 12800, Minibatch Loss= 2820.980713, Training Accuracy= 0.85938 Iter 25600, Minibatch Loss= 1172.836670, Training Accuracy= 0.91406 ...
As you run Jupyter locally you don't really need to secure it. In Part 2 we will expose it to the internet, it means that anyone will be able to access it, especially cryptocurrency miners... Remember, Jupyter can serve as a file explorer and even worse... can execute code, that's a hacker heaven if you don't secure it properly!
Everything is now setup, congratulations!! :)
Now if you don't like working on Jupyter notebooks you can still use a text editor such as Atom or Visual studio code, but personally I prefer using Pycharm, it's free and really powerful :) .
Here I'll just show you how to run Pycharm with the conda environment we just created. In the second part I'll show you how to use your Pycharm instance locally and execute code remotely.
Copy paste this code (which is the same as the one in the Jupyter notebook) into a file named
script.py which you can find here.
File > Settings then navigate to
Project: script.py > Project Interpreter.
From the drop down list you can select you anaconda environment, select it then press
Let the indexing process finish then go to the little
Edit configuration on the drop down arrow on the top right corner:
A new window will appear, click on the
+ sign and choose
Give it a name and in the configuration tab fill the
Script entry with the path to your
Now click on the
... next to
Environment variables: and add the variable
LD_LIBRARY_PATH so it looks like this:
OK then the final result should look like this:
OK, now you're ready to launch your script by clicking on the green "play" button.
If everything works, perfect, you're done! :)
A big thanks to Akshay Lakhi for his corrections and feedback.
If you guys want clarifications, find any typos or improvements to add, don't hesitate to leave your feedback in the comments!
In Part 2 we will learn how to take what we did so far and expose it to the internet, stay tuned!