Machine learning - configuration

Recently I have reinstalled all my machine for machine learning, and just realized how hard is to install everything from scratch. And make some notes about how to reinstall it. Hope they are useful for more people.

 Inspiration

Some of the ideas of this post have been getted from this two links

1. Install a normal ubuntu 16.04 (server)

  • Download the desktop iso image from http://releases.ubuntu.com/16.04/
  • Install it on a USB
  • Follow the installation instructions
  • Let's install the ssh service so we can do all the following steps in remote
  • Modify the /etc/network/interface to make the machine to have an static ip

2. CUDA

Now let's install CUDA

  1. go to https://developer.nvidia.com/cuda-downloads, and download cuda 8.0, you must be registered to make the download
  2. sudo apt-get install gcc g++ make build-essential
  3. ./cuda_8.0.61_375.26_linux-run
    2.1. yes to install the driver
    2.2. install the cuda toolkit
    2.3. not install the samples

3. cudnn

  1. Download from https://developer.nvidia.com/rdp/cudnn-download cudnn version 6.0 (https://developer.nvidia.com/compute/machine-learning/cudnn/secure/v6/prod/8.0_20170307/cudnn-8.0-linux-x64-v6.0-tgz
  2. Install the cudnn
tar zxf cudnn-8.0-linux-x64-v6.0.tgz  
sudo cp cuda/lib64/* /usr/local/cuda/lib64/  
sudo cp cuda/include/cudnn.h /usr/local/cuda/include/  

4. install anaconda

  1. Download the console installer for linux from https://www.continuum.io/downloads the version with python 3.6
  2. wget https://repo.continuum.io/archive/Anaconda3-4.3.1-Linux-x86_64.sh
  3. ./Anaconda3-4.3.1-Linux-x86_64.sh

5. install jupyter

Jupyter is already installed, bu we are going to configure it :)

jupyter notebook --generate-config  
python -c "from notebook.auth import passwd; p = passwd(); print(p)"  

modify the generated config on $HOME/.jupyter/jupyter_notebook_config.py

and add a final line with

c.NotebookApp.ip = '*'  
c.NotebookApp.open_browser = False  
c.NotebookApp.port = 9999

c.NotebookApp.password = u'sha1:67c9e60bb8b6:9ffede0825894254b2e042ea597d771089e11aed'  

where the hash generated in the above script.

TODO MAKE jupyter start automatically on each restart

6. install tensorflow-gpu with gpu

  • Create a new environment with the tensorflow, so we don't mix packages
conda create --name tensorflow  
source activate tensorflow  
  • Follow the guide on https://www.tensorflow.org/install/install_linux#InstallingAnaconda
export TF_URL=https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.0.1-cp36-cp36m-linux_x86_64.whl

pip install --ignore-installed --upgrade $TF_URL  
  • Test that really tensorflow is using the GPU
python -c "import tensorflow as tf; sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))"  

You should see something like this

I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:  
name: GeForce GTX 1060 6GB  
major: 6 minor: 1 memoryClockRate (GHz) 1.7085  
pciBusID 0000:01:00.0  
Total memory: 5.93GiB  
Free memory: 5.49GiB  
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0  
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0:   Y  
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:01:00.0)  

7. Install pytorch

conda create --name pytorch  
source activate pytorch  
conda install pytorch torchvision cuda80 -c soumith  

8. install torch

git clone https://github.com/torch/distro.git ./torch --recursive  
cd ./torch; bash install-deps;  
./install.sh

execute th on the shell to be sure that everything is installed correctly

th  

9. install caffe - Could not make it work :(

fortunately caffe is available as oficial docker nvidia image, althoug here are the guide I followed

http://caffe.berkeleyvision.org/installation.html

sudo apt-get install libprotobuf-dev libleveldb-dev libsnappy-dev libopencv-dev libhdf5-serial-dev protobuf-compiler libgflags-dev libgoogle-glog-dev liblmdb-dev libatlas-base-dev 

sudo apt-get install --no-install-recommends libboost-all-dev
git clone https://github.com/BVLC/caffe;  
cd caffe;  
cp Makefile.config.example Makefile.config  

modify the Makefile.config, I changed:

  • decomented the CUDNN usage
  • decommented the ANACONDAHOME and setted to $HOME/anaconda3, also the next line about PYTHONINCLUDE decomented and changed from 2.7 to 3.6
  • decomented PYTHON_LIB for anaconda
make all  
make test  
make runtest  

NOT WORKED :(

10. docker-nvidia

install docker on ubuntu

sudo apt-get install \  
    linux-image-extra-$(uname -r) \
    linux-image-extra-virtual

sudo apt-get install \  
    apt-transport-https \
    ca-certificates \
    curl \
    software-properties-common

curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -

sudo apt-key fingerprint 0EBFCD88  
sudo add-apt-repository \  
   "deb [arch=amd64] https://download.docker.com/linux/ubuntu \
   $(lsb_release -cs) \
   stable"

sudo apt-get update

    sudo apt-get install docker-ce

Now let's install the nvidia-docker :) from

wget -P /tmp https://github.com/NVIDIA/nvidia-docker/releases/download/v1.0.1/nvidia-docker_1.0.1-1_amd64.deb  
sudo dpkg -i /tmp/nvidia-docker*.deb && rm /tmp/nvidia-docker*.deb  

check that it works by

sudo nvidia-docker run --rm nvidia/cuda nvidia-smi  

you should be looking at something like this

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.39                 Driver Version: 375.39                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 106...  Off  | 0000:01:00.0      On |                  N/A |
| 24%   24C    P2    23W / 120W |    385MiB /  6071MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

11. sshfs

To share data between the two systems I installed sshfs on the linux machine

sudo apt-get install sshfs  

and on my mac

 brew cask install osxfuse
 brew install homebrew/fuse/sshfs

and to mount the data at /mnt/ml

### add my key to the machinelearning
cat $HOME/.ssh/id_rsa.pub | ssh kozko@192.168.11.10 "mkdir -p ~/.ssh && cat >>  ~/.ssh/authorized_keys"

sudo umount /Volumes/ml  
sudo mkdir -p /Volumes/ml  
sudo sshfs -o allow_other,defer_permissions,IdentityFile=~/.ssh/id_rsa kozko@192.168.11.10:/home/kozko /Volumes/ml