Dockerized prediction.io


Overview

Prediction.IO is an Open Source Machine Learning Server.
During a conversation with a good friend, I was informed that he and his team were having problems setting up an official stack and using the engine for their code.  He suggested that having a dockerized version of the stack would help.

After thinking about this for a while, I searched for some previous works in github to see if anyone has done any attempts to dockerized the solution.  Indeed, I found a old project at https://github.com/vovimayhem/docker-prediction-io, by Roberto Quintanilla, however, there were some problems with it:

  1. It hasn't been updated in more than a year
  2. It has internal dependencies that were not included in the project
  3. It used postgresql instead of elasticsearch
  4. Even after recreating the internal dependencies that it used, I ran into some ssl problems, hence, I couldn't run tests to confirm it was working correctly.

I decided to take up the task of updating this found version and to update the solution using the ActionML's PredictionIO V0.9.7-Aml version.

For those wondering about the differences between the standard Prediction.IO and the ActionML Prediction.IO version, there's a comparison provided by ActionML on their website here.

This version has the added benefit of working with the The Universal Recommender which I used to test that the stack was working correctly.

In this post, I'll go into detail on how to setup this solution in a local computer and how to run The Universal Recommender Template as a test to confirm everything is working as it should.

Setting up

First, clone the repo from https://github.com/krusmir/docker-prediction-io and go to the directory you setup for it.

Make sure to run afterwards:

git submodule init && git submodule update

This will pull the  Universal Recommender Template which will be used for testing later on.

For building the stack, run:

docker-compose -p TestEnv build

Drink a cup of coffee, juice, or whatever you fancy, since this will take a while creating and compiling the predition.io docker image.

While you wait for it to build, you can check the dockerfile for prediction.io.  You will notice that the image is not optimized (ie.  running multiple commands per RUN statement and similar tricks).  This is done on purpose, since it is quite frustrating and time consuming to have an error while downloading (if your internet connection is intermitent as mine) and debugging to find where the build went wrong.  I rather have a bigger image, where I can backtrack if an error is found than optimizing the docker image size.  Feel free to combine all the statements if you feel optimizing the docker image is more important than easily backtracking and adding custom commands in the dockerfile if you deem necessary.


... Enjoy your beverage now ...


Ok, so if you are here, the build must have built successfully.

Successful Build

If your screen looks different, that's ok.  I had previously built the solution, so it will look different from the first time building it.

Before proceeding, a pet peeve of mine, is to have the rest of the images ready before starting the stack, so if you are like me, do:

docker-compose -p TestEnv pull
docker-compose -p TestEnv pull

Otherwise, just do:

docker-compose -p TestEnv up -d
docker-compose -p TestEnv up -d

To see the logs and confirm the application is working:

docker-compose -p TestEnv logs -f
docker-compose -p TestEnv logs -f

If all seems right, congrats!  you have a Prediction.IO stack running.

Now, let's run some tests to confirm everything is working as it should.

Testing

Now, for the fun part, is the stack really working?

For testing the stack, we'll need to enter the pio container and run some commands.

First, check the stack using:

docker-compose -p TestEnv ps
docker-compose -p TestEnv ps

Enter the pio container, using the name assigned to it by your stack, in my case is:

docker exec -it testenv_pio_1 bash

and then run pio status, you should see something like the following:

pio status

Everything looking good so far, now let's run the Universal Recommender Template (that we cloned previosly using the git submodule commands).

If you notice, there is a universal folder in the home directory when you access the pio container:

universal dir

The universal directory was mounted on the container and it corresponds to the ./docker_volumes/universal directory in the root of the repository (defined in the docker-compose.yml).  This is the same repository you pulled earlier while doing the git submodule commands.

To be able to run the examples, we need to install pip on the pio container.  But since the container runs with a nonroot user (ie.prediciton-io), well need to install pip in userspace.  This will allow us to install virtualenv using pip (in user space again), and then we will create a python virtual env with all the dependencies needed to run the tests.

Do the following inside the pio container:

mkdir python
cd python
wget https://bootstrap.pypa.io/get-pip.py
python get-pip.py --user
installing pip user space

Once pip is installed in userspace, we can install the rest of the tools we need:

~/.local/bin/pip install virtualenv --user
~/.local/bin/virtualenv prediction.io
source prediction.io/bin/activate
installing needed tools with userspace pip

Now, while inside the python virtualenv, we can now test using the Universal Recommender Template

Go to the universal directory

cd ~/universal

However, before proceeding, we need to make one small modification to one file in the universal repo.  In another terminal, go to the root of the repo. Let's see the difference between the original file and the file we will replace it with.

diff -c  docker_volumes/engine.json docker_volumes/universal/examples/handmade-engine.json
diff engine.json

The only difference is the following line:

"es.nodes"":"elasticsearch",

We are just specifying in the sparkConf the name of the elasticsearch nodes.

Just copy the provided file over the one in the submodule with:

 cp docker_volumes/engine.json  docker_volumes/universal/examples/handmade-engine.json

And now we can run the tests on the original console (the one with the python env).

./examples/integration-test
running integration tests

Note:  The tests are quite taxing on your machine.  Make sure you have a decent system to run the tests, otherwise they might fail.   If you are having any problems running the tests, just run the integration-test script line by line, by copy pasting each line on the console.  That will make the test a little bit less taxing.


That should be it.  Now you have a running prediction.io environment on your local machine.

Please share and comment and suggest what would you like to see dockerized or any DevOps recommendation that I might provide.

note:  this article was previously published previously in 2016  in https://d.evops.pw