Deploying with Seldon
Introduction
This tutorial demonstrates how to train and deploy a MNIST model using Neu.ro and Seldon and is based on a basic PyTorch example.
Here are the steps we'll perform:
Prerequisites
First, make sure that you have the Neu.ro CLI client installed and configured:
You will also need to have Seldon Core up and running. It's assumed that you have kubectl
configured locally to be able to create all necessary K8S resources.
Training
First, we need to copy two files from the repository we mentioned in the introduction:
If you check the contents of main.py
, the path to the resulting serialized model is baked right into the code:
This doesn't look flexible enough for our purposes, so we would probably need to patch the code and expose another command option to specify the path. For the sake of simplicity, we won't do it in this tutorial - instead, we'll copy the resulting model once the training process finishes.
The serialized model should be copied to a mounted storage volume under a chosen path. To accomplish that, we need to write a Dockerfile for the training job and save it as train.Dockerfile
for further use. The image will be based on a pre-built PyTorch image. The complete list of such images can be found on PyTorch DockerHub.
As you can see, the CMD is meant to perform two operations:
Train a model and save it in the default location
Copy the saved model into the location in which we mounted our storage volume
Now, let's build the image. The following command doesn't require a running Docker Engine locally. The building process will be performed on Neu.ro.
The command above instructs Neu.ro to copy the build context (the current working directory .
in this case), use the build steps from train.Dockerfile
, and save the resulting image under image:examples/mnist:train
in the Neu.ro registry. We can check the registry contents by using the following command:
Now that we have the image available within Neu.ro, we can actually run a training job.
Note that we explicitly specify the MODEL_PATH environment variable which leads to the mounted storage volume. This job takes up to 4 minutes using the gpu-k80-small preset.
We can then check if there actually is a serialized model on the storage:
Deployment
Now that we have successfully trained our model, we can start actually using it.
Building the server
neuro-extras
provides a scaffolding for wrapping the model's code into a functional inference HTTP server that you can run in Neu.ro or any other container runtime.
The command above creates two files:
seldon_model.py
is a Python module responsible for implementing the interface that Seldon Core expects. Typically, this module reads a serialized model from a mounted storage volume, deserializes this model, and uses it to predict on incoming data points.seldon.Dockerfile
is a predefined Dockerfile that assembles your code and the Seldon Core inference HTTP server for further use.
Let's implement the interface first. We should add some missing imports at the top of seldon_model.py
:
Then you'll need to replace the existing dummy constructor method with the actual model loading procedure.
Note that the code above assumes that the model will reside at a particular path. We will need to mount a storage volume under /storage
to make it work. Another point to note is that we'll be running inference on CPU.
Similarly, we need to redefine the predict
method. As we are dealing with image classification problem in this example, we would like our inference HTTP server to be able to recieve an image as binary data in bytes
, predict the classes, and return a JSON document with the resulting scores.
We are ready to build the inference image in the same way we built the training one:
We see that our registry now has one more image.
Running the server on Neu.ro
Before pushing the newly trained and built model to production, let's take a glimpse at how it works within Neu.ro. We need to submit a job that exposes the port 5000
(the default port for the Seldon Core HTTP server) and mounts a storage volume with the serialized model to the path mentioned above.
Note the output of the command above. We should copy the value of the HTTP URL field to form a curl
command later.
Assuming the HTTP URL value was https://example-mnist--user.jobs.neuro-ai-public.org.neu.ro/
, we can now create a curl
command. Seldon Core HTTP server expects binary data sent as the binData
form field:
The array element with the index 2
has the highest score, as expected.
If the testing job is no longer needed, we can simply kill it to release the resources:
Running the server on Seldon Core
After a successful test of our inference server, we need to push the result to production.
Since your Seldon Core deployment typically resides outside Neu.ro (for example, in your on-premise K8S cluster), we need to instruct K8S and Seldon to pull our newly-built model image, as well as the corresponding serialized model into the cluster. neuro-extras
allows creating the required resources easily:
At last, let's test our production setup. We will use the same curl
command, but change the URL. It should lead to your K8S ingress gateway.
Once again, we see that the result is 2
and the setup is working properly.
Last updated