Serving Models With Model Deployment Controller
Custom-built app for no-code model inference server deployment on the platform
How it works
Our Model Deployment Controller is a web application, that connects to your MLFlow instance, retrieves the list of Registered Models (in Staging or Production stages) and enables you to create an MLFlow Serving (or Triton - in case your model was saved in ONNX format) job, which will serve the chosen model. It is possible to select the image, which you want to use for serving, and we provide a list of sensible defaults, curated by us. Additionally, you can specify whether the model endpoints need to be protected by the platform auth.
Prerequisites
MLFlow
You need to have a running MLFlow instance with --serve-artifacts
flag enabled. If this instance was launched via the Dashboard, you will be able to automatically connect to this instance by injecting the MLFLOW_TRACKING_URI
into the model deployment job. Otherwise, you would need to pass it manually (e.g. by mounting a secret into env variable from Dashboard or passing the appropriate value if running the deployment app job via Neuro CLI).
Additionally, in case your MLFlow instance is protected by auth, you would need to pass MLFLOW_TRACKING_TOKEN
to the deployment app as well.
Note: currently only
mlflow>=2.0
instances are actively supported
Starting the Model Deployment Controller
Model deployment web app can be launched either via Dashboard or via CLI.
Via Dashboard
On the Dashboard page, select Model Deployment Controller and click on RUN A JOB button. In the corresponding dialogue you can choose the preset you would like to use to run the deployment controller (this is not the preset that will be used for model serving). If you've previously launched an instance of MLFlow via Dashboard, you will see a notice regarding the integration of the URI of this instance into the Deployment Controller via the env variable. Otherwise, you need to pass the URI of your instance as MLFLOW_TRACKING_URI
env variable to the Controller (e.g. via secret).
If your MLFlow instance uses authentication, you might need to supply
MLFLOW_TRACKING_TOKEN
as well.
Notice that we automatically mount the folder from your storage, which will be used as model repository in case you want to deploy to Triton Inference Server.
After pressing RUN you will be redirected to the newly started Model Deployment Controller.
Via CLI
In case you need more flexibility, you can resort to running the Deployment Controller via neuro-cli
. Below is an example of a command, that will launch the equivalent of the job, started via Dashboard:
Model Deployment
Use the web app to select the registered model from the MLFlow instance (it must be in Staging or Production), select server type (MLFlow or Triton), resource preset, image name, and tag (we provide recommended defaults, but you can also choose any of your images) and whether to require platform auth for server access.
If you oped for Triton server deployment, you will also be able to select an already running Triton instance (if it was previously used to deploy models) or create a new one otherwise.
Clicking on Deploy will start the deployment process and you'll be able to see your model in Deployed models tab.
Examples
This Jupyter Notebook provides and example of training models and exporting them to MLFlow, including ONNX format for Triton Inference.
Last updated