TorchServe
TorchServe takes a Pytorch deep learning model and wraps it in a set of REST APIs. It comes with a built-in web server that you run from command line. This built-in server takes command line arguments like single or multiple models you want to serve, along with optional parameters controlling port, host and logging.
TIR makes deploying pytorch models as easy as pushing code. You can upload your torchserve model archive - more on this later - to E2E Object Storage (EOS) and thats it. TIR can automatically launch containers, download the model from EOS bucket to a local directory on container and start the torchserve web server. What you get is not only the automated containerized deployment but also monitoring and maintenance features in TIR dashboard.
Some feature for Torchserve on TIR:
Automated Deployments from E2E Object Storage (EOS bucket)
Automatic restart on failures
E2E Managed TLS certificates
Token based Authentication
Manual or Automated Scaling
Optional Persistent disks (to reduce boot time when the model downloads on restarts)
Rest (HTTP) and GRPC
Readiness and Liveness Checks
Quick Start
This section focuses on serving model files (deployment) without much discussion on the model files themselves, where they come from, and how they are made. We will learn more about model development in later sections.
Install dependencies:
TIR deployments requires minio CLI to upload model archive to E2E Object Store.
Follow instructions below to download minio CLI (mc) on your local. If you are using TIR Notebooks, you can skip this step as they come pre-installed with mc (minio CLI).
Create a directory to store the model weights.
# make a model directory mkdir mnist && mkdir ./mnist/model-store && mkdir ./mnist/config cd model-store
Download a trained model (torch archive)
wget https://objectstore.e2enetworks.net/iris/mnist/model-store/mnist.mar
Download a torchserve config file
# go to config directory cd ../config && wget https://objectstore.e2enetworks.net/iris/mnist/config/config.properties
Create a TIR Model
Go to TIR Dashboard
Go to Model Repository in Inference section.
Create a Model with name
my-mnist
. If prompted selectNew E2E Object Store Bucket
as model type.When model is created, copy the mc alias command for the model
Run the mc alias command on your command line:
mc config host add my-mnist https://objectstore.e2enetworks.net <access-key> <secret-key>
You can also note down the bucket name (auto-created by TIR) with this command as we will need in next section:
mc ls my-mnist/
Upload the model and config to TIR Model
Run the following commands from your command line to upload
model-store
andconfig
directories to TIR model bucket.# return to model directory (top) cd .. # run this command to get bucket name mc ls my-mnist/ # copy contents to model bucket in E2E Object Store mc cp -r * my-mnist/<enter-bucket-name-here> # if all goes well, you should see the directories in model bucket mc ls my-mnist/<enter-bucket-name-here>
Create an Inference Service We have our model weights and config in E2E Object Storage now. Follow these steps to create inference service in TIR.
Go to TIR Dashboard
Go to Deployments
Create a new deployment. When prompted, select
torchserve
framework andmy-mnist
model.Follow instructions from
Sample API request
to test the service
Developer Workflow
Pytorch is the most commonly used model training toolkit. Once a model is trained, a typical developer workflow would look like this:
Save the model (.pt) to file system.
Write a custom API handler (optional)
Make a model archive (.mar) using torch-model-archiver
Prepare a config file to set runtime behaviour of the service. In TIR, this step is not optional.
Run torchserve command to launch the service. For TIR users, this step is completely automated.
For sample code, click here
Make a Model Archive
Torch Model Archiver is a tool used for creating archives of trained pytorch models that can be consumed by TorchServe for inference.
After you save a trained model on the file system, use torch-model-archiver utility to generate an archive file (.mar). This file will need to be pushed to EOS Bucket (more details on this in further sections).
$ torch-model-archiver -h
usage: torch-model-archiver [-h] --model-name MODEL_NAME --version MODEL_VERSION_NUMBER
--model-file MODEL_FILE_PATH --serialized-file MODEL_SERIALIZED_PATH
--handler HANDLER [--runtime {python,python3}]
[--export-path EXPORT_PATH] [-f] [--requirements-file] [--config-file]
–model-name
Enter a unique name for this model. This name is important as your API Endpoint will depend on it. For e.g. if model-name is mnist, the endpoint will look like https://../mnist/infer
–version
This is optional. You may choose to set a version number. But, you would have to also create a version in EOS bucket. More on that in Push model Updates section.
–model-file
This is a the file path for model definition. e.g. mnist.py that defines pytorch model
–serialized-file
This is the actual model weights file. The extension format will be .pt
–handler
You can use built-in handlers like base_handler, image_classifier, text_classifier, object_detector, or more from the list here. You may as well write your own custom handler as this example.
Torchserve Config File
The default configuration of torchserve can be overridden through a config file. In most cases, this will be a necessary. For example, the default method is to write metrics in the log file but you may want to push them to prometheus.
A sample config.properties
file is shown below. It is the same config you may have used in Quick Start section.
# Modifications to the following parameters are supported.
metrics_format=prometheus
number_of_netty_threads=4
job_queue_size=10
enable_envvars_config=true
install_py_dep_per_model=true
# Below are certain defaults that TIR uses. We recommend not to change these.
# inference_address=http://0.0.0.0:8080
# management_address=http://0.0.0.0:8081
# metrics_address=http://0.0.0.0:8082
# grpc_inference_port=7070
# grpc_management_port=7071
# enable_metrics_api=true
# model_store=/mnt/models/model-store
Note
You can learn more about the config params here: https://pytorch.org/serve/configuration.html#config-properties-file
Package and Push Model Updates
Now that we have covered, how to prepare config file and model archive, we can now package them together.
Create a directories as shown below:
mkdir my-model && mkdir -p my-model/config && mkdir -p my-model/model-store
Now move the
config.properties
file in config foldercp /path/to/config.properties my-model/config/
Move the model archive in model-store
cp /path/to/model.mar my-model/model-store
Push the contents of my-model directory to TIR Model. You can find steps for this step by locating your model in TIR dashboard and following instructions from
setup
tab.Create a new Inference Service or Restart an existing one if you meant to just update the model version.
Connecting to the Service Endpoint
Torchserve does not provide authentication of endpoints. But not to worry yet, TIR can do this for you.
All inference endpoints in TIR are secured with an auth token. You can create new API tokens by locating API Tokens section in TIR dashboard.
Once you have API Token, firing requests is easy. Each Inference Service is equiped with the following endpoints:
Checking Status of Endpoint
# This request returns status of the mnist endpoint
# we created in Quick Start section. Also note,
# the model name here matches exactly to the
# model name used in torch-model-archiver utility.
curl -v -H 'Authorization: Bearer $AUTH_TOKEN' -X GET \
https://infer.e2enetworks.net/project/<project>/endpoint/<inference>/v1/models/mnist
# response: {Active: true}
Predict Endpoint
The format of predict api:
Predict v1
verb: POST
Path: /v1/models/<modelname>:predict
Payload: Request:{“instances”: []}
Response:{“predictions”: []}
When submitting a prediction or inference request, the important part to consider would be the request format.
The request body for predict
API must be a JSON object formatted as follows:
{
"instances": <value>|<(nested)list>|<list-of-objects>
}
A sample request to mnist endpoint would look like below. Here, the image is converted to base64 format.
curl -v -H 'Authorization: Bearer $AUTH_TOKEN' \
-X POST https://infer.e2enetworks.net/project/<project>/endpoint/<inference>/v1/models/mnist:predict \
-d {
"instances": [
{
"data": "iVBORw0KGgoAAAANSUhEUgAAABwAAAAcCAAAAABXZoBIAAAAw0lEQVR4nGNgGFggVVj4/y8Q2GOR83n+58/fP0DwcSqmpNN7oOTJw6f+/H2pjUU2JCSEk0EWqN0cl828e/FIxvz9/9cCh1zS5z9/G9mwyzl/+PNnKQ45nyNAr9ThMHQ/UG4tDofuB4bQIhz6fIBenMWJQ+7Vn7+zeLCbKXv6z59NOPQVgsIcW4QA9YFi6wNQLrKwsBebW/68DJ388Nun5XFocrqvIFH59+XhBAxThTfeB0r+vP/QHbuDCgr2JmOXoSsAAKK7bU3vISS4AAAAAElFTkSuQmCC",
"target": 0
}
]
}
Note
To run above request, you will need to convert your image to base64 format. You can use os utils like base64 on mac OS (e.g. base64 -i <in-file> -o <outfile>) or use libs available in source languages (base64 in python).
In the following example, a directory of images are batched together before sending prediction request.
data_directory = pathlib.Path(data_dir)
digit_files = list(data_directory.glob("mnist-images-directory/*"))
with open(digit_files[0], "rb") as f:
data = {"data": base64.b64encode(f.read()).decode("utf-8")}
response = endpoint.predict(instances=[data])
Attention
If your application restricts you to a specific request and response format, and can not adhere to the format above then you may consider writing a custom container.
Monitoring
The torchserve containers in TIR are capable of logging and recording metrics at a deep level.
Logging
You can view detailed logs of inference service by selecting the endpoint in deployments section.
Metrics
By default, the metrics will be printed in the log. But to view the metrics in TIR dashboard, we recommend adding following line to config.properties
file.
metrics_format=prometheus
Advanced Use-cases
Extending built-in container
TIR does not restrict you to pre-built frameworks. You can write your own container and publish it on TIR. To see an example, read Custom Containers in TIR
Large Models and Multi-GPUs
TIR supports all functionality of torchserve extensively including multi-gpu training and deployments. You can find multi-gpu development guidelines for pytorch here.
Examples
For more samples on using torchserve, visit the official torchserve repo.