Setting Up the Workloads

Step 5: Setting up the Docker Containers for Triton server and client

In this step, we will set up a Docker container for the Triton server and client and configure the example model repository.

1. Create the Docker Container for the Triton Inference Server

Clone the server repository to get the example model repository.

git clone -b r25.08 https://github.com/triton-inference-server/server.git
cd server/docs/examples

2. Change the Fetch Model Link

To fetch the models for the server, you’ll need to update the fetch model link. You can do this by following the changes suggested in the PR link here.

After making the necessary changes, run the script to fetch the models:

./fetch_models.sh

3. Run Triton Server in Docker

Run the Triton server in a Docker container. Ensure that your model repository path is correctly mapped.

docker run -it --net=host --pid=host --name=triton-server -v ${PWD}/model_repository:/models nvcr.io/nvidia/tritonserver:24.08-py3 tritonserver --model-repository=/models

This will start the Triton server with the models available in the model_repository.

4. Verify Triton Server is Running

Once the Triton server starts, it should be up and running, and you should see logs in the terminal indicating that the server has started successfully.

5. Setting Up the Triton Inference Client

Pull the Triton Client Docker Image. To install the Triton client, pull the Docker image for the Triton SDK:

docker pull nvcr.io/nvidia/tritonserver:<xx.yy>-py3-sdk

Replace with the appropriate version tag (e.g., 24.08).

Once these steps are complete, the Triton server should be running, and the Triton client image will be ready for use to interact with the server.