Setting Up the Workloads
Step 5: Setting up the Docker Containers for Triton server and client
In this step, we will set up a Docker container for the Triton server and client and configure the example model repository.
1. Create the Docker Container for the Triton Inference Server
Clone the server
repository to get the example model repository.
git clone -b r25.08 https://github.com/triton-inference-server/server.git
cd server/docs/examples
2. Change the Fetch Model Link
To fetch the models for the server, you’ll need to update the fetch model link. You can do this by following the changes suggested in the PR link here.
After making the necessary changes, run the script to fetch the models:
./fetch_models.sh
3. Run Triton Server in Docker
Run the Triton server in a Docker container. Ensure that your model repository path is correctly mapped.
docker run -it --net=host --pid=host --name=triton-server -v ${PWD}/model_repository:/models nvcr.io/nvidia/tritonserver:24.08-py3 tritonserver --model-repository=/models
This will start the Triton server with the models available in the model_repository.
4. Verify Triton Server is Running
Once the Triton server starts, it should be up and running, and you should see logs in the terminal indicating that the server has started successfully.
5. Setting Up the Triton Inference Client
Pull the Triton Client Docker Image. To install the Triton client, pull the Docker image for the Triton SDK:
docker pull nvcr.io/nvidia/tritonserver:<xx.yy>-py3-sdk
Replace
Once these steps are complete, the Triton server should be running, and the Triton client image will be ready for use to interact with the server.