How can I run ollama using serverless container on DataCrunch’s cloud server?
- Create a DataCrunch account and top it up
- Go to “Serverless containers” page and press
New deployment:
- Choose a GPU, you need to ensure it has enough VRAM depending on what model you plan to run on the Ollama. H100 or H200 is generally a good choice to start experimenting with.
- Set container image to: docker.io/ollama/ollama:0.11.5 Set exposed HTTP port to: 11434 Set Healthcheck path to: /api/tags
- Now update your deployment and wait until it updates to running state. Make sure Start command is off. To pull a model to your Ollama deployment run following script (SQL):
#!/bin/bash
PAYLOAD=‘{
“name”: “llama3.1:8b”
}’
curl -X POST “``https://containers.datacrunch.io/``{name of your deployment}/api/pull” \
-H “Authorization: Bearer $TOKEN” \
-H ‘Content-Type: application/json’ \
-d “$PAYLOAD”
Now test inference with your Ollama deployment
#!/bin/bash PAYLOAD=‘{ “model”: “llama3.1:8b”, “messages”: [ { “role”: “user”, “content”: “What is deep learning?” } ], “stream”: false }’curl -X POST “https://containers.datacrunch.io/{name of your deployment}/api/chat” \ -H “Authorization: Bearer $TOKEN” \ -H ‘Content-Type: application/json’ \ -d “$PAYLOAD”
Remember to update $TOKEN to your DataCrunch API token and {name of your deployment} to actual name of your deployment.
1 Like