Run Ollama on the cloud server

How can I run ollama using serverless container on DataCrunch’s cloud server?

  1. Create a DataCrunch account and top it up
  2. Go to “Serverless containers” page and press

New deployment:

  1. Choose a GPU, you need to ensure it has enough VRAM depending on what model you plan to run on the Ollama. H100 or H200 is generally a good choice to start experimenting with.
  2. Set container image to: docker.io/ollama/ollama:0.11.5 Set exposed HTTP port to: 11434 Set Healthcheck path to: /api/tags
  3. Now update your deployment and wait until it updates to running state. Make sure Start command is off. To pull a model to your Ollama deployment run following script (SQL):

#!/bin/bash
PAYLOAD=‘{
“name”: “llama3.1:8b”
}’
curl -X POST “``https://containers.datacrunch.io/``{name of your deployment}/api/pull” \
-H “Authorization: Bearer $TOKEN” \
-H ‘Content-Type: application/json’ \
-d “$PAYLOAD”

Now test inference with your Ollama deployment

#!/bin/bash
PAYLOAD=‘{
“model”: “llama3.1:8b”,
“messages”: [
{
“role”: “user”,
“content”: “What is deep learning?”
}
],
“stream”: false
}’
curl -X POST “https://containers.datacrunch.io/{name of your deployment}/api/chat” \
-H “Authorization: Bearer $TOKEN” \
-H ‘Content-Type: application/json’ \
-d “$PAYLOAD”

Remember to update $TOKEN to your DataCrunch API token and {name of your deployment} to actual name of your deployment.

1 Like