Finetuning LLM model using serverless container

Alpaca · October 20, 2025, 1:57pm

Hi, I want to deploy a finetuned LLM model using serverless container. Since the model is quite large (>100GB), it’s not super practical to download the model before the request.

I have also noticed sometimes a new request creates a container and it’d download the image again. How can I make sure that my image and its models are downloaded just once?

TeamVerda · October 21, 2025, 6:28am

Hi @Alpaca, you can use the general storage for storing weights. This volume will then be shared between all replicas of your deployment. Please download to /data in your container or change the mount to whatever location your container would save the weights to.

Topic		Replies	Views
Run Ollama on the cloud server Product Q&A	1	132	September 2, 2025
Limit in containers API Product Q&A	1	91	August 26, 2025
Shared storage for Serverless Containers Product Q&A	1	76	October 6, 2025
Increasing model download speeds from Hugging Face Product Q&A	1	348	September 3, 2025
Post Event Q&A - PyTorch Afters: Efficient World Model and LLM Training on Blackwell Infrastructure Event	0	42	October 21, 2025

Finetuning LLM model using serverless container

Related topics