Hi, I want to deploy a finetuned LLM model using serverless container. Since the model is quite large (>100GB), it’s not super practical to download the model before the request.
I have also noticed sometimes a new request creates a container and it’d download the image again. How can I make sure that my image and its models are downloaded just once?
