How do run finetuned models in a multi-tenant/shared GPU setup?

		How do run finetuned models in a multi-tenant/shared GPU setup?
		1 point by iamzycon 4 months ago \| hide \| past \| favorite

		I'm considering setting up a fine-tuning and inference platform for Llama that would allow customers to host their fine-tuned models. Would it be necessary to allocate a dedicated infrastructure for each fine-tuned model, or could a shared infrastructure work? Are there any existing solutions for this?