Deploying Open-Source LLMs As APIs

Open-source LLMs are all the rage, along with concerns about data privacy with closed-source LLM APIs. This tutorial goes through how to deploy your own open-source LLM API Using Hugging Face + AWS

7 min readJul 9, 2023

Super intelligent AI llama prompted by author, generated using Leonardo.AI

While ChatGPT and GPT-4 have taken the world of AI by storm in the last half year, open-source models are catching up — slowly but surely. And there has been a lot of ground to cover, to reach OpenAI model performance. In many cases, ChatGPT and GPT-4 are clear winners due to their quality and competitive pricing.

But, open-source models will always have value over closed APIs like ChatGPT/GPT-4 for certain business cases. I have spoken with folks in industries like legal, healthcare, and finance — who have concerns over data and customer privacy. These companies would rather spend thousands of dollars a month (or more) to run open-source models on their own cloud instances (think AWS, Google Cloud, Azure) rather than send data through OpenAI APIs that are used by everyone. These folks understand that right now, open-source LLMs might not perform as well as ChatGPT/GPT-4, and may end up being 10X more expensive due to the costs involved in training, deploying, and hosting models with tens or hundreds of billions of parameters. But they…

Deploying Open-Source LLMs As APIs

Open-source LLMs are all the rage, along with concerns about data privacy with closed-source LLM APIs. This tutorial goes through how to deploy your own open-source LLM API Using Hugging Face + AWS

Written by Skanda Vivek