I recently wrote an article about deploying LLMs as APIs using AWS — when someone commented that they would like a similar article but using Azure ML instead. So I decided to write this article discussing how to deploy LLM endpoints on Azure ML. I also compare the price, ease, and customizability for LLM deployments in Azure ML vs AWS.
Microsoft has been making extremely successful partnerships in the Generative AI era, including those with OpenAI, GitHub Copilot, etc. Recently, Microsoft CEO Satya Nadella gave a talk at Microsoft Inspire, highlighting an incredible vision of Generative AI giving us a more natural interface to virtually every application and a reasoning engine that works on top of all your data, giving users more power.
Microsoft has also partnered with open-source platform Hugging Face, to make many of their models accessible on Microsoft’s Azure cloud platform. They also make available the recently released Llama-2 model family, which has been the talk of the town over the past week. In this blog, I go through how to deploy open-source LLMs as endpoints on Microsoft Azure ML and how to call these models through an API.
First of all, you need to create a workspace in Azure ML. Once you do that, you can visit the model catalog. The model catalog has an incredible list of models. As I mentioned above, I was surprised to see Llama 2 there, given it was so recently released.
Next, you can click on a model of your choice, and deploy it. Unfortunately, I’m not able to deploy any Billion parameter LLMs as I do not have enough quota and need to request for a quota increase.
However, I was able to deploy the Roberta language model, fine-tuned for Q&A.