Unlocking Coding Magic: Setting Up Your Own Local LLM in VS Code

May 3, 20245 min read

As developers, we’re always on the lookout for tools that boost our productivity. The recent rise of Language Model Libraries (LLMs) like GitHub Copilot has been nothing short of magical. But what if you’re not ready to hop on the Copilot train? Maybe your budget doesn’t allow for the monthly subscription, or perhaps your company policies frown upon third-party services. Fear not! In this article, we’ll explore an alternative: setting up your very own local LLM—Deepseek Coder—right within Visual Studio Code.

Why Go Local?

Before we dive into the setup, let’s discuss why a local LLM might be the right choice for you:

Cost-Effective: No subscription fees here! By hosting your LLM locally, you’ll save those hard-earned dollars for your favorite caffeinated beverage.

Compliance: Some organizations have strict policies regarding external services. By opting for a self-hosted solution, you not only ensure compliance but also safeguard sensitive data. Hosting your LLM locally allows you to maintain control over your information, addressing privacy concerns effectively without compromising functionality.

Customization: Want a different model? A local LLM lets you tailor it to your needs. There are new and better models coming out all the time!

VS Code Integration

Let’s dive into setting up the Continue Extension for Visual Studio Code (VS Code). Installing the extension is relatively straightforward. You can find out more about Continue by visiting their website: https://continue.dev/

Install the Continue Extension

Open Visual Studio Code.
Navigate to the Extensions Marketplace by clicking on the Extensions icon in the Activity Bar on the side of VS Code.
In the search box, type “Continue” to filter the extensions.
Look for the “Continue” extension and click the Install button.
Once installed, you’ll see the Continue logo in the left sidebar.
If you don’t see it on the sidebar, you can press Ctrl+Alt+M to open the window.

Next, we need to host a local LLM model using a provider, and for this tutorial, we’ll use llama.cpp.

Setting Up llama.cpp

Continue integrates with a number of providers, so if llama.cpp isn’t your first choice, you can find other options here: https://continue.dev/docs/model-setup/select-provider

For our purposes, llama.cpp is a simple tool and easy to set up. Before we proceed, let’s understand what llama.cpp is all about.

Quick note: the following was tested in Ubuntu, but you should be able to follow similar steps in your environment. Just make sure you have enough RAM!

What is llama.cpp?

Simply stated, llama.cpp will host our model and serve it up to Continue. It’s a powerful open-source library that implements Meta’s LLaMa architecture in efficient C/C++. Here are some key points about Llama.cpp:

Universal Compatibility: llama.cpp is designed as a CPU-first C++ library, making it easy to integrate into various programming environments. Whether you’re on a PC, Mac, or any other platform.
LLM Hosting: Llama.cpp serves as our LLM model host. It provides a server endpoint that we’ll seamlessly integrate with Visual Studio Code.

Now, let’s roll up our sleeves and set up our Llama server. Execute the following steps in your terminal.

Setup

Clone the llama.cpp repository:

git clone https://github.com/ggerganov/llama.cpp

Build the Server:

Navigate to the cloned folder: cd llama.cpp

Execute the following command to build the Llama server: make

3. Install python dependencies

python3 -m pip install -r requirements.txt

And there you have it! Your very own LLM server, ready to assist you in your coding endeavors. Now, let’s go pick a model!

Choosing Your Model

When it comes to LLMs, variety is the spice of code. The Continue model selection page (https://continue.dev/docs/model-setup/select-model) provides a delightful array of LLMs. Take your pick! These models range from compact to substantial, each offering a unique balance of accuracy and memory footprint.

Our chosen model for this journey is the Deepseek Coder 7B Instruct GGUF model (https://huggingface.co/LoneStriker/deepseek-coder-7b-instruct-v1.5-GGUF/tree/main). With a formidable 7 billion parameters, while not as powerful as the 33 billion version, it’s a good balance of size and performance.

Memory Considerations

Before proceeding, let’s discuss memory requirements. LLMs come in various sizes in the billions of parameters such as 3b, 7b, 13b, 20b, 30b, 33b, 34b, 65b, and 70b. The larger the size, the smarter your model becomes. For instance:

Imagine the 70b model as a well-read companion who engages in thoughtful discussions, referencing classic literature and sharing profound insights. On the other hand, the 3b model resembles a chatty parrot—repeating phrases but lacking depth.

Now, let’s talk about the actual memory footprint. A full-sized model occupies 2 bytes per parameter. So, a 3b model would theoretically require 6GB of RAM. However, thanks to quantization, you can obtain a “compressed” version of that file for far less. Choose wisely based on your machine’s capabilities.

Think of quantization as compressing that model into a more manageable size. Quantized models come in flavors: q2, q3, q4, q5, q6, and q8. The smaller the number, the more compact the model. For instance, a 34B q3 model is only 17GB—quite the diet compared to the full 68GB.

Remember the rule of thumb, you are generally better off running a small quantization size of a bigger model than a big quantization size of a smaller model. A 34b q3 is going to be smarter and better than a 13b q8.

Saving the Model

Once you’ve downloaded your model, save it into the llama.cpp/models folder.

Now, armed with knowledge, let’s continue our quest to integrate this magical LLM into Visual Studio Code and set up the Continue Extension with llama.cpp.

Before we go back into VS Code, let’s start our llama.cpp server. Go back to the terminal and run the following to get the server started.

./server -c 4096 --host 0.0.0.0 -t 16 --mlock -m models/deepseek-coder-7b-instruct-v1.5-Q8_0.gguf

Make sure to update your model to the one you’ve downloaded.

Configure Continue

We’ve installed the Continue extension, chosen a model, and chosen a provider with llama.cpp. Next, we need to tie all that together so we can start coding! Let’s hop back into VS Code.

Configure the Llama Provider

Open Continue by pressing Ctrl+Alt+M
At the bottom of the extension, add a provider by selecting the ‘+’ next to the model dropdown.

3. Select the llama.cpp provider

4. Select the DeepSeek Coder Model

5. Once you’ve finished configuration, select the DeepSeek option in the model dropdown.

Getting Started

That’s all there is to it! That wasn’t so painful and now you have a locally hosted LLM model for all of your coding needs. You can play around with different models or providers to find the tools that work best for you.

One thing to note about the types of models used by Continue: you can chat with these models like ChatGPT! Try asking questions and see if it can come up with solutions. The video below will show you some ways of interacting with Continue.

Continue

Conclusion

Congratulations! You’ve just set up your own local LLM. Now, as you code away in VS Code, remember that now you’ve got a coding buddy right there! Happy coding!