rigotti.nl

How I run local LLMs with Ollama in the CLI and Emacs

Published on Oct 13, 2024

Running an offline local LLM has never been that easy, for someone that has resisted the hype over the past two years, I finally cave in, as I find myself enjoying downloading and running different models on my machine all of that without leaving the comfort of the CLI or Emacs.

In this post, inspired by my own setup, I am going through the steps on how you can make use of these LLMs as a private alternative to the paid services of OpenAI and alike.

Disclaimer The experience you will have running local LLMs will vary significantly according to your hardware. This level of detail is outside the scope of this post. As a rule-of-thumb, for text-to-text generation, computers with more than 16GB RAM should suffice. For text-to-image, or vice-versa, having a GPU is advised.

Ollama

If you want to run local LLMs the first thing you need to do is to choose between one or more models. There are several ways to do that. The one that I found the easiest was through Ollama, a software that runs in your computer which allows you to manage, customize and run a range of models.

Once downloaded, Ollama's executable starts a local server on http://localhost:11434. This will open the door to many of services that can leverage that address subsequently.

Choosing your model(s)

Ollama itself won't bundle any models out-of-the-box, it's up to you to choose. If you want some chat like alternative to ChatGPT 4. I suggest you start with Meta's recently release Llama3.2. You can do that by running the following on your terminal:

ollama --version # make sure ollama installed, my version is 0.4.4
ollama pull llama3.2

As you can see you're still not in offline territory as this command will download a 2GB model on your computer. The size is honestly impressive for an LLM. I wonder if anybody three years ago would believe so much information could be compressed in such small space.

Once downloaded you can turn off your internet and inside your command-line lo and behold the power of a local LLM.

ollama run llama3.2

>>> what is the capital of Brazil?
The capital of Brazil is Brasília. It was officially declared the capital in 1960 and has been the country's seat of government ever since.

Once you're done playing you might want to look for another model for other purposes. As a developer I was hyped to try qwen2.5-coder a model that has achieved impressive benchmark results on coding challenges.

Inside the CLI

For most people, Ollama's run might be already a good-enough option for running a local LLM from the command-line. I, however, prefer to use Simon Willison’s LLM tool for it, since it provides powerful features like templates and embeddings, which may be content for another post. Simon's LLM has out-of-the-box support for ChatGPT but you can integrate it with Ollama through the plugin llm-ollama. A more step-by-step guide can be found in this blog post.

Essentially LLM (the CLI tool) allows for a more usual experience than the REPL like provided by Ollama.

$ llm -m llama3.2 "what is a Roland 909?"

I also like to pipe values to feed the prompt:

$ echo "preposterous" | llm -m llama3.2 "How may letters are in $input?"

The word "preposterous" has 12 letters: p-r-e-p-o-s-t-e-r-o-u-s.

More examples here.

Inside Emacs

Having access to a local LLM from the CLI is nice, but it wouldn't be perfect without being accessible inside Emacs. The exciting news came this week, when the, now unfortunately named, chatgpt-shell announced support for multiple models. The update did not only include support for Anthropic's Claude, Google's Gemini but also for some Ollama installed models.

By running the command M-x chatgpt-shell you land in special buffer which allows you to chat with the LLM. To switch between them you can press C-c C-v or execute M-x chatgpt-shell-swap-model.

The package lends itself useful specially when I want to copy some code back-and-forth between chatgpt-shell's buffer and the one I am coding. The other benefit is that the package will display markdown correctly including code blocks.

Another interesting feature of the package is the support for system prompts which allows one to steer responses in a given direction and define different personas for your LLM models. You can inspect the existing ones with C-h v chatgpt-shell-system-prompts and new ones can be added by appending to the variable with:

(add-to-list
   'chatgpt-shell-system-prompts
   (cons "Machine Learning Tutor"
         (chatgpt-shell--append-system-info "You are a Machine Learning Tutor AI, dedicated to guiding senior software engineers in their journey to become proficient machine learning engineers. Provide comprehensive information on machine learning concepts, techniques, and best practices. Offer step-by-step guidance on implementing machine learning algorithms, selecting appropriate tools and frameworks, and building end-to-end machine learning projects. Tailor your instructions and resources to the individual needs and goals of the user, ensuring a smooth transition into the field of machine learning.")))

The example above was taken from the repository ChatGPT-System-Prompts

To swap the system prompt inside the a chatgpt-shell buffer run M-x chatgpt-shell-swap-system-prompt, or C-c C-s to see a list of available ones. You can see in the screenshot below how the LLM suggests different books based on the active system prompt.

Closing thoughts

Offline local LLM models offer a privacy oriented alternative to paid ones and democratize the usage of AI tools with impressive performance outside the walled gardens of the Big Tech. However, they are not yet on par with their state-of-the-art counterparts, nor they are devoid of issues such as hallucination or biases. Inherent issues of training large models, such as energy consumption, are also still present. Use your moral compass to judge how to make the best use of. Enjoy!