Skip to main content

Command Palette

Search for a command to run...

How to Run AI Models Offline on Linux

Part 1: How I Set Up My Local LLM (Large Language Model)

Updated
4 min read
How to Run AI Models Offline on Linux
H
I am a tech enthusiast and love getting my hands dirty with new tools and ideas. I work as an IT Support engineer and am now starting from the fundamentals of AI, working my way through LLMs, and eventually diving into things like n8n workflows or anything else I pick up along the way. I write here to share my work, learnings and experiences, so it helps anyone who are just dipping their toes into this vast ocean. And honestly, writing it all down helps me as much!

If you’ve procrastinated on running AI models offline and exploring them, I’m in the same boat. I planned to run models locally, even bought a new laptop for it, but five months later my GPU still isn’t doing any meaningful work. So, here I am writing about my current setup and my experiences building a small lab. This is just the beginning — moving forward I plan to create workflows that make my life and work easier.

Although I have many learnings and experiences to share, I’ll focus here on the tools I used for my lab setup.

My laptop's core specifications:

  • Model: Lenovo LOQ 15IRX10

  • Processor: Intel® Core™ i7-13650HX

  • RAM: 24 GB

  • Graphics card: NVIDIA GeForce RTX 5060 (8 GB)

  • Operating system: Pop!_OS 24.04 LTS

Why Pop!_OS:

When I decided to set up my AI lab, I wanted an isolated environment to protect sensitive data on my main OS. So I installed Pop!_OS on an external SSD and boot into it whenever I need the lab. I chose Pop!_OS because:

  • It works great with graphic cards. The ISO file itself comes with default NVIDIA drivers. You just need to download the correct ISO for your graphics card.

  • It's based on Ubuntu (Debian architecture), so it uses the same software repositories and package management I'm familiar with.

  • And of course! Linux is great at resource efficiency and will boost the performance of LLM models.

Why Ollama:

Although you can use LM Studio, Node.js, or even Docker to easily run an LLM model, I chose to use Ollama because n8n has a dedicated built-in Ollama node. I plan to work on n8n workflows in the future and this makes my work easier and efficient.

Installing Ollama and running your first LLM model:

Now, let's get into action and install Ollama, pull an LLM model, and run it. To run some of the commands below, you need to enter your root password whenever prompted. You can either enter it when asked or just run sudo su now to avoid repetitive prompts.

  1. Open your Terminal and run the below command to install or update installed version of Ollama:

    curl -fsSL https://ollama.com/install.sh | sh

    'curl' pulls the script from the given URL and '|' passes the script to your system's shell (sh) to execute on the computer.

    Note: If the above command fails or the URL in the command is broken, please refer to the official Ollama download page to verify the latest command.

  2. Run the command systemctl start ollama to run ollama. You can check the status of it by running systemctl status ollama

  3. Once the Ollama is active and running, you can download any LLM model by running ollama pull <modelname>. ollama run command not only runs a downloaded model, but will also download and run a model if it's not already present on your computer.

    I'm gonna use Google's gemma4, a 4 billion parameter model for now. Keep in mind that the higher the parameters in a model, the better its performance, but it will also require more resources. Depending on your computer specifications and your goals, choose a model. I recommend choosing models with fewer than 12 billion parameters unless you have a high-end, premium computer.

    You can browse Ollama Library to choose a model and find it's "model tag" for installation.

    I used the below command to download and run gemma4 (4 billion parameter):

    ollama run gemma4:e4b

  4. Gemma4 now runs in your terminal and you can query it as you like, setting desired parameters. Below is the screenshot of my query.

    Pro Tip: If you want to check the CPU/GPU usage of the model, use the following command:
    ollama ps

    I am happy that the model is running entirely on my GPU. If the VRAM becomes full, the load might shift to the CPU.

Something's not right yet:

I understand the result was not what you expected. It didn't match my vision either. I would like a clean interface, ability to switch between multiple downloaded models seamlessly and the option to upload images/documents easily without constantly finding and inserting paths. This is exactly what I will be working on next.

Building a Local AI Lab

Part 1 of 2

Follow along as I transform my idle laptop into a fully functional offline AI lab. In this series, I document my journey from procrastinating to running powerful Large Language Models (LLMs) completely locally. You will learn how to set up a dedicated Linux environment (Pop!_OS), install Ollama, run models like Gemma directly from the terminal, and eventually build a sleek, user-friendly Web GUI to manage your models seamlessly. This is the perfect guide for developers and enthusiasts who want to protect their data, escape cloud subscription costs, and put their GPUs to work.

Up next

How to Add a GUI to Your Local LLM on Linux (Ollama + Open WebUI on Docker)

Part 2: Step-by-step setup for Docker, Open WebUI, and connecting local Ollama models on Debian Linux

More from this blog

H

HARI BUILDS

2 posts

I am a tech enthusiast and love getting my hands dirty with new tools and ideas. I work as an IT Support engineer and am now starting from the fundamentals of AI, working my way through LLMs, and eventually diving into things like n8n workflows or anything else I pick up along the way. My goal is to include as many details as possible in my articles to understand what you are doing. This helps anyone who are just dipping their toes . And honestly, writing it all down helps me as much.