How to Run AI Models Offline on Linux
Part 1: How I Set Up My Local LLM (Large Language Model)

If you’ve procrastinated on running AI models offline and exploring them, I’m in the same boat. I planned to run models locally, even bought a new laptop for it, but five months later my GPU still isn’t doing any meaningful work. So, here I am writing about my current setup and my experiences building a small lab. This is just the beginning — moving forward I plan to create workflows that make my life and work easier.
Although I have many learnings and experiences to share, I’ll focus here on the tools I used for my lab setup.
My laptop's core specifications:
Model: Lenovo LOQ 15IRX10
Processor: Intel® Core™ i7-13650HX
RAM: 24 GB
Graphics card: NVIDIA GeForce RTX 5060 (8 GB)
Operating system: Pop!_OS 24.04 LTS
Why Pop!_OS:
When I decided to set up my AI lab, I wanted an isolated environment to protect sensitive data on my main OS. So I installed Pop!_OS on an external SSD and boot into it whenever I need the lab. I chose Pop!_OS because:
It works great with graphic cards. The ISO file itself comes with default NVIDIA drivers. You just need to download the correct ISO for your graphics card.
It's based on Ubuntu (Debian architecture), so it uses the same software repositories and package management I'm familiar with.
And of course! Linux is great at resource efficiency and will boost the performance of LLM models.
Why Ollama:
Although you can use LM Studio, Node.js, or even Docker to easily run an LLM model, I chose to use Ollama because n8n has a dedicated built-in Ollama node. I plan to work on n8n workflows in the future and this makes my work easier and efficient.
Installing Ollama and running your first LLM model:
Now, let's get into action and install Ollama, pull an LLM model, and run it. To run some of the commands below, you need to enter your root password whenever prompted. You can either enter it when asked or just run sudo su now to avoid repetitive prompts.
Open your Terminal and run the below command to install or update installed version of Ollama:
curl -fsSL https://ollama.com/install.sh | sh'curl' pulls the script from the given URL and '|' passes the script to your system's shell (sh) to execute on the computer.
Note: If the above command fails or the URL in the command is broken, please refer to the official Ollama download page to verify the latest command.
Run the command
systemctl start ollamato run ollama. You can check the status of it by runningsystemctl status ollama
Once the Ollama is active and running, you can download any LLM model by running
ollama pull <modelname>.ollama runcommand not only runs a downloaded model, but will also download and run a model if it's not already present on your computer.I'm gonna use Google's gemma4, a 4 billion parameter model for now. Keep in mind that the higher the parameters in a model, the better its performance, but it will also require more resources. Depending on your computer specifications and your goals, choose a model. I recommend choosing models with fewer than 12 billion parameters unless you have a high-end, premium computer.
You can browse Ollama Library to choose a model and find it's "model tag" for installation.
I used the below command to download and run gemma4 (4 billion parameter):
ollama run gemma4:e4bGemma4 now runs in your terminal and you can query it as you like, setting desired parameters. Below is the screenshot of my query.
Pro Tip: If you want to check the CPU/GPU usage of the model, use the following command:
ollama ps
I am happy that the model is running entirely on my GPU. If the VRAM becomes full, the load might shift to the CPU.
Something's not right yet:
I understand the result was not what you expected. It didn't match my vision either. I would like a clean interface, ability to switch between multiple downloaded models seamlessly and the option to upload images/documents easily without constantly finding and inserting paths. This is exactly what I will be working on next.

