How to Run GPT-OSS Locally on Laptop using Ollama

Updated on: August 19, 2025

We all love open-source language models. Running GPT-OSS on your own computer is easier than most people think. I’ve set it up on my own machine a few times, and the smoothest method I’ve found using a tool called Ollama. It’s lightweight, works on Mac and Windows, and makes the whole process quick.

I’ll walk you through the exact steps so that you can have GPT-OSS running locally in less than an hour.

Why Use GPT-OSS Locally

When you run GPT-OSS on your own system:

  • Your data stays with you.
  • You don’t need an internet connection once it’s set up.
  • You can integrate it into custom projects without API limits.

For example, I use a local model to help automate report writing and test small coding projects. It’s faster than sending data to a remote server. If you are in Europe and concerned about data privacy in LLM, then this is a useful model for you.

Steps to run GPT-OSS Locally

Step 1: Pick the Right Model

Ollama supports multiple versions of GPT-OSS. Your choice depends on your computer’s hardware.

  • GPT-OSS-20B – Needs at least 16GB of VRAM or unified memory. Runs well on high-end consumer GPUs or Apple Silicon Macs. This is the model I recommend for most people starting out.
  • GPT-OSS-120B – Needs at least 60GB of VRAM or unified memory. Designed for multi-GPU systems or powerful workstations.

If you’re not sure, start with the 20B model. You can always switch later.

Step 2: Install Ollama

Ollama is the tool that makes it possible to download and run GPT-OSS locally.

  1. Go to ollama.com/download.
  2. Download the installer for your system (Mac or Windows).
  3. Follow the installation instructions.
  4. Once installed, you’ll use the command line to run it. On Mac, that’s Terminal. On Windows, that’s Command Prompt or PowerShell.

Step 3: Download the Model

Ollama interface

After installing Ollama, pull the model you want to use via terminal.

For GPT-OSS-20B:

ollama pull gpt-oss:20b

For GPT-OSS-120B:

ollama pull gpt-oss:120b

This will take some time, depending on your internet speed. The model files are large.

If you are on Apple Silicon then you can simply use LM Studio to download this model easily. That's the easiest option without any technical knowledge.

Step 4: Start a Chat Session

gpt prompt

Once the download is complete, you can start talking to the model.

ollama run gpt-oss:20b

You’ll now see a prompt where you can type your questions.
For example:

Explain quantum computing in simple terms.

The model will respond directly in your terminal.

Using LMStudio on Apple Silicon

If you are running it via LMstudio on Apple Silicon, this model has an option to set 'Reasoning' weight like low, medium & high. You have to choose as per your task for better performance.

LMStudio-Running-GPT-OSS-Locally-on-Laptop

 

Step 5: Use the API in Your Projects

Ollama isn’t just for chatting in the terminal. It also provides an API that’s compatible with OpenAI’s Chat Completions format. That means you can connect it to your own applications.

Here’s a Python example:

from openai import OpenAI

client = OpenAI(

base_url="http://localhost:11434/v1",

api_key="ollama"  # Dummy key

)

response = client.chat.completions.create(

model="gpt-oss:20b",

messages=[

{"role": "user", "content": "Explain quantum computing"}

]

)

print(response.choices[0].message["content"])

This lets you build tools that use GPT-OSS without sending data to the cloud. I’ve used this approach to create small automation scripts and test prototypes without extra costs. It works as a standalone MCP server.

Step 6: Enable Function Calling

GPT-OSS supports function calling. This means it can trigger actions outside the chat, like checking your calendar, fetching data from a database, or creating files.

For example, you could connect it to a function that retrieves weather data:

def get_weather(city):
return {"temperature": "30°C", "condition": "Sunny"}

When the model sees a request for weather, it can call this function automatically. This makes it possible to build local AI agents that can do things, not just talk.

Hardware Considerations

Running large models locally needs good hardware. Here’s what works best:

  • For GPT-OSS-20B:
    • 16GB+ VRAM (GPU) or unified memory (Apple Silicon)
    • Modern CPU with at least 8 cores
    • SSD storage for fast model loading
  • For GPT-OSS-120B:
    • 60GB+ VRAM or a multi-GPU setup
    • High-end CPU
    • Large, fast SSD

If you have less powerful hardware, you can still run smaller quantized versions of these models. Ollama handles this automatically.

Real-World Uses

I’ve used GPT-OSS locally for:

  • Generating code snippets while offline
  • Summarizing PDF reports without uploading them anywhere
  • Testing chatbot prototypes without paying for API usage
  • Building small data analysis tools that stay private

Because it’s local, there’s no per-token charge, and no risk of sending sensitive information to an external server.

Troubleshooting Tips

  • Model won’t download - Check your internet connection and disk space. These models are large.
  • Slow responses - Make sure your GPU is being used. Without GPU acceleration, large models run much slower.
  • High memory usage - This is normal for large models. Closing other heavy applications can help.

Conclusion

Running GPT-OSS locally with Ollama is straightforward once you know the steps. You install Ollama, download the model, and start chatting. From there, you can integrate it into your own scripts and tools using the API.

Our finance team is using it for internal operations, and it's helping them. If your machine meets the hardware requirements, you’ll have a powerful AI running right on your desk - without relying on cloud servers.

Do let us know if you are facing any issues while using it internally. We are happy to help you.

FAQ

  1. Can I run GPT-OSS without a GPU?
    Yes, but it will be slower. A GPU speeds things a lot.
  2. Is GPT-OSS free?
    Ollama is free to use, but make sure to check the model license.
  3. Can I run multiple models at once?
    Yes, as long as your hardware has enough resources.
  4. How big is the GPT-OSS-20B model?
    It’s several gigabytes. Make sure you have enough disk space.
  5. Can I fine-tune GPT-OSS locally?
    Yes, but you’ll need extra hardware and tools beyond Ollama.
Aryan Kadam
Tech enthusiast with a knack for finding opportunities across sectors. Aryan balances his passion for the latest gadgets with creative outlets in gaming and music. Inspired by sci-fi and animated classics, he approaches technology with both practical expertise and imaginative thinking. Always looking forward, Aryan combines technical knowledge with creative problem-solving to explore what's next in both his professional and personal pursuit

Leave a Reply

Your email address will not be published. Required fields are marked *