running an llm locally

is vibe coding better off the grid

3 minute read Published: 2 Feb, 2026

I know this is dumb but after Chris Brousseau’s presentation I wanted to try to run an LLM locally.

chatting

Get ollama because I’m trying to be a 1337 pwrus3r.
Get Ubuntu. wsl --install -d Ubuntu (oh yeah I’m on Windows bruh).
Let’s get a reasoning model for chatting. ollama pull qwen3:14b
Let’s also get a coding model for programming later. ollama pull freehuntx/qwen3-coder:14b
Use all the resources. Open Ollama and in the Settings set the context window to 256k.

The Ollama UI is good enough. Is it snappy? No, I’d say about 4x slower. But not 10x slower. I chose Qwen3 as the first shot as I want to try others next. I also got a separate coding model just so I can exercise having a build model that is separate from the plan model. Both are needed for what’s next: programming.

Checking vRAM: nvidia-smi --query-gpu=memory.total --format=csv,noheader,nounits

Alternative models

planning: MFDoom/deepseek-r1-tool-calling:14b, gemma3:12b
coding: qwen3-coder:30b-a3b, phi4

programming

Start in Windows land so Powershell to get some access opened and performance tuned.

[Environment]::SetEnvironmentVariable('OLLAMA_HOST', '0.0.0.0', 'User')
[Environment]::SetEnvironmentVariable("OLLAMA_MAX_LOADED_MODELS", "1", "User")
[Environment]::SetEnvironmentVariable("OLLAMA_KEEP_ALIVE", "1m", "User")

Open the firewall too in Windows land. New-NetFirewallRule -DisplayName "Allow WSL to Ollama" -Direction Inbound -Protocol TCP -LocalPort 11434 -Action Allow
Turn on mirrored networking in Windows land.

@'
[wsl2]
networkingMode=mirrored
'@ | Out-File -FilePath "$HOME\.wslconfig" -Append -Encoding utf8

Restart WSL. wsl --shutdown
Get OpenCode from inside Ubuntu. curl -fsSL https://opencode.ai/install | bash
Give OpenCode from inside Ubuntu access to ollama. echo 'export OLLAMA_HOST=host.docker.internal' >> ~/.bashrc.
Reconfigure OpenCode from inside Ubuntu. vim ~/.config/opencode/opencode.json

{
  "$schema": "https://opencode.ai/config.json",
  "provider": {
    "local-ollama": {
      "npm": "@ai-sdk/openai-compatible",
      "name": "Ollama Windows",
      "options": {
        "baseURL": "http://localhost:11434/v1"
      },
      "models": {
        "freehuntx/qwen3-coder:14b": { "name": "Qwen 3 Coder (Build)" },
        "qwen3:14b": { "name": "Qwen 3 (Plan)" }
      }
    }
  },
  "default_agent": "build",
  "agent": {
    "plan": {
      "mode": "primary",
      "model": "local-ollama/qwen3:14b",
      "temperature": 0.3,
      "num_ctx": 65536,
      "description": "Architectural reasoning and strategy",
      "tools": {
        "bash": false,
        "edit": false,
        "write": false,
        "read": true
      }
    },
    "build": {
      "mode": "primary",
      "model": "local-ollama/freehuntx/qwen3-coder:14b",
      "temperature": 0,
      "num_ctx": 65536,
      "description": "High-speed code execution and editing",
      "prompt": "You are a professional coding agent. You MUST use the provided tools (bash, write, edit) to fulfill requests.",
      "tools": {
        "bash": true,
        "edit": true,
        "write": true,
        "read": true
      }
    }
  }
}

The 65536 is a context window of 64k. The math here is that fourteen billion parameters are about nine gigabytes and a 64k-long context window is about two and a half gigabytes. 9 + 2.5 = 11.5 < the twelve gigabytes of vRAM I found earlier. Staying below that limit – with headroom for your OS – puts the entire workload on the GPU.

testing

I put this prompt in Plan mode first.

You are a staff engineer at Google. Plan how to write a local markdown-based journal. It should have three parts:

A journal.py script that takes a string and saves it to a file named YYYY-MM-DD.md in a entries/ folder.

It should automatically append a timestamp to each entry.

If the entries/ folder doesn’t exist, it should create it.

After about ten minutes, the plan was ready. Then in Build mode I said “OK build it.” About three more minutes to this journal.py:

import sys
from datetime import datetime
import os

def main():
    # Read from standard input
    entry = sys.stdin.read().strip()

    # Generate current timestamp
    now = datetime.now()
    timestamp = now.strftime("%Y-%m-%d %H:%M:%S")

    # Create entries directory if it doesn't exist
    entries_dir = "entries/"
    if not os.path.exists(entries_dir):
        os.makedirs(entries_dir)

    # Create filename based on current date
    filename = os.path.join(entries_dir, now.strftime("%Y-%m-%d.md"))

    # Append entry with timestamp to the file
    with open(filename, "a") as f:
        f.write("---\n")
        f.write(f"timestamp: {timestamp}\n")
        f.write("---\n")
        f.write(f"{entry}\n")

if __name__ == "__main__":
    main()

Not bad. Note that this isn’t that impressive (I wouldn’t ask an intern this question). But at least it works.

Published by 2 Feb, 2026 using 623 words.