I know this is dumb but after Chris Brousseau’s presentation I wanted to try to run an LLM locally.
chatting
- Get ollama because I’m trying to be a 1337 pwrus3r.
- Get Ubuntu.
wsl --install -d Ubuntu(oh yeah I’m on Windows bruh). - Let’s get a reasoning model for chatting.
ollama pull qwen3:14b - Let’s also get a coding model for programming later.
ollama pull freehuntx/qwen3-coder:14b - Use all the resources. Open Ollama and in the Settings set the context window to 256k.
The Ollama UI is good enough. Is it snappy? No, I’d say about 4x slower. But not 10x slower. I chose Qwen3 as the first shot as I want to try others next. I also got a separate coding model just so I can exercise having a build model that is separate from the plan model. Both are needed for what’s next: programming.
Checking vRAM: nvidia-smi --query-gpu=memory.total --format=csv,noheader,nounits
Alternative models
- planning:
MFDoom/deepseek-r1-tool-calling:14b,gemma3:12b - coding:
qwen3-coder:30b-a3b,phi4
programming
- Start in Windows land so Powershell to get some access opened and performance tuned.
[Environment]::SetEnvironmentVariable('OLLAMA_HOST', '0.0.0.0', 'User')
[Environment]::SetEnvironmentVariable("OLLAMA_MAX_LOADED_MODELS", "1", "User")
[Environment]::SetEnvironmentVariable("OLLAMA_KEEP_ALIVE", "1m", "User")
- Open the firewall too in Windows land.
New-NetFirewallRule -DisplayName "Allow WSL to Ollama" -Direction Inbound -Protocol TCP -LocalPort 11434 -Action Allow - Turn on mirrored networking in Windows land.
@'
[wsl2]
networkingMode=mirrored
'@ | Out-File -FilePath "$HOME\.wslconfig" -Append -Encoding utf8
- Restart WSL.
wsl --shutdown - Get OpenCode from inside Ubuntu.
curl -fsSL https://opencode.ai/install | bash - Give OpenCode from inside Ubuntu access to ollama.
echo 'export OLLAMA_HOST=host.docker.internal' >> ~/.bashrc. - Reconfigure OpenCode from inside Ubuntu.
vim ~/.config/opencode/opencode.json
{
"$schema": "https://opencode.ai/config.json",
"provider": {
"local-ollama": {
"npm": "@ai-sdk/openai-compatible",
"name": "Ollama Windows",
"options": {
"baseURL": "http://localhost:11434/v1"
},
"models": {
"freehuntx/qwen3-coder:14b": { "name": "Qwen 3 Coder (Build)" },
"qwen3:14b": { "name": "Qwen 3 (Plan)" }
}
}
},
"default_agent": "build",
"agent": {
"plan": {
"mode": "primary",
"model": "local-ollama/qwen3:14b",
"temperature": 0.3,
"num_ctx": 65536,
"description": "Architectural reasoning and strategy",
"tools": {
"bash": false,
"edit": false,
"write": false,
"read": true
}
},
"build": {
"mode": "primary",
"model": "local-ollama/freehuntx/qwen3-coder:14b",
"temperature": 0,
"num_ctx": 65536,
"description": "High-speed code execution and editing",
"prompt": "You are a professional coding agent. You MUST use the provided tools (bash, write, edit) to fulfill requests.",
"tools": {
"bash": true,
"edit": true,
"write": true,
"read": true
}
}
}
}
The 65536 is a context window of 64k. The math here is that fourteen billion parameters are about nine gigabytes and a 64k-long context window is about two and a half gigabytes. 9 + 2.5 = 11.5 < the twelve gigabytes of vRAM I found earlier. Staying below that limit – with headroom for your OS – puts the entire workload on the GPU.
testing
I put this prompt in Plan mode first.
You are a staff engineer at Google. Plan how to write a local markdown-based journal. It should have three parts:
A journal.py script that takes a string and saves it to a file named YYYY-MM-DD.md in a entries/ folder.
It should automatically append a timestamp to each entry.
If the entries/ folder doesn’t exist, it should create it.
After about ten minutes, the plan was ready. Then in Build mode I said “OK build it.” About three more minutes to this journal.py:
import sys
from datetime import datetime
import os
def main():
# Read from standard input
entry = sys.stdin.read().strip()
# Generate current timestamp
now = datetime.now()
timestamp = now.strftime("%Y-%m-%d %H:%M:%S")
# Create entries directory if it doesn't exist
entries_dir = "entries/"
if not os.path.exists(entries_dir):
os.makedirs(entries_dir)
# Create filename based on current date
filename = os.path.join(entries_dir, now.strftime("%Y-%m-%d.md"))
# Append entry with timestamp to the file
with open(filename, "a") as f:
f.write("---\n")
f.write(f"timestamp: {timestamp}\n")
f.write("---\n")
f.write(f"{entry}\n")
if __name__ == "__main__":
main()
Not bad. Note that this isn’t that impressive (I wouldn’t ask an intern this question). But at least it works.