Thread - Hyper

Ivan 🛡️ 1 week ago

Good morning, Nostr. Who's running local LLMs? What are the best models that can run at home for coding on a beefy PC system? In 2026, I want to dig into local LLMs more and stop using Claude and Gemini as much. I know I can use Maple for more private AI, but I prefer running my own model. I also like the fact there are no restrictions on these models ran locally. I know hardware is the bottleneck here; hopefully, these things become more efficient.

Replies (14)

John 1 week ago

24 gb VRAM (3090, 7900 cards): the latest mistral 24b, qwen3 32b and qwen3 30a3 (MoE) 48gb: 70b size models at decent quants, mistral dev large at lobotomized quants. Mistral dev large is the main one in this bracket. There might be other good 70b's released lately 96gb: gpt-oss 120b This is to fit everything in vram. With MoE's (qwen3 30a3, GPT-oss) you can get by with VRAM+RAM without ruining your speed depending on the speed of your ram. But it's usually a speed hit so I don't use anything that doesn't fit in VRAM

Kajoozie Maflingo 1 week ago

↩ replying to John

yup, and about 2tb storage just to fit all those models.

Ivan 🛡️ 1 week ago

↩ replying to John

Thank you 🙏. Current cards I own 3090 and 7900xtx. Someone suggested dual 3090s. I'll check out the models thank you.

TheRupertDamnit 1 week ago

@Kajoozie Maflingo uses local LLMs for his image generation, not coding. But he may have some insight.

Jay 🛡️ 1 week ago

@ABH3PO

ABH3PO 1 week ago

↩ replying to Jay

Local LLMs are a blessing, free and loyal to you 😛

Daniel Wigton 🛡️ 1 week ago

My set up is i913900k + 64 GB at 6400MT/s + RTX 4090 The absolute best all-around AI is llama3.3 but it is a bit outdated and and slow. Newer MOE models like llama 4 and gpt-oss are flashier and faster but they are mostly experts on hallucinating. People will also suggest deepseek but generally speaking 24gb vram is just too small for "reasoning models" to actually be an improvement. I haven't tried some of the more recent developments, but I have some hope. If someone were to train a llama3.3 like AI but have it focus on tool use, like reading the source code and documentation for the libraries you have installed, then I think it could be very good.

OceanSlim 🛡️ 1 week ago

I don't think you can really run anything unless you have a card with a minimum of 16gb vram. Even then the model you can run would be 1/4 of sonnet performance. You need like 4, 24gb cards to get close.

Corey Santa Diego 🛡️ 1 week ago

As I understand it, you'll want to limit to 1b tokens per 1GB of ram.

Hazey 🛡️ 1 week ago

I used local LLM exclusively, mostly for coding. Two used 24g 3090's which provides 48g of vram. It runs models up to 70b with very fast performance. When inferring or training, 1. It uses a lot of power, peaking around 800W 2. It spins up the fans pretty loudly I don't think it's necessary to go local for open source coding though. Maple, mostly gpt-oss-120 is great for that. I think it is necessary to go local for uncensored models or training with your own data, and discussing things that don't fit mainstream bullshit narratives.

Ivan 🛡️ 1 week ago

↩ replying to Hazey

I got a 3090 I might pick another. Thank you!

John 1 week ago

↩ replying to Ivan

Dual 3090s is (somehow) still the sweet spot for local

librekitty 🛡️ 1 week ago

browse on

Ollama Search

Search for models on Ollama.

Ivan 🛡️ 1 week ago

↩ replying to librekitty

Played with ollama before thanks!