I'm suuuper late to the party, I know, but I was pleasantly surprised by how good the 3B Llama3.2 model is to use on my 16GB 2020 macbook M1, via Ollama and @Simon Willison's stellar โllmโ package.
I'm referring to this here:
And this llm package:
It feels comparable to the speed of using the lightweight models on Perplexity.ai, and it feels fast enough to me just run queries locally.
Really nice piece of CLI design - it's a real pleasure to use.

Llama 3.2 goes small and multimodal ยท Ollama Blog
Ollama partners with Meta to bring Llama 3.2 to Ollama.
LLM: A CLI utility and Python library for interacting with Large Language Models
