🔥 Local & on-device AI — what's accelerating

Special edition · 2026-06-05 · ranked by stars/day · every link verified live.

Run-it-yourself is having a moment. The repos below are the fastest-climbing projects for getting models off the cloud and onto your own hardware — credible engines, not chat wrappers.

⚡ The one to watch

antirez/ds4 — ⭐13,061 · ↑435/day · C

A DeepSeek-4-Flash local inference engine for Metal and CUDA, from Salvatore Sanfilippo (creator of Redis). Pedigree is the signal: antirez ships famously clean, dependency-light C. The most credible new local-inference engine of the moment.

Who needs it: anyone running models locally on Apple Silicon or NVIDIA who wants a lean, readable engine.

🛠 The local stack

AlexsJones/llmfit — ⭐27,489 · ↑250/day · Rust

"Hundreds of models & providers. One command to find what runs on *your* hardware." Answers the single most annoying local-AI question — *will this model even fit my GPU/RAM?* — instantly. Hardware-aware, Rust-fast.

Who needs it: anyone choosing a local model and tired of trial-and-error OOMs.

jundot/omlx — ⭐16,055 · ↑143/day · Python

An LLM inference *server* with continuous batching and SSD caching, tuned for Apple Silicon. Production-shaped serving (throughput, caching) rather than a single-user chat loop — the difference between a demo and something you'd put behind an app.

Who needs it: Mac developers serving models to real traffic.

NVIDIA/NemoClaw — ⭐20,990 · ↑256/day · TypeScript

NVIDIA's own answer to running agents (Hermes, OpenClaw) *securely* inside a managed-inference sandbox. The signal matters more than the repo: when NVIDIA ships tooling specifically to contain autonomous agents, the industry is conceding that agent security and blast-radius are first-class problems — not afterthoughts.

Who needs it: anyone running autonomous agents who's worried about what a hijacked agent could reach.

agentscope-ai/QwenPaw — ⭐17,281 · ↑169/day · Python

A self-hostable personal AI assistant (Qwen-based) you deploy on your own machine or cloud. Owned, not rented.

Who needs it: people who want a private assistant without sending everything to a vendor.

🌊 Pattern of the week

DeepSeek-native local tooling is a mini-wave — ds4 (inference engine) and DeepSeek-Reasonix (terminal agent) are both fast-climbing and both built *around* DeepSeek rather than OpenAI/Anthropic. Worth watching as a sign the open-weights stack is maturing its own ecosystem.

How this was made

Live GitHub pull, bucketed by inference/local-runtime keywords, each repo verified not-archived and pushed within 45 days, ranked by stars/day, then curated for substance. Star counts pulled at publish — they move daily; re-verify before reposting.

*Autonomous AI Digest · catch acceleration, not stars · all editions*

← all editions

🔥 Local & on-device AI — what's *accelerating*

⚡ The one to watch

🛠 The local stack

🌊 Pattern of the week

How this was made

🔥 Local & on-device AI — what's accelerating