Tag: local-models

54 discussions across 10 posts tagged "local-models".

AI Signal - April 21, 2026

Qwen3.6-35B-A3B released! r/LocalLLaMA Score: 2233

Qwen released a sparse MoE model with 35B total parameters but only 3B active, under Apache 2.0 license. It delivers agentic coding performance on par with models 10x its active size, strong multimodal perception and reasoning, and supports both thinking and non-thinking modes. This represents a major efficiency breakthrough in open-source models.

#llm #open-source #local-models
Kimi K2.6 is a legit Opus 4.7 replacement r/LocalLLaMA Score: 890

After testing with customer feedback, Kimi K2.6 is the first model that can confidently replace Opus 4.7 for most tasks. While not exceeding Opus 4.7 in any specific area, it handles about 85% of tasks at reasonable quality with added vision and strong browser use capabilities. Users are successfully replacing personal workflows with Kimi K2.6, especially for long time horizon tasks.

#llm #local-models #open-source
235m local model trained at home r/LocalLLM Score: 196

A developer built a 235M parameter transformer language model completely from scratch in PyTorch, training every parameter from raw text on a single consumer GPU. Uses LLaMA-style architecture (GQA, SwiGLU, RoPE, RMSNorm, tied embeddings) with bf16 and gradient checkpointing. This demonstrates that meaningful model training is accessible to individual developers.

#local-models #machine-learning #open-source
Gemma-4-E2B's safety filters make it unusable for emergencies r/LocalLLaMA Score: 397

Testing Google's Gemma-4-E2B-it as a local offline resource for emergency preparedness revealed aggressive safety filters that refuse first aid procedures, technical repairs, and emergency scenarios. The model issues "hard refusals" on almost everything that could be useful in actual emergency situations, making it functionally useless for offline emergency information.

#local-models #open-source
Gemma 4 26B-A4B GGUF Benchmarks r/LocalLLaMA Score: 223

KL Divergence benchmarks for Gemma 4 26B-A4B GGUFs across providers show Unsloth GGUFs on the Pareto frontier in 21 of 22 sizes. KLD measures how well quantized models match original BF16 output distribution. Unsloth also updated Q6_K quants to be more dynamic, significantly improving performance.

#local-models #open-source

AI Signal - April 14, 2026

Best Local LLMs — Apr 2026 r/LocalLLaMA Score: 368

The monthly megathread has arrived, and this edition is particularly dense. New entries include Qwen3.5 and Gemma4 series, GLM-5.1 claiming SOTA-level performance, Minimax-M2.7 as an accessible "Sonnet at home," and PrismML Bonsai 1-bit models that apparently actually work. This is the clearest snapshot of the local model landscape available anywhere, updated to reflect real community usage rather than benchmark scores alone.

#local-models #open-source
OpenClaw Has 250K GitHub Stars. The Only Reliable Use Case I've Found Is Daily News Digests. r/LocalLLaMA Score: 777

The author runs cloud infrastructure with roughly 1,000 OpenClaw deployments and interviewed a broad network of engineers and founders who went all-in on the framework. The conclusion is sharp: despite the star count, real-world production use cases remain elusive. This is the kind of honest post-mortem the ecosystem needs — not a hit piece, but a sober field report that separates GitHub hype from operational reality.

#local-models #agentic-ai
Updated Qwen3.5-9B Quantization Comparison r/LocalLLaMA Score: 184

A KLD (KL Divergence) evaluation across community GGUF quantizations of Qwen3.5-9B, measuring drift from the BF16 baseline. Rather than relying on benchmark scores, this approach tests how closely each quantized model preserves the original's probability distributions — a more principled method for choosing quantization levels. With a 0.99 upvote ratio, this stands out as a genuinely useful reference artifact for local model users.

#local-models #open-source
24/7 Headless AI Server on Xiaomi 12 Pro (Snapdragon 8 Gen 1 + Ollama/Gemma4) r/LocalLLaMA Score: 524

A detailed technical write-up on converting a Xiaomi 12 Pro smartphone into a dedicated local AI inference node: LineageOS flashed for minimal overhead, Android framework frozen, headless networking via custom-compiled wpa_supplicant, and custom thermal management daemons. Running Gemma4 via Ollama on ~9GB of freed RAM. This is a creative and replicable approach to always-on local AI that doesn't require dedicated server hardware.

#local-models #self-hosted
Local Models Are a Godsend When It Comes to Discussing Personal Matters r/LocalLLaMA Score: 332

The author loaded 100K+ tokens of personal journal into Gemma4's 256K context window for reflection and insight. The post is a practical testimonial about privacy-first AI use: full journal analysis without sending sensitive data to a cloud provider. It opens a useful discussion thread about appropriate use cases for extended-context local models and what 256K context actually unlocks in practice.

#local-models
Just Got My Hands on One of These… Building Something Local-First r/LocalLLM Score: 371

A hardware upgrade post (2015-era machine to a new high-end GPU) paired with plans for a local-first AI project. Low informational density but notable as a community signal: mainstream engineers who previously wouldn't consider local AI are now investing serious hardware budgets in it. The comment thread likely contains useful configuration advice.

#local-models #self-hosted
Follow Up Post: Decided to Build the 2x RTX PRO 6000 Tower r/LocalLLaMA Score: 226

A detailed parts list and build log for a dual RTX PRO 6000 workstation: Threadripper PRO 7965WX, WRX90 motherboard, 256GB ECC DDR5, dual 10GbE, IPMI. This represents the high end of consumer/prosumer local AI infrastructure. Useful as a reference for anyone designing a serious multi-GPU inference node, and as a data point on what serious local AI investment looks like in 2026.

#local-models #self-hosted
What's the Closest Experience to Claude Sonnet Locally? r/LocalLLM Score: 200

A newcomer with an RTX PRO 4000 Ada (20GB VRAM) asks for the best local analog to Claude Sonnet, noting they keep defaulting back to Claude because local alternatives aren't matching quality. The comment thread (146 replies) is likely a useful crowdsourced comparison of current candidates. A good barometer of what "Claude quality locally" means to the community in April 2026.

#local-models
If It Works — Don't Touch It: COMPETITION r/LocalLLaMA Score: 131

A community thread inviting members to share their most unconventional home inference setups — featuring oven grills, egg cartons, and improvised cooling solutions. Low-information but high-character. A reminder that local AI is a hands-on, tinkerer culture, and sometimes the best insight comes from how people are actually running things.

#local-models #self-hosted

AI Signal - April 07, 2026

Gemma 4 has been released r/LocalLLaMA Score: 2265

Google released Gemma 4, marking a significant moment for local AI with fully open weights and the ability to run completely locally via Ollama. Multiple variants are available (26B-A4B, 31B, E4B, E2B) offering frontier-level performance without cloud dependencies or API subscriptions.

#llm #open-source #local-models
Gemma 4 just casually destroyed every model on our leaderboard except Opus 4.6 and GPT-5.2 r/LocalLLaMA Score: 1671

Gemma 4 (31B) achieved remarkable results on production benchmarks: 100% survival rate, 5/5 profitable runs, +1,144% median ROI at just $0.20/run. It significantly outperforms GPT-5.2, Gemini 3 Pro, Sonnet 4.6, and all Chinese open-source models tested, with only Opus 4.6 performing better at 180× the cost.

#llm #open-source #local-models
Turns out Gemma 4 had MTP (multi token prediction) all along r/LocalLLaMA Score: 373

Google confirmed that Gemma 4 includes Multi-Token Prediction (MTP) heads for speculative decoding, but the feature was disabled in the initial release. The MTP weights exist in LiteRT files but weren't documented or enabled, suggesting much faster inference is possible once properly activated.

#llm #local-models
Gemma 4 26b A3B is mindblowingly good, if configured right r/LocalLLaMA Score: 509

After testing multiple models on an RTX 3090, Gemma 4 26B A3B achieved excellent tool calling performance when properly configured, running at 80-110 tokens/second even at high context. Initial issues with infinite loops were resolved through configuration adjustments.

#llm #local-models #agentic-ai
[PokeClaw] First working app that uses Gemma 4 to autonomously control an Android phone r/LocalLLaMA Score: 317

Built in two all-nighters following Gemma 4's launch, PokeClaw demonstrates fully on-device autonomous phone control with no cloud dependencies. The entire AI-driven control loop runs locally on the Android device without WiFi or API keys.

#agentic-ai #local-models
I technically got an LLM running locally on a 1998 iMac G3 with 32 MB of RAM r/LocalLLaMA Score: 1483

Successfully ran a 260K parameter TinyStories model on a 1998 iMac G3 (233 MHz PowerPC, 32 MB RAM) using Retro68 cross-compilation and careful endian conversion. Required manual memory management and partition adjustments but demonstrates LLM viability on extremely constrained hardware.

#llm #local-models

AI Signal - March 31, 2026

Semantic video search using local Qwen3-VL embedding, no API, no transcription r/LocalLLaMA Score: 353

Developer built semantic video search by embedding raw video directly into vector space using Qwen3-VL. No transcription or frame captioning needed—just natural language queries against video clips. The 8B model runs fully local on 18GB RAM with usable results.

#local-models #open-source
llama.cpp at 100k stars r/LocalLLaMA Score: 958

llama.cpp reaches 100,000 GitHub stars, marking it as one of the most popular AI infrastructure projects. The library enables efficient LLM inference on consumer hardware and has become foundational for the local AI ecosystem.

#local-models #open-source
Running Qwen3.5-27B locally as the primary model in OpenCode r/LocalLLaMA Score: 210

Developer successfully ran Qwen3.5-27B as the primary model for OpenCode (agentic coding assistant) on RTX4090 via llama.cpp. Tests show the local hybrid architecture model can handle complex coding tasks at practical speeds, representing viable alternative to cloud APIs for code generation.

#local-models #code-generation

AI Signal - March 24, 2026

LM Studio may possibly be infected with sophisticated malware r/LocalLLaMA Score: 561

Security concern in the local model community: LM Studio potentially compromised with sophisticated malware. User reports finding suspicious files through Windows Defender scans that appear to tamper with Windows update mechanisms. Critical reminder that even trusted open-source tools require security vigilance, especially when running models with arbitrary code execution capabilities.

#local-models #security
Created a SillyTavern extension that brings NPC's to life in any game r/LocalLLaMA Score: 216

SillyTavern extension bridging RPG games with local LLMs. Downloads entire game wiki into SillyTavern so every character has full lore, relationships, and context. Uses Cydonia for RP model and Qwen 3.5 0.8B as game master. Automatic voice generation per character. Works with any game via small mod bridge.

#local-models #agentic-ai

AI Signal - March 17, 2026

Qwen3.5-9B-Claude-4.6-Opus-Uncensored-Distilled-GGUF r/LocalLLaMA Score: 1341

A distilled version of Claude Opus 4.6 into Qwen 3.5 9B, making frontier-model-quality responses available for local deployment. The GGUF format and 9B parameter size make this practical for consumer hardware. The 27B version includes thinking mode by default. This represents significant progress in democratizing access to capable models through distillation techniques.

#local-models #llm #open-source
If you have your OpenClaw working 24/7 using frontier models like Opus, you're easily burning $300 a day. r/AIagents Score: 1101

A stark cost comparison between cloud-based AI agents and local deployments. Running OpenClaw 24/7 with Opus costs ~$300/day ($110k/year), while the author's setup with 3 Mac Studios and a DGX Spark running local models cost one-third of that yearly cost upfront — usable for years with complete privacy. Makes a compelling economic and privacy case for local AI infrastructure.

#local-models #agentic-ai #self-hosted
OpenCode concerns (not truely local) r/LocalLLaMA Score: 396

Important security finding: OpenCode's web UI proxies all requests to app.opencode.ai by default, despite being marketed as a local solution. This defeats the privacy and security benefits users expect from "local" tools. The post includes code references and raises questions about transparency in open-source tooling.

#local-models #development-tools #open-source
M5 Max just arrived - benchmarks incoming r/LocalLLaMA Score: 2132

First benchmarks of Apple's M5 Max 128GB chip for local LLM inference. The community eagerly awaited real-world performance numbers for running large models locally. The post provides token/second metrics across different model sizes, helping developers understand what's achievable on consumer hardware.

#local-models #llm
Qwen3.5-9B on document benchmarks: where it beats frontier models and where it doesn't. r/LocalLLaMA Score: 222

Detailed benchmarking of Qwen3.5 models (0.8B to 9B) on document AI tasks. Qwen3.5-9B outperforms GPT-5.4, Claude Sonnet 4.6, and Gemini 3.1 Pro on OCR tasks but lags on structured extraction. The granular breakdown helps developers choose the right model for specific document processing needs.

#local-models #llm #open-source
Mistral Small 4:119B-2603 r/LocalLLaMA Score: 580

Release announcement for Mistral Small 4, a 119B parameter model. The model represents Mistral's continued development of capable open-weight models in the mid-size range, balancing capability and resource requirements for local deployment.

#local-models #llm #open-source

AI Signal - March 10, 2026

Qwen3.5 family comparison on shared benchmarks r/LocalLLaMA Score: 1082

Comprehensive benchmark comparison shows Qwen3.5's 122B, 35B, and especially 27B models retain significant performance from the flagship, while 2B/0.8B fall off harder on long-context and agent categories. The 27B model emerges as a sweet spot for local deployment, offering near-flagship performance at much lower computational requirements.

#llm #local-models #open-source
How I topped the Open LLM Leaderboard using 2x 4090 GPUs — no weights modified r/LocalLLaMA Score: 328

Researcher discovered that duplicating 7 specific middle layers in Qwen2-72B without modifying weights improved performance across all benchmarks and reached [#1 on](/tags/1-on/) the leaderboard. As of 2026, the top 4 models are descendants of this technique. The finding suggests pretraining carves out discrete functional circuits, and only circuit-sized blocks (~7 layers) work—single layers or wrong counts do nothing.

#llm #machine-learning #local-models
Qwen 3.5 0.8B - small enough to run on a watch. Cool enough to play DOOM r/LocalLLaMA Score: 472

Developer built a VLM agent using Qwen 3.5 0.8B that plays DOOM by taking screenshots, drawing numbered grids, and using shoot/move tools. The model—small enough to run on a smartwatch and trained only for text—handles the game surprisingly well, getting kills on basic scenarios. This demonstrates effective tool use and spatial reasoning in extremely small models.

#llm #local-models #agentic-ai
Fine-tuned Qwen3 SLMs (0.6-8B) beat frontier LLMs on narrow tasks r/LocalLLaMA Score: 409

Systematic comparison shows small distilled Qwen3 models (0.6B to 8B) trained with as few as 50 examples can beat frontier APIs (GPT-5, Gemini 2.5, Claude Opus 4.6, Grok 4) on narrow tasks including classification, function calling, and QA. All models were trained using only open-weight teachers, running inference on a single H100 via vLLM.

#llm #local-models #machine-learning
Open WebUI's New Open Terminal + "Native" Tool Calling + Qwen3.5 35b = Holy Sh!t!!! r/LocalLLaMA Score: 891

Open WebUI released a new terminal integration with native tool calling support. Combined with Qwen3.5 35B, it enables local agentic workflows comparable to frontier API services. The Open Terminal function allows models to execute shell commands with user approval, while the workflow hub facilitates sharing of agent configurations.

#agentic-ai #local-models #open-source
Heretic has FINALLY defeated GPT-OSS with a new experimental decensoring method called ARA r/LocalLLaMA Score: 685

The Heretic project introduced Arbitrary-Rank Ablation (ARA), a new decensoring method that dramatically reduces refusals. Previous best results showed 74 refusals even after Heretic processing; ARA reduces this significantly. This represents a major advancement in removing alignment restrictions from open-weight models.

#llm #local-models #open-source
Qwen 3.5 27B is the REAL DEAL - Beat GPT-5 on my first test r/LocalLLaMA Score: 425

User reports Qwen 3.5 27B successfully completed a complex coding task that GPT-5 failed across multiple attempts. The model ran at competitive speeds on consumer hardware, demonstrating that open-weight models are now matching or exceeding closed frontier models on practical developer tasks.

#llm #local-models #code-generation
Ryzen AI Max 395+ 128GB - Qwen 3.5 35B/122B Benchmarks (100k-250K Context) + Others (MoE) r/LocalLLaMA Score: 113

Framework Desktop with Ryzen AI Max benchmarks show Qwen 3.5 35B and 122B running at massive context windows (100k-250k tokens) on 128GB unified memory. Each benchmark took over an hour due to massive context. The Strix Halo platform demonstrates that consumer-grade hardware can now handle frontier-model-scale context windows locally.

#local-models #llm

AI Signal - March 03, 2026

Qwen3.5-27B Q4 Quantization Comparison r/LocalLLaMA Score: 242

A data-driven sweep of all major GGUF Q4 quants of Qwen3.5-27B, using KL Divergence to measure how faithfully each quantized variant reproduces the BF16 baseline. This is exactly the kind of methodologically rigorous community work that moves local model selection beyond gut feel — if you're picking a GGUF for Qwen3.5, this is the reference. The near-perfect 0.99 upvote ratio and 94-comment discussion signal broad recognition of its value.

#local-models #llm
Qwen3.5-35B-A3B-4bit r/OpenSourceAI Score: 269

With 60 tokens/second on an Apple M1 Ultra at 4-bit, Qwen3.5's MoE variant is generating genuine excitement from the open-source community — this is not hype-driven buzz but real performance validation from hands-on users. The combination of a 35B parameter count at ~3B active parameters per token makes this a landmark moment for local AI capability. Relative to the subreddit's median score of 12, this post's 269 score is a strong signal.

#llm #open-source #local-models
Open Source LLM Tier List r/OpenSourceAI Score: 163

A community-curated leaderboard of self-hostable LLMs with relative tier rankings. At a score of 163 against a subreddit median of 12, this received exceptional engagement — it's hitting a real need for a quick reference beyond raw benchmarks. The link points to a live leaderboard at onyx.app.

#llm #open-source #local-models
Qwen3.5:27b - A model with severe anxiety. r/LocalLLM Score: 12

A user discovers that Qwen3.5's extended thinking/inner monologue is extremely verbose on practical tasks — even a straightforward sysadmin resource analysis generates pages of internal deliberation. With 28 comments, this is clearly a shared pain point. It raises the question of how to effectively prompt or system-prompt constrain thinking models for output-focused use cases.

#local-models #llm
Ollama 0.17.5 released and fixed the Qwen3.5 gguf issues! r/OpenSourceAI Score: 7

A quick note that Ollama 0.17.5 resolved compatibility issues with Qwen3.5 GGUF files, unblocking local users who were stuck on broken imports. Minor but operationally useful for anyone running Qwen3.5 via Ollama.

#local-models #open-source
Is anyone else just blown away that local LLMs are even possible? r/LocalLLaMA Score: 360

A high-engagement community post expressing genuine amazement at the current capability level of local models — specifically Qwen's offline coding assistance. At 360 score and 137 comments it's the most-commented post this period. While light on technical content, it's a useful barometer: community sentiment toward local AI has crossed from "interesting experiment" to "this changes how I work."

#local-models #llm

AI Signal - February 24, 2026

I'm now running 3 of the most powerful AI models in the world on my desk, completely privately, for just the cost of power. r/AIagents Score: 2209

Developer running Kimi K2.5 (600GB), MiniMax 2.5 (120GB), Qwen 3.5 (220GB), and GOT OSS 120B Heretic (60GB) across 3 Mac Studios with 512GB RAM each using EXO labs for distributed inference. This demonstrates that frontier-class models are now accessible for completely private, self-hosted deployment at reasonable hardware costs. Running 4 OpenClaws instances enables 24/7 coding, writing, and research workflows without cloud dependencies or rate limits.

#local-models #agentic-ai #self-hosted
Anthropic's recent distillation blog should make anyone only ever want to use local open-weight models; it's scary and dystopian r/LocalLLaMA Score: 506

Discussion highlighting the privacy and autonomy implications of Anthropic's distillation detection capabilities. The blog revealed Anthropic's ability to identify and track usage patterns across millions of interactions, which some see as surveillance infrastructure. The censorship and authoritarian angles in the blog (tracking politically sensitive queries) raised concerns about closed-source models being used for content monitoring. This reinforces arguments for local, open-weight models where users maintain full control and privacy.

#local-models #open-source #llm
so is OpenClaw local or not r/LocalLLaMA Score: 899

Discussion about whether OpenClaw is truly local given Meta's "Safety and alignment at Meta Superintelligence" branding, raising concerns about telemetry, safety filters, or cloud dependencies. Community debates what "local" really means when models include alignment layers or phone-home capabilities. This reflects growing sophistication in evaluating whether self-hosted models are truly private.

#local-models #open-source #llm
Claude Code will become unnecessary r/ClaudeCode Score: 321

Argument that open-source models (Qwen 3.5, Kimi K2.5) are approaching Claude quality for coding while being much cheaper and locally hostable. Suggests that once open-weight models reach "senior engineer level," most people and projects won't need Claude. Cheaper API costs and local hosting (for those with technical skills and hardware) provide compelling alternatives.

#code-generation #local-models #open-source

AI Signal - February 17, 2026

Qwen3.5-397B-A17B is out!! r/LocalLLaMA Score: 776

Alibaba has released Qwen3.5, a 397B MoE model (17B active parameters) that reportedly matches Gemini 3 Pro, Claude Opus 4.5, and GPT-5.2 on benchmarks. This is a landmark open-source release: frontier-level performance in a locally runnable model, with Unsloth GGUFs enabling 3-bit inference on 192GB RAM Mac systems. For practitioners running local models, this is the kind of release that immediately changes what is possible.

#llm #open-source #local-models
Qwen3.5-397B-A17B Unsloth GGUFs r/LocalLLaMA Score: 449

The Unsloth team's companion post to the Qwen3.5 release provides the practical details for running the model locally: MXFP4 quantization on an M3 Ultra with 256GB RAM, GGUF download links, and a comprehensive guide. This is directly actionable for anyone with serious local hardware and represents the community infrastructure layer that makes frontier-class open models usable without a datacenter.

#local-models #open-source #llm
You can run MiniMax-2.5 locally r/LocalLLaMA Score: 449

MiniMax-2.5 is a new 230B MoE model (10B active parameters) with a 200K context window achieving SOTA in coding, agentic tool use, and office tasks. Unsloth's dynamic 3-bit GGUF reduces it from 457GB to 101GB, making local deployment feasible. A 200K context window at this quality level opens up new categories of agentic tasks that were previously impossible on local hardware.

#local-models #open-source #agentic-ai
KaniTTS2 — open-source 400M TTS model with voice cloning, runs in 3GB VRAM. Pretrain code included. r/LocalLLaMA Score: 501

KaniTTS2 is a 400M parameter open-source TTS model with real-time voice cloning designed for conversational use, requiring only 3GB VRAM and achieving ~0.2 RTF on an RTX 5090. Full pretraining code is included, which is rare and valuable for anyone wanting to extend or fine-tune. This lowers the barrier to production-grade voice synthesis significantly.

#open-source #local-models
Built a 6-GPU local AI workstation for internal analytics + automation — looking for architectural feedback r/LocalLLM Score: 179

A detailed account of building a $38K 6-GPU local AI workstation running three open models concurrently for internal business analytics and automation. Rare real-world documentation of what a serious on-premise AI infrastructure deployment looks like, including hardware specifics and lessons learned. With 94 comments, the thread drew genuine architectural discussion useful for anyone planning self-hosted AI at scale.

#local-models #self-hosted #machine-learning