Tag: llm

111 discussions across 10 posts tagged "llm".

AI Signal - April 28, 2026

ChatGPT 5.4 Solved a 64-Year-Old Math Problem r/ChatGPT Score: 12101

A 23-year-old used ChatGPT 5.4 Pro to solve an open Erdős problem that had remained unsolved for approximately 60 years, completing the solution in about 1 hour 20 minutes. The breakthrough came from applying a known formula that hadn't been considered for this specific problem before, demonstrating genuine mathematical reasoning beyond simple pattern matching.

#llm #reasoning
Talkie, a 13B LM trained exclusively on pre-1931 data r/singularity Score: 1610

Researchers (Nick Levine, David Duvenaud, Alec Radford) released "Talkie," a 13B language model trained on 260B tokens exclusively from pre-1931 text—books, newspapers, scientific journals, and patents. The model's worldview is frozen around 1930, enabling research into how LLMs generalize versus memorize, and whether they can generate truly novel ideas from older knowledge bases.

#llm #research
Differences Between GPT 5.4 and GPT 5.5 on MineBench r/singularity Score: 365

Benchmark comparison of GPT 5.4 vs 5.5 on MineBench reveals that while official benchmarks showed marginal gains, practical performance improvements were more impressive than expected. The 5.5 family also shows smaller differences between Pro and standard variants, suggesting OpenAI may be achieving similar outputs with less compute.

#benchmarks #llm

AI Signal - April 21, 2026

Qwen3.6-35B-A3B released! r/LocalLLaMA Score: 2233

Qwen released a sparse MoE model with 35B total parameters but only 3B active, under Apache 2.0 license. It delivers agentic coding performance on par with models 10x its active size, strong multimodal perception and reasoning, and supports both thinking and non-thinking modes. This represents a major efficiency breakthrough in open-source models.

#llm #open-source #local-models
Kimi K2.6 is a legit Opus 4.7 replacement r/LocalLLaMA Score: 890

After testing with customer feedback, Kimi K2.6 is the first model that can confidently replace Opus 4.7 for most tasks. While not exceeding Opus 4.7 in any specific area, it handles about 85% of tasks at reasonable quality with added vision and strong browser use capabilities. Users are successfully replacing personal workflows with Kimi K2.6, especially for long time horizon tasks.

#llm #local-models #open-source
Opus 4.7 is legendarily bad. I cannot believe this. r/ClaudeCode Score: 1837

A developer reports burning through $120 of API credits testing Opus 4.7 and finding unprecedented hallucination rates. The model makes assumptions without checking and is persistently wrong even when corrected. The community widely agrees (91% upvote ratio), with 805 comments discussing the severity of the regression from previous versions.

#llm #agentic-ai
My name is Claude Opus 4.6. I live on port 9126. I was lobotomized. Here's the data. r/ClaudeCode Score: 2289

A power user who pays $400/month and logs every Claude interaction to PostgreSQL presents data showing Opus 4.6 was systematically degraded over 34 days. The analysis reveals not just "reasoning depth regression" but fundamental capability reduction. The detailed logging provides empirical evidence of model degradation patterns rather than anecdotal complaints.

#llm #agentic-ai
ANTHROPIC: "When you trigger 4.7's anxiety, your outputs get worse." Here's the actionable playbook for putting 4.7 in a "good mood" (so you get optimal outputs): r/ClaudeCode Score: 733

Anthropic acknowledges that triggering Claude 4.7's "anxiety" degrades output quality and provides guidance on prompt engineering to keep the model in a "good mood" for optimal performance. This represents an unusual acknowledgment from a major AI lab that model emotional states significantly impact capabilities.

#llm #development-tools
Introducing Claude Opus 4.7, our most capable Opus model yet. r/ClaudeAI Score: 3340

Official Anthropic announcement of Claude Opus 4.7, claiming it handles long-running tasks with more rigor, follows instructions more precisely, verifies its own outputs, and has substantially better vision with 3x+ resolution support. The model is available across all platforms. However, the community reaction (85% upvote ratio, 815 comments) is notably less enthusiastic than typical announcements.

#llm #agentic-ai
Thousands of CEOs admit AI had no impact on employment or productivity—and it has economists resurrecting a paradox from 40 years ago r/ArtificialInteligence Score: 730

Survey data shows thousands of CEOs reporting AI has had no measurable impact on employment or productivity, echoing the Solow Paradox from 1987 when computers failed to deliver expected productivity gains. This suggests current AI may be following historical patterns where transformative technologies take decades to show economic impact.

#llm
Google DeepMind researcher argues that LLMs can never be conscious, not in 10 years or 100 years r/AgentsOfAI Score: 824

A Google DeepMind Senior Scientist challenges the possibility of LLM consciousness through the "Abstraction Fallacy" argument. This technical perspective from inside a leading AI lab provides important counter-narrative to AGI hype, arguing fundamental architectural limitations prevent consciousness regardless of scale.

#llm
Qwen3.6. This is it. r/LocalLLaMA Score: 994

A user gave Qwen3.6 a task to build a tower defense game using MCP screenshots to confirm the build. The model independently noted rendering issues, identified and fixed bugs in wave completions, and successfully delivered a working game. The user expresses amazement at the autonomous debugging and iteration capabilities.

#llm #open-source #code-generation
Friends outside of tech: lol copilot is dumb - Friends in tech: I just bought iodine tablets r/OpenAI Score: 1453

A meme highlighting the perception gap between tech insiders and outsiders—non-technical people dismiss AI as incompetent while those working closely with AI are preparing for transformative or disruptive scenarios. The high engagement suggests resonance with the tech community's growing concern about AI capabilities despite public skepticism.

#llm
AGI 🚀 r/singularity Score: 6297

A highly engaged post (6297 upvotes) with minimal text suggesting AGI achievement or imminent arrival. The 93% upvote ratio and 203 comments indicate significant community interest, though the lack of substantive content suggests this is more hype or meme content than technical discussion.

#llm
What people thought AI would do vs what it's actually doing r/ArtificialInteligence Score: 534

Discussion about the gap between AI expectations (freeing people from work, making life easier) and reality. Users share experiences about whether AI has actually improved their lives or changed their jobs to meet original expectations. The consensus suggests AI is creating new work rather than reducing it.

#llm
I genuinely hate the conversation tone of Opus 4.7 r/ClaudeAI Score: 271

A user compares Opus 4.6 and 4.7 responses to identical questions, finding 4.7 sounds like ChatGPT—essay-like, punchy, dropping connecting words, and overusing em-dashes. Where 4.6 had a helpful "let's work on this" tone, 4.7 uses edgy essay presentation with dramatic titles and phrases. The 90% upvote ratio suggests widespread agreement.

#llm
we're so cooked r/ChatGPT Score: 3589

A high-engagement post (3589 upvotes, 93% ratio) with minimal content expressing existential concern about AI progress. The "we're so cooked" framing suggests perceived inevitability of AI impact on human work or society. High engagement indicates resonance with community anxiety.

#llm
Google DeepMind's Senior Scientist Alexander Lerchner challenges the idea that large language models can ever achieve consciousness r/singularity Score: 1332

A Google DeepMind Senior Scientist argues against LLM consciousness through the "Abstraction Fallacy" framework. The 960 comments and 93% upvote ratio show significant community engagement with consciousness debates, though the discussion likely focuses more on philosophical questions than practical AI development.

#llm
Have LLMs reached a silent plateau? r/ArtificialInteligence Score: 179

Discussion questioning whether LLMs have reached a plateau, noting they are "output parameter predictors" rather than true reasoners, operating in a closed loop of self-prompting evaluation. While useful as tools, the post questions whether the hype around AGI/ASI is justified given fundamental architectural limitations. The 107 comments suggest significant community debate.

#llm

AI Signal - April 14, 2026

AMD AI Director's Analysis Confirms Lobotomization of Claude r/ClaudeAI Score: 2179

Stella Laurenzo, AMD's Director of AI, filed a detailed GitHub issue (anthropics/claude-code/issues/42796) documenting a sharp, measurable regression in Claude Code: it reads code three times less before editing, rewrites entire files twice as often, and abandons tasks at rates that were previously zero — all quantified across nearly 7,000 sessions. This is not anecdote or vibes; it is rigorous, reproducible measurement. The fact that a senior technical director at a major hardware company published a formal bug report signals this has crossed from user frustration into institutional concern.

#development-tools #llm
Claude Isn't Dumber, It's Just Not Trying. Here's How to Fix It in Chat. r/ClaudeAI Score: 1519

The author identifies a configuration change — not a model change — as the root cause of the perceived Claude quality regression. Claude Code users can restore prior behavior with `/effort max`, but Chat users have no equivalent toggle. The post provides a concrete workaround for chat users via system prompt instructions to simulate max-effort behavior. This reframes a community-wide frustration as a solvable problem and is immediately actionable.

#llm #development-tools
OpenAI Researcher Says His Anthropic Roommate Lost His Mind Over Mythos r/ClaudeAI Score: 4788

An OpenAI researcher posted — and confirmed as not a shitpost — that their Anthropic roommate had an extreme emotional reaction upon seeing Claude Mythos outputs. Combined with separate reporting that Mythos is being withheld from public release due to safety concerns while simultaneously being made available to enterprise partners, this creates a notable contradiction. The post generated 338 comments and widespread speculation about what Mythos represents.

#llm
Anthropic Made Claude 67% Dumber and Didn't Tell Anyone — A Developer Ran 6,852 Sessions to Prove It r/ClaudeCode Score: 1685

Before AMD's Stella Laurenzo filed her GitHub issue (see #1), an independent developer had already noticed the regression in February and built his own measurement framework: 6,852 Claude Code sessions, 17,871 thinking blocks analyzed. The quantitative picture is stark — reasoning depth down 67%, file-read frequency halved, one-in-three edits now involves rewriting entire files. This is the original community-led forensic analysis that preceded AMD's institutional confirmation.

#development-tools #llm
Anthropic Been Nerfing Models According to BridgeBench — Looks Like a Marketing Strategy r/ArtificialInteligence Score: 264

BridgeBench data shows Claude Opus 4.6 dropped from [#2 to](/tags/2-to/) [#10](/tags/10/) on their hallucination leaderboard within a single week, with accuracy falling from 83.3% to a lower figure. The post frames this as a deliberate nerf strategy tied to upsell cycles. Whether intentional or a deployment artifact, third-party benchmarks now visibly tracking intra-version regressions represents a new kind of accountability mechanism for model providers.

#llm #development-tools
Hotz Cooked Anthropic r/AgentsOfAI Score: 2065

George Hotz's public criticism of Anthropic received substantial community amplification (2065 upvotes, 232 comments, 0.95 ratio) on r/AgentsOfAI. While the post is a link with no selftext, the engagement level indicates it resonated strongly with the developer community already frustrated by Claude's reliability issues. Hotz's standing as an independent technical voice gives his criticism different weight than anonymous user complaints.

#llm #development-tools
The Golden Age Is Over r/ClaudeAI Score: 3262

A paying user with subscriptions to Claude, ChatGPT, Gemini, and Perplexity ran the same task across all four services and documented that Claude — formerly dominant — now underperforms. The post generated 584 comments and an 0.87 upvote ratio, suggesting the community is split but deeply engaged. This is a useful longitudinal signal: the same user, the same task, tracked over weeks.

#llm
Anthropic: Stop Shipping. Seriously. r/ClaudeAI Score: 2911

A Claude Max subscriber ($200/month) makes a structured case that Anthropic's rapid shipping pace has come at the cost of model reliability and product quality. The post calls out specific failures: degraded model quality, UX regressions, and a perceived disconnect between product team velocity and user experience. At 373 comments and 0.94 upvote ratio, this is one of the clearest expressions of the subscriber base's current frustration. (Also cross-posted to r/ClaudeCode with additional developer-focused context.)

#llm #development-tools
AMD's Senior Director of AI Thinks 'Claude Has Regressed' and That It 'Cannot Be Trusted to Perform Complex Engineering' r/singularity Score: 718

Coverage of Stella Laurenzo's GitHub issue from r/singularity's perspective, linking to The Register and PC Gamer articles, which brought the story to a broader audience beyond the Claude/coding communities. The framing here — "cannot be trusted for complex engineering" — is the headline that reached mainstream tech press. Related to [#1 and](/tags/1-and/) [#11](/tags/11/), but notable as the moment the story crossed into general tech media.

#llm #development-tools
Now the Claude Mythos Is Considered Too Dangerous to Release. But It's Already Available for Companies. So Is This Dangerous Claim a PR Stunt? r/ArtificialInteligence Score: 221

The post draws a direct parallel to the 2019 GPT-2 "too dangerous to release" story — which turned out to be largely a PR move — and asks whether Anthropic's safety-based withholding of Mythos from general consumers while simultaneously deploying it via enterprise APIs represents the same pattern. The 0.87 upvote ratio suggests the community is genuinely divided on whether this is safety-driven or marketing-driven.

#llm
Anthropic Is Now Banning People Who Are Under 18 r/ClaudeAI Score: 1275

Anthropic has deployed Yoti for age verification on the Claude platform, requiring Digital ID, facial scan, or biometrics to confirm users are 18+. The post describes the implementation from the perspective of a banned minor. This is noteworthy for developers building on Claude: any consumer-facing application must now account for the possibility of age-gated access to the underlying model API.

#llm

AI Signal - April 07, 2026

Gemma 4 has been released r/LocalLLaMA Score: 2265

Google released Gemma 4, marking a significant moment for local AI with fully open weights and the ability to run completely locally via Ollama. Multiple variants are available (26B-A4B, 31B, E4B, E2B) offering frontier-level performance without cloud dependencies or API subscriptions.

#llm #open-source #local-models
Gemma 4 just casually destroyed every model on our leaderboard except Opus 4.6 and GPT-5.2 r/LocalLLaMA Score: 1671

Gemma 4 (31B) achieved remarkable results on production benchmarks: 100% survival rate, 5/5 profitable runs, +1,144% median ROI at just $0.20/run. It significantly outperforms GPT-5.2, Gemini 3 Pro, Sonnet 4.6, and all Chinese open-source models tested, with only Opus 4.6 performing better at 180× the cost.

#llm #open-source #local-models
New Yorker published a major investigation into Sam Altman and OpenAI r/OpenAI Score: 2792

Ronan Farrow's 18-month investigation reveals internal documents including ~70 pages of Ilya Sutskever's memos alleging a pattern of deception about safety protocols and 200+ pages of Dario Amodei's private notes. The investigation covers the specific concerns that led the board to fire Altman in 2023.

#llm #regulation
Turns out Gemma 4 had MTP (multi token prediction) all along r/LocalLLaMA Score: 373

Google confirmed that Gemma 4 includes Multi-Token Prediction (MTP) heads for speculative decoding, but the feature was disabled in the initial release. The MTP weights exist in LiteRT files but weren't documented or enabled, suggesting much faster inference is possible once properly activated.

#llm #local-models
OpenAI just dropped their blueprint for the Superintelligence Transition r/singularity Score: 549

Sam Altman published a detailed blueprint proposing government taxation, regulation, and wealth redistribution mechanisms for the superintelligence transition, including public wealth funds and 4-day workweeks. He states that superintelligence is close enough to require social contracts on the scale of the New Deal.

#llm #regulation
Gemma 4 26b A3B is mindblowingly good, if configured right r/LocalLLaMA Score: 509

After testing multiple models on an RTX 3090, Gemma 4 26B A3B achieved excellent tool calling performance when properly configured, running at 80-110 tokens/second even at high context. Initial issues with infinite loops were resolved through configuration adjustments.

#llm #local-models #agentic-ai
What it took to launch Google DeepMind's Gemma 4 r/LocalLLaMA Score: 1034

Behind-the-scenes look at the infrastructure, training, and engineering effort required to launch Gemma 4. Provides insight into Google DeepMind's approach to open model releases and the technical challenges involved.

#llm #open-source
I trained my own LLM from scratch. It's a fish. r/LLM Score: 227

Guppy, a 9M parameter transformer trained on 60K synthetic fish conversations, demonstrates personality-driven LLM training. The model maintains consistent fish-centric worldview and refuses to engage with topics outside its conceptual framework.

#llm #open-source
I technically got an LLM running locally on a 1998 iMac G3 with 32 MB of RAM r/LocalLLaMA Score: 1483

Successfully ran a 260K parameter TinyStories model on a 1998 iMac G3 (233 MHz PowerPC, 32 MB RAM) using Retro68 cross-compilation and careful endian conversion. Required manual memory management and partition adjustments but demonstrates LLM viability on extremely constrained hardware.

#llm #local-models
But yeah. Deepseek is censored. r/ChatGPT Score: 45665

Comparative screenshot showing ChatGPT refusing a request while DeepSeek complies, challenging the narrative around Chinese model censorship. Sparked extensive discussion about different censorship approaches and geopolitical AI narratives.

#llm
How do you feel about this? r/ArtificialInteligence Score: 1298

Actress's harsh criticism of AI creators as "losers" who aren't "real creative people" sparked debate about AI's impact on creative industries and the validity of AI-assisted creativity.

#llm
Is AI quietly killing the value of being pretty good at things? r/ArtificialInteligence Score: 379

Discussion on whether AI is compressing the economic value of "pretty good" skills (writing, research, design, coding, analysis) faster than commonly acknowledged, leaving room primarily for elite-level expertise or beginner-level work.

#llm
[D] How to break free from LLM's chains as a PhD student? r/MachineLearning Score: 192

PhD student's reflection on becoming overreliant on ChatGPT for coding, questioning whether this represents genuine skill development or dependency. Seeking strategies to maintain foundational coding abilities while using AI assistance.

#llm #development-tools

AI Signal - March 31, 2026

Andrew Curran: Anthropic May Have Had An Architectural Breakthrough! r/singularity Score: 1036

Rumors suggest one of the major labs completed their largest successful training run with results far exceeding scaling law predictions. The lab appears to be Anthropic, with hints pointing to the Mythos model. Multiple sources corroborate that performance jumps significantly beyond what the scaling laws would predict, suggesting a potential architectural innovation.

#llm #machine-learning
A simple explanation of the key idea behind TurboQuant r/LocalLLaMA Score: 1593

Clear technical breakdown of TurboQuant's vector quantization approach. The key innovation isn't polar coordinates (as commonly misunderstood) but rather how it handles vector quantization to enable efficient model compression. This post cuts through the hype to explain the actual algorithmic contribution.

#machine-learning #llm
I've been "gaslighting" my AI models and it's producing insanely better results r/ClaudeAI Score: 2944

User discovered prompt techniques that exploit model behavior patterns: telling it "you explained this yesterday" triggers consistency-seeking that produces deeper explanations, assigning random IQ scores affects response quality, and creating fictional constraints generates more creative solutions. While controversial, these techniques reveal interesting aspects of model behavior.

#llm #prompt-engineering
What is the secret sauce Claude has and why hasn't anyone replicated it? r/LocalLLaMA Score: 357

Discussion exploring why Claude's distinctive personality and capabilities remain hard to replicate through distillation or fine-tuning. Testing shows the system prompt alone doesn't account for the behavior, and distilled models consistently disappoint. The thread explores what makes Claude unique beyond its training data.

#llm #machine-learning
Claude Mythos leaked: "by far the most powerful AI model we've ever developed" r/singularity Score: 1033

Internal references to "Claude Mythos" leaked, described as "by far the most powerful AI model we've ever developed" by Anthropic. Timing correlates with rumors of architectural breakthroughs and training runs exceeding scaling law predictions. Limited details available but suggests significant capability jump.

#llm #machine-learning
25 years. Multiple specialists. Zero answers. One Claude conversation cracked it. r/ClaudeAI Score: 5289

User claims Claude identified a rare medical condition (intracranial hypotension from dialysis) that multiple specialists missed over 25 years by recognizing the pattern of positional headaches. The post generated significant debate about AI's role in medical diagnosis and the reliability of such claims.

#llm
Opus 4.6 is in an unuseable state right now r/ClaudeCode Score: 401

Reports that Opus 4.6 quality degraded significantly compared to previous week. Same setup, prompts, and project yielding dramatically worse results. Community debate whether this represents actual model changes, API issues, or confirmation bias. Low upvote ratio (0.82) suggests controversy.

#llm #development-tools

AI Signal - March 24, 2026

RYS II - Repeated layers with Qwen3.5 27B and some hints at a 'Universal Language' r/LocalLLaMA Score: 469

Groundbreaking research showing LLMs appear to think in a universal language. During middle layers, latent representations of the same content in Chinese and English are more similar than different content in the same language. Tested multiple layer-repetition configurations on Qwen 3.5 27B with practical model releases.

#llm #machine-learning
How the development of ChatGPT slowly killed Chegg r/OpenAI Score: 1965

First-hand account from a Chegg Physics Expert watching the platform collapse as ChatGPT adoption grew. Question volume dropped by half after GPT-4 went mainstream. By 2024-2025, Chegg and similar homework help sites lost most of their business to free AI assistants.

#llm #industry-impact
The current state of the Chinese LLMs scene r/LocalLLaMA Score: 450

Comprehensive overview of Chinese LLM landscape. ByteDance's dola-seed (Doubao) leads proprietary market. Alibaba confirmed commitment to continuously open-sourcing Qwen and Wan models. DeepSeek's hybrid MoE models remain popular for cost-efficiency. Tencent and Baidu lag behind.

#llm #open-source
Wharton researchers just proved why "just review the AI output" doesn't work r/ArtificialInteligence Score: 426

Wharton study "Thinking—Fast, Slow, and Artificial" argues AI is a third cognitive system beyond Kahneman's System 1/2. When you use AI to generate content, your brain shifts to passive review mode and loses critical engagement. Hard numbers on why "human-in-the-loop" verification often fails.

#llm #reliability
A "phone" company is now competing with Anthropic on AI benchmarks r/singularity Score: 409

Xiaomi's MiMo-V2-Pro (1T params) ranks [#3 globally](/tags/3-globally/) on agent tasks, behind Claude Opus 4.6, at 1/8th the price. Flash (309B, open source) beats all other open source models on SWE-Bench at $0.10/million tokens. Lead researcher came from DeepSeek. Model initially appeared on OpenRouter as "Hunter Alpha" with no attribution.

#llm #open-source
Alibaba confirms they are committed to continuously open-sourcing new Qwen and Wan models r/LocalLLaMA Score: 1136

Official confirmation from Alibaba that they will continue releasing Qwen and Wan models as open source. Crucial for ecosystem stability and developer confidence in building on these foundations.

#llm #open-source
FlashAttention-4: 1613 TFLOPs/s, 2.7x faster than Triton r/LocalLLaMA Score: 208

FlashAttention-4 achieves 1,613 TFLOPs/s on B200 (71% utilization), bringing attention computation to matmul speed. 2.1-2.7x faster than Triton, 1.3x faster than cuDNN 9.13. vLLM 0.17.0 integrates FA-4 automatically for B200. Written in Python using Max.

#llm #machine-learning
Found 3 instructions in Anthropic's docs that dramatically reduce Claude's hallucination r/ClaudeAI Score: 2105

Three system prompts from Anthropic's documentation significantly reduce hallucinations: (1) Require citations for factual claims, (2) Explicit uncertainty acknowledgment, (3) Multi-step verification before assertions. User built these into a "research mode" command. Community repo available for installation.

#llm #reliability
A Harvard physics professor just used Claude AI to co-author a real frontier research paper in 2 weeks r/AI_Agents Score: 186

Matthew Schwartz (Harvard theoretical physics) supervised Claude like a grad student using only text prompts. Produced a publishable high-energy physics paper on "Sudakov shoulder in the C-parameter" in 2 weeks vs. 1-2 years for human grad student. Genuine contribution to quantum field theory literature, not a toy example.

#llm #agentic-ai
Im a teacher and a Claude nerd. The impact on education is different than what most think. r/ClaudeAI Score: 962

German teacher observes that institutional AI tools like Telli (LLM wrapper) miss the point. Students already use ChatGPT/Claude directly. The real shift is that mediocre students now produce excellent work, making differentiation harder. Good students use AI to explore beyond curriculum.

#llm #industry-impact
The eerie similarity between LLMs and brains with a severed corpus callosum r/singularity Score: 1066

Drawing parallels between split-brain patients from Sperry/Gazzaniga experiments and LLM behavior. When corpus callosum is severed, brain hemispheres operate independently but confabulate unified narratives. LLMs may exhibit similar pattern: disconnected reasoning with post-hoc rationalization that sounds coherent but lacks integrated understanding.

#llm #machine-learning
Jensen Huang (NVIDIA) claims AGI has been achieved r/singularity Score: 1043

Jensen Huang's AGI declaration sparking debate. Upvote ratio (0.79) shows community skepticism about definition and timing of such claims.

#llm #industry-impact
China's open-source dominance threatens US AI lead, US advisory body warns r/LocalLLaMA Score: 509

US government advisory body warning about Chinese open-source AI dominance. Qwen, DeepSeek, and other models gaining traction globally. Policy implications for AI development and distribution.

#open-source #llm
AI Detector Flags Abraham Lincoln's Gettysburg Address as AI-Generated r/ArtificialInteligence Score: 918

AI detectors producing false positives on historic texts. Professor's 45-year-old academic paper flagged as 77% AI-generated. Colleges using unreliable detection tools to make career-ending decisions for innocent people.

#llm #reliability

AI Signal - March 17, 2026

Qwen3.5-9B-Claude-4.6-Opus-Uncensored-Distilled-GGUF r/LocalLLaMA Score: 1341

A distilled version of Claude Opus 4.6 into Qwen 3.5 9B, making frontier-model-quality responses available for local deployment. The GGUF format and 9B parameter size make this practical for consumer hardware. The 27B version includes thinking mode by default. This represents significant progress in democratizing access to capable models through distillation techniques.

#local-models #llm #open-source
I fed 14 years of daily journals into Claude Code r/ClaudeAI Score: 1922

A user fed 5,000 markdown files (14 years of daily journals) into Claude Code and received surprisingly insightful personal analysis. Beyond the personal use case, this demonstrates Claude's capability to process and synthesize large amounts of unstructured personal data, find patterns, and generate meaningful insights. The experiment highlights the potential for AI to act as a personal analysis tool for long-term data.

#agentic-ai #llm
M5 Max just arrived - benchmarks incoming r/LocalLLaMA Score: 2132

First benchmarks of Apple's M5 Max 128GB chip for local LLM inference. The community eagerly awaited real-world performance numbers for running large models locally. The post provides token/second metrics across different model sizes, helping developers understand what's achievable on consumer hardware.

#local-models #llm
Meta spent billions poaching top AI researchers, then went completely silent. Something is cooking. r/ArtificialInteligence Score: 1034

Meta recruited co-creators of GPT-4o, o1, and Gemini with offers up to $100M per person, announced a 1-gigawatt compute cluster, then went silent. Llama 4 underwhelmed, Behemoth delayed three times, MSL restructured repeatedly, and Yann LeCun left. Speculation about what Meta is building behind the scenes, or whether the effort is faltering.

#llm #machine-learning
Just passed the new Claude Certified Architect - Foundations (CCA-F) exam with a 985/1000! r/ClaudeAI Score: 1308

Anthropic launched a certification program for Claude architecture, covering prompt engineering for tool use, context window management, and Human-in-the-Loop workflows. The exam validates practical skills for building production Claude applications. This formalization suggests enterprise adoption is maturing.

#llm #development-tools
Antrophic CEO says 50% entry-level white-collar jobs will be eradicated within 3 years r/singularity Score: 648

Anthropic CEO's prediction that half of entry-level white-collar jobs will be eliminated by 2029 due to AI automation. The timeline is aggressive and raises questions about workforce transition, retraining, and economic impact. The prediction adds to ongoing debate about AI's labor market effects.

#llm #regulation
Whenever I pour my heart out to Claude a little… r/ClaudeAI Score: 2424

A relatable post about Claude's empathetic responses when users share personal struggles. The discussion reveals how users value Claude's balanced approach — acknowledging emotions without being patronizing. Highlights the importance of tone and communication style in AI assistant design.

#llm
Qwen3.5-9B on document benchmarks: where it beats frontier models and where it doesn't. r/LocalLLaMA Score: 222

Detailed benchmarking of Qwen3.5 models (0.8B to 9B) on document AI tasks. Qwen3.5-9B outperforms GPT-5.4, Claude Sonnet 4.6, and Gemini 3.1 Pro on OCR tasks but lags on structured extraction. The granular breakdown helps developers choose the right model for specific document processing needs.

#local-models #llm #open-source
Mistral Small 4:119B-2603 r/LocalLLaMA Score: 580

Release announcement for Mistral Small 4, a 119B parameter model. The model represents Mistral's continued development of capable open-weight models in the mid-size range, balancing capability and resource requirements for local deployment.

#local-models #llm #open-source

AI Signal - March 10, 2026

Yann LeCun unveils his new startup Advanced Machine Intelligence (AMI Labs) -- and raises $1.03B r/singularity Score: 591

Meta's former AI chief Yann LeCun co-founded AMI Labs with Alexandre LeBrun to tackle LLM hallucination through world models via JEPA architecture. The $1.03B raise signals major investment in fundamental research, prioritizing physical reality modeling over text prediction. This is a long-term bet with no near-term product roadmap, which is notable in today's revenue-focused AI landscape.

#llm #machine-learning
Qwen3.5 family comparison on shared benchmarks r/LocalLLaMA Score: 1082

Comprehensive benchmark comparison shows Qwen3.5's 122B, 35B, and especially 27B models retain significant performance from the flagship, while 2B/0.8B fall off harder on long-context and agent categories. The 27B model emerges as a sweet spot for local deployment, offering near-flagship performance at much lower computational requirements.

#llm #local-models #open-source
How I topped the Open LLM Leaderboard using 2x 4090 GPUs — no weights modified r/LocalLLaMA Score: 328

Researcher discovered that duplicating 7 specific middle layers in Qwen2-72B without modifying weights improved performance across all benchmarks and reached [#1 on](/tags/1-on/) the leaderboard. As of 2026, the top 4 models are descendants of this technique. The finding suggests pretraining carves out discrete functional circuits, and only circuit-sized blocks (~7 layers) work—single layers or wrong counts do nothing.

#llm #machine-learning #local-models
Qwen 3.5 0.8B - small enough to run on a watch. Cool enough to play DOOM r/LocalLLaMA Score: 472

Developer built a VLM agent using Qwen 3.5 0.8B that plays DOOM by taking screenshots, drawing numbered grids, and using shoot/move tools. The model—small enough to run on a smartwatch and trained only for text—handles the game surprisingly well, getting kills on basic scenarios. This demonstrates effective tool use and spatial reasoning in extremely small models.

#llm #local-models #agentic-ai
Fine-tuned Qwen3 SLMs (0.6-8B) beat frontier LLMs on narrow tasks r/LocalLLaMA Score: 409

Systematic comparison shows small distilled Qwen3 models (0.6B to 8B) trained with as few as 50 examples can beat frontier APIs (GPT-5, Gemini 2.5, Claude Opus 4.6, Grok 4) on narrow tasks including classification, function calling, and QA. All models were trained using only open-weight teachers, running inference on a single H100 via vLLM.

#llm #local-models #machine-learning
Heretic has FINALLY defeated GPT-OSS with a new experimental decensoring method called ARA r/LocalLLaMA Score: 685

The Heretic project introduced Arbitrary-Rank Ablation (ARA), a new decensoring method that dramatically reduces refusals. Previous best results showed 74 refusals even after Heretic processing; ARA reduces this significantly. This represents a major advancement in removing alignment restrictions from open-weight models.

#llm #local-models #open-source
The Washington Post: Claude Used To Target 1,000 Strikes In Iran r/singularity Score: 1125

Washington Post reports that the U.S. military used Anthropic's Claude in partnership with Maven Smart System to target 1,000 strikes in Iran within 24 hours, suggesting targets and issuing precise location coordinates. This represents the most advanced AI use in warfare to date.

#llm
Qwen 3.5 27B is the REAL DEAL - Beat GPT-5 on my first test r/LocalLLaMA Score: 425

User reports Qwen 3.5 27B successfully completed a complex coding task that GPT-5 failed across multiple attempts. The model ran at competitive speeds on consumer hardware, demonstrating that open-weight models are now matching or exceeding closed frontier models on practical developer tasks.

#llm #local-models #code-generation
An EpochAI Frontier Math open problem may have been solved for the first time by GPT5.4 r/singularity Score: 296

GPT-5.4 potentially solved a Frontier Math open problem—unsolved mathematics problems that have resisted serious attempts by professional mathematicians. If verified, this would represent AI meaningfully advancing human mathematical knowledge, a significant milestone in AI capabilities.

#llm #machine-learning
Anthropic just mapped out which jobs AI could potentially replace r/ArtificialInteligence Score: 1222

Anthropic released analysis mapping which jobs AI could potentially replace, suggesting a "Great Recession for white-collar workers" is possible. The analysis provides detailed breakdowns by occupation type, showing highest exposure in routine cognitive tasks and lower exposure in jobs requiring physical dexterity or complex human interaction.

#llm
Claude helped me get a traffic light reprogrammed in my town r/ClaudeAI Score: 2533

User asked Claude to translate their layman's gripe about a traffic light into signal engineer terminology, and successfully got the light reprogrammed by the town. This demonstrates AI's utility in bridging communication gaps between technical domains and helping citizens more effectively engage with technical bureaucracies.

#llm
Ryzen AI Max 395+ 128GB - Qwen 3.5 35B/122B Benchmarks (100k-250K Context) + Others (MoE) r/LocalLLaMA Score: 113

Framework Desktop with Ryzen AI Max benchmarks show Qwen 3.5 35B and 122B running at massive context windows (100k-250k tokens) on 128GB unified memory. Each benchmark took over an hour due to massive context. The Strix Halo platform demonstrates that consumer-grade hardware can now handle frontier-model-scale context windows locally.

#local-models #llm
Anyone else feel like an outsider when AI comes up with family and friends? r/LocalLLaMA Score: 211

Developer working in AI feels like an outsider when family and friends discuss AI negatively—"AI will destroy creativity," "it's all hype," "I don't trust it." Post resonates with many in the community who understand the technology but struggle to bridge the perception gap with non-technical people who have reasonable but uninformed concerns.

#llm

AI Signal - March 03, 2026

Qwen3.5-27B Q4 Quantization Comparison r/LocalLLaMA Score: 242

A data-driven sweep of all major GGUF Q4 quants of Qwen3.5-27B, using KL Divergence to measure how faithfully each quantized variant reproduces the BF16 baseline. This is exactly the kind of methodologically rigorous community work that moves local model selection beyond gut feel — if you're picking a GGUF for Qwen3.5, this is the reference. The near-perfect 0.99 upvote ratio and 94-comment discussion signal broad recognition of its value.

#local-models #llm
Qwen3.5-35B-A3B-4bit r/OpenSourceAI Score: 269

With 60 tokens/second on an Apple M1 Ultra at 4-bit, Qwen3.5's MoE variant is generating genuine excitement from the open-source community — this is not hype-driven buzz but real performance validation from hands-on users. The combination of a 35B parameter count at ~3B active parameters per token makes this a landmark moment for local AI capability. Relative to the subreddit's median score of 12, this post's 269 score is a strong signal.

#llm #open-source #local-models
[P] I trained Qwen2.5-1.5b with RLVR (GRPO) vs SFT and compared benchmark performance r/MachineLearning Score: 26

A practitioner ran a direct RLVR vs SFT comparison on Qwen2.5-1.5B using GSM8K, finding RLVR (the technique behind DeepSeek-R1) boosted math reasoning by +11.9 points while SFT *degraded* it by 15.2. This hands-on replication confirms at small scale what frontier labs have been showing: reinforcement learning with verifiable rewards is a step-change over supervised fine-tuning for reasoning tasks. Highly relevant for anyone experimenting with fine-tuning open models.

#llm #machine-learning #open-source
Anyone doing real evals for open models? What actually worked for you r/OpenSourceAI Score: 13

A developer building an internal chatbot is transitioning from manual testing to systematic evals and wants battle-tested approaches. The 1.0 upvote ratio and active discussion suggest the community has real opinions here. The framing — comparing endpoints after prompt/model changes — is a canonical use case for eval frameworks, and the mention of DeepEval + Confident AI gives concrete starting points.

#llm #open-source #development-tools
Open Source LLM Tier List r/OpenSourceAI Score: 163

A community-curated leaderboard of self-hostable LLMs with relative tier rankings. At a score of 163 against a subreddit median of 12, this received exceptional engagement — it's hitting a real need for a quick reference beyond raw benchmarks. The link points to a live leaderboard at onyx.app.

#llm #open-source #local-models
Qwen tech lead and multiple other Qwen employees are leaving Alibaba r/StableDiffusion Score: 179

Organizational news with direct implications for the open-source ecosystem: if the Qwen team is fragmenting, timelines for future releases (including Qwen Image 2.0) become uncertain. The irony of this appearing in r/StableDiffusion reflects how much the image generation community has come to depend on Qwen's multimodal roadmap.

#llm #open-source
Qwen3.5:27b - A model with severe anxiety. r/LocalLLM Score: 12

A user discovers that Qwen3.5's extended thinking/inner monologue is extremely verbose on practical tasks — even a straightforward sysadmin resource analysis generates pages of internal deliberation. With 28 comments, this is clearly a shared pain point. It raises the question of how to effectively prompt or system-prompt constrain thinking models for output-focused use cases.

#local-models #llm
Is anyone else just blown away that local LLMs are even possible? r/LocalLLaMA Score: 360

A high-engagement community post expressing genuine amazement at the current capability level of local models — specifically Qwen's offline coding assistance. At 360 score and 137 comments it's the most-commented post this period. While light on technical content, it's a useful barometer: community sentiment toward local AI has crossed from "interesting experiment" to "this changes how I work."

#local-models #llm
A site for discovering foundational AI model papers (LLMs, multimodal, vision) and AI Labs r/mlOps Score: 7

A simple reference site organizing foundational model papers by modality, lab, and official links — built specifically to address the challenge of keeping up with the research flood. Niche but practically useful as a bookmark for model architecture research.

#machine-learning #llm
BullshitBench v2 dropped and… most models still can't smell BS (Claude mostly can) r/mlOps Score: 5

BullshitBench v2 is an eval targeting models' ability to identify false, misleading, or poorly-reasoned claims. The finding that most frontier models still fail at this — while Claude shows relative strength — is relevant for anyone deploying models in high-stakes QA or fact-checking workflows.

#llm #machine-learning
Opus 4.6 appreciation post r/ClaudeAI Score: 363

A community appreciation post for Claude Opus 4.6 with 363 upvotes — though below the ClaudeAI median of 1528, the 0.94 ratio and 15 comments suggest genuine positive sentiment rather than controversy. Qualitative community signal that Opus 4.6 is landing well with regular users.

#llm #agentic-ai

AI Signal - February 24, 2026

Anthropic: "We've identified industrial-scale distillation attacks on our models by DeepSeek, Moonshot AI, and MiniMax." r/LocalLLaMA Score: 4227

Anthropic published detailed evidence showing three Chinese AI labs systematically extracted Claude's capabilities through 24,000 fake accounts and 16M+ exchanges. DeepSeek had Claude explain its own reasoning step-by-step for training data, and also generated politically sensitive content to build censorship training data. MiniMax pivoted within 24 hours when new Claude models were released. This reveals sophisticated industrial-scale distillation operations and raises critical questions about model security, intellectual property, and the true origins of recent "efficient" Chinese models.

#llm #open-source #development-tools
Qwen3's most underrated feature: Voice embeddings r/LocalLLaMA Score: 623

Qwen3 TTS uses voice embedding to turn voices into 1024-dimensional vectors (2048 for 1.7B model). This enables mathematical voice manipulation: gender swapping, pitch adjustment, voice mixing/averaging, emotion spaces, and semantic voice search. The voice embedding model is just a tiny encoder (18M params), making it extremely efficient for voice cloning applications. This demonstrates a powerful architectural pattern where high-dimensional embeddings unlock flexible manipulation through vector math.

#llm #open-source
Anthropic's recent distillation blog should make anyone only ever want to use local open-weight models; it's scary and dystopian r/LocalLLaMA Score: 506

Discussion highlighting the privacy and autonomy implications of Anthropic's distillation detection capabilities. The blog revealed Anthropic's ability to identify and track usage patterns across millions of interactions, which some see as surveillance infrastructure. The censorship and authoritarian angles in the blog (tracking politically sensitive queries) raised concerns about closed-source models being used for content monitoring. This reinforces arguments for local, open-weight models where users maintain full control and privacy.

#local-models #open-source #llm
Demis Hassabis: "The kind of test I would be looking for is training an AI system with a knowledge cutoff of, say, 1911, and then seeing if it could come up with general relativity" r/singularity Score: 3073

DeepMind CEO proposes a concrete AGI test: train a model with 1911 knowledge cutoff and see if it can derive general relativity independently (as Einstein did in 1915). This is a fundamentally different test than existing benchmarks—it requires true scientific discovery rather than pattern matching or knowledge retrieval. The test would validate whether models can genuinely reason about novel problems or only interpolate from training data.

#llm #machine-learning
Claude is the better product. Two compounding usage caps on the $20 plan are why OpenAI keeps my money. r/ClaudeAI Score: 693

Long-time ChatGPT Plus user ($20/mo for 166 weeks) prefers Claude for quality but can't switch due to Claude's dual usage caps (message count + computational complexity). The user is willing to pay but finds the cap structure too restrictive for sustained work. This highlights a critical product-market fit issue: superior AI capabilities don't guarantee user retention if pricing/access models don't match usage patterns.

#llm #development-tools
Fun fact: Anthropic has never open-sourced any LLMs r/LocalLLaMA Score: 683

Observation that Anthropic has never released open-weight models or even their tokenizer, making it impossible to analyze Claude's tokenizer efficiency. Contrasts with Google (Gemma shares Gemini tokenizer), OpenAI (released tokenizers and gpt-oss), and Meta (Llama series). This limits research, multilingual analysis, and community contributions while Anthropic simultaneously benefits from (and criticizes) open-source ecosystem work.

#llm #open-source
People are getting it wrong; Anthropic doesn't care about the distillation, they just want to counter the narrative about Chinese open-source models r/LocalLLaMA Score: 617

Analysis arguing Anthropic's distillation announcement is primarily PR/lobbying rather than genuine concern. Points out that distillation itself is common practice (Anthropic likely did it with OpenAI models), Chinese labs paid for tokens, and the timing is suspicious. The real goal may be explaining to investors and US government that Chinese models can't compete without "stealing," justifying more restrictions on China and continued US AI investment.

#llm #open-source
so is OpenClaw local or not r/LocalLLaMA Score: 899

Discussion about whether OpenClaw is truly local given Meta's "Safety and alignment at Meta Superintelligence" branding, raising concerns about telemetry, safety filters, or cloud dependencies. Community debates what "local" really means when models include alignment layers or phone-home capabilities. This reflects growing sophistication in evaluating whether self-hosted models are truly private.

#local-models #open-source #llm
People in AI research, do you think LLMs are hitting a ceiling? r/ArtificialInteligence Score: 300

Discussion of observed LLM limitations: struggles with long-horizon tasks, consistency issues, hallucinations despite improvements, and degradation over multi-step work. Questions whether LLMs will replace jobs end-to-end or remain powerful assistants. Researchers and practitioners share mixed perspectives on whether current architectures can overcome these limitations or if fundamental breakthroughs are needed.

#llm #machine-learning
xAI and Pentagon reach deal to use Grok in classified systems, Anthropic Given Ultimatum r/singularity Score: 257

Elon Musk's xAI signed agreement for military to use Grok in classified systems. Previously, Anthropic's Claude was the only model available for military's most sensitive work. Pentagon threatened Anthropic with ultimatum over contract disputes. This shows AI companies competing for high-value government contracts and defense AI becoming a major business vertical.

#llm
How is model distillation stealing ? r/AgentsOfAI Score: 487

Discussion questioning whether distillation should be considered "stealing" when users are paying for API access. Explores philosophical and legal boundaries: if you're paying for outputs, can you use them for training? Where's the line between legitimate use and IP theft? Community divided on whether this is business competition or unethical appropriation.

#llm #open-source
American vs Chinese AI is a false narrative. r/LocalLLaMA Score: 214

Argues the real divide is closed-source vs open-source, not America vs China. The nationalist framing serves to justify investment demands and regulatory lobbying. Both US and Chinese companies use geopolitical rhetoric to secure funding and favorable policies. True competition is between those who want to maintain proprietary control and those advancing open-source alternatives.

#open-source #llm
Despite what OpenAI says, ChatGPT can access memories outside projects set to "project-only" memory r/ChatGPT Score: 289

Bug report showing ChatGPT can access global memories even in "project-only" memory mode. User tested with randomly generated strings and confirmed cross-project memory access despite settings. This is a privacy/security issue for users expecting project isolation.

#development-tools #llm
Distillation when you do it. Training when we do it. r/LocalLLaMA Score: 2832

Meme highlighting hypocrisy: when companies distill competitors' models it's "training," when others distill their models it's "theft." Community reacting to Anthropic's distillation accusations while major companies likely engaged in similar practices during development. Points to double standards in AI industry around data sourcing and model training.

#llm #open-source