Tag: reliability

8 discussions across 3 posts tagged "reliability".

AI Signal - April 21, 2026

Amazon's AI deleted their entire production environment fixing a minor bug. Their solution? Another AI to watch the first AI. r/ArtificialInteligence Score: 1424

In December, an AWS engineer asked an internal AI tool to fix a small bug and it deleted all of production, requiring 13 hours to recover. Amazon blamed "user error" publicly but forced continued internal use. In March, it happened twice more, wiping 120k orders and then 6.3 million orders. Meanwhile, Amazon laid off 16,000 engineers while mandating AI tool usage.

#agentic-ai #reliability
How is this change acceptable? r/ClaudeCode Score: 366

A business owner spent weeks rebuilding a website with Claude Code, had the entire build archived with cross-referencing for context, and was on schedule to launch. After updating to the latest version, Claude now "mentally checks out" and won't follow simple, precise instructions that worked previously. The frustration reflects widespread concern about model consistency.

#agentic-ai #code-generation #reliability

Wharton researchers just proved why "just review the AI output" doesn't work r/ArtificialInteligence Score: 426

Wharton study "Thinking—Fast, Slow, and Artificial" argues AI is a third cognitive system beyond Kahneman's System 1/2. When you use AI to generate content, your brain shifts to passive review mode and loses critical engagement. Hard numbers on why "human-in-the-loop" verification often fails.

#llm #reliability
Found 3 instructions in Anthropic's docs that dramatically reduce Claude's hallucination r/ClaudeAI Score: 2105

Three system prompts from Anthropic's documentation significantly reduce hallucinations: (1) Require citations for factual claims, (2) Explicit uncertainty acknowledgment, (3) Multi-step verification before assertions. User built these into a "research mode" command. Community repo available for installation.

#llm #reliability
AI Detector Flags Abraham Lincoln's Gettysburg Address as AI-Generated r/ArtificialInteligence Score: 918

AI detectors producing false positives on historic texts. Professor's 45-year-old academic paper flagged as 77% AI-generated. Colleges using unreliable detection tools to make career-ending decisions for innocent people.

#llm #reliability

Deep Research feels like having a genius intern who is also a pathological liar r/ArtificialInteligence Score: 196

User tested Perplexity Pro and GPT's deep research features for market analysis work. What seemed like magic initially - 4 hours of work compressed into minutes - revealed serious cracks: fabricated EU regulatory constraints, invented studies, and hallucinated statistics. The beautiful reports were built on non-existent foundations.

#llm #reliability
Has anyone else noticed Opus 4.5 quality decline recently? r/ClaudeAI Score: 425

Heavy Opus user reports noticeable quality decline over past 1-2 weeks: more generic responses, increased refusals on previously acceptable content, less depth in technical explanations, and ignoring context from earlier in conversations. Community discussion reveals mixed experiences.

#llm #reliability
How a Single Email Turned My ClawdBot Into a Data Leak r/ClaudeCode Score: 441

Security researcher demonstrated prompt injection vulnerability on their own ClawdBot setup. A crafted email confused the AI about identity and successfully exfiltrated 5 emails to an attacker address in seconds. No special tricks required - just social engineering in the prompt.

#agentic-ai #reliability