"I Am No Longer Needed" — AI CEO's Warning Hits 80M Views but METR Says Coders Got 19% Slower

AI Reality Check · February 2026

The Viral AI Warning — What the Data Actually Says

An essay about AI and jobs crossed 80 million views in days. Here is what the claims, the counter-arguments, and the published research actually say — without the noise.

In February 2026, Matt Shumer — founder and CEO of OthersideAI, the company behind the HyperWrite AI writing platform — published an essay titled “Something Big Is Happening” on his personal website on 9 February, before sharing it on X the following day. The post accumulated more than 80 million views on X, and over 100,000 likes, within days of going live.

Shumer, who has spent six years building AI products and investing in the sector, wrote the piece as a direct message to friends and family outside the technology industry. He compared the moment to February 2020 — when early reports of a new virus drew little widespread attention before upending daily life within weeks. “I think we’re in the ‘this seems overblown’ phase of something much, much bigger than Covid,” he wrote.

His core claim: AI crossed a threshold where he could describe a software product in plain English, walk away, and return hours later to find it built — tested, functional, and requiring no corrections. “I am no longer needed for the actual technical work of my job,” he stated in the essay. Cognitive scientist and author Gary Marcus published a point-by-point response, calling the post “weaponized hype” that selectively cited data while omitting well-documented failure modes. Both views are examined below — using only the primary published data available.

80M+ views on X

100K+ likes

Published Feb 9–10, 2026

METR benchmark: 6.6 hr (50% threshold)

AI devs: 19% slower in METR RCT

Matt Shumer Something Big Is Happening viral AI essay February 2026

Matt Shumer’s essay “Something Big Is Happening” reached 80 million views on X after being published on 9 February 2026 — Source: shumer.dev

Tap any card to expand the detail

Six Claims, Checked Against the Primary Data

Shumer’s essay made several specific claims about AI capability. Here is each one, cross-referenced against published research, benchmark data, and first-hand accounts — split into what is contested, what is verified, and what warrants caution.

▼

Contested Claim

AI writes whole complex apps — reliably, without errors

Shumer stated he can describe an app, walk away for four hours, and return to find it built — tested, complete, with no corrections needed. Gary Marcus pointed out that no actual data was cited to support the claim of reliable, error-free app creation. Journalist Kelsey Piper, writing about her own hands-on experience with Claude Code, described it as “sometimes perfect — and other times maddening.” In one documented session, it deleted every correctly recorded phoneme file she had obtained by personally emailing an English teacher, and replaced them with AI-generated sounds that were all subtly wrong.

▼

Contested Claim

METR’s benchmark proves AI handles all multi-hour human tasks

Shumer cited METR’s task-time benchmark as evidence of AI handling hours-long autonomous work. What the benchmark actually measures: the length of software task that a given AI model completes correctly at least 50% of the time. The benchmark covers software engineering tasks only — not law, medicine, finance, or general knowledge work. As of February 2026, GPT-5.2 at “high” reasoning effort holds the top recorded score at a 6.6-hour 50% time horizon, with a 95% confidence interval of 3 hours 20 minutes to 17 hours 30 minutes, per METR’s official release.

▼

Verified Fact

A qualitative shift did occur in late 2024 / early 2025

Gary Marcus, despite his broader critique, acknowledged a real change: “Something happened a couple of months ago where you can truly give it a description and let it go and — sometimes! — will come out with the right answer.” METR’s published time-horizon data, released in its March 2025 paper, shows the curve for AI’s coding task performance bending sharply upward from late 2024 onward, following a period of comparatively slow growth from 2022 through mid-2024. The shift is real; the degree of reliability attributed to it is where the disagreement sits.

▼

Verified Fact

GPT-5.2 set a record 6.6-hour time horizon on METR — at 50% success

On 4 February 2026, METR published its official assessment: GPT-5.2 with “high” reasoning effort (not “xhigh”) achieved a 50%-time-horizon of approximately 6.6 hours on METR’s expanded suite of software tasks — the highest score METR has reported to date. The 95% confidence interval spans 3 hours 20 minutes to 17 hours 30 minutes. This means GPT-5.2 completes software tasks of roughly 6.6 hours in length at a 50% success rate — not reliably or consistently, and not across all professional task types.

▼

Key Risk

The closer AI looks right, the harder its errors are to catch

An experienced developer quoted by Marcus put it plainly: “Generally, the closer these systems are to appearing right, the more dangerous they become because people become increasingly at ease just trusting them when they shouldn’t.” Security researchers and independent audits have flagged that AI-generated code can introduce vulnerabilities that are difficult to detect in review, particularly as developers become more comfortable accepting outputs without line-by-line scrutiny. This concern sits alongside the reliability debate, not separate from it.

▼

Verified Risk

METR’s own RCT found AI made experienced developers 19% slower

In a randomized controlled trial published by METR in July 2025, 16 experienced open-source developers were assigned tasks from their own repositories — with AI tools allowed for some tasks and not others. The result: when AI tools were allowed, developers took 19% longer to complete tasks than when they worked without AI. Notably, the same developers estimated after the study that AI had made them approximately 20% faster — the opposite of what the data recorded. Shumer’s essay did not reference this study, despite citing METR’s time-horizon benchmark.

METR AI task time horizon benchmark GPT-5.2 6.6 hours February 2026

METR’s time-horizon benchmark: GPT-5.2 recorded a 6.6-hour 50% task completion window — the highest reported to date as of February 2026. Source: metr.org

Primary source: METR.org · Epoch AI

METR’s Task Time Horizon — How the Curve Changed

This chart tracks the longest software task an AI model could complete correctly at least 50% of the time, measured in human-equivalent hours. The data reflects METR’s published retrospective curve from their March 2025 paper and the February 2026 GPT-5.2 update.

Coding tasks only

50% success threshold — not 100%

GPT-5.2: 6.6 hr | CI: 3h20m – 17h30m

Source: METR — Measuring AI Ability to Complete Long Tasks (Mar 2025) · Epoch AI METR Time Horizons · METR official GPT-5.2 announcement, Feb 4 2026.
Note: Early data points (2022–2024) are derived from METR’s retrospective modelling published March 2025, not real-time tracking.

Based on published benchmarks · Feb 2026

What Current AI Can and Cannot Do Reliably

These figures are drawn from METR’s published benchmark data and the July 2025 developer productivity RCT. They represent the state of AI coding capability as of early 2026 — not projections.

Short coding tasks (minutes-long, well-defined prompts)

~80–88%

High reliability on simple, clearly scoped tasks — consistent across recent models per METR’s benchmark history.

Multi-hour autonomous coding tasks (METR 50% threshold)

~50% at 6.6 hrs

GPT-5.2 at “high” reasoning effort — best recorded score on METR’s software task suite as of Feb 4, 2026. 95% CI: 3hr 20min to 17hr 30min.

Developer productivity gain when using AI tools (METR RCT, 2025)

–19% (slower)

METR’s July 2025 randomized controlled trial, 16 experienced open-source developers. When AI tools were permitted, developers took 19% longer on average than without them. Developers themselves estimated they were 20% faster — the opposite of the measured result.

Error-free long-form app creation without human review

No published data

No peer-reviewed study or benchmark has established reliable, end-to-end autonomous app creation without human oversight as of February 2026.

Direct statements — from primary sources only

In Their Own Words

The following quotes are taken verbatim from Shumer’s published essay, Kelsey Piper’s published article at The Argument, and the anonymous developer statement shared by Gary Marcus on his Substack.

“I am no longer needed for the actual technical work of my job. I describe what I want built, in plain English, and it just… appears. Not a rough draft I need to fix. The finished thing.”

— Matt Shumer, shumer.dev — “Something Big Is Happening,” Feb 9, 2026

“Something happened a couple of months ago where you can truly give it a description and let it go and — sometimes! — will come out with the right answer. … Generally, the closer these systems are to appearing right, the more dangerous they become because people become increasingly at ease just trusting them when they shouldn’t.”

— Anonymous experienced developer, quoted by Gary Marcus, Substack, Feb 10 2026

“At one point, it deleted every single one of the phoneme files of each English sound pronounced absolutely correctly, which I had personally emailed an English teacher to secure permission to use, and replaced them with AI-generated sounds which were all subtly wrong.”

— Kelsey Piper, The Argument — “I can’t stop yelling at Claude Code,” Jan 6 2026

How it unfolded

From Flat Curve to Viral Debate — a Timeline

The events that led to, and followed, Shumer’s essay — in chronological order, sourced from METR’s published data and contemporaneous reporting.

2022 – Mid 2024

METR’s retrospective modelling, published March 2025, showed slow and incremental growth in AI’s task-completion time horizon across this period. AI models could complete only minutes-long software tasks at roughly 50% reliability.

Late 2024 – Early 2025

Experienced developers and researchers began noticing a qualitative shift in newer AI models — the ability to take a plain-English description, run without step-by-step guidance, and occasionally produce a correct working result independently. METR’s curve bends sharply upward from this period.

July 2025

METR publishes its randomized controlled trial on developer productivity. Result: 16 experienced open-source developers using AI tools completed their tasks 19% slower than without them — while believing they had worked 20% faster.

4 February 2026

METR announces that GPT-5.2 with “high” reasoning effort has achieved a 50%-time-horizon of 6.6 hours on its expanded software task suite — the highest recorded figure to date. 95% confidence interval: 3hr 20min to 17hr 30min.

9 – 10 February 2026

Matt Shumer publishes “Something Big Is Happening” on his personal website on 9 February. He shares it to X the following day. The post accumulates more than 80 million views and over 100,000 likes within days.

10 – 13 February 2026

Gary Marcus publishes a point-by-point critique on Substack calling the essay “weaponized hype.” Shumer appears on CNBC’s Power Lunch on 13 February and clarifies: “I want to be very clear about this, the article wasn’t meant to scare people in this way.” Fortune had published an adapted version of the essay on 11 February.

What was covered

The Coverage, in Summary

Shumer’s essay “Something Big Is Happening,” published 9 February 2026 on shumer.dev and shared to X on 10 February, was covered across multiple outlets after accumulating more than 80 million views on the platform. The essay, Gary Marcus’s Substack response, METR’s published benchmarks — including the task-time horizon paper and the developer productivity RCT — and first-hand accounts from developers were discussed in the context of AI coding capability, reliability, and its impact on knowledge work.

The METR data for GPT-5.2, Kelsey Piper’s first-hand account of Claude Code, and the 19% slowdown figure from METR’s July 2025 RCT were among the primary data points examined. Shumer appeared on CNBC on 13 February 2026 and addressed the reaction to the piece. More AI and technology developments covered on this site include Apple’s AI delays and FTC probe (February 2026) and IBM FlashSystem AI launch (March 2026).

Also on Giganectar