The Demise of Sora: Pathways for LLMs and AGI
Original article: Sora之死,LLM与AGI路径 March 25
Video AI model Sora has been taken down, along with its developer API, the video generation feature integrated into ChatGPT, and even the planned $1 billion joint venture with Disney. Many were caught off guard, never imagining that this video generation model, which had achieved a level of renown on a par with ChatGPT, would be handed a death sentence so swiftly.
When Sora was first unveiled, its photorealistic effects were nothing short of stunning. Its cinematic clips went viral across social media, sparking a sharing frenzy and fueling curiosity and hyperbole. People said it had the potential to upend Hollywood. In cost terms, AI video is a token sink. A significant portion of growth projections regarding the token economy, computing infrastructure, and GPU chip demand are predicated on the rise of AI video. Yet, in the end, Sora proved to be nothing more than an exorbitantly expensive AI showcase.
Sora’s Dual Failure: Financial and Strategic
Sora’s demise was not the result of a proactive technical abandonment, but rather a reactive contraction triggered by financial collapse. At the end of September 2025, Sora launched as a standalone app; within days, it topped the charts in major app stores, peaking at 3.3 million downloads. However, by February 2026, that figure had plummeted to just 1.1 million. Even more critically, in-app purchase revenue amounted to a mere $2.1 million, while OpenAI’s daily expenditure on AI video generation soared to $15 million—projecting a potential annual cost exceeding $5 billion. Bill Peebles, the head of the Sora project, publicly admitted that Sora’s current economic model is completely unsustainable.
Three fatal flaws have left the independent commercialization of video generation models with virtually no viable path forward: First, the computational power required is immense—a hurdle unlikely to be overcome in the short term. Second, the potential for “deepfakes” poses significant security and legal risks; governments worldwide are imposing strict regulations, with China, for instance, having already enacted specific measures mandating the labeling and authorized generation of AI video content. Third, unresolved copyright issues regarding training data remain a major stumbling block. This also led ByteDance to suspend the global launch of its own video model, Seedance 2.0. In a twist of irony, just one day before announcing Sora’s shutdown, OpenAI had released a set of safety protocols for video content, a somewhat helpless acknowledgment of the inevitable.
True frontrunners in the AI video sector, Google’s Veo 3 and ByteDance’s Seedance 2.0, are left vying for the consumer video crown. The prevailing paradigm for video capabilities involves their integration into multimodal models: Veo resides within Gemini, Seedance within Doubao, and Wan within Alibaba’s Qwen. The experiment of developing AI video as a standalone product has effectively reached a provisional conclusion, signaled by the failure of Sora.
OpenAI CEO Sam Altman has informed employees that discontinuing Sora will free up resources to be allocated toward the next generation of AI models. OpenAI’s strategic pivot is now crystal clear: leveraging the Codex application in conjunction with ChatGPT and its browser interface to construct a “desktop super-app,” thereby refocusing its core efforts on enterprise and developer clients. Anthropic has already secured a significant head start in this arena.
Is OpenAI “Returning” to Language Models?
On the surface, the failure of Sora suggests that OpenAI is reverting to work on LLMs, where it started out. While OpenAI pioneered the path toward AGI using language models, it is actually Anthropic, founded by a group of former OpenAI defectors, that has truly remained steadfast in its commitment to language models and may well have already successfully paved that path.
However, toward the end of last year, the AI community was briefly permeated by a wave of criticism directed at LLMs, which subsequently sparked a surge of enthusiasm for “World Models.” For years, Turing Award laureate Yann LeCun has consistently argued that LLMs, being based solely on the singular form of knowledge that is text, deceive us into believing they possess intelligence. In reality, their understanding of the physical world remains extremely superficial. He has characterized LLMs as merely an “exit ramp,” a “distraction,” or a “dead end” on the long road toward achieving human-level intelligence. His arguments hold considerable weight: although LLMs are capable of passing bar exams and solving mathematical equations, we have yet to develop a household robot that can rival the capabilities of a common cat; after all, language is merely a serialized representation of thought—a relatively low-dimensional and discrete conceptual space.
In late 2025, Yann LeCun departed Meta to establish AMI Labs, a venture dedicated to research on World Models, which has already secured funding at a valuation of 3 billion euros. During his keynote address at NVIDIA’s GTC conference, he explicitly stated: “Simply scaling up [models] will not enable us to achieve AGI.” However, Yann LeCun’s critique suffers from a critical blind spot: his target is the static, closed, and purely text-based predictive LLM—not the *agentic* LLM embedded within an action-perception-feedback loop. Fundamentally, these are two vastly different systems. The failure of Sora serves as concrete proof that multimodal expansion is not the key to solving the puzzle; conversely, the rise of coding agents is responding to his challenge in an entirely different manner.
Anthropic: Coding Represents the Embodiment of LLMs in the Symbolic World—A Step Toward AGI
Anthropic has consistently adhered to a language-model-centric approach; rather than engaging directly with images and video internally, it relies on calling upon external applications or skills to execute such tasks. Claude Code even adopted a retro style by utilizing a Command Line Interface (CLI). While outsiders once viewed this as a conservative stance, in retrospect, it appears to have been prescient.
On SWE-bench Verified (a benchmark evaluating real-world software engineering problems from GitHub), Claude Opus 4.5 achieved a score of 80.9%, surpassing both GPT-5.1 and Gemini 3 Pro. Yet, this represents more than just an engineering feat; it is underpinned by a profound cognitive logic:
Code constitutes a formalized world characterized by “ground truth” feedback—compilers do not lie; tests either pass or fail, providing unambiguous feedback rooted in reality; and the debugging process compels the model to undergo a complete cycle of perception, action, observation, and correction. This serves as a direct, albeit partial, rebuttal to Yann LeCun’s core accusation—that LLMs lack perceptual feedback and are devoid of anchors within the physical world. Through the utilization of tools and the execution of code, coding agents achieve a form of *embodiment* within the symbolic realm.
Claude Opus 4.5 also demonstrated an even more remarkable capability: the capacity to autonomously refine its own performance. It reached peak performance within just four iterations—a level of quality that other models failed to attain even after ten iterations—and exhibited the ability to learn from experience across different tasks, retaining insights and applying them in subsequent contexts. This is not merely a matter of scale; rather, it represents a novel capability that has *emerged* through the agentic framework, all while the underlying architecture remains unchanged.
Empirical data corroborates the accelerating trajectory of this trend: the execution time for autonomous AI tasks is currently doubling every 4 to 7 months (a pace that has recently accelerated to approximately every 4 months). Within 30 minutes, the system can automatically complete code snippets; within 4.8 hours, it can refactor an entire software module; and for tasks spanning multiple days, it is capable of automating a complete code audit. This represents an emergent capability curve entirely distinct from the scaling laws observed in pure text-based models.
OpenAI’s Realization and Reorganization
OpenAI, too, has come to a realization: coding is precisely the pathway to achieving AGI within the cognitive domain. It articulated the rationale behind the launch of Codex as follows: everything is governed by code; the more adept an agent becomes at reasoning about and generating code, the more capable it will be across all forms of technical and knowledge-intensive work.
Ethan Mollick, a professor at the Wharton School who studies AI and innovation, has observed a divergence across three distinct strategic paths: Anthropic focuses exclusively on language models; OpenAI constantly experiments with—and subsequently discards—various concepts; while Google attempts to do a little bit of everything. He remains unsure which of these endgames is ultimately the most optimal. However, reality has already provided an answer: currently, GPT, xAI, and China’s open-source models are all emulating Anthropic’s approach—placing increased emphasis on coding, intelligent agents, and enterprise services. The rivalry between Codex and Claude Code has emerged as the central battleground in the current AI race.
Prior to the discontinuation of Sora, OpenAI had already launched the Codex application and completed the pre-training of its next model, “Spud,” which is slated for release within a matter of weeks. OpenAI’s leadership structure has also undergone adjustments: Fidji Simo’s title has shifted from “CEO of Applications” to “CEO of AGI Deployment.” Furthermore, the Alignment and Safety teams will no longer report directly to Altman. All of this seems to suggest that OpenAI believes it has—from a technical standpoint—already achieved AGI; the Spud model is expected to serve as the definitive marker of this milestone—signifying an AI capable of generating economic value within the cognitive domain at a level equivalent to that of humans. Its next objective will be to realize AI capabilities within the physical world. Bill Peebles, who rose to sudden prominence thanks to his work on Sora, will now lead his team in pivoting toward robotics projects.
The core challenge of the present moment has shifted: it is no longer a question of what intelligent agents are capable of doing, but rather how humans can effectively command, supervise, and collaborate with them at scale. Key components of systems such as Nvidia’s Rubin CPX and Groq’s 3-series inference chips were originally designed with video generation in mind; consequently, OpenAI’s decision to scrap its video projects may also impact its future projections regarding the scale of its required computing infrastructure. Revisiting the Paths to LLMs and AGI
The withdrawal of Sora and the launch of Codex collectively point toward a clearer theoretical landscape:
Yann LeCun was right on one count: pure autoregressive text prediction cannot lead to general intelligence capable of navigating the physical world. Sora’s failure also served as a market test—and a validation of sorts—for the strategy of achieving general intelligence through the brute-force scaling of multimodal models.
Anthropic was right on another count: LLMs should not be viewed as closed systems, but rather as a “cognitive substrate.” When embedded within an infrastructure that facilitates tool use, memory management, multi-agent collaboration, and code execution, the emergent capabilities of language models can transcend the limitations identified by LeCun. This breakthrough occurs not because the underlying architecture has changed, but because the structure of the information flow has been fundamentally altered.
Underlying this perspective is a specific technical-philosophical premise: Code—as the most formalized and verifiable expression of human cognition—serves as the critical bridgehead enabling LLMs to transcend the boundaries of mere perception. An AI capable of autonomously writing, testing, debugging, and optimizing tools is, in effect, demonstrating a form of causal understanding, even if that understanding manifests within a symbolic space rather than the physical world. Within circles focused on achieving AGI, a long-standing hypothesis is now finding preliminary validation: the entity that possesses the most advanced coding model will be the one to reach AGI the fastest.
The clamor surrounding the notion that AI video will “disrupt Hollywood” can, for the time being, be put to rest. Hugging Face founder Clément argues that simply discarding Sora would be a regrettable waste; instead, he suggests, it would be far better to open-source the technology, thereby making a valuable contribution to the broader AI video community. Many others, however, view Sora’s exit as a positive development; they argue that while it consumed vast resources, its primary output was merely a potent tool for generating “deepfakes”: a technology whose overall societal impact appears to be far more negative than positive.
This situation leaves all AI technology companies facing a fundamental, probing question: Is everything we create with AI, simply by virtue of being “state-of-the-art”, necessarily valuable and beneficial to society?


