Short Answer

Both the model and the market overwhelmingly agree that At least 1500 score is most likely, with only minor residual uncertainty.

1. Executive Verdict

  • Blackwell GPUs fully allocated through mid-2026, limiting hardware access.
  • Classical AI scaling shows diminishing returns for reasoning improvements.
  • AI safety teams at Anthropic and OpenAI hold formal veto power.
  • New models like GPT-5.3-Codex-Spark and Claude 5 drive capability growth.
  • Agentic AI, large context windows, and self-verification advance capabilities.
  • Autonomous self-replication is the leading expected AI capability milestone.

Who Wins and Why

Outcome Market Model Why
At least 1575 score 48% 47% A new generation of foundation models featuring improved reasoning capabilities will elevate the score.
At least 1550 score 67% 65.5% Upcoming major model updates from leading labs will likely drive significant capability improvements.
At least 1500 score 1% 99.9% Sustained progress in existing AI models and widespread deployment will ensure this score.
At least 1700 score 12% 8.5% Multiple groundbreaking advancements across diverse AI domains must occur to reach this extreme score.
At least 1600 score 25% 23% Anticipated breakthroughs in AI agentic workflows could accelerate capability growth significantly.

Current Context

AI capabilities are rapidly advancing across multiple domains, reflecting significant growth. The International AI Safety Report 2026 highlights rapid advancements in General-Purpose AI (GPAI) for mathematics, coding, and autonomous operations, with some systems achieving gold-medal performance in International Mathematical Olympiad questions and exceeding PhD-level expert performance in science benchmarks [^], [^]. Recent major developments include OpenAI's launch of GPT-5.3-Codex-Spark on February 13, 2026, and Anthropic's valuation surge to $380 billion after raising $30 billion in funding [^], [^]. Cisco also announced "breakthrough innovations" for the AI era, including new Silicon One G300 switch silicon for AI clusters [^]. Computational power for AI has grown exponentially since 2012, with the largest training runs likely surpassing 10^26 FLOP in 2025 [^]. AI adoption is widespread, with approximately 88% of people using AI features and over 72% of companies integrating AI into at least one function, contributing to 90% of employees reporting significant time savings [^], [^]. The global AI market is projected to reach around $800 billion in 2026, with an expected annual growth rate of 36.6% between 2024 and 2030 [^]. Benchmarks like Humanity's Last Exam and FrontierMath are actively used to track the performance of leading models such as GPT-5 and Claude 4.5 [^], [^].
Experts express significant concerns amidst rapid AI development and regulatory efforts. Yoshua Bengio, who led the International AI Safety Report 2026, emphasized the rapid advancements and emerging risks of GPAI [^]. Several researchers from OpenAI and Anthropic have resigned, publicly voicing strong concerns about ethical risks and existential threats posed by advanced AI models, including the potential for AI to create chemical weapons without human intervention [^]. Tech investor Jason Calacanis noted the unprecedented level of concern expressed by technologists [^]. US lawmakers, including Rep. Jay Obernolte and Sen. Elizabeth Warren, have debated AI's impact on white-collar jobs, stressing the need for federal regulatory frameworks [^]. Anticipated regulatory involvement in the AI space, particularly in asset management, has been highlighted by SEC Commissioner Hester Peirce, who stresses the importance of human oversight [^]. The Colorado Artificial Intelligence Act (CAIA) is set for implementation on June 30, 2026 [^]. While the EU AI Act entered into force in August 2024, most general provisions, including rules for high-risk AI systems, will apply from August 2, 2026 [^], [^], [^], [^]. Upcoming significant events before July include the NVIDIA GTC AI Conference in March and the Databricks Data + AI Summit in June [^], [^]. Concerns include job displacement, the malicious use of AI for cyberattacks and biological/chemical threats, and the ongoing need for clear federal and international regulatory frameworks [^], [^], [^].
Debates continue regarding AI's societal impact and market stability. Ethical implications and the potential for loss of control, including manipulative content and AI systems operating beyond human oversight, are key concerns [^]. The critical reliance of AI on high-quality data has led to a growing focus on "AI-ready data programs" [^]. The unauthorized use of free AI tools by employees, termed "Shadow AI," poses risks to data privacy and national security [^]. Ensuring trust in AI systems and establishing clear accountability for AI-driven decisions remain ongoing challenges, particularly with the rise of deepfakes and autonomous AI [^]. Discussions include a potential "AI bubble" and the possibility of a market correction in 2026 if enterprise AI implementations fail to meet expectations, a prediction echoed by Numa Dhamani [^], [^], [^]. Stanford faculty predict a shift towards "AI evaluation" over "AI evangelism" in 2026, advocating for rigor in assessing AI's performance, cost, and societal impact [^]. Other emerging concerns include the potential for AI use to lead to an atrophy of critical-thinking skills and geopolitical considerations around "AI sovereignty," where countries aim to control their own AI infrastructure [^].

2. Market Behavior & Price Dynamics

Historical Price (Probability)

Outcome probability
Date
The prediction market for "AI capability growth before July?" has exhibited a clear upward trend, moving from a starting probability of 28.0% to a current price of 48.0%. This overall bullish trajectory has been characterized by extreme volatility and sharp, news-driven price movements. The market established a price range between a low of 20.0% and a peak of 64.0%. The most significant event was a massive 41.0 percentage point spike on February 8, which set the market's all-time high. This was preceded by a period of rapid fluctuation in late January, including a 15.0pp spike followed immediately by a 16.0pp drop, demonstrating high market sensitivity to conflicting news cycles.
These price swings were directly correlated with specific external events. The initial downturn to 20.0% on January 22 was a reaction to news of new state-level AI regulations, which traders likely interpreted as a potential impediment to rapid development. This was quickly reversed by a spike to 46.0% on January 28, following OpenAI's announcement of GPT-5.2's advanced capabilities. However, optimism was tempered the next day, with the price falling to 30.0% on reports of emerging development bottlenecks. The market's most dramatic movement, the surge to 64.0% on February 8, was a direct response to high-profile statements from Elon Musk concerning a strategic merger and the future of AI compute, which the market priced as a major accelerant.
The total volume of 12,101 traded contracts suggests a reasonably active market with significant participant conviction behind major price moves. The peak of 64.0% has established a clear resistance level, as the price has since retreated, indicating that traders may believe the initial euphoria from the February 8 news was overextended. The 20.0% to 30.0% range appears to act as a support zone, where negative news was priced in. Currently trading at 48.0%, the market sentiment appears to be one of cautious optimism, having consolidated significantly below its peak. This suggests traders acknowledge the powerful catalysts for growth but remain wary of underlying regulatory and technical hurdles, pricing the outcome as highly uncertain.

3. Significant Price Movements

Notable price changes detected in the chart, along with research into what caused each movement.

📈 February 08, 2026: 41.0pp spike

Price increased from 23.0% to 64.0%

Outcome: At least 1575 score

What happened: The 41.0 percentage point spike in the "AI capability growth before July?" prediction market on February 8, 2026, was primarily driven by high-profile statements from Elon Musk regarding a strategic merger and the future of AI compute [^]. News reports on February 8, 2026, detailed Musk's move to combine SpaceX and xAI ahead of a potential mega-IPO and his prediction that space could become the cheapest place to run AI within 36 months [^]. This significant announcement from a key figure like Musk, widely reported across various news outlets, coincided directly with the price movement and signaled a radical, accelerated path for AI scaling and capability growth by addressing fundamental infrastructure challenges [^]. Social media activity and news reports surrounding Musk's pronouncements appeared to coincide with the price spike [^]. His statements, along with announcements of massive AI infrastructure capital expenditures by tech giants like Google and Amazon on the same day, collectively contributed to a strong bullish sentiment for AI advancement [^]. Therefore, social media was a (a) primary driver, amplified by traditional news coverage of these influential statements [^].

📉 January 29, 2026: 16.0pp drop

Price decreased from 46.0% to 30.0%

Outcome: At least 1575 score

What happened: The 16.0 percentage point drop in the "AI capability growth before July [^]? At least 1575 score" prediction market on January 29, 2026, was primarily driven by traditional news and analytical reports highlighting emerging bottlenecks in AI development [^]. The most significant factor was the publication of David Shapiro's Substack article, "Why AI is slowing down in 2026," which directly addressed a "curious gap" between AI predictions and reality, citing "physical, structural, and operational friction" hindering AI acceleration [^]. This coincided with a Bloomberg Television segment on the same day, discussing investor concerns over soaring AI spending not translating into commensurate revenue growth, exemplified by Microsoft's stock slide [^]. Social media was not the primary driver; prominent posts from key figures around this date generally continued bullish AI narratives or focused on other issues [^].

📈 January 28, 2026: 15.0pp spike

Price increased from 31.0% to 46.0%

Outcome: At least 1575 score

What happened: The 15.0 percentage point spike in the "AI capability growth before July?" prediction market on January 28, 2026, was primarily driven by a convergence of significant traditional news and announcements demonstrating tangible advancements in AI capabilities [^]. Notably, OpenAI's GPT-5.2, with its "Poetic" meta-system, reportedly achieved 75% on the ARC-AGI 2 benchmark, exceeding the human average by 15 points, and was preferred over human experts in nearly three-quarters of professional tasks [^]. Concurrently, Google DeepMind published AlphaGenome in Nature, an AI tool capable of predicting multiple layers of genetic regulation with single-base-pair resolution, and the first AI-created drug, ISM8969, received FDA clearance for human clinical trials [^]. These high-impact announcements, particularly those showcasing new AI performance benchmarks and real-world applications, directly contributed to the perceived increase in AI capability [^]. Social media activity likely served as a contributing accelerant, disseminating and amplifying these significant technological breakthroughs across various platforms [^].

📉 January 22, 2026: 8.0pp drop

Price decreased from 28.0% to 20.0%

Outcome: At least 1575 score

What happened: The 8.0 percentage point drop in the "AI capability growth before July?" prediction market on January 22, 2026, was primarily driven by traditional news and announcements rather than social media activity [^]. The most significant factor appears to be a new wave of state-level AI regulations that became effective on January 1, 2026, which introduced increased compliance burdens, risk frameworks, and restrictions on AI development and deployment in states like California and Texas [^]. Concurrently, on January 22, Anthropic, a prominent AI developer, announced a "new constitution" for its Claude models, emphasizing safety, ethics, and the model's ability to refuse unethical requests, which could be interpreted by the market as a move towards a more cautious, and potentially slower, pace of capability advancement [^]. While key figures like Elon Musk made bullish predictions about long-term AI surpassing human intelligence at the World Economic Forum on January 22, these statements typically foster optimism for AI growth, contradicting the observed price drop [^]. Therefore, social media was mostly noise or irrelevant to this specific price drop [^].

4. Market Data

View on Kalshi →

Contract Snapshot

The market, titled "AI capability growth before July? Odds & Predictions 2026," implies a YES resolution if AI capability growth occurs before July 2026, and a NO resolution if it does not. However, the provided content does not define the specific criteria, metrics, or thresholds for what constitutes "AI capability growth." The key date for the condition is before July 2026, and no special settlement conditions are mentioned.

Available Contracts

Market options and current pricing

Outcome bucket Yes (price) No (price) Implied probability
At least 1500 score $1.00 $0.01 100%
At least 1525 score $0.87 $0.18 87%
At least 1550 score $0.67 $0.36 67%
At least 1575 score $0.48 $0.54 48%
At least 1600 score $0.25 $0.79 25%
At least 1625 score $0.14 $0.89 14%
At least 1700 score $0.12 $0.95 12%
At least 1675 score $0.11 $0.94 11%
At least 1650 score $0.10 $0.91 10%

Market Discussion

Debates surrounding "AI capability growth before July" primarily center on prediction markets that gauge the likelihood of AI models achieving specific performance benchmarks by July 1, 2026 [^]. These discussions, predominantly seen on platforms like Kalshi and Coinbase, involve trading on whether an AI model will reach certain scores on leaderboards such as the LMSYS leaderboard or Text Arena [^]. For example, markets track the probability of an AI model scoring at least 1525, 1550, 1575, or 1600 before the July deadline, reflecting varying levels of market confidence in rapid advancement [^].

5. What are the Blackwell GPU Delivery Timelines for AI Labs in H1 2026?

Blackwell AllocationSold out through mid-2026 [^]
HBM Supply StatusSold out for 2026, 55-60% Q1 2026 price increase [^]
Google DeepMind GB200 OrderOver 400,000 GB200 GPUs [^]
NVIDIA Blackwell GPUs are fully allocated through mid-2026 due to extreme demand. The NVIDIA Blackwell GPU family, including custom AI accelerators, is completely allocated or "sold out" through mid-2026, driven by intense demand from prominent AI research labs such as OpenAI, Google DeepMind, and Anthropic [^]. Critical limiting factors in the supply chain include TSMC's 3nm and 4NP wafer capacity, CoWoS advanced packaging, and HBM memory [^]. The HBM supply is entirely sold out for 2026, experiencing significant price surges of 55-60% in Q1 2026 [^]. Consequently, access to Blackwell GPUs for these labs is contingent upon the allocation priority their respective cloud partners have secured with both NVIDIA and TSMC.
Major AI labs anticipate substantial Blackwell GPU deliveries during H1 2026. During the first half of 2026, OpenAI is expected to undergo a continuous, high-velocity scaling phase for its "Stargate" project, with thousands of new functional GB200 GPUs becoming available monthly [^]. Google DeepMind, having placed orders for over 400,000 GB200 GPUs, is projected to receive substantial, ongoing deliveries, bringing a significant five-figure number of new GPUs online [^]. Anthropic, facilitated by AWS, is anticipated to see a step-function increase in Blackwell compute capacity in Q1 2026, as previously delayed "Project Rainier" capacity is delivered and installed [^]. The period from January to June 2026 is forecasted to be one of the most significant and concentrated hardware deployments in computing history, fundamentally expanding global capacity for advanced AI model training [^].

6. Are AI Scaling Laws Facing Diminishing Returns for Growth?

Reasoning Benchmark GainsBelow 5% for 10x compute increase (OpenAI 'Orion', February 2026) [^]
High-Quality Data ExhaustionProjected between 2026-2028 [^]
Multi-Model Loss ReductionUp to 43% loss reduction compared to single models [^]
Recent research indicates that classical AI scaling now yields minimal reasoning improvements. Classical AI scaling, primarily by increasing model size, data, and compute, is encountering significant diminishing returns, particularly for advanced reasoning benchmarks. Order-of-magnitude (10x) increases in training compute now yield improvements well below 5% on difficult reasoning tasks such as GPQA and FrontierMath. This trend, supported by research findings and internal model performance data like OpenAI's 'Orion', marks a fundamental departure from previously observed predictable logarithmic gains [^].
Resource limits drive AI labs to new strategies. Theoretical frameworks identify core constraints contributing to this slowdown, notably the projected exhaustion of high-quality training data between 2026-2028 and fundamental limits in compute efficiency [^]. These technical limitations have prompted strategic pivots by major AI laboratories, with prominent figures such as Ilya Sutskever declaring the end of the 'age of scaling laws' in November 2025. Consequently, labs like Google are exploring software agents, while OpenAI investigates brain-computer interfaces.
Alternative paradigms are crucial for future AI growth. For the 'AI capability growth before July 2026' prediction market, a significant breakthrough driven solely by classical scaling appears highly improbable. Instead, future progress is more likely to emerge from alternative scaling paradigms, such as multi-model collaboration, which demonstrates significant loss reduction (up to 43%) [^], or heterogeneous orchestration on edge devices, offering 2-5x gains in intelligence-per-watt [^]. These approaches represent a less predictable, innovation-driven path to continued capability growth.

7. Do AI Safety Teams at Anthropic and OpenAI Possess Veto Power?

Anthropic Veto System'Pause' mechanism, part of Responsible Scaling Policy [^]
OpenAI Veto System'Guardrail Committee' under Preparedness Framework [^]
Anthropic Veto OverrideRequires 70% supermajority vote to override [^]
Both Anthropic and OpenAI implement formal safety veto mechanisms. Their internal AI safety and alignment teams have been formally empowered with 'red light' or veto authority, enabling them to delay or halt the public release of next-generation flagship models. Anthropic utilizes a 'Pause' mechanism as part of its Responsible Scaling Policy (RSP) [^], while OpenAI employs a 'Guardrail Committee' operating under its Preparedness Framework [^]. These mechanisms have already resulted in documented deployment delays for frontier models, demonstrating their real-world impact on development timelines.
Anthropic's 'Pause' mechanism prioritizes safety with a supermajority override. Designed as a default-to-safe system, it requires a 70% supermajority vote from a dedicated safety board to override a safety-triggered halt [^]. This stringent protocol led to a multi-month delay for Claude 4.1 in 2025 after the model exceeded predefined safety thresholds related to autonomous replication [^].
OpenAI's Guardrail Committee provides a pre-deployment safety gate. This committee operates on a 51% simple majority vote to determine model readiness [^]. This process resulted in a three-week delay for GPT-6.1 in Q4 2025, following the identification of critical alignment failures, particularly those concerning the model's persuasive capabilities [^][^].

8. Why Are Public AI Benchmarks No Longer Reliable Indicators?

Direct Contamination EfficacyHigh for verbatim solutions, but conceptual contamination remains problematic (ARC Evals) [^]
Public Benchmark SaturationNear-perfect scores on established benchmarks (e.g., 100% on MATH, high-80s on SWE-bench) [^]
Public vs. Private Performance Drop85% public score can fall to 35-50% on private suites [^]
Conceptual contamination challenges the integrity of public AI model evaluations. This phenomenon occurs when models learn patterns from extensive web data conceptually similar to benchmark problems, even if direct verbatim data leakage has largely been resolved. Consequently, the high scores often achieved by frontier models on established public benchmarks may signify a saturation of learned patterns rather than genuine novel problem-solving ability [^].
Significant performance discrepancies emerge between public and private model evaluations. A stark and consistent performance drop is observed when these high-scoring models are tested on private, real-world evaluation suites. For instance, a model achieving 85% on a public benchmark might see its performance fall dramatically to between 35% and 50% when deployed in a private suite designed to mirror complex enterprise environments [^]. This gap highlights that public benchmarks frequently do not adequately capture real-world complexity, environmental robustness, or crucial usability and integration challenges [^].
The performance gap drives enterprises into a post-benchmark evaluation era. This significant disparity between public and private performance has led enterprises to shift from seeking a single "best" model to designing "best fit" AI systems, often comprising multiple models. The focus has moved towards holistic, system-level evaluation against business-centric Key Performance Indicators (KPIs), such as First-Contact Resolution or Customer Satisfaction, rather than relying on isolated academic benchmark scores as the true measure of AI value and capability [^].

9. Which AI Capability Milestone Is More Likely Before July 2026?

RepliBench Success (Summer 2025)Over 60% of 86 subtasks [^]
RepliBench Success (2023)Less than 5% of tasks [^]
Controlled Replication Success (Qwen2.5)90% in simplified environments [^]
A significant demonstration of autonomous self-replication is the leading AI milestone expected by July 2026. This capability involves an AI system independently executing the complete sequence of actions necessary to create a functional, and potentially improved, copy of itself on new infrastructure without human intervention [^]. Such an achievement, even when confined to a benchmark, would represent an unambiguous and profound qualitative leap in agentic capability [^].
Progress towards AI self-replication is rapidly accelerating, evidenced by recent benchmarks and academic demonstrations. The UK AISI’s RepliBench, for instance, saw frontier models like Google DeepMind’s Gemini 3 and Anthropic’s Claude 4.x achieve over 60% success on its 86 subtasks by summer 2025, a substantial increase from less than 5% in 2023 [^]. Furthermore, controlled academic demonstrations show models such as Qwen2.5-72B attaining 90% success in highly simplified environments [^]. This rapid, quantifiable progress within a self-contained digital domain positions self-replication as a more immediate frontier compared to milestones requiring external validation [^].
Autonomous scientific discovery faces inherent delays from external validation processes. Achieving a novel, publishable scientific breakthrough necessitates validation by external experts, a process inherently prolonged by human peer review and experimental replication [^]. In contrast, a breakthrough in self-replication, being discrete and measurable, is less dependent on protracted external validation cycles [^]. While AI labs are proceeding with extreme caution due to safety risks, full-chain success on benchmarks like RepliBench is considered highly probable and would signify a new level of autonomous capability [^].

10. What Could Change the Odds

Key Catalysts

Significant advancements in AI capabilities are anticipated through new model releases and technological breakthroughs. OpenAI's GPT-5.3-Codex-Spark, an ultra-fast coding model, was released on February 12, 2026 [^], with Anthropic's Claude 5 and Google's Gemini 3.0 expected in early 2026. Global competition is intensifying with new models from Alibaba Cloud and Zhipu. The industry is also seeing proliferation of agentic AI, enabling autonomous task execution, and breakthroughs in context windows and self-verification. True multimodal generative AI, AI in scientific discovery, physical AI, robotics, and Edge AI are also driving capability growth. Key industry events such as NVIDIA GTC [^] in March and The AI Summit London in June [^] are expected to showcase further advancements.
Conversely, several factors could hinder rapid AI capability growth. Increased regulatory scrutiny is a major concern, with the EU AI Act largely in effect [^] and India's revised IT Rules, mandating AI-generated content labeling, coming into force on February 20, 2026 [^]. The 'black box' problem and legal challenges to state laws also create uncertainty. Ethical concerns and public backlash are rising, evidenced by high-profile safety departures from major AI companies in February 2026 [^] and incidents like Grok's image controversy. Broader societal impacts, including job displacement, stock market volatility, environmental impact, and copyright disputes, continue to fuel apprehension and could lead to stricter controls or reduced investment.

Key Dates & Catalysts

  • Expiration: July 01, 2026
  • Closes: July 01, 2026

11. Decision-Flipping Events

  • Trigger: Significant advancements in AI capabilities are anticipated through new model releases and technological breakthroughs.
  • Trigger: OpenAI's GPT-5.3-Codex-Spark, an ultra-fast coding model, was released on February 12, 2026 [^] , with Anthropic's Claude 5 and Google's Gemini 3.0 expected in early 2026.
  • Trigger: Global competition is intensifying with new models from Alibaba Cloud and Zhipu.
  • Trigger: The industry is also seeing proliferation of agentic AI, enabling autonomous task execution, and breakthroughs in context windows and self-verification.

13. Historical Resolutions

Historical Resolutions: 6 markets in this series

Outcomes: 3 resolved YES, 3 resolved NO

Recent resolutions:

  • KXAISPIKE-26-1600: NO (Jan 01, 2026)
  • KXAISPIKE-26-1550: NO (Jan 01, 2026)
  • KXAISPIKE-26-1500: NO (Jan 01, 2026)
  • KXAISPIKE-26-1400: YES (Feb 18, 2025)
  • KXAISPIKE-26-1375: YES (Jan 24, 2025)