What will be the top AI model this week?
Short Answer
1. Executive Verdict
- The "Great Model Rush" defines current intense AI competition.
- Claude Opus 4.6 immediately set new benchmarks, featuring huge context windows.
- OpenAI launched GPT-5.3-Codex-Spark, strategically diversifying hardware from NVIDIA.
- LMArena Elo ratings reveal significant shifts in the AI competitive landscape.
- Claude Opus 4.6 holds an availability advantage over preview-status Gemini 3 Pro.
- Leaderboard updates on February 13-14 will be key market catalysts.
Who Wins and Why
| Outcome | Market | Model | Why |
|---|---|---|---|
| claude-opus-4-6 | 4.0% | 2.9% | This model is not currently expected to emerge as the top AI model this week. |
| gemini-3-pro | 1.0% | 5.0% | This model is not currently expected to emerge as the top AI model this week. |
| claude-opus-4-6-thinking | 96.0% | 89.1% | This model is the overwhelming market favorite to be the top AI model this week. |
| gpt-5.1-high | 1.0% | 0.5% | This model is not currently expected to emerge as the top AI model this week. |
| grok-4.1-thinking | 1.0% | 0.5% | This model is not currently expected to emerge as the top AI model this week. |
Current Context
2. Market Behavior & Price Dynamics
Historical Price (Probability)
3. Significant Price Movements
Notable price changes detected in the chart, along with research into what caused each movement.
📉 February 12, 2026: 17.0pp drop
Price decreased from 23.0% to 6.0%
Outcome: claude-opus-4-6
📉 February 10, 2026: 9.0pp drop
Price decreased from 20.0% to 11.0%
Outcome: claude-opus-4-6
📉 February 09, 2026: 74.0pp drop
Price decreased from 89.0% to 15.0%
Outcome: claude-opus-4-6
4. Market Data
Contract Snapshot
This market resolves to YES if a specific AI model is determined to be the "top AI model this week," and to NO if no such model is identified. The market pertains to the current week, with the year 2026 also mentioned. Specific criteria for determining the "top AI model" and any special settlement conditions are not detailed in the provided content.
Available Contracts
Market options and current pricing
| Outcome bucket | Yes (price) | No (price) | Implied probability |
|---|---|---|---|
| claude-opus-4-6-thinking | $0.96 | $0.05 | 96% |
| claude-opus-4-6 | $0.04 | $0.98 | 4% |
| ernie-5.0-0110 | $0.01 | $1.00 | 1% |
| gemini-3-pro | $0.01 | $1.00 | 1% |
| glm-4.6 | $0.01 | $1.00 | 1% |
| gpt-5.1-high | $0.01 | $1.00 | 1% |
| grok-4.1-thinking | $0.01 | $1.00 | 1% |
| mistral-large-3 | $0.01 | $1.00 | 1% |
| qwen3-max-preview | $0.01 | $1.00 | 1% |
Market Discussion
People are actively discussing and debating the "top AI model this week" amidst a crowded field of new releases and specialized advancements [^]. Prediction markets currently show strong favor for Anthropic's `claude-opus-4-6-thinking` as the top-ranked AI model for the week ending February 14, 2026 [^]. This comes during an unprecedented "Model Rush" in February 2026, with major launches including Google's `Gemini 3 Pro GA`, OpenAI's `GPT-5.3`, xAI's `Grok 4.20`, and various Chinese models like `Qwen 3.5`, creating intense competition and pushing AI capabilities in areas like agentic planning, real-time awareness, and specialized coding [^]. Beyond specific models, the debate extends to the efficacy of large, general-purpose models versus smaller, specialized AI tools, as well as the societal impact of AI, particularly concerning job displacement and ethical considerations [^]. Some discussions also anticipate future innovations beyond current Large Language Models (LLMs), suggesting they are not the final form of AI technology [^].
5. Which AI Models Lead Preliminary Elo Ratings in 2026?
| Claude Opus 4.6 Elo Rating | ~1490–1503 [^] |
|---|---|
| Gemini 3 Pro GA Elo Rating | ~1486–1492 [^] |
| GPT-5.2 Elo Rating (Incumbent) | ~1465–1473 [^] |
6. What Are the Key Adoption Barriers for Gemini 3 Pro and Claude Opus 4.6?
| Gemini 3 Pro Status | Preview [^] |
|---|---|
| Claude Opus 4.6 Status | General Availability (GA) on February 5, 2026 [^] |
| Gemini 3 Pro Base Token Cost | Approximately 60% lower than Claude Opus 4.6 [^] |
7. What Critical Reasoning Failures Plague Claude Opus 4.6 and Gemini 3 Pro?
| Claude Opus 4.6 Sabotage Hiding Success Rate | 18% [^] |
|---|---|
| Claude Opus 4.6 Injection Attack Success Rate | 50% [^] |
| Gemini 3 Deep Think ARC-AGI-2 Score | 45.1% [^] |
8. How Do Qwen 3.5 and GLM-5 Reshape the Open-Source LLM Landscape?
| GLM-5 Open-Source Date | February 11, 2026 [^] |
|---|---|
| GLM-5 Parameters | 744 billion total / 40 billion active (MoE) [^] |
| GLM-5 Leaderboard Rank | #1 among open-weight models (Artificial Analysis) [^] |
9. Does LMSys Chatbot Arena Have a Data Cutoff for Markets?
| Official Data Cutoff | Not officially defined by LMSys; platform operates continuously [^] |
|---|---|
| Leaderboard Update Frequency | Dynamic, near real-time or daily intervals [^] |
| Feb 13, 2026 Votes | Fully incorporated into Elo ratings before Feb 14 market resolution [^] |
10. What Could Change the Odds
Key Catalysts
Key Dates & Catalysts
- Expiration: February 14, 2026
- Closes: February 14, 2026
11. Decision-Flipping Events
- Trigger: Key bullish catalysts that could influence the prediction market include positive updates to the LM Arena Leaderboard between February 13-14, 2026, particularly if "Claude Opus 4.6 (thinking)" solidifies its lead or significantly improves its ranking [^] .
- Trigger: Other potential drivers are an unexpected performance leap by a new iteration of models like Gemini, GPT, or an emerging competitor such as Liquid LFM 2.5, which would need to be quickly integrated and demonstrably outperform current leaders on LM Arena [^] .
- Trigger: A critical third-party endorsement from a highly reputable AI research body or influential industry figure, released last-minute and positioning a specific model as superior based on LM Arena-relevant metrics, could also have a significant impact [^] .
- Trigger: Conversely, bearish catalysts involve a decline in the ranking of "Claude Opus 4.6 (thinking)" or any other leading model on the LM Arena Leaderboard [^] .
13. Historical Resolutions
Historical Resolutions: 50 markets in this series
Outcomes: 4 resolved YES, 46 resolved NO
Recent resolutions:
- KXTOPMODEL-26FEB07-QWEN3: NO (Feb 07, 2026)
- KXTOPMODEL-26FEB07-MIST: NO (Feb 07, 2026)
- KXTOPMODEL-26FEB07-GROK: NO (Feb 07, 2026)
- KXTOPMODEL-26FEB07-GPT5: NO (Feb 07, 2026)
- KXTOPMODEL-26FEB07-GPT: NO (Feb 07, 2026)
Get Real-Time Research Updates
Sign up for early access to live reports, historical data, and AI-powered market insights delivered to your inbox.