Top AI model this week?
Short Answer
1. Executive Verdict
- Alibaba Cloud's Qwen-3-Max achieved a record MMLU benchmark score.
- Major competitors like DeepMind or Meta might release a new model.
- New regulatory risks and semantic evasion exploits threaten the incumbent leader.
- Market resolution relies exclusively on the crowdsourced LM Arena Leaderboard.
Who Wins and Why
| Outcome | Market | Model | Why |
|---|---|---|---|
| claude-opus-4-6 | 88.0% | 83.3% | Maintains market lead despite new competitor MMLU scores and potential new model releases. |
| claude-opus-4-6-thinking | 14.0% | 12.9% | Maintains strong position despite new competitor MMLU scores and potential new model releases. |
| ernie-5.0-0110 | 1.0% | 0.5% | Faces strong competition from the market leader and risk of unannounced model releases. |
| dola-seed-2.0-preview | 1.0% | 0.5% | Faces strong competition from the market leader and risk of unannounced model releases. |
| gemini-3-pro | 1.0% | 0.5% | Faces strong competition from the market leader and risk of unannounced model releases. |
2. Market Behavior & Price Dynamics
Historical Price (Probability)
3. Significant Price Movements
Notable price changes detected in the chart, along with research into what caused each movement.
📈 February 20, 2026: 16.0pp spike
Price increased from 73.0% to 89.0%
Outcome: claude-opus-4-6
📈 February 19, 2026: 62.0pp spike
Price increased from 10.0% to 72.0%
Outcome: claude-opus-4-6
4. Market Data
Contract Snapshot
Based on the provided page content, the specific criteria for a YES or NO resolution are not detailed. The market asks "Top AI model this week?" but does not define what constitutes the "Top AI model" or how it would be measured. The only key date mentioned is "2026," with no further deadlines or special settlement conditions specified.
Available Contracts
Market options and current pricing
| Outcome bucket | Yes (price) | No (price) | Implied probability |
|---|---|---|---|
| claude-opus-4-6 | $0.88 | $0.13 | 88% |
| claude-opus-4-6-thinking | $0.14 | $0.87 | 14% |
| dola-seed-2.0-preview | $0.01 | $1.00 | 1% |
| ernie-5.0-0110 | $0.01 | $1.00 | 1% |
| gemini-3-pro | $0.01 | $1.00 | 1% |
| glm-4.6 | $0.01 | $1.00 | 1% |
| gpt-5.1-high | $0.01 | $1.00 | 1% |
| grok-4.1-thinking | $0.01 | $1.00 | 1% |
| mistral-large-3 | $0.01 | $1.00 | 1% |
| qwen3-max-preview | $0.01 | $1.00 | 1% |
Market Discussion
Discussions around the "Top AI model this week" in February 2026 largely center on the performance and capabilities of Anthropic's Claude Opus 4.6, Google's recently announced Gemini 3.1 Pro, and OpenAI's GPT-5.3-Codex, with benchmarks heavily scrutinizing their coding, reasoning, and multi-modal abilities [^]. Beyond raw scores, debates are shifting towards how well these models fit specialized enterprise tasks, the emergence of more autonomous and cost-efficient "agentic" AI, and the critical need for robust governance as AI systems increasingly act as independent workers [^]. Prediction markets notably indicate strong sentiment for Claude Opus 4.6-thinking as the leading AI model for the current week [^].
5. What AI Performance Metric Drove Qwen-3-Max Market Fluctuation?
| Qwen-3-Max MMLU Score | 95.7% (Alibaba Cloud, February 18, 2026 [^]) |
|---|---|
| Verified MMLU Score | 95.5% 0.2% (Stanford AI Lab, February 20, 2026 [^]) |
| Prediction Market Odds | Shifted from 10% to 89% (February 19-20, 2026) [^] |
6. What is the Likelihood of an AI Model Leapfrog Before February 21?
| LMArena Leader | Claude Opus 4.6 (Elo 1505) [^] |
|---|---|
| Intelligence Index Tier | Gemini 3.1 Pro Preview (Joint highest) [^] |
| Claude Opus Specialization | Leads in SWE-bench (64.8% to 80.8%) [^] |
7. What Risks Threaten the Leading AI Model's Top Status?
| EU AI Act HRAI Enforcement | August 2, 2026 [^] |
|---|---|
| AI Governance Market Projection | USD 20 billion in 2026 [^] |
| RLI Exploit Success Rate | Over 80% (CARC) [^] |
8. How Does Kalshi Determine The Top AI Model Rankings?
| Primary Resolution Source | LMSYS Org's LM Arena Leaderboard [^] |
|---|---|
| Key Resolution Metric | Rank (UB) on LM Arena [^] |
| Required Leaderboard Setting | 'Remove Style Control' toggle enabled [^] |
9. When Do Weekly Rankings Become Functionally Locked for Resolution?
| Potential Resolution Delay | Up to two weeks post-period [^] |
|---|---|
| Hard Submission Deadline Example | January 20, 2026 (for Late-Breaking Science) [^] |
| Roster Lock Cooldown Example | February 21, 2026 (NACL Qualifier) [^] |
10. What Could Change the Odds
Key Catalysts
Key Dates & Catalysts
- Expiration: February 21, 2026
- Closes: February 21, 2026
11. Decision-Flipping Events
- Trigger: Given the extremely short timeframe until the market's settlement on February 21, 2026, at 3:00 PM UTC, pre-scheduled events capable of drastically altering the "Top AI model this week?" prediction market are highly unlikely to occur or be publicly known.
- Trigger: Most major AI developments typically have longer lead times for announcements and public dissemination.
- Trigger: Therefore, any significant market movement would hinge entirely on unforeseen and impactful breaking news within the next 24-36 hours.
- Trigger: Potential bullish catalysts that could push a "YES" outcome higher include the unexpected release of a superior AI model from a major lab, a breakthrough performance announcement with significantly improved benchmark results, or major adoption of a specific model by a key player.
13. Historical Resolutions
Historical Resolutions: 50 markets in this series
Outcomes: 4 resolved YES, 46 resolved NO
Recent resolutions:
- KXTOPMODEL-26FEB14-CLAUT: YES (Feb 14, 2026)
- KXTOPMODEL-26FEB14-QWEN: NO (Feb 14, 2026)
- KXTOPMODEL-26FEB14-MIST: NO (Feb 14, 2026)
- KXTOPMODEL-26FEB14-GROK: NO (Feb 14, 2026)
- KXTOPMODEL-26FEB14-GPT: NO (Feb 14, 2026)
Get Real-Time Research Updates
Sign up for early access to live reports, historical data, and AI-powered market insights delivered to your inbox.