
Meta Muse Spark: A New Frontier in Multimodal Reasoning
Meta's Superintelligence Labs has unveiled Muse Spark, its first public AI model and a significant departure from the Llama family. This natively multimodal reasoning model introduces multi-agent orchestration and demonstrates strong capabilities in health reasoning and visual understanding.
What is Muse Spark?
Muse Spark is a ground-up rebuild from Meta's new Superintelligence Labs, formed in mid-2025 with Alexandr Wang as chief AI officer. Unlike previous Llama models that stitched vision onto language backbones, Muse Spark was architected from the start to process text, images, and voice simultaneously.
Key features:
- Natively multimodal: Processes visual and textual data in the same latent space
- Tool-use support: Integrates with external tools and APIs
- Visual chain-of-thought: Generates annotations on complex visual inputs
- Contemplating mode: Multi-agent orchestration for enhanced reasoning
Performance Highlights
Muse Spark shows competitive performance across key benchmarks:
| Benchmark | Muse Spark | GPT-5.4 | Gemini 3.1 Pro | Claude Opus 4.6 |
|---|---|---|---|---|
| HealthBench Hard | 42.8 | 40.1 | 20.6 | N/A |
| CharXiv Reasoning | 86.4 | 82.8 | 80.2 | N/A |
| MMMU-Pro | 80.5 | N/A | 82.4 | N/A |
| MedXpertQA | 78.4 | 77.1 | 81.3 | N/A |
| ARC-AGI-2 | 42.5 | 76.1 | 76.5 | N/A |
| SWE-Bench Verified | 77.4 | 57.7 | N/A | 80.8 |
Notable strengths:
- Health reasoning: Leads with 42.8 on HealthBench Hard
- Visual reasoning: Tops CharXiv Reasoning at 86.4
- Token efficiency: Achieves similar performance with ~50% fewer output tokens
Contemplating Mode
Muse Spark's standout feature is Contemplating mode, which orchestrates multiple AI agents in parallel:
- Runs up to 16 agents simultaneously
- Each agent generates independent solutions
- Outputs are synthesized into final response
- Achieved 58% on Humanity's Last Exam with tools
This approach provides superior performance with comparable latency to single-agent deep reasoning.
Architecture & Efficiency
Built over nine months with a completely rebuilt pretraining stack:
- 10x more compute-efficient than Llama 4 Maverick
- Improved data curation and model architecture
- Scaling law experiments guide efficient training
The efficiency gains make frontier-level capabilities more accessible for deployment.
Limitations
Muse Spark has acknowledged gaps:
- Coding: Scores 77.4 on SWE-Bench Verified vs Claude's 80.8
- Abstract reasoning: Trails on ARC-AGI-2 (42.5 vs 76+)
- Agentic tasks: Lower GDPval-AA scores
These gaps are expected to narrow as Meta continues development.
Availability
Muse Spark is available today:
- Meta AI app and meta.ai website
- Rolling out to WhatsApp, Instagram, Facebook, Messenger
- Private API preview for select partners
The Bottom Line
Muse Spark represents Meta's serious entry into frontier AI. While proprietary (unlike Llama), its strength in health reasoning and visual understanding, combined with innovative multi-agent architecture, makes it a compelling option for multimodal applications. The model's efficiency and Contemplating mode point toward a scalable path for personal superintelligence.
For teams building health-adjacent AI, visual analysis tools, or seeking efficient multimodal reasoning, Muse Spark warrants consideration. Its weaknesses in coding are acknowledged by Meta, suggesting future iterations will address these gaps.
Read more

Qwen 3.7 Max: Alibaba's Agent-Grade Reasoning Model
Alibaba's Qwen 3.7 Max is a text-only reasoning flagship with 1M token context, scoring #5 on the Artificial Analysis Intelligence Index and #3 in coding benchmarks.

Multica: Turn AI Agents Into Real Teammates
Multica is an open-source platform that manages AI coding agents as full team members - assigning tasks, tracking progress, and compounding reusable skills across your organization.

Gemini 3.5 Flash: Google's New Frontier Model for Agentic Coding
Google unveils Gemini 3.5 Flash at I/O 2026, delivering frontier performance for agents and coding at unprecedented speed and cost efficiency.