Meta Muse Spark: A New Frontier in Multimodal Reasoning

Meta's Superintelligence Labs has unveiled Muse Spark, its first public AI model and a significant departure from the Llama family. This natively multimodal reasoning model introduces multi-agent orchestration and demonstrates strong capabilities in health reasoning and visual understanding.

What is Muse Spark?

Muse Spark is a ground-up rebuild from Meta's new Superintelligence Labs, formed in mid-2025 with Alexandr Wang as chief AI officer. Unlike previous Llama models that stitched vision onto language backbones, Muse Spark was architected from the start to process text, images, and voice simultaneously.

Key features:

Natively multimodal: Processes visual and textual data in the same latent space
Tool-use support: Integrates with external tools and APIs
Visual chain-of-thought: Generates annotations on complex visual inputs
Contemplating mode: Multi-agent orchestration for enhanced reasoning

Performance Highlights

Muse Spark shows competitive performance across key benchmarks:

Benchmark	Muse Spark	GPT-5.4	Gemini 3.1 Pro	Claude Opus 4.6
HealthBench Hard	42.8	40.1	20.6	N/A
CharXiv Reasoning	86.4	82.8	80.2	N/A
MMMU-Pro	80.5	N/A	82.4	N/A
MedXpertQA	78.4	77.1	81.3	N/A
ARC-AGI-2	42.5	76.1	76.5	N/A
SWE-Bench Verified	77.4	57.7	N/A	80.8

Notable strengths:

Health reasoning: Leads with 42.8 on HealthBench Hard
Visual reasoning: Tops CharXiv Reasoning at 86.4
Token efficiency: Achieves similar performance with ~50% fewer output tokens

Contemplating Mode

Muse Spark's standout feature is Contemplating mode, which orchestrates multiple AI agents in parallel:

Runs up to 16 agents simultaneously
Each agent generates independent solutions
Outputs are synthesized into final response
Achieved 58% on Humanity's Last Exam with tools

This approach provides superior performance with comparable latency to single-agent deep reasoning.

Architecture & Efficiency

Built over nine months with a completely rebuilt pretraining stack:

10x more compute-efficient than Llama 4 Maverick
Improved data curation and model architecture
Scaling law experiments guide efficient training

The efficiency gains make frontier-level capabilities more accessible for deployment.

Limitations

Muse Spark has acknowledged gaps:

Coding: Scores 77.4 on SWE-Bench Verified vs Claude's 80.8
Abstract reasoning: Trails on ARC-AGI-2 (42.5 vs 76+)
Agentic tasks: Lower GDPval-AA scores

These gaps are expected to narrow as Meta continues development.

Availability

Muse Spark is available today:

Meta AI app and meta.ai website
Rolling out to WhatsApp, Instagram, Facebook, Messenger
Private API preview for select partners

The Bottom Line

Muse Spark represents Meta's serious entry into frontier AI. While proprietary (unlike Llama), its strength in health reasoning and visual understanding, combined with innovative multi-agent architecture, makes it a compelling option for multimodal applications. The model's efficiency and Contemplating mode point toward a scalable path for personal superintelligence.

For teams building health-adjacent AI, visual analysis tools, or seeking efficient multimodal reasoning, Muse Spark warrants consideration. Its weaknesses in coding are acknowledged by Meta, suggesting future iterations will address these gaps.

Meta Muse Spark: A New Frontier in Multimodal Reasoning

Meta Muse Spark: A New Frontier in Multimodal Reasoning

What is Muse Spark?

Performance Highlights

Contemplating Mode

Architecture & Efficiency

Limitations

Availability

The Bottom Line

Read more

Qwen 3.7 Max: Alibaba's Agent-Grade Reasoning Model

Multica: Turn AI Agents Into Real Teammates

Gemini 3.5 Flash: Google's New Frontier Model for Agentic Coding