BackArticle

Meta Muse Spark: A New Frontier in Multimodal Reasoning

Meta's Superintelligence Labs unveils Muse Spark, a natively multimodal reasoning model with multi-agent orchestration and strong performance in health and visual reasoning.

Meta Muse Spark: A New Frontier in Multimodal Reasoning

Meta Muse Spark: A New Frontier in Multimodal Reasoning

Meta's Superintelligence Labs has unveiled Muse Spark, its first public AI model and a significant departure from the Llama family. This natively multimodal reasoning model introduces multi-agent orchestration and demonstrates strong capabilities in health reasoning and visual understanding.

What is Muse Spark?

Muse Spark is a ground-up rebuild from Meta's new Superintelligence Labs, formed in mid-2025 with Alexandr Wang as chief AI officer. Unlike previous Llama models that stitched vision onto language backbones, Muse Spark was architected from the start to process text, images, and voice simultaneously.

Key features:

  • Natively multimodal: Processes visual and textual data in the same latent space
  • Tool-use support: Integrates with external tools and APIs
  • Visual chain-of-thought: Generates annotations on complex visual inputs
  • Contemplating mode: Multi-agent orchestration for enhanced reasoning

Performance Highlights

Muse Spark shows competitive performance across key benchmarks:

BenchmarkMuse SparkGPT-5.4Gemini 3.1 ProClaude Opus 4.6
HealthBench Hard42.840.120.6N/A
CharXiv Reasoning86.482.880.2N/A
MMMU-Pro80.5N/A82.4N/A
MedXpertQA78.477.181.3N/A
ARC-AGI-242.576.176.5N/A
SWE-Bench Verified77.457.7N/A80.8

Notable strengths:

  • Health reasoning: Leads with 42.8 on HealthBench Hard
  • Visual reasoning: Tops CharXiv Reasoning at 86.4
  • Token efficiency: Achieves similar performance with ~50% fewer output tokens

Contemplating Mode

Muse Spark's standout feature is Contemplating mode, which orchestrates multiple AI agents in parallel:

  • Runs up to 16 agents simultaneously
  • Each agent generates independent solutions
  • Outputs are synthesized into final response
  • Achieved 58% on Humanity's Last Exam with tools

This approach provides superior performance with comparable latency to single-agent deep reasoning.

Architecture & Efficiency

Built over nine months with a completely rebuilt pretraining stack:

  • 10x more compute-efficient than Llama 4 Maverick
  • Improved data curation and model architecture
  • Scaling law experiments guide efficient training

The efficiency gains make frontier-level capabilities more accessible for deployment.

Limitations

Muse Spark has acknowledged gaps:

  • Coding: Scores 77.4 on SWE-Bench Verified vs Claude's 80.8
  • Abstract reasoning: Trails on ARC-AGI-2 (42.5 vs 76+)
  • Agentic tasks: Lower GDPval-AA scores

These gaps are expected to narrow as Meta continues development.

Availability

Muse Spark is available today:

  • Meta AI app and meta.ai website
  • Rolling out to WhatsApp, Instagram, Facebook, Messenger
  • Private API preview for select partners

The Bottom Line

Muse Spark represents Meta's serious entry into frontier AI. While proprietary (unlike Llama), its strength in health reasoning and visual understanding, combined with innovative multi-agent architecture, makes it a compelling option for multimodal applications. The model's efficiency and Contemplating mode point toward a scalable path for personal superintelligence.

For teams building health-adjacent AI, visual analysis tools, or seeking efficient multimodal reasoning, Muse Spark warrants consideration. Its weaknesses in coding are acknowledged by Meta, suggesting future iterations will address these gaps.