RESEARCHMicrosoft
Microsoft ships MDASH, a security system built from 100+ specialized AI agents (auditor, debater, prover) working as an ensemble. Found 16 new Windows vulnerabilities in one Patch Tuesday.
Single-agent thinking is over. Multi-model ensemble with role-specialized sub-agents is the new reference architecture.
The harness does the work. The model is one input. Auditor plus debater plus prover splits beat any single-model prompt-loop on hard problems. The pattern transfers from security to legal review, medical chart review, financial audit.
Expect XBOW, ZeroPath, and ProjectDiscovery to get acquired within 12 months. For PMs building high-stakes agentic products: stop optimizing prompt loops. Design the harness.
⚡ Why this matters
- Single-agent thinking is over.
- Multi-model agentic ensembles are the new reference architecture for high-stakes products.
- Microsoft just shipped the pattern at scale.
🔍 What happened
- May 12, 2026. Microsoft ships MDASH multi-model agentic scanning harness.
- Taesoo Kim (VP, Agentic Security) + Team Atlanta DARPA AIxCC pedigree.
- 100+ specialized agents (auditor, debater, prover) across frontier + distilled + counterpoint ensemble.
- 16 net-new Windows CVEs in May 12 Patch Tuesday.
- 4 Critical pre-auth RCEs (tcpip.sys SSRR UAF, ikeext.dll IKEv2 double-free, netlogon CLDAP, dnsapi UDP DNS heap OOB).
- 21/21 on StorageDrive private bench with 0 false positives.
- 96% recall on 5-year CLFS MSRC backlog. 100% on tcpip.sys.
- 88.45% CyberGym leaderboard top, ~5 points ahead.
💬 Smart takes
- Taesoo Kim: "The harness does the work, and the model is one input. Single-model harnesses undersold what models can do. Over-trusted single agents overshoot."
- Kim's framing: "Not which model does it use, but what does it do with the model, and what survives when the next model arrives."
- Skeptic: 16 CVEs is impressive, but MSRC normally ships 80-150 per Patch Tuesday. MDASH is a meaningful input, not the autonomous SOC story. 88.45% CyberGym lead over 83.1% next entry is smaller than absolute numbers suggest.
🧭 Where this goes
- "Multi-agent ensemble" becomes the standard pitch for high-stakes agentic products by Q3.
- Google, Anthropic, CrowdStrike, Palo Alto, Wiz ship competing ensemble harnesses within 6 months.
- Bug-bounty economics compress hard for routine vulnerability classes.
- First publicly-attributed agentic-system exploitation campaign lands within 12 months.
🎯 Implication
- For PMs building high-stakes agentic products: stop optimizing prompt-and-tool loops. Design the ensemble harness instead.
- Three concrete moves: split agent into role-specialized sub-agents (auditor, debater, prover); introduce adversarial-debate stages; build plugin extensibility for domain-specific context.
- For security and platform leaders: sign up for MDASH private preview. Architecture is more transferable than specific numbers.