Microsoft has introduced Critique, a new multi-model deep research system inside Researcher, the deep research agent in Microsoft 365 Copilot, as part of a broaderMicrosoft has introduced Critique, a new multi-model deep research system inside Researcher, the deep research agent in Microsoft 365 Copilot, as part of a broader

Microsoft Introduced Critique, A New Multi-Model Deep Research System In M365 Copilot

저자: Metaverse Post

출처: Metaverse Post

2026/04/06 14:00

6분 읽기

DEEP$0.028463+6.73%

PART$0.188-0.58%

이 콘텐츠에 대한 의견이나 우려 사항이 있으시면 [email protected]으로 연락주시기 바랍니다

Microsoft Introduced Critique, A New Multi-Model Deep Research System In M365 Copilot

Microsoft has introduced Critique, a new multi-model deep research system inside Researcher, the deep research agent in Microsoft 365 Copilot, as part of a broader push to make Copilot feel more dependable for serious knowledge work instead of just fast drafting.

According to Microsoft, Critique is designed for complex research tasks and works by splitting the job into two parts: one model handles planning, retrieval, synthesis, and drafting, while a second model reviews and refines the output before the final report is produced. Microsoft says the system uses models from frontier labs including OpenAI and Anthropic, and that it is available now through the company’s Frontier program.

Reuters reported that in Critique’s current setup, OpenAI’s GPT generates the response and Anthropic’s Claude reviews it for accuracy and quality before the answer reaches the user. Microsoft has also said it wants this workflow to become bi-directional later on, allowing models to review each other in both directions.

What Critique actually does inside Microsoft 365 Copilot

Microsoft’s own description makes it clear that Critique is not just a cosmetic feature or a new button slapped onto Copilot.It works inside Researcher in Microsoft 365 Copilot and is built for deeper tasks where getting it right matters just as much as getting it done fast. One model does the digging and drafts the report, while the second steps in like an editor, checking the facts, sharpening the structure, and helping turn it into a more reliable final piece.

Microsoft says the whole idea is to separate generation from evaluation, rather than asking one model to brainstorm, write, fact-check, and polish its own work all at once. That distinction matters because a lot of AI failure comes from exactly that one-model bottleneck. When a single system is asked to do everything, it can produce something that looks polished while quietly missing gaps, overreaching on claims, or leaning on weak evidence.

Microsoft says Critique’s review layer is built around rubric-based evaluation, with attention to source reliability, report completeness, and strict evidence grounding. In plain English, the second model is there to ask whether the draft actually answered the question, whether the sourcing is solid, and whether the final narrative is supported instead of merely sounding confident.

Microsoft is not pitching Critique as a side experiment

One of the more important details in Microsoft’s announcement is that Critique will be the default experience in Researcher when Auto is selected in the model picker. That signals the company sees this as more than an optional lab feature for power users. It is effectively treating multi-model review as the new baseline for deep research quality inside Microsoft 365 Copilot. That is a meaningful product choice, because it suggests Microsoft believes enterprise customers care less about raw response speed than they do about fewer hallucinations, stronger structure, and more confidence in the finished report.

That also fits neatly into Microsoft’s broader messaging around Wave 3 of Microsoft 365 Copilot, where the company has been pushing the idea of Copilot as a “system for work” built on a multi-model advantage rather than on any single AI lab. In Microsoft’s framing, Copilot is meant to pull the best available intelligence from across the industry, grounded in work context through what it calls Work IQ and protected by enterprise data controls. Critique is one of the clearest examples yet of that strategy moving from marketing language into a visible product feature.

The benchmark numbers are a big part of Microsoft’s sales pitch

Microsoft is not only saying Critique feels better. It is saying the system performed better on a formal benchmark. In its technical write-up, the company says it tested Critique on the DRACO benchmark, short for Deep Research Accuracy, Completeness, and Objectivity, which covers 100 complex research tasks across 10 domains. Microsoft says responses were judged across factual accuracy, breadth and depth of analysis, presentation quality, and citation quality, and that Critique outperformed the single-model version of Researcher across all four measures.

The company highlighted the largest gains in breadth and depth of analysis, followed by presentation quality and factual accuracy. It also says the improvements were statistically significant and that Researcher with Critique delivered a +7.0 point aggregated score improvement, or +13.88% over Perplexity Deep Research (Claude Opus 4.6 model), which Microsoft described as the best system reported in the benchmark paper.

Data | Source: Microsoft

That is an eye-catching claim, especially because the deep research race has become one of the most competitive fronts in enterprise AI. Research tools are no longer being judged only by whether they can gather information, but by whether they can assemble a report that feels decision-ready.

Microsoft’s argument is that the review layer forces researchers to identify missing angles, tighten organization, challenge weak claims, and use citations more carefully. Whether customers experience those gains in real workflows will matter more than benchmark charts, but Microsoft is clearly trying to signal that this is a measurable quality jump rather than a vague model update.

Council shows Microsoft is thinking beyond one “best answer”

Critique is not the only feature Microsoft introduced alongside this update. The company also launched Council, a multi-model comparison mode inside Researcher. Microsoft says Council runs Anthropic and OpenAI models simultaneously, allowing each to generate a full standalone report. A separate judge model then creates a distilled summary showing where the reports agree, where they diverge, and what each uniquely contributes. Microsoft Support describes this as Model Council, a mode that preserves both full reports and adds a comparison summary to help users decide which output is stronger or how to combine them.

That is a very interesting signal about where enterprise AI may be heading. For a while, the industry behaved as if the goal was to find one model that could replace all the others. Microsoft’s latest move suggests the more realistic future may be one where companies do not trust any single model enough to make it the only voice in the room.

The timing of Critique is not accidental. Microsoft has been under pressure to show that Microsoft 365 Copilot is becoming more useful, more differentiated, and more valuable as competition intensifies.

Reuters tied the rollout of Critique and Council to Microsoft’s effort to improve Copilot adoption in a market where rivals including Google’s Gemini and Anthropic’s Claude products are pushing hard into workplace AI. Axios also noted that Microsoft’s multi-model strategy has another benefit: it shows the company is not locked into overdependence on OpenAI at a time when frontier model leadership can shift quickly.

The post Microsoft Introduced Critique, A New Multi-Model Deep Research System In M365 Copilot appeared first on Metaverse Post.

시장 기회

DeepBook 가격(DEEP)

$0.028463

$0.028463$0.028463

+6.28%

USD

DeepBook (DEEP) 실시간 가격 차트

Get 20 USDT in Just 1 Minute

Deposit $100 to unlock $300 in GOLD positions

면책 조항: 본 사이트에 재게시된 글들은 공개 플랫폼에서 가져온 것으로 정보 제공 목적으로만 제공됩니다. 이는 반드시 MEXC의 견해를 반영하는 것은 아닙니다. 모든 권리는 원저자에게 있습니다. 제3자의 권리를 침해하는 콘텐츠가 있다고 판단될 경우, [email protected]으로 연락하여 삭제 요청을 해주시기 바랍니다. MEXC는 콘텐츠의 정확성, 완전성 또는 시의적절성에 대해 어떠한 보증도 하지 않으며, 제공된 정보에 기반하여 취해진 어떠한 조치에 대해서도 책임을 지지 않습니다. 본 콘텐츠는 금융, 법률 또는 기타 전문적인 조언을 구성하지 않으며, MEXC의 추천이나 보증으로 간주되어서는 안 됩니다.