Windows servers were burning CPU in production. The cause was identified. The fix — adding folder exclusions to Microsoft Defender — was well-understood in principle. But the specific configuration details were scattered across four vendors’ documentation. Getting them right mattered: a wrong path would leave the problem unsolved, and an overly broad exclusion could create a security gap.
The question wasn’t whether to fix it. The question was: how do you know what you find is actually correct?
The Deliverable
What came back from the research phase:
- 36 findings across the relevant vendor documentation
- 23 cited sources, each linked and traceable
- Specific folder paths, organised by agent
- Security tradeoffs explicitly named — what each exclusion permits and what risks it introduces
- Known attacker techniques cross-referenced against the proposed configuration
- Open questions flagged for follow-up, not buried
That’s not a summary. That’s a structured output you can act on, hand off, or file for audit.
The Quality Gate
Here’s the part that matters for anyone signing off on work like this: the first draft had errors.
Not vague errors. Specific, consequential errors. One vendor’s log path was wrong by a single folder name — the exclusion would have been configured, tested as “applied,” and would have done nothing. A security detail about a specific malware variant was attributed to the wrong technique.
A second agent — a Validator — ran through every source independently. Its only job was to check. Not summarise, not add findings, not recommend. Check. It found both errors before anything reached production.
Two agents, separated by role. The Investigator finds. The Validator checks. Neither does both. This is the same separation-of-concerns principle as a developer who doesn't review their own code.
What Made It Repeatable
A one-off investigation done carefully is useful. A system that runs every future investigation at the same quality bar is an asset.
The difference is documentation. Before the second investigation ran, the rules were written down:
- Every investigation starts with a scope gate — one clear question, stated in a single sentence, with explicit out-of-scope boundaries
- Validation is not optional, ever
- The answer goes first in every output — not after pages of context
- One source of truth per investigation; everything else is generated from it
This isn’t process for its own sake. Each rule exists because its absence produced a real problem.
What It Produced in One Afternoon
At the end of the build:
- A scope gate that forces clear questions before any research starts
- An Investigator that researches from live sources, not memory
- A Validator that fact-checks independently before anything ships
- A verification script that catches structural errors automatically
- Answer-first output — the decision is at the top, the evidence follows
- Open questions converted to concrete follow-up tasks
- Everything stored, findable, and reproducible
The system has no memory of being built in an afternoon. Every investigation it runs gets the same structure, the same validation, the same output format — regardless of who’s asking the question or what the topic is.
Research that arrives pre-validated, sourced, and structured is research you can act on. Research that arrives as a summary from someone's memory is research you have to verify yourself — which means you're doing the work twice, or you're not doing it at all.
The Lesson That Applies Everywhere
The Validator caught errors that a human reviewer probably wouldn’t have caught — not because humans are careless, but because the Validator went back to primary sources for every claim, which is not how humans review under time pressure.
The scope gate prevents the most common research failure mode: wide-and-shallow investigations that cover everything and answer nothing.
The answer-first structure means the person reading it doesn’t have to excavate the finding from the context.
These are not AI-specific lessons. They’re good research practice. The system just enforces them mechanically, every time, without being asked.