System: MWMS
Document Type: Framework
Status: Draft For MCR
Authority: HeadOffice
Applies To: Research Brain, Data Brain, Affiliate Brain, Ads Brain, Content Brain, Experimentation Brain, HeadOffice Intelligence, AI Agent Operations, Future AI Employees
Primary Location: MCR
Future Operational Destination: mwmsbrain.site, mwmsheadofficebrain.site, Future AI Employee Dashboards
Parent Page: HeadOffice
Source Of Truth: MCR
Course Source: Matt Pocock AIhero Build DeepSearch In TypeScript
Absorption Status: Approved For Integration
Purpose
The purpose of this framework is to define how MWMS evaluates, monitors, improves, and governs Deep Search style AI Employees.
A Deep Search AI Employee is any AI workflow that researches external information, searches the web, inspects sources, crawls pages, evaluates evidence, produces recommendations, or creates decision ready intelligence for MWMS.
This framework ensures that Deep Search outputs are not judged by appearance, confidence, or “it sounds good” alone.
MWMS Deep Search outputs must be judged by measurable quality standards including factuality, relevance, source quality, freshness, cost control, latency, traceability, and business usefulness.
This framework also defines how observability, metadata, database activity, tool use, crawler activity, and evaluation results must be captured so HeadOffice can understand what happened, why it happened, whether it was useful, and whether the AI Employee should be trusted, improved, restricted, or escalated.
Scope
This framework applies to any MWMS system, Brain, AI Employee, workflow, dashboard, or automation that performs research, retrieval, evidence extraction, source analysis, or intelligence generation.
This includes:
- Research Brain search workflows
- Affiliate Brain offer research
- Ads Brain market and compliance research
- Content Brain topic and SEO research
- HeadOffice newsletter intelligence
- Data Brain source and signal validation
- Experimentation Brain test analysis
- AI Employee task execution
- Future client facing AI research tools
- Future AIBS research and reporting systems
- Future Deep Search agents
- Any system that uses search, crawling, scraping, source inspection, or external knowledge retrieval
This framework does not define exact implementation code, TypeScript architecture, Langfuse setup, Evalite setup, crawler packages, or vendor specific tooling.
Those belong to developer implementation notes.
This framework defines the MWMS operating standard.
Core Principle
A Deep Search AI Employee is only useful if its output is:
- Factually grounded
- Relevant to the task
- Based on inspectable sources
- Current enough for the decision being made
- Traceable from request to result
- Measurable against success criteria
- Cost controlled
- Fast enough for the use case
- Logged for review
- Improveable through Kaizen
The model response is not the system.
The full system includes:
- User request
- Brain or Employee assignment
- Prompt
- Model call
- Search query
- Tool call
- Source selection
- Crawler or scraper action
- Database read or write
- Observability trace
- Evaluation score
- Final output
- Human review
- Stored decision record
- Improvement loop
Definition Of Deep Search In MWMS
Deep Search is the structured process of moving beyond a shallow AI answer by combining search, source inspection, evidence extraction, synthesis, evaluation, and decision ready output.
A proper MWMS Deep Search workflow should include:
- Understanding the task
- Identifying whether the task is time sensitive
- Creating one or more search paths
- Retrieving possible sources
- Selecting sources worth inspecting
- Opening or crawling selected sources
- Extracting useful evidence
- Checking freshness and relevance
- Synthesising findings
- Producing a decision ready answer
- Logging the full process
- Evaluating the output against success criteria
- Routing improvements through Kaizen
A shallow AI answer is not Deep Search.
A search result summary is not Deep Search.
A scraped page alone is not Deep Search.
Deep Search requires source backed reasoning and measurable quality control.
Date Awareness Requirement
Any AI Employee working with external information must be date aware.
The AI Employee must understand:
- The current date
- Whether the user request depends on current information
- Whether the source may be outdated
- Whether the answer changes over time
- Whether the source has a publication date, update date, or no visible date
- Whether recency should affect confidence
- Whether older sources are still valid or should be treated as historical
Date awareness is mandatory for:
- Affiliate offer research
- Google Ads policy research
- Compliance checks
- Tool reviews
- AI platform updates
- Pricing checks
- Newsletter intelligence
- Market trend research
- Search engine or platform behaviour
- Product availability
- Legal, financial, medical, or policy related topics
- Current events
- Competitor monitoring
If freshness matters and the AI Employee cannot confirm current information, the output must state uncertainty and reduce confidence.
Source Freshness Rules
Each source used by a Deep Search workflow should be evaluated for freshness.
Freshness assessment should consider:
- Publication date
- Updated date
- Retrieval date
- Whether the page appears active
- Whether the topic is time sensitive
- Whether newer conflicting information may exist
- Whether the source is evergreen or unstable
- Whether the source is official, secondary, outdated, archived, promotional, or user generated
MWMS should treat source freshness differently depending on the task.
Stable Information
Stable information may use older sources if the concept does not change often.
Examples:
- General frameworks
- Historical facts
- Evergreen business principles
- Basic technical concepts
Moderately Changing Information
Moderately changing information should prefer newer sources.
Examples:
- Tool features
- Platform workflows
- Pricing pages
- SEO practices
- Marketing channel tactics
Highly Time Sensitive Information
Highly time sensitive information must use current sources wherever possible.
Examples:
- Policy changes
- Affiliate payouts
- product availability
- ads platform rules
- compliance rules
- current events
- laws and regulations
- software versions
- AI model capabilities
- market trends
Retrieval Quality Rule
Deep Search quality depends on retrieval quality before reasoning quality.
A weak retrieval layer produces weak answers, even if the model sounds confident.
The retrieval layer must be evaluated for:
- Search query quality
- Number of search attempts
- Diversity of sources
- Source reliability
- Source relevance
- Source freshness
- Page accessibility
- Extracted content quality
- Failure handling
- Duplicate source handling
- Evidence sufficiency
A Deep Search AI Employee should not rely only on search snippets when a decision requires deeper evidence.
Where appropriate, the system should inspect selected sources directly.
Crawler And Source Inspection Rules
When a crawler, scraper, browser tool, or source inspection tool is used, the AI Employee must treat the retrieved page content as evidence, not as automatic truth.
Crawler and scraper workflows should record:
- Source URL
- Source title
- Retrieval time
- Access status
- Extraction success or failure
- Extracted content summary
- Content length or completeness
- Whether the content appeared usable
- Whether important content may have been hidden, blocked, or missing
- Whether the source should be trusted
- Whether the source was used in the final answer
Crawler failures must not be hidden.
If source inspection fails, the AI Employee should either:
- Try another source
- Use the search result only with reduced confidence
- Escalate to human review
- State that evidence was insufficient
Source Reliability Classification
MWMS Deep Search workflows should classify sources where possible.
Suggested source classes:
Official Source
Examples:
- Vendor documentation
- Platform policy pages
- Government pages
- Official product pages
- Company announcements
Default trust level: High, but still checked for bias and recency.
Expert Source
Examples:
- Industry analysis
- Recognised experts
- Specialist publications
- Technical blogs from credible practitioners
Default trust level: Medium to high.
Commercial Source
Examples:
- Affiliate sales pages
- product reviews with monetisation
- vendor comparison pages
- agency landing pages
Default trust level: Medium to low unless corroborated.
User Generated Source
Examples:
- forums
- comments
- social media posts
Default trust level: Signal only, not proof.
Unknown Or Low Trust Source
Examples:
- scraped reposts
- thin content sites
- anonymous blogs
- outdated pages
- AI generated content farms
Default trust level: Low.
Observability Requirement
Every serious Deep Search AI Employee must be observable.
Observability means the system can answer:
- What was requested?
- Who requested it?
- Which Brain handled it?
- Which AI Employee handled it?
- Which model was used?
- What prompt was sent?
- What tools were called?
- What searches were performed?
- What sources were found?
- What sources were inspected?
- What database records were read?
- What database records were written?
- What failed?
- What cost was incurred?
- How long did it take?
- What confidence score was produced?
- What decision was made?
- Where was the final output stored?
- What should be improved next?
Observability must cover the full workflow, not just the model call.
Full Workflow Trace Standard
A Deep Search workflow should be traceable from beginning to end.
The ideal trace path is:
User request
→ Brain assignment
→ AI Employee assignment
→ task or thread ID
→ prompt
→ model call
→ search query
→ tool call
→ source result
→ crawler or scraper action
→ extracted evidence
→ database read
→ database write
→ evaluation score
→ final output
→ HeadOffice review
→ Kaizen improvement log
No major Deep Search action should exist without a parent task, thread, workflow run, offer, experiment, source record, or report.
No orphaned AI output.
Required Observability Metadata
Deep Search traces should include operational metadata wherever possible.
Recommended metadata fields:
- Brain name
- AI Employee name
- workflow type
- task ID
- thread ID
- user or operator
- client or account if relevant
- source record ID
- offer ID if relevant
- experiment ID if relevant
- newsletter ID if relevant
- campaign ID if relevant
- priority
- urgency
- confidence
- model used
- tool used
- search provider used
- crawler or scraper used
- cost estimate
- latency
- success status
- failure reason
- retry count
- escalation status
- decision outcome
- final storage location
- review status
- Kaizen note
Technical logs without business metadata are not enough for MWMS.
HeadOffice must be able to understand the business meaning of the trace.
Database Call Observability
Deep Search observability must include important database activity.
The system should log or trace:
- database reads
- database writes
- task updates
- queue status changes
- source record creation
- source record updates
- evidence record creation
- result storage
- duplicate detection
- failed inserts
- missing records
- permissions failures
- status transitions
- event log creation
This matters because many AI failures are not model failures.
They may be:
- missing data
- wrong record linkage
- duplicate writes
- broken task state
- failed source storage
- incorrect user ownership
- outdated cached records
- queue routing failure
A final answer should never be trusted if the supporting database workflow is broken.
Tool Call Observability
Every tool used by a Deep Search AI Employee should be visible and reviewable.
Tool traces should include:
- tool name
- tool purpose
- input arguments
- execution status
- output summary
- error message if failed
- latency
- retry count
- whether the output was used
- whether the output changed the final answer
- whether the tool was authorised for that AI Employee
This applies to:
- search tools
- crawler tools
- scraper tools
- browser tools
- database tools
- file tools
- email tools
- calendar tools
- analytics tools
- ad platform tools
- future MCP tools
- future WordPress or Supabase tools
Cost And Latency Tracking
Deep Search can become expensive if uncontrolled.
Each production level Deep Search AI Employee should track:
- cost per query
- cost per user
- cost per workflow run
- cost per source inspected
- cost per successful answer
- cost per failed answer
- total daily cost
- total monthly cost
- average response time
- slowest workflow paths
- tool latency
- database latency
- model latency
- crawler latency
Cost and latency are not only technical metrics.
They affect business viability, user trust, scaling, and product packaging.
Evaluation Requirement
Every serious Deep Search AI Employee should have repeatable evaluation tests.
Manual judgement alone is not enough.
Evals should be used to test whether the AI Employee produces outputs that meet MWMS standards over time.
Evaluation should apply before:
- giving an AI Employee more autonomy
- using the Employee for high value decisions
- scaling client facing workflows
- changing models
- changing prompts
- changing tools
- changing search providers
- changing crawler behaviour
- changing source rules
- automating downstream actions
Success Criteria Requirement
Every Deep Search AI Employee must have success criteria.
Success criteria define what “good” means.
Without success criteria, MWMS cannot know whether an AI Employee is improving, drifting, wasting cost, or creating risk.
Deep Search success criteria should include:
- factual accuracy
- relevance
- source quality
- source visibility
- freshness
- completeness
- clarity
- actionability
- decision usefulness
- speed
- cost control
- error rate
- confidence calibration
- compliance safety
- repeatability
- human review usefulness
Base Success Criteria For Deep Search Outputs
A Deep Search output should be judged against the following base criteria.
1. Factual
The answer should be supported by evidence, not model confidence alone.
2. Relevant
The answer should directly address the task, decision, or question.
3. Sourced
The answer should make clear what evidence was used.
4. Up To Date
The answer should use information current enough for the decision.
5. Complete Enough
The answer should include enough information to support a decision, while avoiding unnecessary filler.
6. Clear
The answer should be understandable by the intended operator.
7. Actionable
The answer should help MWMS decide what to do next.
8. Cost Controlled
The workflow should not use excessive model, tool, or crawler cost for the value of the task.
9. Fast Enough
The response time should fit the use case.
10. Traceable
The process should be reviewable after the fact.
11. Safe
The output should avoid compliance, policy, legal, privacy, or reputational risk.
12. Improveable
The output should produce enough evidence for future Kaizen improvement.
Deep Search Evaluation Categories
MWMS should evaluate Deep Search AI Employees across five categories.
Category 1: Evidence Quality
Questions:
- Did the AI inspect enough sources?
- Were the sources reliable?
- Were the sources current?
- Were weak sources filtered out?
- Were conflicting sources identified?
- Was the final answer grounded?
Category 2: Reasoning Quality
Questions:
- Did the AI interpret the evidence correctly?
- Did it avoid overclaiming?
- Did it separate fact from inference?
- Did it acknowledge uncertainty?
- Did it reach a useful conclusion?
Category 3: Operational Quality
Questions:
- Did the workflow run without error?
- Were tool calls successful?
- Were database records saved correctly?
- Were outputs connected to the right task or thread?
- Was the process visible to HeadOffice?
Category 4: Business Quality
Questions:
- Did the answer help MWMS make a better decision?
- Did it save time?
- Did it identify risk?
- Did it reveal an opportunity?
- Did it support revenue, compliance, efficiency, or system improvement?
Category 5: Cost And Scaling Quality
Questions:
- Was the output worth the cost?
- Did the AI use too many searches?
- Did the crawler inspect too many pages?
- Did retries create waste?
- Can the workflow scale safely?
Suggested Scoring Model
Each Deep Search output may be scored from 1 to 5 across key criteria.
| Score | Meaning |
|---|---|
| 1 | Failed or unusable |
| 2 | Weak and needs human correction |
| 3 | Acceptable but not strong |
| 4 | Strong and useful |
| 5 | Excellent and reusable |
Suggested scoring fields:
- Factual accuracy
- Relevance
- Source quality
- Freshness
- Completeness
- Decision usefulness
- Traceability
- Cost efficiency
- Speed
- Safety
A score below 3 in factual accuracy, source quality, safety, or decision usefulness should trigger review.
Failure Conditions
A Deep Search output should be treated as failed or review required if:
- No sources were used when sources were required
- Sources were outdated for a time sensitive topic
- The answer made unsupported claims
- The crawler failed but the answer acted as if it succeeded
- Search snippets were treated as full evidence
- The output did not answer the actual task
- The system could not trace tool calls
- The system could not trace database activity
- The answer used low trust sources without warning
- The cost was excessive for the task value
- The output created compliance or reputational risk
- The AI Employee showed high confidence with weak evidence
- The final answer could not be linked to a task, thread, source, or workflow record
Human Review Requirements
Human review is required when:
- The decision has financial risk
- The decision has compliance risk
- The output affects a campaign launch
- The output affects affiliate offer selection
- The output recommends budget changes
- The output recommends public claims
- The output uses weak or conflicting sources
- The output depends on current policy or regulation
- The AI Employee confidence is low
- The evaluation score is below threshold
- The system detects missing trace data
Human review should record:
- approved
- rejected
- needs more research
- park for later
- route to another Brain
- create task
- update framework
- add Kaizen note
HeadOffice Governance Role
HeadOffice owns this framework.
HeadOffice is responsible for:
- defining success criteria
- approving Deep Search AI Employees
- reviewing observability outputs
- monitoring cost and latency
- reviewing eval results
- identifying drift
- routing failures to the correct Brain
- deciding when an AI Employee can gain more autonomy
- deciding when an AI Employee must be restricted
- ensuring all Deep Search workflows support MWMS business goals
HeadOffice must not rely on polished AI answers without traceability.
A useful answer without evidence is not enough.
A confident answer without observability is not enough.
A fast answer that is wrong is not enough.
A cheap answer that creates risk is not enough.
Relationship To Other MWMS Standards
This framework supports and should align with:
- MWMS AI Agent Operations Core
- MWMS AI Tool Permission And Access Framework
- MWMS AI Agent Deployment Readiness Checklist
- MWMS AI Workflow Pipeline Standard
- MWMS AI Schema And Decision Ready Output Framework
- MWMS AI Output Validation Standard
- MWMS Agentic Reporting Standard
- MWMS Supabase Event Schema
- MWMS Brain Room Architecture
- HeadOffice Operational Intelligence Framework
- HeadOffice Newsletter Intelligence Operating Protocol
- Research Brain Source Evaluation Framework
- Data Brain Measurement Integrity Framework
- Experimentation Brain Canon
- MWMS Kaizen Continuous Improvement Loop
- MWMS System Change Log
This framework does not replace those standards.
It provides the quality and observability layer for Deep Search style AI work.
Routing Rules
Deep Search findings should route according to their business function.
| Finding Type | Primary Destination |
|---|---|
| Source quality issue | Research Brain |
| Data integrity issue | Data Brain |
| AI Employee failure | AI Agent Operations |
| Cost issue | Finance Brain or HeadOffice |
| Experiment insight | Experimentation Brain |
| Affiliate opportunity | Affiliate Brain |
| Ad or compliance issue | Ads Brain or Risk Brain |
| Newsletter signal | HeadOffice Intelligence |
| Framework improvement | MCR |
| UI visibility issue | relevant Brain site or plugin build |
| Repeated failure pattern | Kaizen Log and HeadOffice review |
Kaizen Loop
Every Deep Search AI Employee should feed a Kaizen loop.
After meaningful runs, the system should record:
- What worked
- What failed
- What was unclear
- What sources were weak
- What cost too much
- What took too long
- What should be improved
- Whether prompts need refinement
- Whether tools need refinement
- Whether success criteria need updating
- Whether the AI Employee is ready for more autonomy
Kaizen loop:
Reflect
→ Reduce
→ Refine
→ Record
The goal is not only to judge individual outputs.
The goal is to improve the system over time.
Minimum Viable Implementation
The first version of this framework does not require a complex observability platform.
MWMS can begin with:
- task IDs
- thread IDs
- source records
- event logs
- model used
- tool used
- final output
- status
- confidence
- human review result
- Kaizen note
As the system matures, MWMS can add:
- full trace logging
- external observability tools
- model cost tracking
- crawler performance metrics
- database call traces
- automated evals
- AI Employee scorecards
- HeadOffice monitoring dashboards
Start simple.
Do not delay governance because tooling is not perfect.
Future System Enhancements
Future versions may include:
- MWMS AI Observability Metadata Standard
- MWMS Deep Search Source Record Schema
- MWMS AI Employee Eval Registry
- MWMS Deep Search Scorecard
- HeadOffice AI Trace Dashboard
- Research Brain Source Quality Dashboard
- Affiliate Brain Offer Evidence Trail
- Ads Brain Compliance Evidence Trail
- automated source freshness checking
- model comparison evals
- cost per AI Employee reporting
- confidence calibration reports
- failure pattern detection
- AI Employee promotion or restriction rules
Drift Protection
This framework prevents the following drift:
- treating model output as truth
- trusting AI confidence without evidence
- relying only on search snippets
- ignoring source freshness
- hiding tool calls from operators
- ignoring database workflow failures
- judging AI Employees by vibes
- scaling AI Employees without evals
- allowing cost to grow invisibly
- creating orphaned outputs
- separating AI output from business usefulness
- using observability as technical logging only
- building Deep Search without HeadOffice oversight
If an AI Employee cannot be observed, evaluated, and reviewed, it should not be trusted with important decisions.
Architectural Intent
The architectural intent of this framework is to make Deep Search a governed MWMS capability rather than a loose AI feature.
MWMS is not building simple chatbots.
MWMS is building business aligned AI Employees that can research, reason, report, and improve.
For that to work, every serious AI Employee must be:
- observable
- measurable
- source aware
- date aware
- cost aware
- workflow aware
- reviewable
- improvable
This framework ensures Deep Search becomes part of the MWMS intelligence layer, not just another tool.
Change Log
v1.0 Initial Draft
Created the MWMS Deep Search Quality And Observability Framework based on absorbed insights from Matt Pocock AIhero Build DeepSearch In TypeScript.
Integrated principles from course blocks covering:
- date aware LLM behaviour
- crawler improvement
- database call observability
- Langfuse style tracing
- Evalite style repeatable evaluation
- Deep Search success criteria
- factuality
- relevance
- source quality
- freshness
- cost per query
- cost per user
- latency
- error rates
- business usefulness
Established this framework as the MWMS governance layer for evaluating, monitoring, and improving Deep Search style AI Employees.