System: MWMS
Document Type: Operating Framework
Authority Level: MCR Source Of Truth
Status: Active
Version: v1.2
Primary Location: MCR
Future Operational Destination: Research Brain, Data Brain, AIBS Brain, Affiliate Brain, PPL Brain, Content Brain, Sales Brain, Experimentation Brain, HeadOffice Brain, Compliance Brain, Risk Brain, Automation Brain
Parent Page: Research Brain
Owner: Martyn
Developer Boundary: Do Not Touch M’s Active Build Areas Unless Specifically Assigned
Source Of Truth: MCR
Last Reviewed: 2026-06-21
Source / Origin: AI Automations by Jack Commercialisation Block, Apify Masterclass, Lead Generation Systems, Productised AIOS Service Packaging, Case Study Pattern Library, RSS Extraction, Website Crawling, YouTube Transcript Capture, Email And Meeting Intelligence, Browser Capture, RAG Intake, Research Automation, Keyword Driven Lead Discovery, Contact Extraction, Scrape Or Reject Review Gates, Prospect Enrichment, And Downstream Outreach Preparation Material
MWMS Classification: Research Brain Operating Framework / Data Extraction Standard / Actor Infrastructure Framework / Source Intake Governance / Candidate Discovery And Permitted Use Standard / Structured Intelligence Pipeline
Primary Brain: Research Brain
Supporting Brains: Data Brain, AIBS Brain, Affiliate Brain, PPL Brain, Content Brain, Sales Brain, Experimentation Brain, HeadOffice Brain, Compliance Brain, Risk Brain, Automation Brain
Related Pages: Research Brain Canon, Data Brain Canon, MWMS Search Scrape Summarise Evidence Pipeline Standard, MWMS Source Visibility And Evidence Display Standard, MWMS Deep Search Quality And Observability Framework, MWMS Research Synthesis Documentation And Distribution Framework, MWMS Outbound Lead Enrichment And Cold Outreach Governance Framework, MWMS AIOS Lead Capture And Conversion Infrastructure Framework, MWMS AI Assisted Outreach And Sales Follow Up Automation Framework, MWMS Personalised Visual Sales Asset Production And Governance Framework, MWMS Client Context Isolation And Privacy Boundary Standard, MWMS AI Automation Security And Risk Checklist, MWMS AI Tool Permission And Access Framework
Purpose
The purpose of the MWMS Data Extraction And Actor Infrastructure Framework is to define how MWMS uses structured web data extraction, actor-based automation, scraping systems, APIs, feeds, transcripts, files, browser capture, enrichment workflows, and research pipelines to support better decisions across the MWMS ecosystem.
This framework exists because Research Brain and Data Brain must become stronger than manual searching.
MWMS cannot rely only on:
random Google searches
scattered course notes
manual copying
isolated spreadsheets
unverified summaries
single-source assumptions
unowned scraping workflows
tool-specific actor configurations
The framework defines how authorised source data becomes:
structured evidence
clean records
enriched intelligence
scored opportunities
reviewable candidates
permitted downstream inputs
dashboards
reports
Brain requests
commercial opportunities
client AIOS inputs
The core purpose is:
Turn authorised external and internal source data into structured, source-visible MWMS intelligence that can be searched, scored, compared, reviewed, routed, tested, and used by the correct Brain.
Core Doctrine
The MWMS doctrine is:
Data extraction is not valuable by itself.
Data extraction becomes valuable when it feeds a decision, dashboard, offer, campaign, experiment, report, client system, or governed knowledge base.
A scraper is not the asset.
An actor is not the asset.
An API is not the asset.
An RSS feed is not the asset.
A transcript is not the asset.
A crawler is not the asset.
A lead list is not the asset.
The asset is the structured intelligence produced from authorised source evidence.
MWMS should never build data extraction systems merely because they are technically possible.
Every extraction workflow must answer:
What decision will this support?
Which Brain needs the data?
What source is being captured?
Are we permitted to access and use it?
What fields are needed?
How fresh must the data be?
Is this one-time capture or continuous monitoring?
What action may the data trigger?
How will the raw evidence be preserved?
How will the data be cleaned?
How will changes be detected?
How will duplicates be handled?
How will the data be stored?
How will the data be scored?
What compliance risks exist?
What downstream use is permitted?
What human review is required?
What dashboard, report, or knowledge system will show the value?
What is the cost of maintaining the data flow?
Strategic Importance
The long-term strategic value of this framework is not scraping.
It is reliable external intelligence infrastructure.
That infrastructure can support:
market research
offer discovery
competitor intelligence
lead discovery
client intelligence
content research
affiliate research
PPL research
review mining
trend monitoring
price monitoring
product monitoring
local business intelligence
research evidence
sales relevance
AIOS opportunity discovery
The strongest extraction systems combine:
clear business purpose
authorised sources
stable actors
raw evidence preservation
structured schemas
normalisation
identity resolution
deduplication
enrichment
freshness control
scoring
human review
permitted-use decisions
Brain routing
observability
Core Definitions
A data extraction workflow is a repeatable process that collects information from an authorised external or internal source and converts it into structured data or source-linked evidence.
An actor is a reusable automation component that performs a defined extraction or processing job, such as:
scraping a website
collecting listings
extracting reviews
monitoring a page
retrieving a feed
capturing transcripts
normalising records
enriching a dataset
An actor infrastructure layer is the system that stores, runs, monitors, versions, and routes these extraction components.
A research pipeline is the full pathway from:
source selection
capture
raw evidence preservation
cleaning
normalisation
enrichment
scoring
candidate review
permitted-use decision
storage
comparison
dashboarding
Brain routing
A source intake pipeline is the governed path through which authorised information enters MWMS.
Change detection is the comparison of newly captured source data with previous source records to identify material differences.
Raw evidence is the original captured material before interpretation, summarisation, or cleansing changes its presentation.
A candidate record is an extracted entity that may be relevant to a downstream purpose but has not yet been accepted for that purpose.
A permitted-use decision is the explicit determination of whether an accepted record may be used for research, analysis, reporting, personalisation, sales preparation, outreach, client delivery, or another named activity.
MWMS Definition
The MWMS Data Extraction And Actor Infrastructure Framework is:
Research Brain and Data Brain’s standard for converting authorised web data, feeds, transcripts, files, emails, meetings, browser captures, marketplace signals, competitor intelligence, lead data, review data, product data, and market signals into structured, governed, reusable intelligence pipelines that support MWMS decisions, dashboards, offers, campaigns, knowledge systems, and client AIOS systems.
Scope
This framework applies to:
Research Brain market research
Data Brain structured intelligence
Affiliate Brain product research
PPL offer research
AIBS client research
competitor monitoring
offer intelligence
ad intelligence
review mining
lead enrichment
Google Maps-style business research
keyword-driven prospect discovery
search-result extraction
contact discovery
website contact extraction
social profile extraction
marketplace scraping
e-commerce product monitoring
real estate data extraction
social proof monitoring
price monitoring
ranking and visibility monitoring
AI tool monitoring
newsletter intelligence enrichment
case study extraction
client intelligence reports
content opportunity systems
data-backed dashboards
actor-based SaaS or micro-app infrastructure
RSS ingestion
website crawling
sitemap capture
YouTube transcript capture
uploaded file extraction
email intelligence intake
meeting transcript intake
support conversation capture
browser-selected text capture
form data intake
CRM imports
API feeds
webhook events
scheduled source monitoring
This framework applies whenever MWMS uses extraction, capture, crawling, feeds, transcripts, files, or scraping to support a business decision.
Core Principle
The core principle is:
Extract only what MWMS is authorised to access and can use, structure, verify, govern, retain, and act on.
A data extraction workflow should not create a pile of raw data.
It should create usable intelligence.
Usable intelligence means:
purpose-led
authorised
structured
cleaned
timestamped
source-linked
identity-linked where required
deduplicated
change-aware
scored where useful
human-reviewable
downstream-use controlled
routed to the right Brain
connected to a decision
displayed where useful
retained appropriately
deletable
not over-collected
The MWMS Data Extraction And Actor Infrastructure Model
Every extraction and intake system should be designed across seventeen layers:
Intelligence Need Layer
Source Authority And Permission Layer
Source Selection Layer
Capture Method Layer
Actor And Automation Layer
Raw Evidence Layer
Data Schema Layer
Cleaning And Normalisation Layer
Identity And Deduplication Layer
Enrichment Layer
Change Detection And Freshness Layer
Scoring And Classification Layer
Storage And Retention Layer
Dashboard And Report Layer
Brain Routing Layer
Governance And Compliance Layer
Candidate Discovery And Permitted Downstream Use Layer
- Intelligence Need Layer
The first step is not choosing a scraper.
The first step is identifying the intelligence need.
Intelligence Need Questions
Ask:
What are we trying to learn?
Which Brain needs the answer?
What decision depends on this?
Is this for affiliate, PPL, AIBS, content, ads, research, client work, sales, or HeadOffice?
Is this one-time research or recurring monitoring?
How fresh must the data be?
What fields are needed?
What evidence must be preserved?
What output is required?
What will happen if the data confirms the hypothesis?
What will happen if the data contradicts the hypothesis?
What downstream use may be requested?
Example Intelligence Needs
Find local businesses with poor follow-up signals.
Identify companies with weak website conversion paths.
Monitor competitor offers.
Track new product or pricing changes.
Find content gaps.
Extract review themes.
Build a qualified market map.
Identify potential AIOS clients.
Rule
No extraction should begin without a defined business question.
- Source Authority And Permission Layer
The system must determine whether MWMS may access and use the source.
Source Authority Questions
Ask:
Is the source public?
Is authentication required?
Do we have permission?
Do platform terms restrict extraction?
Does the source contain personal data?
Is the intended use different from the source’s original purpose?
Does the source contain restricted or sensitive information?
Does the client own or authorise the source?
Is the source licensed?
Can the data be retained?
Can it be reused?
Can it be used for outreach?
Permission Status
Approved
Approved With Conditions
Review Required
Restricted
Prohibited
Unknown
Rule
Technical accessibility does not equal permission.
- Source Selection Layer
Sources should be selected according to the intelligence need.
Possible sources include:
official websites
search results
directories
marketplaces
feeds
sitemaps
public profiles
review platforms
social pages
news sources
documents
transcripts
emails
meetings
client systems
APIs
files
browser captures
Source Selection Criteria
authority
relevance
coverage
freshness
stability
cost
permission
reliability
extractability
change frequency
Rule
Use the strongest practical sources rather than the easiest source alone.
- Capture Method Layer
Possible capture methods include:
manual capture
API
feed
actor
scraper
crawler
browser capture
file upload
transcript retrieval
email intake
webhook
database export
The method should be selected according to:
source
volume
frequency
structure
cost
reliability
permission
maintenance burden
Rule
The capture method must fit the source and decision.
- Actor And Automation Layer
Each actor should perform a clear job.
Actor examples include:
search-result actor
website crawler
contact extractor
review scraper
feed reader
transcript retriever
file parser
price monitor
change detector
normaliser
enrichment actor
Actor Requirements
Actor name
owner
purpose
source
inputs
outputs
version
frequency
cost
permissions
failure handling
destination
last tested
Rule
Actors should be reusable, observable, and replaceable.
- Raw Evidence Layer
Raw evidence must be preserved where the decision may require audit, verification, or reprocessing.
Raw evidence may include:
source HTML
original JSON
feed item
file
transcript
screenshot
browser capture
API response
Raw Evidence Fields
Source ID:
Source URL:
Captured At:
Published At:
Updated At:
Capture Method:
Actor Version:
Raw Location:
Hash:
Access Status:
Rule
Cleaning must not erase the ability to inspect the original evidence.
- Data Schema Layer
Unstructured captured data must become structured before it can support MWMS decisions.
Schema fields depend on use case.
Example Fields For AIBS Lead Discovery
candidate_id
business_name
website
domain
industry
location
contact_name
contact_role
phone
social_profiles
review_rating
review_count
booking_link_present
chatbot_present
response_gap_signal
source_url
extracted_at
lead_score
risk_notes
candidate_status
permitted_use_status
Rule
Fields must have clear definitions and data types.
- Cleaning And Normalisation Layer
Cleaning may include:
removing duplicates
normalising phone numbers
normalising URLs
standardising categories
cleaning names
removing irrelevant records
removing broken records
validating required fields
checking timestamps
checking source links
detecting incomplete records
separating text from HTML
removing spam results
language detection
speaker label normalisation
transcript formatting
date normalisation
Cleaning Status Values
Raw
Cleaning
Cleaned
Partially Cleaned
Rejected
Review Needed
Rule
MWMS must distinguish raw evidence from cleaned data.
- Identity And Deduplication Layer
Extracted records must be matched to the correct entity.
Entity types may include:
business
person
client
product
offer
website
article
video
meeting
document
listing
campaign
source
Deduplication Methods
stable source ID
URL normalisation
content hash
file hash
email message ID
transcript ID
business identifier
domain
verified email
phone number
title plus publication date
source plus external record ID
Deduplication Questions
Is this record genuinely new?
Is it an updated version?
Is it a duplicate from another source?
Is it syndicated content?
Does it belong to an existing entity?
Does the new record replace the old record?
Rule
Duplicate data should not inflate evidence, lead counts, trend strength, source confidence, or outreach volume.
- Enrichment Layer
Enrichment adds useful context.
Possible enrichment includes:
email finding
domain lookup
social profile lookup
business size estimate
industry classification
review sentiment
website technology detection
traffic estimate
ad activity detection
offer classification
buyer persona classification
contact role detection
location enrichment
AI summary
pain signal extraction
opportunity note
source confidence
content classification
Enrichment should improve actionability, not merely add data volume.
Enrichment Evidence Status
Verified Fact
Derived Value
Estimate
Inference
Classification
Unverified Enrichment
Rule
Enriched inference must not be stored as confirmed source fact.
- Change Detection And Freshness Layer
Recurring extraction should identify material changes.
Possible changes include:
price change
headline change
offer change
product availability change
policy update
new review
rating change
website redesign
new CTA
new testimonial
new job listing
new RSS item
updated article
changed product specification
new meeting commitment
new client information
Freshness Questions
When was the source published?
When was it last updated?
When was it captured?
When was it last verified?
How often should it be rechecked?
Is the record still current?
Has it been replaced?
Rule
Capture time, source publication time, source update time, and verification time must remain separate.
- Scoring And Classification Layer
Data should be scored when decisions require ranking.
Possible scores include:
lead fit score
trophy client score
offer opportunity score
affiliate opportunity score
review weakness score
local SEO opportunity score
competitor threat score
content opportunity score
trend strength score
pricing gap score
pain signal score
buyer sophistication score
AIOS fit score
source confidence score
change materiality score
Example AIBS Lead Score
Pain Signal: 25
Ability To Pay: 20
Reachability: 15
AIOS Fit: 15
Review Or Reputation Gap: 10
Website Or Conversion Gap: 10
Compliance Risk: -5
Rule
Scoring must be explainable enough for human review.
Scoring must not convert weak inference into objective fact.
- Storage And Retention Layer
Possible destinations include:
Supabase
Google Sheets
Airtable
CRM
WordPress database
MCR page
vector memory
local CSV
dashboard database
client AIOS database
research archive
raw evidence archive
Storage Questions
Is this source of truth?
Is this temporary?
Is this raw evidence?
Is this structured metrics data?
Is this research context?
Does it need semantic retrieval?
Does it need dashboarding?
Does it contain personal data?
Does it need client isolation?
How long should it be retained?
Who can access it?
Can it be deleted?
Retention Status
Active
Temporary
Historical
Stale
Replaced
Archived
Delete Requested
Deleted
Rule
The raw source and processed records must remain connected.
- Dashboard And Report Layer
Extracted data should become visible only when visibility helps a decision.
Possible outputs include:
opportunity dashboard
competitor dashboard
lead review queue
change-monitoring report
market intelligence report
client intelligence report
source health report
actor performance report
Dashboard Questions
What decision should the dashboard support?
Who reviews it?
What needs action?
What is stale?
What failed?
What changed?
What was accepted?
What was rejected?
What is permitted for downstream use?
Rule
Dashboards must support review and action.
- Brain Routing Layer
Extracted intelligence must be routed to the correct Brain.
Possible routes include:
Research Brain
Data Brain
AIBS Brain
Affiliate Brain
PPL Brain
Content Brain
Sales Brain
Experimentation Brain
HeadOffice Brain
Compliance Brain
Risk Brain
Routing should define:
destination
reason
record type
evidence
confidence
requested action
owner
Rule
Extraction does not create downstream authority.
The receiving Brain must apply its own governance.
- Governance And Compliance Layer
Governance should cover:
source authority
privacy
personal data
terms of service
client isolation
retention
deletion
sensitive data
regulated data
outreach use
recording consent
copyright
access control
Governance Review Outcomes
Approved
Approved With Conditions
Human Review Required
Restricted
Rejected
Rule
Data that is lawful or appropriate for research may not automatically be appropriate for outreach, personalisation, publication, or client delivery.
- Candidate Discovery And Permitted Downstream Use Layer
This layer governs the transition from extracted record to accepted operational candidate.
It is especially important for:
lead discovery
prospect research
local business lists
contact extraction
sales personalisation
client intelligence
personalised asset production
outbound campaign preparation
Candidate Discovery Path
Business Question
→ Search Term Or Source Definition
→ Initial Extraction
→ Candidate Record Creation
→ Relevance Review
→ Accept, Reject, Merge, Or Hold
→ Approved Enrichment
→ Identity Verification
→ Contact Confidence
→ Permitted-Use Review
→ Campaign Or Brain Readiness
→ Downstream Handoff
Candidate Status
New
Review Required
Accepted For Enrichment
Rejected
Duplicate
Merged
Hold
Enriched
Identity Verified
Permitted For Research
Permitted For Internal Analysis
Permitted For Personalisation Preparation
Permitted For Outreach Review
Restricted
Expired
Rule
An extracted search result is a candidate, not an approved prospect.
Scrape Or Reject Gate
Before deeper extraction or enrichment, the system may apply a scrape-or-reject decision.
Review criteria may include:
business relevance
target-market fit
location
industry
company type
obvious competitor status
existing client relationship
existing suppression
duplicate status
source quality
compliance risk
commercial value
Scrape Decision
Scrape
Reject
Hold
Merge
Escalate
Rule
A scrape decision authorises only the approved next extraction step.
It does not authorise outreach.
Contact Extraction Standard
Contact extraction may collect:
public business email
named business contact
phone
social profile
contact page
role
department
The system should record:
contact source
contact type
identity confidence
business versus personal status
verification status
capture date
permitted-use status
Contact Confidence
Verified
Probable
Unverified
Conflicting
Invalid
Rule
A found email address is not automatically a verified decision-maker or an outreach-ready contact.
One-Contact-Per-Company Rule
Where workflows require one primary contact, the selection logic should be explicit.
Possible priority:
verified relevant decision-maker
verified role-based contact
general business contact
contact form
no suitable contact
The system must not silently discard useful alternative contacts without preserving source evidence where retention is justified.
Permitted Downstream Use
Possible permitted uses include:
Research Only
Internal Analysis
Market Mapping
Dashboarding
Client Intelligence
Personalisation Preparation
Human Outreach Review
Campaign Eligible
Publication
Restricted
Prohibited
Each permitted-use decision should record:
record ID
source authority
identity confidence
purpose
channel
reviewer
decision
conditions
expiry
Rule
Permission for one use does not create permission for every use.
Downstream Handoff Record
Candidate ID:
Entity:
Business Question:
Source:
Raw Evidence:
Candidate Status:
Identity Status:
Enrichment Status:
Score:
Risk:
Permitted Use:
Destination Brain:
Requested Action:
Human Review:
Owner:
Expiry:
Lead Discovery And Outreach Boundary
Research Brain and Data Brain may:
discover
extract
clean
normalise
deduplicate
enrich
score
classify
prepare evidence
recommend a downstream route
They must not independently:
authorise cold outreach
send messages
generate deceptive personalisation
override suppression
decide legal compliance
publish personal data
The Outbound Lead Enrichment And Cold Outreach Governance Framework controls outreach readiness and delivery.
The Personalised Visual Sales Asset Production And Governance Framework controls personalised visual, likeness, logo, voice, and synthetic-media assets.
The AIOS Lead Capture And Conversion Infrastructure Framework controls lead and conversion progression after a legitimate response or lead event.
Rule
Extraction authority stops before communication authority.
Actor Registry Standard
MWMS should maintain an Actor Registry.
Actor Name:
Brain Owner:
Purpose:
Source:
Capture Method:
Input Fields:
Output Fields:
Raw Evidence Location:
Run Frequency:
Destination Table:
Dashboard Or Report:
Compliance Notes:
Cost:
Status:
Version:
Last Tested:
Failure Notes:
Rule
Actors should be registered before becoming operational dependencies.
Source Pipeline Registry Standard
Pipeline Name:
Source Type:
Source Owner:
Permission Status:
Capture Method:
Actor Or Workflow:
Run Frequency:
Raw Evidence Location:
Processed Destination:
Deduplication Method:
Change Detection Rule:
Candidate Review Gate:
Permitted Use Rule:
Retention Rule:
Brain Destination:
Review Owner:
Status:
Rule
Recurring source intake must have an identifiable owner and maintenance status.
Data Extraction Request Template
Request Name:
Requesting Brain:
Business Question:
Decision Supported:
Source Or Sources:
Source Authority:
Data Needed:
Raw Evidence Needed: Yes / No
One-Time Or Recurring:
Watch Or Historical Retrieval:
Preferred Method:
Output Fields:
Destination:
Dashboard Needed: Yes / No
Scoring Needed: Yes / No
Candidate Review Needed: Yes / No
Change Detection Needed: Yes / No
Deduplication Method:
Permitted Downstream Use:
Retention Rule:
Compliance Risk:
Human Review Needed:
Action After Extraction:
Owner:
Due Date:
Extraction Output Template
Extraction Name:
Date:
Source Or Sources:
Source Authority:
Method Used:
Actor Or Workflow Used:
Raw Evidence Preserved:
Records Extracted:
Candidates Created:
Records Accepted:
Records Rejected:
Records Held:
Duplicates Detected:
Changes Detected:
Identity Verified:
Enrichment Completed:
Permitted Use Decisions:
Data Quality Notes:
Key Findings:
Top Opportunities:
Risks Or Compliance Notes:
Recommended Brain Routing:
Recommended Action:
Next Run Needed:
Retention Status:
Data Extraction Scorecard
Decision Value: 20
Data Availability: 15
Source Reliability: 10
Extraction Feasibility: 10
Repeat Use Potential: 10
Dashboard Or Reporting Value: 10
Commercial Value: 10
Compliance Risk: -10
Maintenance Burden: -5
MWMS Strategic Fit: 10
Interpretation
80 Or Higher
Strong extraction candidate.
65 To 79
Useful. Test carefully.
50 To 64
Use one-time or manual research first.
Below 50
Park or reject.
Rule
Do not build extraction infrastructure for low-value data.
Application To Research Brain
Research Brain is the primary owner of this framework.
Research Brain should:
define research questions
select sources
request extraction
interpret extracted data
create market intelligence
support avatar definition
validate niches
monitor competitors
detect trends
manage recurring source monitoring
review candidates
route intelligence
Research Brain Rule
Research Brain must convert extracted data into market understanding, not merely datasets.
Application To Data Brain
Data Brain owns:
schemas
tables
source records
data quality
field definitions
source tracking
identity matching
deduplication
change records
candidate states
permitted-use states
dashboard feeds
retention
deletion
actor registry
source pipeline registry
pipeline monitoring
Data Brain Rule
Data Brain must prevent raw extraction from becoming intelligence debt.
Application To AIBS Brain
AIBS may use this framework for:
prospect lists
trophy client scoring
local business opportunity research
review and reputation gaps
lead-capture AIOS candidates
AI audit targets
vertical AIOS research
competitor offers
client intelligence reports
AIBS Rule
AIBS should use extracted data to find better clients and build stronger AIOS packages.
Application To Sales Brain
Sales Brain may use accepted and permitted records for:
outreach relevance
recent observations
prospect prioritisation
human review
personalisation preparation
Sales Brain must apply:
contact verification
outreach governance
suppression checks
channel authority
message approval
Sales Rule
Extraction does not authorise contact.
Application To Affiliate Brain
Affiliate Brain may use data extraction for:
competitor affiliate pages
product angles
testimonials
sales claims
pricing
bonus stacks
content gaps
niche demand
ad angles
offer changes
Rule
Extracted intelligence should improve selection and testing, not support copying.
Application To PPL Brain
PPL Brain may use extraction for:
lead verticals
form flows
buyer categories
local demand
competitor lead-generation pages
compliance signals
offer economics
conversion friction
Rule
PPL extraction must respect lead-handling and compliance rules.
Application To Content Brain
Content Brain may mine:
reviews
comments
competitor blogs
YouTube titles
YouTube transcripts
questions
forums
search results
RSS feeds
social themes
customer pain language
Rule
Extracted content signals should become original MWMS content strategy.
Application To Automation Brain
Automation Brain may operate approved workflows.
It owns:
actor execution
schedules
retries
failure handling
status polling
logs
alerts
cost visibility
Automation Brain does not decide:
source authority
business purpose
permitted outreach
publication authority
Rule
Automation authority must remain narrower than governance authority.
Failure Modes
Failure Mode 1: Tool First Extraction
A scraper is selected before a business question exists.
Correction
Define the intelligence need first.
Failure Mode 2: Public Means Permitted
Accessible data is treated as unrestricted.
Correction
Apply source-authority and use-purpose review.
Failure Mode 3: Search Result Becomes Prospect
Every result is treated as outreach-ready.
Correction
Create a candidate record and review gate.
Failure Mode 4: Scrape Approval Becomes Outreach Approval
A record approved for enrichment is automatically contacted.
Correction
Separate extraction, permitted use, and communication authority.
Failure Mode 5: Found Email Becomes Decision-Maker
A generic or unverified email is treated as a named buyer.
Correction
Record contact type and confidence.
Failure Mode 6: Duplicate Companies Inflate Opportunity
The same business appears through multiple results.
Correction
Use domain and identity deduplication.
Failure Mode 7: AI Enrichment Becomes Fact
Inferred pain or company size is stored as verified.
Correction
Label enrichment status.
Failure Mode 8: No Raw Evidence
Cleaned records cannot be checked.
Correction
Preserve source-linked evidence.
Failure Mode 9: Candidate Rejection Is Lost
Rejected records reappear in later campaigns.
Correction
Persist rejection, duplicate, and suppression states.
Failure Mode 10: Downstream Use Is Undefined
Data moves into sales or personalisation without review.
Correction
Record permitted-use status.
Failure Mode 11: Stale Contacts Are Reused
Old contact data is treated as current.
Correction
Apply freshness and verification rules.
Failure Mode 12: One Contact Per Company Selection Is Arbitrary
The first found email is used.
Correction
Define contact-priority logic.
Failure Mode 13: Permanent Over-Collection
Every field is retained indefinitely.
Correction
Apply minimisation and retention.
Failure Mode 14: Actor Failure Is Silent
The pipeline produces incomplete records without warning.
Correction
Log status, coverage, and failure.
Failure Mode 15: Score Hides Weak Evidence
A numerical score appears objective.
Correction
Preserve explainable components and confidence.
Failure Mode 16: Data Volume Is Mistaken For Value
The team celebrates thousands of records.
Correction
Measure accepted candidates, useful decisions, and commercial outcomes.
Drift Protection
This framework protects MWMS from:
scraping without purpose
unrestricted data collection
tool-as-architecture thinking
weak source authority
lost raw evidence
duplicate records
stale data
unverified enrichment
lead-count inflation
scrape-to-send automation
wrong-contact use
unclear downstream permission
unowned recurring pipelines
data hoarding
Drift Signals
Watch for:
“Scrape everything.”
“It is public.”
“We found an email.”
“Just send to all of them.”
“The actor returned a thousand records.”
“AI says they are a good fit.”
“We can clean it later.”
“Duplicates do not matter.”
“The source link is not needed.”
“We approved the scrape, so outreach is fine.”
“Use the first contact.”
“Keep all fields forever.”
Rule
When these signals appear, return to purpose, source authority, evidence, identity, deduplication, candidate review, permitted use, and Brain routing.
Strategic Summary
The durable value of actor infrastructure is not volume.
It is the ability to turn authorised sources into structured, reviewable, decision-ready intelligence.
The later lead-generation material strengthens this framework by making the transition from search result to operational candidate explicit.
The controlled pathway is:
Keyword Or Source
→ Extraction
→ Candidate
→ Review
→ Accept, Reject, Merge, Or Hold
→ Enrichment
→ Identity Verification
→ Permitted-Use Decision
→ Brain Handoff
This prevents extraction systems from silently becoming uncontrolled outreach systems.
Final Standard
Every data extraction and source-intake workflow must begin with a business question and source authority check and end with structured, source-visible, deduplicated, freshness-aware, reviewed, permitted-use-controlled, routed intelligence.
A valid extraction system must define:
intelligence need
source authority
source
capture method
actor or workflow
raw evidence rule
schema
cleaning rules
identity and deduplication rules
enrichment rules
change detection rules
freshness rules
scoring logic
candidate review
scrape or reject decision where applicable
contact confidence
permitted downstream use
storage destination
retention and deletion rules
dashboard or report
Brain routing
compliance review
action after extraction
That is the MWMS Data Extraction And Actor Infrastructure Standard.
MWMS System Change Log
Version: v1.2
Date: 2026-06-21
Author: HeadOffice
Change
Updated the MWMS Data Extraction And Actor Infrastructure Framework from v1.1 to v1.2 using the later AI Automations by Jack material covering:
keyword-driven business discovery
search-result extraction
human scrape-or-reject decisions
contact-detail extraction
website enrichment
email and social-profile discovery
candidate selection
company deduplication
contact-confidence review
downstream outreach preparation
Expanded the existing sixteen-layer model into a seventeen-layer model.
Added:
- Candidate Discovery And Permitted Downstream Use Layer
Added standards covering:
- candidate records
- candidate status
- scrape-or-reject gates
- accepted-for-enrichment status
- contact extraction
- contact-confidence classification
- one-contact-per-company selection
- permitted downstream use
- research versus outreach authority
- candidate handoff records
- campaign-readiness boundaries
- persistent rejection and suppression states
Expanded the schema to include:
- candidate_status
- permitted_use_status
Expanded the Source Pipeline Registry with:
- Candidate Review Gate
- Permitted Use Rule
Expanded the Data Extraction Request Template with:
- Candidate Review Needed
- Permitted Downstream Use
Expanded the Extraction Output Template with:
- Candidates Created
- Records Held
- Identity Verified
- Enrichment Completed
- Permitted Use Decisions
Added explicit doctrine that:
- an extracted search result is a candidate, not an approved prospect
- a scrape decision authorises only the next extraction step
- a found email is not automatically a verified decision-maker
- permission for research does not create permission for outreach
- extraction authority stops before communication authority
Change Impact Declaration
This update materially strengthens the lead-discovery and downstream-use boundary without changing the framework’s primary ownership.
Research Brain remains responsible for:
- intelligence need
- source selection
- interpretation
- candidate relevance
- Brain routing
Data Brain remains responsible for:
- schemas
- source records
- identity
- deduplication
- candidate states
- permitted-use states
- storage
- retention
- deletion
Automation Brain may execute approved extraction and enrichment workflows but does not determine outreach authority.
Sales Brain and AIBS Brain may receive accepted candidate records but must apply their own communication, compliance, asset, and campaign governance.
The update does not authorise:
- unrestricted scraping
- automatic outreach
- automatic personalised asset delivery
- use of unverified contacts
- suppression overrides
- publication of personal data
Pages Created
- None
Pages Updated
- MWMS Data Extraction And Actor Infrastructure Framework updated from v1.1 to v1.2
Pages Deprecated
- None
Standalone Pages Not Created
The following standalone pages were not created because their durable intelligence is governed within this updated framework:
- MWMS Lead Scraping Framework
- MWMS Candidate Discovery Framework
- MWMS Scrape Or Reject Gate Standard
- MWMS Contact Extraction Framework
- MWMS Contact Confidence Standard
- MWMS Permitted Data Use Framework
- MWMS Lead List Preparation Framework
Registries Requiring Update
- MCR Page Registry
- Research Brain Page Registry
- Data Brain Page Registry where this framework is operationally referenced
- MCR Copy Map where the framework version is recorded
- MWMS Course Absorption Decision Registry
Canon Version Update Required
No immediate Research Brain Canon or Data Brain Canon version change is required unless either Canon directly records framework versions or contains candidate-use rules that conflict with v1.2.
The candidate review, scrape-or-reject, contact-confidence, and permitted-use controls should be included during the next scheduled Research Brain and Data Brain Canon alignment review.
Change Log Entry Required
Yes.
The v1.2 update must be recorded in:
- MWMS System Change Log
- MCR Page Registry change history where applicable
- Research Brain Page Registry change history where applicable
- Data Brain Page Registry change history where applicable
- MWMS Course Absorption Decision Registry
Strategic Absorption Result
The later AI Automations by Jack lead-generation material has been absorbed into the existing MWMS Data Extraction And Actor Infrastructure Framework.
The absorption preserves:
- keyword-driven discovery
- search-result extraction
- contact extraction
- enrichment
- human review
- candidate selection
- structured records
- Brain routing
The absorption rejects:
- every search result being treated as a prospect
- scrape approval being treated as outreach approval
- every found email being treated as a verified decision-maker
- data volume being treated as commercial value
- rejected records being forgotten and rediscovered
- downstream use occurring without an explicit permitted-use decision
The resulting v1.2 framework establishes that MWMS extraction pipelines must separate:
- discovery
- capture
- candidate review
- enrichment
- identity verification
- permitted use
- Brain handoff
- communication authority
END OF FULL FILE OUTPUT