MWMS Data Extraction And Actor Infrastructure Framework

System: MWMS
Document Type: Operating Framework
Authority Level: MCR Source Of Truth
Status: Draft For MCR
Version: v1.0
Primary Location: MCR
Future Operational Destination: Research Brain, Data Brain, AIBS Brain, Affiliate Brain, PPL Brain, Content Brain, Sales Brain, Experimentation Brain, HeadOffice Brain, Compliance Brain, Risk Brain, Automation Brain
Parent Page: Research Brain
Owner: Martyn
Developer Boundary: Do Not Touch M’s Active Build Areas Unless Specifically Assigned
Source Of Truth: MCR
Last Reviewed: 2026-06-04
Source / Origin: AI Automations by Jack — Commercialization Block / Apify Masterclass — How the 1% Are Building / Lead Generation Systems / Productized AIOS Service Packaging / Case Study Pattern Library
MWMS Classification: Research Brain Infrastructure Framework / Data Extraction Standard / Actor-Based Intelligence Pipeline / Market Monitoring Framework / Data-Driven AIOS Support System
Primary Brain: Research Brain
Supporting Brains: Data Brain, AIBS Brain, Affiliate Brain, PPL Brain, Content Brain, Sales Brain, Experimentation Brain, HeadOffice Brain, Compliance Brain, Risk Brain, Automation Brain, Product Brain

Related Pages: Research Brain Canon, Data Brain Canon, AIBS Brain Canon, MWMS Productized AIOS Service Packaging And Scope Control Framework, MWMS AIOS Lead Capture And Conversion Infrastructure Framework, MWMS High-Ticket AIOS Client Acquisition And Trophy Client Framework, MWMS AIBS Case Study Pattern Library And Offer Replication Framework, MWMS Business Brain Copilot Architecture Framework, MWMS Dashboard-First Client AIOS Offer Framework, MWMS Offer And Niche Selection Framework, MWMS Outbound Lead Enrichment And Cold Outreach Governance Framework, MWMS Client Intelligence Report Automation Framework, MWMS Market Driven Social Content Production Framework, MWMS Source Visibility And Evidence Display Standard, MWMS AI Tool Permission And Access Framework, MWMS AI Automation Security And Risk Checklist, HeadOffice Kaizen Continuous Improvement Loop

Source Evidence: This framework is derived primarily from the Apify Masterclass material, which showed Apify actors, scraping/data APIs, MCP-style actor discovery, actor-store infrastructure, agentic data extraction workflows, and examples where web data can power e-commerce intelligence, real estate intelligence, lead enrichment, and SaaS-style tools. It is also supported by the lead generation and productized AIOS material, where scraped/enriched data becomes useful only when connected to ICP clarity, dashboards, CRM workflows, offer intelligence, or commercial decision-making.

Purpose

The purpose of the MWMS Data Extraction And Actor Infrastructure Framework is to define how MWMS uses structured web data extraction, actor-based automation, scraping systems, APIs, enrichment workflows, and research pipelines to support better decisions across the MWMS ecosystem.

This framework exists because Research Brain and Data Brain must become stronger than manual searching.

MWMS cannot rely only on:

random Google searches
scattered course notes
manual competitor checking
one-off newsletter insights
unstructured browsing
screenshots
guesswork
memory
isolated chat analysis

As MWMS grows, it needs repeatable systems that can collect, structure, enrich, monitor, and route data from external sources.

This includes:

affiliate product research
competitor monitoring
offer intelligence
ad intelligence
marketplace research
lead list building
local business research
e-commerce intelligence
review mining
pricing monitoring
content opportunity research
niche validation
customer/avatar research
trend detection
case study extraction
client intelligence reports

The core purpose is:

Turn external web data into structured MWMS intelligence that can be searched, scored, routed, tested, and used by the right Brain.

Core Doctrine

The MWMS doctrine is:

Data extraction is not valuable by itself.
Data extraction becomes valuable when it feeds a decision, dashboard, offer, campaign, experiment, or client system.

A scraper is not the asset.

An actor is not the asset.

An API is not the asset.

The asset is the structured intelligence produced from the data.

MWMS should never build data extraction systems just because they are technically possible.

Every data extraction workflow must answer:

What decision will this support?
Which Brain needs the data?
What fields are needed?
How fresh does the data need to be?
What action will the data trigger?
How will the data be stored?
How will the data be scored?
What compliance risks exist?
What dashboard or report will show the value?
What is the cost of maintaining this data flow?

Strategic Importance

This framework is strategically important because Research Brain must become one of the strongest Brains in the MWMS ecosystem.

Research comes before:

avatar creation
offer selection
content strategy
ad strategy
AIBS packaging
PPL targeting
affiliate product evaluation
client acquisition
experiment design
market opportunity scoring

If research is weak, the downstream Brains can create polished work for the wrong market.

This has already been a concern in MWMS:

Without strong avatar, market, and evidence layers, Content Brain, Ads Brain, Affiliate Brain, PPL Brain, and AIBS Brain may act on assumptions instead of reality.

The Apify material is valuable because it reframes scraping and actors as infrastructure, not one-off tricks. Actors can become reusable data units that support apps, dashboards, intelligence reports, lead enrichment, market monitoring, competitor tracking, and SaaS-style systems.

For MWMS, the lesson is:

Research Brain needs reusable data infrastructure, not just manual research skill.

Definition

A data extraction workflow is a repeatable process that collects information from external or internal sources and converts it into structured data.

An actor is a reusable automation component that performs a defined extraction or processing job, such as scraping a website, collecting listings, extracting reviews, monitoring a page, or enriching a dataset.

An actor infrastructure layer is the system that stores, runs, monitors, and routes these extraction components.

A research pipeline is the full pathway from source selection to extraction, cleaning, enrichment, scoring, storage, dashboarding, and Brain routing.

MWMS Definition

The MWMS Data Extraction And Actor Infrastructure Framework is:

Research Brain and Data Brain’s standard for converting external web data, scraped information, marketplace signals, competitor intelligence, lead data, review data, product data, and market signals into structured, governed, reusable intelligence pipelines that support MWMS decisions, dashboards, offers, campaigns, and client AIOS systems.

Scope

This framework applies to:

Research Brain market research
Data Brain structured intelligence
Affiliate Brain product research
PPL offer research
AIBS client research
competitor monitoring
offer intelligence
ad intelligence
review mining
lead enrichment
Google Maps-style business research
marketplace scraping
e-commerce product monitoring
real estate data extraction
social proof monitoring
price monitoring
ranking/visibility monitoring
AI tool monitoring
newsletter intelligence enrichment
case study extraction
client intelligence reports
content opportunity systems
data-backed dashboards
actor-based SaaS or micro-app infrastructure

This framework applies whenever MWMS uses extraction or scraping to support a business decision.

Core Principle

The core principle is:

Extract only what MWMS can use, structure, verify, govern, and act on.

A data extraction workflow should not create a pile of raw data.

It should create usable intelligence.

Usable intelligence means:

structured
cleaned
timestamped
source-linked
scored where useful
routed to the right Brain
connected to a decision
displayed where useful
compliant enough for the use case
not over-collected

The MWMS Data Extraction And Actor Infrastructure Model

Every extraction system should be designed across twelve layers:

Intelligence Need Layer
Source Selection Layer
Extraction Method Layer
Actor / Automation Layer
Data Schema Layer
Cleaning And Normalisation Layer
Enrichment Layer
Scoring And Classification Layer
Storage Layer
Dashboard / Report Layer
Brain Routing Layer
Governance And Compliance Layer

1. Intelligence Need Layer

The first step is not choosing a scraper.

The first step is identifying the intelligence need.

Intelligence Need Questions

Ask:

What are we trying to learn?
Which Brain needs the answer?
What decision depends on this?
Is this for affiliate, PPL, AIBS, content, ads, research, or client work?
Is this one-time research or recurring monitoring?
How fresh must the data be?
What fields are needed?
What output is required?
What will happen if the data confirms the hypothesis?
What will happen if the data contradicts the hypothesis?

Example Intelligence Needs

Find local businesses with poor follow-up signals.
Monitor competitor offers in a niche.
Extract product pricing and positioning from e-commerce sites.
Identify affiliate offers with strong market demand.
Gather reviews to understand customer pain language.
Scrape job boards to detect demand for services.
Collect YouTube titles to identify content angles.
Monitor landing pages for offer changes.
Extract Google Maps business categories for outreach.
Build lead lists for AIBS acquisition.
Track AI tool categories for MWMS opportunities.

Rule

No extraction workflow should start without a defined intelligence need.

2. Source Selection Layer

The source must match the question.

Possible sources include:

Google Maps
business directories
review platforms
marketplaces
e-commerce sites
affiliate marketplaces
job boards
YouTube
LinkedIn where permitted
Reddit
competitor websites
landing pages
app stores
product directories
newsletters
public datasets
ad libraries
real estate sites
government/public registers
client-owned websites
client CRMs
internal MWMS records

Source Selection Questions

Ask:

Is this source public?
Is this source reliable?
Is this source allowed to be scraped or accessed?
Does the source contain the needed fields?
How often does it change?
Is the source stable?
Is there an official API?
Is there an actor already available?
Is manual research safer?
Is the data worth the extraction cost?

Rule

Use the safest, cleanest, most reliable source that answers the question.

3. Extraction Method Layer

Choose the correct extraction method.

Possible methods:

manual research
browser extraction
official API
Apify actor
custom actor
Firecrawl-style crawling
sitemap extraction
RSS/API feed
Google Sheet import
webhook
CSV upload
CRM export
scraping script
AI-assisted extraction
screenshot/visual extraction where necessary
MCP-connected actor discovery

The Apify material is useful because it shows that actors can act as reusable extraction units and that prebuilt actors can sometimes avoid custom scraping work.

Method Selection Questions

Ask:

Is there an official API?
Is there a prebuilt actor?
Is custom scraping needed?
Is manual extraction enough for now?
Is this a one-time job or recurring job?
What is the cost?
What is the maintenance burden?
What happens if the site changes?
Is the data sensitive?
Is scraping allowed?

Rule

Do not custom-build extraction when a safer, cheaper, reliable method exists.

4. Actor / Automation Layer

Actors should be treated as reusable infrastructure.

An actor should have a clear job.

Actor Types

Possible actor types:

business directory extractor
Google Maps business extractor
review extractor
e-commerce product extractor
competitor page monitor
landing page scraper
job board scraper
social profile extractor
YouTube metadata extractor
real estate listing extractor
price monitor
affiliate offer monitor
ad library extractor
contact enrichment actor
content angle extractor
market trend actor

Actor Definition Fields

Every actor should define:

Actor Name:
Purpose:
Source:
Input Required:
Output Fields:
Run Frequency:
Owner:
Destination:
Cost:
Risk Level:
Failure Mode:
Compliance Notes:
Last Tested:

Rule

An actor must have a business purpose, not just a technical purpose.

5. Data Schema Layer

Extraction output must be structured.

A schema defines what fields are captured.

Schema Questions

Ask:

What fields are required?
What fields are optional?
What fields support scoring?
What fields support dashboarding?
What fields support Brain routing?
What fields are risky or sensitive?
What fields need source URLs?
What fields need timestamps?
What fields need deduplication?
What fields need manual review?

Example Fields For Business Lead Extraction

business_id
business_name
category
website
phone
email
location
review_count
average_rating
recent_review_date
website_status
booking_link_present
chatbot_present
response_gap_signal
source_url
extracted_at
lead_score
risk_notes

Example Fields For Competitor Offer Extraction

competitor_id
competitor_name
website
offer_name
headline
price
CTA
proof_elements
guarantee
upsells
lead magnet
ad angle
landing_page_url
last_seen_at
change_detected
MWMS_opportunity_note

Rule

Unstructured scraped data must become structured before it can support MWMS decisions.

6. Cleaning And Normalisation Layer

Raw extracted data is rarely clean.

Cleaning may include:

removing duplicates
normalising phone numbers
normalising URLs
standardising categories
cleaning names
removing irrelevant records
removing broken records
validating required fields
checking timestamps
checking source links
detecting incomplete records
separating text from HTML
removing spam results
language detection

Rule

Dirty data creates bad decisions.

7. Enrichment Layer

Enrichment adds useful context.

Possible enrichment:

email finding
domain lookup
social profile lookup
business size estimate
industry classification
review sentiment
website technology detection
traffic estimate
ad activity detection
offer classification
buyer persona classification
contact role detection
location enrichment
AI summary
pain signal extraction
opportunity note

The lead generation material supports enrichment as part of the acquisition workflow, where raw scraped lists become more useful after ICP filtering, contact discovery, personalisation, and follow-up structure.

Rule

Enrichment should improve actionability, not just add noise.

8. Scoring And Classification Layer

Data should be scored when decisions require ranking.

Scoring Examples

Possible scores:

lead fit score
trophy client score
offer opportunity score
affiliate opportunity score
review weakness score
local SEO opportunity score
competitor threat score
content opportunity score
trend strength score
pricing gap score
pain signal score
buyer sophistication score
AIOS fit score

Example AIBS Lead Score

Pain Signal: 25
Ability To Pay: 20
Reachability: 15
AIOS Fit: 15
Review / Reputation Gap: 10
Website / Conversion Gap: 10
Compliance Risk: -5

Rule

Scoring must be explainable enough for human review.

9. Storage Layer

Data needs the correct destination.

Possible destinations:

Supabase
Google Sheets
Airtable
CRM
WordPress database
MCR page
vector memory
local CSV
dashboard database
client AIOS database
research archive

Storage Questions

Ask:

Is this source-of-truth?
Is this temporary?
Is this structured metrics data?
Is this research context?
Does it need retrieval?
Does it need dashboarding?
Does it contain personal data?
Does it need client isolation?
How long should it be retained?
Who can access it?

Rule

Metrics and structured records belong in databases.
Canonical rules belong in MCR.
Long-form research archives may belong in retrieval systems.

10. Dashboard / Report Layer

Extracted data should often become visible.

Possible outputs:

lead list dashboard
competitor change dashboard
affiliate offer intelligence report
local business opportunity map
review mining report
content opportunity dashboard
market trend report
product pricing dashboard
client intelligence report
weekly research digest
offer comparison table
experiment hypothesis board

Dashboard Questions

Ask:

Who needs to see this?
What decision must be made?
What score matters?
What changed since last run?
What should be acted on?
What should be ignored?
What should be routed?
What should be tested?

Rule

Dashboards must support decisions, not just display scraped data.

11. Brain Routing Layer

Extracted intelligence must route to the correct Brain.

Routing Examples

Route to Research Brain when:

market research
avatar research
competitor research
trend detection
niche validation

Route to Data Brain when:

schema
storage
scoring
dashboarding
data quality

Route to AIBS Brain when:

client lead opportunities
AIOS package ideas
business process signals
local business opportunities

Route to Affiliate Brain when:

product opportunity
ClickBank/vendor intelligence
competitor affiliate pages
ad angles
offer market demand

Route to PPL Brain when:

lead buyer categories
local demand
offer verticals
form/conversion patterns

Route to Content Brain when:

content topics
customer pain language
competitor content gaps
YouTube/article opportunities

Route to Experimentation Brain when:

hypothesis created
market test needed
offer test needed
acquisition test needed

Route to Compliance Brain when:

scraping risk
personal data risk
platform policy risk
regulated sector risk

Rule

Extracted data is not absorbed until it is routed.

12. Governance And Compliance Layer

Data extraction can create risk.

Governance must be included.

Risk Areas

website terms
platform rules
personal data
email scraping
cold outreach compliance
copyright
sensitive data
regulated industries
scraping frequency
server load
data retention
client data isolation
hallucinated enrichment
inaccurate scoring
outdated extracted data
use of data beyond permitted context

Compliance Questions

Ask:

Is this public data?
Are we allowed to access it this way?
Does this collect personal data?
Is contact data being used for outreach?
Is opt-out/suppression needed?
Is the data stored securely?
Is the source timestamped?
Is the data being sold, republished, or only used internally?
Is this for a client?
Does jurisdiction matter?

Rule

The ability to extract data does not mean MWMS should extract or use it.

Standard Data Extraction Pipeline

The standard pipeline is:

Define intelligence need.
Identify source.
Select extraction method.
Choose or build actor.
Define schema.
Run test extraction.
Clean and normalise data.
Enrich only where useful.
Score/classify records.
Store in correct system.
Display in dashboard/report where useful.
Route to relevant Brain.
Review compliance and risk.
Decide action.
Schedule repeat extraction if needed.

Actor Selection Rule

Before building a custom actor, check:

Is manual research enough?
Is there an official API?
Is there a trusted existing actor?
Is there a simpler scraping method?
Is the data worth recurring extraction?
Is the source stable?
Is compliance risk acceptable?
Does the output justify maintenance?

Rule

Custom actors should be built only when the data value justifies the maintenance burden.

One-Time vs Recurring Extraction Rule

Not all extraction needs automation.

One-Time Extraction

Use for:

single research task
initial market scan
quick validation
small dataset
one-off client audit
course absorption enrichment
simple competitor snapshot

Recurring Extraction

Use for:

competitor monitoring
price monitoring
offer tracking
lead pipeline generation
review monitoring
content trend tracking
affiliate opportunity tracking
client intelligence reporting
dashboard updates

Rule

Do not create recurring infrastructure for one-time curiosity.

Data Quality Standard

Every extraction workflow should include data quality checks.

Quality Checks

Check:

duplicate rate
missing fields
invalid URLs
invalid emails
old records
wrong category
irrelevant results
source mismatch
broken extraction
language mismatch
hallucinated enrichment
incomplete scrape
timestamp missing

Rule

If the data quality is poor, do not route it into decision-making.

Source Visibility Standard

Every important extracted record should include source visibility.

This connects directly to the MWMS Source Visibility And Evidence Display Standard.

Source Fields

source_url
source_name
extracted_at
actor_name
actor_run_id
source_type
confidence_level
last_verified_at

Rule

MWMS must be able to trace extracted intelligence back to its source.

Data Extraction Use Cases For MWMS

Use Case 1: Affiliate Product Intelligence

Research Brain / Affiliate Brain may extract:

vendor pages
affiliate pages
competitor review pages
pricing
claims
proof elements
VSL angle
ad angles
testimonials
refund/risk signals
seasonal demand
content gaps

Output

Affiliate Product Intelligence report.

Use Case 2: PPL Offer Research

Research Brain / PPL Brain may extract:

lead buyer categories
form flows
landing pages
vertical demand
local markets
competitor CPL offers
compliance notes
conversion pathway elements

Output

PPL offer opportunity map.

Use Case 3: AIBS Client Lead Research

Research Brain / AIBS Brain may extract:

local business categories
review gaps
website quality
booking link presence
chatbot presence
missed-call signals where inferable
CRM/tech stack hints
business size
contact details where appropriate
AIOS fit score

Output

High-ticket AIOS prospect list.

Use Case 4: Competitor Intelligence

Research Brain may extract:

competitor offer pages
pricing changes
new CTAs
lead magnets
guarantees
testimonials
case studies
blog topics
funnel changes

Output

Competitor change dashboard.

Use Case 5: Content Opportunity Mining

Content Brain may use extracted data to identify:

repeated customer questions
review pain language
competitor content gaps
YouTube title patterns
popular post themes
unanswered objections
niche terminology

Output

Content opportunity dashboard.

Use Case 6: Client Intelligence Reports

AIBS / Data Brain may create client reports using extracted data:

competitor changes
review insights
local market signals
content gaps
offer opportunities
lead opportunities
search visibility issues
customer sentiment themes

Output

Monthly client intelligence report.

Use Case 7: Real Estate / Property Intelligence

The Apify material included an example of scraped real-estate/MLS-style data being used to support investor offer generation, with the automation helping analyse opportunities and create faster offer workflows.

MWMS may later use this pattern for:

property lead research
investor intelligence
offer workflow support
data-backed deal dashboards

Output

Property opportunity dashboard or AIOS.

Actor Registry Standard

MWMS should eventually maintain an Actor Registry.

Registry Fields

Actor Name:
Brain Owner:
Purpose:
Source:
Input Fields:
Output Fields:
Run Frequency:
Destination Table:
Dashboard / Report:
Compliance Notes:
Cost:
Status: Active / Paused / Deprecated / Experimental
Last Tested:
Failure Notes:

Rule

Actors should be registered before they become operational dependencies.

Data Extraction Request Template

Use this template when asking Research/Data Brain to create or evaluate an extraction workflow.

Request Name:
Requesting Brain:
Business Question:
Decision Supported:
Source(s):
Data Needed:
One-Time Or Recurring:
Preferred Method: Manual / API / Actor / Scraper / Unknown
Output Fields:
Destination:
Dashboard Needed: Yes / No
Scoring Needed: Yes / No
Compliance Risk: Low / Medium / High
Human Review Needed: Yes / No
Action After Extraction:
Owner:
Due Date:

Extraction Output Template

Every completed extraction should output:

Extraction Name:
Date:
Source(s):
Method Used:
Actor / Workflow Used:
Records Extracted:
Records Accepted:
Records Rejected:
Data Quality Notes:
Key Findings:
Top Opportunities:
Risks / Compliance Notes:
Recommended Brain Routing:
Recommended Action:
Next Run Needed: Yes / No

Data Extraction Scorecard

Score extraction opportunities before building.

Interpretation

80+ Strong extraction candidate
65–79 Useful; test carefully
50–64 One-time/manual research first
Below 50 Park or reject

Rule

Do not build extraction infrastructure for low-value data.

Application To Research Brain

Research Brain is the primary owner of this framework.

Research Brain should use it to:

define research questions
select sources
request extraction
interpret extracted data
create market intelligence
support avatar definition
validate niches
monitor competitors
detect trends
route intelligence to other Brains

Research Brain Rule

Research Brain must convert extracted data into market understanding, not just datasets.

Application To Data Brain

Data Brain owns structure and storage.

Data Brain should define:

schemas
tables
data quality rules
field definitions
source tracking
deduplication rules
dashboard feeds
retention rules
actor registry
data pipeline monitoring

Data Brain Rule

Data Brain must prevent raw extraction from becoming messy intelligence debt.

Application To AIBS Brain

AIBS Brain uses extraction for client opportunities.

AIBS can use this framework to support:

prospect lists
trophy client scoring
local business opportunity research
review/reputation gaps
lead capture AIOS candidates
AI audit targets
vertical AIOS research
competitor offers
client intelligence reports

AIBS Rule

AIBS should use extracted data to find better clients and build better AIOS packages.

Application To Affiliate Brain

Affiliate Brain uses extraction to support product and market research.

Affiliate Brain may use data extraction for:

competitor affiliate pages
product angles
testimonials
sales claims
pricing
bonus stacks
content gaps
niche demand
ad angles
offer changes

Affiliate Rule

Affiliate Brain should use extraction to improve offer selection and angle testing, not copy competitors blindly.

Application To PPL Brain

PPL Brain uses extraction to understand lead markets.

PPL Brain may use extraction for:

lead verticals
form flows
buyer categories
local market demand
competitor lead gen pages
compliance signals
offer economics
conversion friction

PPL Rule

PPL extraction must respect compliance and lead handling rules.

Application To Content Brain

Content Brain uses extraction for content opportunities.

Content Brain may mine:

reviews
comments
competitor blogs
YouTube titles
questions
forums
search results
social themes
customer pain language

Content Rule

Extracted content signals should become original MWMS content strategy, not copied content.

Application To Sales Brain

Sales Brain uses extraction for outreach relevance.

Sales Brain may use extracted data to generate:

recent observations
specific outreach angles
buyer pain notes
proof opportunities
industry trends
lead scoring
account research

Sales Rule

Extracted data should improve relevance, not create spam.

Application To Experimentation Brain

Experimentation Brain uses extracted data to create test hypotheses.

Examples:

test this niche
test this offer angle
test this lead source
test this content topic
test this AIBS package
test this outreach message
test this avatar

Experimentation Rule

Extraction should feed experiments, not replace experiments.

Application To Compliance And Risk Brain

Compliance and Risk Brain review extraction workflows.

They should check:

scraping legality/terms
personal data
cold outreach use
data storage
consent
platform policies
sensitive industries
client data boundaries
data reuse rules
deletion requirements

Compliance Rule

If extracted data will be used for outreach or client work, compliance review is required.

Application To Automation Brain

Automation Brain may help run extraction pipelines.

Automation Brain should define:

triggers
schedules
actor runs
webhooks
retries
failure alerts
cost controls
logs
handoffs
dashboard updates

Automation Rule

Recurring extraction must be monitored like a system, not treated as a one-off script.

Drift Protection

This framework protects MWMS from:

scraping without purpose
collecting data that is never used
actor/tool chasing
creating data mess
ignoring source visibility
using outdated extracted data
feeding bad data into decisions
violating platform or privacy rules
building recurring pipelines for one-time needs
overcomplicating Research Brain
making M build extraction systems without a decision need
mistaking data volume for intelligence
using lead data without compliance review
copying competitor content instead of extracting insight
using Apify/actors as the strategy instead of infrastructure

Drift Signals

Watch for:

“Let’s scrape everything”
no defined research question
no target Brain
no data schema
no source URL field
no timestamp
no data quality check
no compliance review
no dashboard/report
no action after extraction
no actor owner
no maintenance plan
no cost estimate
no routing decision
no source-of-truth decision
raw data dumped into chat
scraped data used as fact without review

Rule

If extracted data does not support a decision, it is probably noise.

Implementation Boundary

This page is an architecture and operating framework.

It does not authorise immediate development of:

Apify actors
scraper scripts
Supabase tables
dashboards
actor registry
automated pipelines
MCR integrations
client-facing data systems

Before any implementation, HeadOffice must create a specific scoped brief with:

exact business question
exact source
exact extraction method
exact output fields
exact destination
exact owner
exact compliance notes
exact test run
exact cost/risk
exact stop condition

Rule

No extraction infrastructure should be built without a scoped intelligence need.

Deferred Update / Parking Lot Section

This framework creates later updates.

Later Update: Research Brain Canon

Add:

actor-based research pipelines
extraction request template
data-to-decision routing
source visibility rules
recurring market monitoring role

Later Update: Data Brain Canon

Add:

Actor Registry
extracted data schemas
source/timestamp requirements
data quality rules
retention and client isolation rules

Later Update: MWMS Outbound Lead Enrichment And Cold Outreach Governance Framework

Add:

extraction-to-enrichment workflow
suppression rules
lead source compliance
data freshness requirements
outreach source evidence fields

Later Update: MWMS Client Intelligence Report Automation Framework

Add:

recurring actor-based data collection
competitor/review/content/market feeds
automated source-linked monthly reports

Future Employee Ideas

Data Extraction Architect
Actor Infrastructure Manager
Research Pipeline Analyst
Market Data Quality Auditor
Competitor Intelligence Extractor

Strategic Summary

This framework captures the Apify/actor/data extraction lesson as a Research Brain and Data Brain infrastructure standard.

The key lesson is:

Web data extraction becomes powerful when it is connected to MWMS decisions, dashboards, offers, experiments, and client systems.

MWMS should not chase scraping tools.

MWMS should build disciplined data pipelines that answer business questions.

The framework supports:

stronger Research Brain
stronger Data Brain
better Affiliate Intelligence
better PPL research
stronger AIBS prospecting
client intelligence reports
competitor monitoring
content opportunity mining
dashboard-first decision systems

The long-term MWMS opportunity is to turn external data into structured intelligence that improves every downstream Brain.

Final Standard

The MWMS final standard is:

Every data extraction workflow must begin with a business question and end with structured, source-visible, routed intelligence.

A valid extraction system must define:

intelligence need
source
extraction method
actor/workflow
schema
cleaning rules
enrichment rules
scoring logic
storage destination
dashboard/report
Brain routing
compliance review
action after extraction

That is the MWMS Data Extraction And Actor Infrastructure standard.

Change Log

Version: v1.0

Date: 2026-06-04
Author: MWMS HeadOffice

Change:

Created the MWMS Data Extraction And Actor Infrastructure Framework from the AI Automations by Jack commercialization block, especially the Apify Masterclass and supporting lead generation / productized AIOS lessons.

Captured the useful strategic lessons from:

Apify actors
actor-store infrastructure
MCP-style actor discovery
actor-as-API logic
actor-as-SaaS-backend logic
real estate / MLS-style extraction example
e-commerce intelligence examples
lead generation enrichment
competitor intelligence
market monitoring
Research Brain / Data Brain pipeline needs

Defined the MWMS Data Extraction And Actor Infrastructure Model with twelve layers:

Intelligence Need Layer
Source Selection Layer
Extraction Method Layer
Actor / Automation Layer
Data Schema Layer
Cleaning And Normalisation Layer
Enrichment Layer
Scoring And Classification Layer
Storage Layer
Dashboard / Report Layer
Brain Routing Layer
Governance And Compliance Layer

Added key operating sections:

Standard Data Extraction Pipeline
Actor Selection Rule
One-Time vs Recurring Extraction Rule
Data Quality Standard
Source Visibility Standard
Data Extraction Use Cases For MWMS
Actor Registry Standard
Data Extraction Request Template
Extraction Output Template
Data Extraction Scorecard
Implementation Boundary
Deferred Update / Parking Lot Section

Mapped the framework across:

Research Brain
Data Brain
AIBS Brain
Affiliate Brain
PPL Brain
Content Brain
Sales Brain
Experimentation Brain
Compliance Brain
Risk Brain
Automation Brain

Purpose of creation:

To establish a formal MWMS standard for using web data extraction, actor infrastructure, scraping workflows, APIs, enrichment systems, and recurring research pipelines as governed intelligence infrastructure that supports better market research, offer decisions, client acquisition, affiliate research, PPL research, competitor monitoring, content strategy, dashboards, and client AIOS systems.

END — MWMS DATA EXTRACTION AND ACTOR INFRASTRUCTURE FRAMEWORK v1.0