MWMS Data Extraction And Actor Infrastructure Framework

System: MWMS
Document Type: Operating Framework
Authority Level: MCR Source Of Truth
Status: Draft For MCR
Version: v1.0
Primary Location: MCR
Future Operational Destination: Research Brain, Data Brain, AIBS Brain, Affiliate Brain, PPL Brain, Content Brain, Sales Brain, Experimentation Brain, HeadOffice Brain, Compliance Brain, Risk Brain, Automation Brain
Parent Page: Research Brain
Owner: Martyn
Developer Boundary: Do Not Touch M’s Active Build Areas Unless Specifically Assigned
Source Of Truth: MCR
Last Reviewed: 2026-06-04
Source / Origin: AI Automations by Jack — Commercialization Block / Apify Masterclass — How the 1% Are Building / Lead Generation Systems / Productized AIOS Service Packaging / Case Study Pattern Library
MWMS Classification: Research Brain Infrastructure Framework / Data Extraction Standard / Actor-Based Intelligence Pipeline / Market Monitoring Framework / Data-Driven AIOS Support System
Primary Brain: Research Brain
Supporting Brains: Data Brain, AIBS Brain, Affiliate Brain, PPL Brain, Content Brain, Sales Brain, Experimentation Brain, HeadOffice Brain, Compliance Brain, Risk Brain, Automation Brain, Product Brain

Related Pages: Research Brain Canon, Data Brain Canon, AIBS Brain Canon, MWMS Productized AIOS Service Packaging And Scope Control Framework, MWMS AIOS Lead Capture And Conversion Infrastructure Framework, MWMS High-Ticket AIOS Client Acquisition And Trophy Client Framework, MWMS AIBS Case Study Pattern Library And Offer Replication Framework, MWMS Business Brain Copilot Architecture Framework, MWMS Dashboard-First Client AIOS Offer Framework, MWMS Offer And Niche Selection Framework, MWMS Outbound Lead Enrichment And Cold Outreach Governance Framework, MWMS Client Intelligence Report Automation Framework, MWMS Market Driven Social Content Production Framework, MWMS Source Visibility And Evidence Display Standard, MWMS AI Tool Permission And Access Framework, MWMS AI Automation Security And Risk Checklist, HeadOffice Kaizen Continuous Improvement Loop

Source Evidence: This framework is derived primarily from the Apify Masterclass material, which showed Apify actors, scraping/data APIs, MCP-style actor discovery, actor-store infrastructure, agentic data extraction workflows, and examples where web data can power e-commerce intelligence, real estate intelligence, lead enrichment, and SaaS-style tools. It is also supported by the lead generation and productized AIOS material, where scraped/enriched data becomes useful only when connected to ICP clarity, dashboards, CRM workflows, offer intelligence, or commercial decision-making.


Purpose

The purpose of the MWMS Data Extraction And Actor Infrastructure Framework is to define how MWMS uses structured web data extraction, actor-based automation, scraping systems, APIs, enrichment workflows, and research pipelines to support better decisions across the MWMS ecosystem.

This framework exists because Research Brain and Data Brain must become stronger than manual searching.

MWMS cannot rely only on:

  • random Google searches
  • scattered course notes
  • manual competitor checking
  • one-off newsletter insights
  • unstructured browsing
  • screenshots
  • guesswork
  • memory
  • isolated chat analysis

As MWMS grows, it needs repeatable systems that can collect, structure, enrich, monitor, and route data from external sources.

This includes:

  • affiliate product research
  • competitor monitoring
  • offer intelligence
  • ad intelligence
  • marketplace research
  • lead list building
  • local business research
  • e-commerce intelligence
  • review mining
  • pricing monitoring
  • content opportunity research
  • niche validation
  • customer/avatar research
  • trend detection
  • case study extraction
  • client intelligence reports

The core purpose is:

Turn external web data into structured MWMS intelligence that can be searched, scored, routed, tested, and used by the right Brain.


Core Doctrine

The MWMS doctrine is:

Data extraction is not valuable by itself.
Data extraction becomes valuable when it feeds a decision, dashboard, offer, campaign, experiment, or client system.

A scraper is not the asset.

An actor is not the asset.

An API is not the asset.

The asset is the structured intelligence produced from the data.

MWMS should never build data extraction systems just because they are technically possible.

Every data extraction workflow must answer:

  • What decision will this support?
  • Which Brain needs the data?
  • What fields are needed?
  • How fresh does the data need to be?
  • What action will the data trigger?
  • How will the data be stored?
  • How will the data be scored?
  • What compliance risks exist?
  • What dashboard or report will show the value?
  • What is the cost of maintaining this data flow?

Strategic Importance

This framework is strategically important because Research Brain must become one of the strongest Brains in the MWMS ecosystem.

Research comes before:

  • avatar creation
  • offer selection
  • content strategy
  • ad strategy
  • AIBS packaging
  • PPL targeting
  • affiliate product evaluation
  • client acquisition
  • experiment design
  • market opportunity scoring

If research is weak, the downstream Brains can create polished work for the wrong market.

This has already been a concern in MWMS:

Without strong avatar, market, and evidence layers, Content Brain, Ads Brain, Affiliate Brain, PPL Brain, and AIBS Brain may act on assumptions instead of reality.

The Apify material is valuable because it reframes scraping and actors as infrastructure, not one-off tricks. Actors can become reusable data units that support apps, dashboards, intelligence reports, lead enrichment, market monitoring, competitor tracking, and SaaS-style systems.

For MWMS, the lesson is:

Research Brain needs reusable data infrastructure, not just manual research skill.


Definition

A data extraction workflow is a repeatable process that collects information from external or internal sources and converts it into structured data.

An actor is a reusable automation component that performs a defined extraction or processing job, such as scraping a website, collecting listings, extracting reviews, monitoring a page, or enriching a dataset.

An actor infrastructure layer is the system that stores, runs, monitors, and routes these extraction components.

A research pipeline is the full pathway from source selection to extraction, cleaning, enrichment, scoring, storage, dashboarding, and Brain routing.

MWMS Definition

The MWMS Data Extraction And Actor Infrastructure Framework is:

Research Brain and Data Brain’s standard for converting external web data, scraped information, marketplace signals, competitor intelligence, lead data, review data, product data, and market signals into structured, governed, reusable intelligence pipelines that support MWMS decisions, dashboards, offers, campaigns, and client AIOS systems.


Scope

This framework applies to:

  • Research Brain market research
  • Data Brain structured intelligence
  • Affiliate Brain product research
  • PPL offer research
  • AIBS client research
  • competitor monitoring
  • offer intelligence
  • ad intelligence
  • review mining
  • lead enrichment
  • Google Maps-style business research
  • marketplace scraping
  • e-commerce product monitoring
  • real estate data extraction
  • social proof monitoring
  • price monitoring
  • ranking/visibility monitoring
  • AI tool monitoring
  • newsletter intelligence enrichment
  • case study extraction
  • client intelligence reports
  • content opportunity systems
  • data-backed dashboards
  • actor-based SaaS or micro-app infrastructure

This framework applies whenever MWMS uses extraction or scraping to support a business decision.


Core Principle

The core principle is:

Extract only what MWMS can use, structure, verify, govern, and act on.

A data extraction workflow should not create a pile of raw data.

It should create usable intelligence.

Usable intelligence means:

  • structured
  • cleaned
  • timestamped
  • source-linked
  • scored where useful
  • routed to the right Brain
  • connected to a decision
  • displayed where useful
  • compliant enough for the use case
  • not over-collected

The MWMS Data Extraction And Actor Infrastructure Model

Every extraction system should be designed across twelve layers:

  1. Intelligence Need Layer
  2. Source Selection Layer
  3. Extraction Method Layer
  4. Actor / Automation Layer
  5. Data Schema Layer
  6. Cleaning And Normalisation Layer
  7. Enrichment Layer
  8. Scoring And Classification Layer
  9. Storage Layer
  10. Dashboard / Report Layer
  11. Brain Routing Layer
  12. Governance And Compliance Layer

1. Intelligence Need Layer

The first step is not choosing a scraper.

The first step is identifying the intelligence need.

Intelligence Need Questions

Ask:

  • What are we trying to learn?
  • Which Brain needs the answer?
  • What decision depends on this?
  • Is this for affiliate, PPL, AIBS, content, ads, research, or client work?
  • Is this one-time research or recurring monitoring?
  • How fresh must the data be?
  • What fields are needed?
  • What output is required?
  • What will happen if the data confirms the hypothesis?
  • What will happen if the data contradicts the hypothesis?

Example Intelligence Needs

  • Find local businesses with poor follow-up signals.
  • Monitor competitor offers in a niche.
  • Extract product pricing and positioning from e-commerce sites.
  • Identify affiliate offers with strong market demand.
  • Gather reviews to understand customer pain language.
  • Scrape job boards to detect demand for services.
  • Collect YouTube titles to identify content angles.
  • Monitor landing pages for offer changes.
  • Extract Google Maps business categories for outreach.
  • Build lead lists for AIBS acquisition.
  • Track AI tool categories for MWMS opportunities.

Rule

No extraction workflow should start without a defined intelligence need.


2. Source Selection Layer

The source must match the question.

Possible sources include:

  • Google Maps
  • business directories
  • review platforms
  • marketplaces
  • e-commerce sites
  • affiliate marketplaces
  • job boards
  • YouTube
  • LinkedIn where permitted
  • Reddit
  • competitor websites
  • landing pages
  • app stores
  • product directories
  • newsletters
  • public datasets
  • ad libraries
  • real estate sites
  • government/public registers
  • client-owned websites
  • client CRMs
  • internal MWMS records

Source Selection Questions

Ask:

  • Is this source public?
  • Is this source reliable?
  • Is this source allowed to be scraped or accessed?
  • Does the source contain the needed fields?
  • How often does it change?
  • Is the source stable?
  • Is there an official API?
  • Is there an actor already available?
  • Is manual research safer?
  • Is the data worth the extraction cost?

Rule

Use the safest, cleanest, most reliable source that answers the question.


3. Extraction Method Layer

Choose the correct extraction method.

Possible methods:

  • manual research
  • browser extraction
  • official API
  • Apify actor
  • custom actor
  • Firecrawl-style crawling
  • sitemap extraction
  • RSS/API feed
  • Google Sheet import
  • webhook
  • CSV upload
  • CRM export
  • scraping script
  • AI-assisted extraction
  • screenshot/visual extraction where necessary
  • MCP-connected actor discovery

The Apify material is useful because it shows that actors can act as reusable extraction units and that prebuilt actors can sometimes avoid custom scraping work.

Method Selection Questions

Ask:

  • Is there an official API?
  • Is there a prebuilt actor?
  • Is custom scraping needed?
  • Is manual extraction enough for now?
  • Is this a one-time job or recurring job?
  • What is the cost?
  • What is the maintenance burden?
  • What happens if the site changes?
  • Is the data sensitive?
  • Is scraping allowed?

Rule

Do not custom-build extraction when a safer, cheaper, reliable method exists.


4. Actor / Automation Layer

Actors should be treated as reusable infrastructure.

An actor should have a clear job.

Actor Types

Possible actor types:

  • business directory extractor
  • Google Maps business extractor
  • review extractor
  • e-commerce product extractor
  • competitor page monitor
  • landing page scraper
  • job board scraper
  • social profile extractor
  • YouTube metadata extractor
  • real estate listing extractor
  • price monitor
  • affiliate offer monitor
  • ad library extractor
  • contact enrichment actor
  • content angle extractor
  • market trend actor

Actor Definition Fields

Every actor should define:

Actor Name:
Purpose:
Source:
Input Required:
Output Fields:
Run Frequency:
Owner:
Destination:
Cost:
Risk Level:
Failure Mode:
Compliance Notes:
Last Tested:

Rule

An actor must have a business purpose, not just a technical purpose.


5. Data Schema Layer

Extraction output must be structured.

A schema defines what fields are captured.

Schema Questions

Ask:

  • What fields are required?
  • What fields are optional?
  • What fields support scoring?
  • What fields support dashboarding?
  • What fields support Brain routing?
  • What fields are risky or sensitive?
  • What fields need source URLs?
  • What fields need timestamps?
  • What fields need deduplication?
  • What fields need manual review?

Example Fields For Business Lead Extraction

business_id
business_name
category
website
phone
email
location
review_count
average_rating
recent_review_date
website_status
booking_link_present
chatbot_present
response_gap_signal
source_url
extracted_at
lead_score
risk_notes

Example Fields For Competitor Offer Extraction

competitor_id
competitor_name
website
offer_name
headline
price
CTA
proof_elements
guarantee
upsells
lead magnet
ad angle
landing_page_url
last_seen_at
change_detected
MWMS_opportunity_note

Rule

Unstructured scraped data must become structured before it can support MWMS decisions.


6. Cleaning And Normalisation Layer

Raw extracted data is rarely clean.

Cleaning may include:

  • removing duplicates
  • normalising phone numbers
  • normalising URLs
  • standardising categories
  • cleaning names
  • removing irrelevant records
  • removing broken records
  • validating required fields
  • checking timestamps
  • checking source links
  • detecting incomplete records
  • separating text from HTML
  • removing spam results
  • language detection

Rule

Dirty data creates bad decisions.


7. Enrichment Layer

Enrichment adds useful context.

Possible enrichment:

  • email finding
  • domain lookup
  • social profile lookup
  • business size estimate
  • industry classification
  • review sentiment
  • website technology detection
  • traffic estimate
  • ad activity detection
  • offer classification
  • buyer persona classification
  • contact role detection
  • location enrichment
  • AI summary
  • pain signal extraction
  • opportunity note

The lead generation material supports enrichment as part of the acquisition workflow, where raw scraped lists become more useful after ICP filtering, contact discovery, personalisation, and follow-up structure.

Rule

Enrichment should improve actionability, not just add noise.


8. Scoring And Classification Layer

Data should be scored when decisions require ranking.

Scoring Examples

Possible scores:

  • lead fit score
  • trophy client score
  • offer opportunity score
  • affiliate opportunity score
  • review weakness score
  • local SEO opportunity score
  • competitor threat score
  • content opportunity score
  • trend strength score
  • pricing gap score
  • pain signal score
  • buyer sophistication score
  • AIOS fit score

Example AIBS Lead Score

Pain Signal: 25
Ability To Pay: 20
Reachability: 15
AIOS Fit: 15
Review / Reputation Gap: 10
Website / Conversion Gap: 10
Compliance Risk: -5

Rule

Scoring must be explainable enough for human review.


9. Storage Layer

Data needs the correct destination.

Possible destinations:

  • Supabase
  • Google Sheets
  • Airtable
  • CRM
  • WordPress database
  • MCR page
  • vector memory
  • local CSV
  • dashboard database
  • client AIOS database
  • research archive

Storage Questions

Ask:

  • Is this source-of-truth?
  • Is this temporary?
  • Is this structured metrics data?
  • Is this research context?
  • Does it need retrieval?
  • Does it need dashboarding?
  • Does it contain personal data?
  • Does it need client isolation?
  • How long should it be retained?
  • Who can access it?

Rule

Metrics and structured records belong in databases.
Canonical rules belong in MCR.
Long-form research archives may belong in retrieval systems.


10. Dashboard / Report Layer

Extracted data should often become visible.

Possible outputs:

  • lead list dashboard
  • competitor change dashboard
  • affiliate offer intelligence report
  • local business opportunity map
  • review mining report
  • content opportunity dashboard
  • market trend report
  • product pricing dashboard
  • client intelligence report
  • weekly research digest
  • offer comparison table
  • experiment hypothesis board

Dashboard Questions

Ask:

  • Who needs to see this?
  • What decision must be made?
  • What score matters?
  • What changed since last run?
  • What should be acted on?
  • What should be ignored?
  • What should be routed?
  • What should be tested?

Rule

Dashboards must support decisions, not just display scraped data.


11. Brain Routing Layer

Extracted intelligence must route to the correct Brain.

Routing Examples

Route to Research Brain when:

  • market research
  • avatar research
  • competitor research
  • trend detection
  • niche validation

Route to Data Brain when:

  • schema
  • storage
  • scoring
  • dashboarding
  • data quality

Route to AIBS Brain when:

  • client lead opportunities
  • AIOS package ideas
  • business process signals
  • local business opportunities

Route to Affiliate Brain when:

  • product opportunity
  • ClickBank/vendor intelligence
  • competitor affiliate pages
  • ad angles
  • offer market demand

Route to PPL Brain when:

  • lead buyer categories
  • local demand
  • offer verticals
  • form/conversion patterns

Route to Content Brain when:

  • content topics
  • customer pain language
  • competitor content gaps
  • YouTube/article opportunities

Route to Experimentation Brain when:

  • hypothesis created
  • market test needed
  • offer test needed
  • acquisition test needed

Route to Compliance Brain when:

  • scraping risk
  • personal data risk
  • platform policy risk
  • regulated sector risk

Rule

Extracted data is not absorbed until it is routed.


12. Governance And Compliance Layer

Data extraction can create risk.

Governance must be included.

Risk Areas

  • website terms
  • platform rules
  • personal data
  • email scraping
  • cold outreach compliance
  • copyright
  • sensitive data
  • regulated industries
  • scraping frequency
  • server load
  • data retention
  • client data isolation
  • hallucinated enrichment
  • inaccurate scoring
  • outdated extracted data
  • use of data beyond permitted context

Compliance Questions

Ask:

  • Is this public data?
  • Are we allowed to access it this way?
  • Does this collect personal data?
  • Is contact data being used for outreach?
  • Is opt-out/suppression needed?
  • Is the data stored securely?
  • Is the source timestamped?
  • Is the data being sold, republished, or only used internally?
  • Is this for a client?
  • Does jurisdiction matter?

Rule

The ability to extract data does not mean MWMS should extract or use it.


Standard Data Extraction Pipeline

The standard pipeline is:

  1. Define intelligence need.
  2. Identify source.
  3. Select extraction method.
  4. Choose or build actor.
  5. Define schema.
  6. Run test extraction.
  7. Clean and normalise data.
  8. Enrich only where useful.
  9. Score/classify records.
  10. Store in correct system.
  11. Display in dashboard/report where useful.
  12. Route to relevant Brain.
  13. Review compliance and risk.
  14. Decide action.
  15. Schedule repeat extraction if needed.

Actor Selection Rule

Before building a custom actor, check:

  1. Is manual research enough?
  2. Is there an official API?
  3. Is there a trusted existing actor?
  4. Is there a simpler scraping method?
  5. Is the data worth recurring extraction?
  6. Is the source stable?
  7. Is compliance risk acceptable?
  8. Does the output justify maintenance?

Rule

Custom actors should be built only when the data value justifies the maintenance burden.


One-Time vs Recurring Extraction Rule

Not all extraction needs automation.

One-Time Extraction

Use for:

  • single research task
  • initial market scan
  • quick validation
  • small dataset
  • one-off client audit
  • course absorption enrichment
  • simple competitor snapshot

Recurring Extraction

Use for:

  • competitor monitoring
  • price monitoring
  • offer tracking
  • lead pipeline generation
  • review monitoring
  • content trend tracking
  • affiliate opportunity tracking
  • client intelligence reporting
  • dashboard updates

Rule

Do not create recurring infrastructure for one-time curiosity.


Data Quality Standard

Every extraction workflow should include data quality checks.

Quality Checks

Check:

  • duplicate rate
  • missing fields
  • invalid URLs
  • invalid emails
  • old records
  • wrong category
  • irrelevant results
  • source mismatch
  • broken extraction
  • language mismatch
  • hallucinated enrichment
  • incomplete scrape
  • timestamp missing

Rule

If the data quality is poor, do not route it into decision-making.


Source Visibility Standard

Every important extracted record should include source visibility.

This connects directly to the MWMS Source Visibility And Evidence Display Standard.

Source Fields

source_url
source_name
extracted_at
actor_name
actor_run_id
source_type
confidence_level
last_verified_at

Rule

MWMS must be able to trace extracted intelligence back to its source.


Data Extraction Use Cases For MWMS


Use Case 1: Affiliate Product Intelligence

Research Brain / Affiliate Brain may extract:

  • vendor pages
  • affiliate pages
  • competitor review pages
  • pricing
  • claims
  • proof elements
  • VSL angle
  • ad angles
  • testimonials
  • refund/risk signals
  • seasonal demand
  • content gaps

Output

Affiliate Product Intelligence report.


Use Case 2: PPL Offer Research

Research Brain / PPL Brain may extract:

  • lead buyer categories
  • form flows
  • landing pages
  • vertical demand
  • local markets
  • competitor CPL offers
  • compliance notes
  • conversion pathway elements

Output

PPL offer opportunity map.


Use Case 3: AIBS Client Lead Research

Research Brain / AIBS Brain may extract:

  • local business categories
  • review gaps
  • website quality
  • booking link presence
  • chatbot presence
  • missed-call signals where inferable
  • CRM/tech stack hints
  • business size
  • contact details where appropriate
  • AIOS fit score

Output

High-ticket AIOS prospect list.


Use Case 4: Competitor Intelligence

Research Brain may extract:

  • competitor offer pages
  • pricing changes
  • new CTAs
  • lead magnets
  • guarantees
  • testimonials
  • case studies
  • blog topics
  • funnel changes

Output

Competitor change dashboard.


Use Case 5: Content Opportunity Mining

Content Brain may use extracted data to identify:

  • repeated customer questions
  • review pain language
  • competitor content gaps
  • YouTube title patterns
  • popular post themes
  • unanswered objections
  • niche terminology

Output

Content opportunity dashboard.


Use Case 6: Client Intelligence Reports

AIBS / Data Brain may create client reports using extracted data:

  • competitor changes
  • review insights
  • local market signals
  • content gaps
  • offer opportunities
  • lead opportunities
  • search visibility issues
  • customer sentiment themes

Output

Monthly client intelligence report.


Use Case 7: Real Estate / Property Intelligence

The Apify material included an example of scraped real-estate/MLS-style data being used to support investor offer generation, with the automation helping analyse opportunities and create faster offer workflows.

MWMS may later use this pattern for:

  • property lead research
  • investor intelligence
  • offer workflow support
  • data-backed deal dashboards

Output

Property opportunity dashboard or AIOS.


Actor Registry Standard

MWMS should eventually maintain an Actor Registry.

Registry Fields

Actor Name:
Brain Owner:
Purpose:
Source:
Input Fields:
Output Fields:
Run Frequency:
Destination Table:
Dashboard / Report:
Compliance Notes:
Cost:
Status: Active / Paused / Deprecated / Experimental
Last Tested:
Failure Notes:

Rule

Actors should be registered before they become operational dependencies.


Data Extraction Request Template

Use this template when asking Research/Data Brain to create or evaluate an extraction workflow.

Request Name:
Requesting Brain:
Business Question:
Decision Supported:
Source(s):
Data Needed:
One-Time Or Recurring:
Preferred Method: Manual / API / Actor / Scraper / Unknown
Output Fields:
Destination:
Dashboard Needed: Yes / No
Scoring Needed: Yes / No
Compliance Risk: Low / Medium / High
Human Review Needed: Yes / No
Action After Extraction:
Owner:
Due Date:


Extraction Output Template

Every completed extraction should output:

Extraction Name:
Date:
Source(s):
Method Used:
Actor / Workflow Used:
Records Extracted:
Records Accepted:
Records Rejected:
Data Quality Notes:
Key Findings:
Top Opportunities:
Risks / Compliance Notes:
Recommended Brain Routing:
Recommended Action:
Next Run Needed: Yes / No


Data Extraction Scorecard

Score extraction opportunities before building.

Score Categories

Decision Value: 20
Data Availability: 15
Source Reliability: 10
Extraction Feasibility: 10
Repeat Use Potential: 10
Dashboard / Reporting Value: 10
Commercial Value: 10
Compliance Risk: -10
Maintenance Burden: -5
MWMS Strategic Fit: 10

Interpretation

80+ Strong extraction candidate
65–79 Useful; test carefully
50–64 One-time/manual research first
Below 50 Park or reject

Rule

Do not build extraction infrastructure for low-value data.


Application To Research Brain

Research Brain is the primary owner of this framework.

Research Brain should use it to:

  • define research questions
  • select sources
  • request extraction
  • interpret extracted data
  • create market intelligence
  • support avatar definition
  • validate niches
  • monitor competitors
  • detect trends
  • route intelligence to other Brains

Research Brain Rule

Research Brain must convert extracted data into market understanding, not just datasets.


Application To Data Brain

Data Brain owns structure and storage.

Data Brain should define:

  • schemas
  • tables
  • data quality rules
  • field definitions
  • source tracking
  • deduplication rules
  • dashboard feeds
  • retention rules
  • actor registry
  • data pipeline monitoring

Data Brain Rule

Data Brain must prevent raw extraction from becoming messy intelligence debt.


Application To AIBS Brain

AIBS Brain uses extraction for client opportunities.

AIBS can use this framework to support:

  • prospect lists
  • trophy client scoring
  • local business opportunity research
  • review/reputation gaps
  • lead capture AIOS candidates
  • AI audit targets
  • vertical AIOS research
  • competitor offers
  • client intelligence reports

AIBS Rule

AIBS should use extracted data to find better clients and build better AIOS packages.


Application To Affiliate Brain

Affiliate Brain uses extraction to support product and market research.

Affiliate Brain may use data extraction for:

  • competitor affiliate pages
  • product angles
  • testimonials
  • sales claims
  • pricing
  • bonus stacks
  • content gaps
  • niche demand
  • ad angles
  • offer changes

Affiliate Rule

Affiliate Brain should use extraction to improve offer selection and angle testing, not copy competitors blindly.


Application To PPL Brain

PPL Brain uses extraction to understand lead markets.

PPL Brain may use extraction for:

  • lead verticals
  • form flows
  • buyer categories
  • local market demand
  • competitor lead gen pages
  • compliance signals
  • offer economics
  • conversion friction

PPL Rule

PPL extraction must respect compliance and lead handling rules.


Application To Content Brain

Content Brain uses extraction for content opportunities.

Content Brain may mine:

  • reviews
  • comments
  • competitor blogs
  • YouTube titles
  • questions
  • forums
  • search results
  • social themes
  • customer pain language

Content Rule

Extracted content signals should become original MWMS content strategy, not copied content.


Application To Sales Brain

Sales Brain uses extraction for outreach relevance.

Sales Brain may use extracted data to generate:

  • recent observations
  • specific outreach angles
  • buyer pain notes
  • proof opportunities
  • industry trends
  • lead scoring
  • account research

Sales Rule

Extracted data should improve relevance, not create spam.


Application To Experimentation Brain

Experimentation Brain uses extracted data to create test hypotheses.

Examples:

  • test this niche
  • test this offer angle
  • test this lead source
  • test this content topic
  • test this AIBS package
  • test this outreach message
  • test this avatar

Experimentation Rule

Extraction should feed experiments, not replace experiments.


Application To Compliance And Risk Brain

Compliance and Risk Brain review extraction workflows.

They should check:

  • scraping legality/terms
  • personal data
  • cold outreach use
  • data storage
  • consent
  • platform policies
  • sensitive industries
  • client data boundaries
  • data reuse rules
  • deletion requirements

Compliance Rule

If extracted data will be used for outreach or client work, compliance review is required.


Application To Automation Brain

Automation Brain may help run extraction pipelines.

Automation Brain should define:

  • triggers
  • schedules
  • actor runs
  • webhooks
  • retries
  • failure alerts
  • cost controls
  • logs
  • handoffs
  • dashboard updates

Automation Rule

Recurring extraction must be monitored like a system, not treated as a one-off script.


Drift Protection

This framework protects MWMS from:

  • scraping without purpose
  • collecting data that is never used
  • actor/tool chasing
  • creating data mess
  • ignoring source visibility
  • using outdated extracted data
  • feeding bad data into decisions
  • violating platform or privacy rules
  • building recurring pipelines for one-time needs
  • overcomplicating Research Brain
  • making M build extraction systems without a decision need
  • mistaking data volume for intelligence
  • using lead data without compliance review
  • copying competitor content instead of extracting insight
  • using Apify/actors as the strategy instead of infrastructure

Drift Signals

Watch for:

  • “Let’s scrape everything”
  • no defined research question
  • no target Brain
  • no data schema
  • no source URL field
  • no timestamp
  • no data quality check
  • no compliance review
  • no dashboard/report
  • no action after extraction
  • no actor owner
  • no maintenance plan
  • no cost estimate
  • no routing decision
  • no source-of-truth decision
  • raw data dumped into chat
  • scraped data used as fact without review

Rule

If extracted data does not support a decision, it is probably noise.


Implementation Boundary

This page is an architecture and operating framework.

It does not authorise immediate development of:

  • Apify actors
  • scraper scripts
  • Supabase tables
  • dashboards
  • actor registry
  • automated pipelines
  • MCR integrations
  • client-facing data systems

Before any implementation, HeadOffice must create a specific scoped brief with:

  • exact business question
  • exact source
  • exact extraction method
  • exact output fields
  • exact destination
  • exact owner
  • exact compliance notes
  • exact test run
  • exact cost/risk
  • exact stop condition

Rule

No extraction infrastructure should be built without a scoped intelligence need.


Deferred Update / Parking Lot Section

This framework creates later updates.

Later Update: Research Brain Canon

Add:

  • actor-based research pipelines
  • extraction request template
  • data-to-decision routing
  • source visibility rules
  • recurring market monitoring role

Later Update: Data Brain Canon

Add:

  • Actor Registry
  • extracted data schemas
  • source/timestamp requirements
  • data quality rules
  • retention and client isolation rules

Later Update: MWMS Outbound Lead Enrichment And Cold Outreach Governance Framework

Add:

  • extraction-to-enrichment workflow
  • suppression rules
  • lead source compliance
  • data freshness requirements
  • outreach source evidence fields

Later Update: MWMS Client Intelligence Report Automation Framework

Add:

  • recurring actor-based data collection
  • competitor/review/content/market feeds
  • automated source-linked monthly reports

Future Employee Ideas

  • Data Extraction Architect
  • Actor Infrastructure Manager
  • Research Pipeline Analyst
  • Market Data Quality Auditor
  • Competitor Intelligence Extractor

Strategic Summary

This framework captures the Apify/actor/data extraction lesson as a Research Brain and Data Brain infrastructure standard.

The key lesson is:

Web data extraction becomes powerful when it is connected to MWMS decisions, dashboards, offers, experiments, and client systems.

MWMS should not chase scraping tools.

MWMS should build disciplined data pipelines that answer business questions.

The framework supports:

  • stronger Research Brain
  • stronger Data Brain
  • better Affiliate Intelligence
  • better PPL research
  • stronger AIBS prospecting
  • client intelligence reports
  • competitor monitoring
  • content opportunity mining
  • dashboard-first decision systems

The long-term MWMS opportunity is to turn external data into structured intelligence that improves every downstream Brain.


Final Standard

The MWMS final standard is:

Every data extraction workflow must begin with a business question and end with structured, source-visible, routed intelligence.

A valid extraction system must define:

  • intelligence need
  • source
  • extraction method
  • actor/workflow
  • schema
  • cleaning rules
  • enrichment rules
  • scoring logic
  • storage destination
  • dashboard/report
  • Brain routing
  • compliance review
  • action after extraction

That is the MWMS Data Extraction And Actor Infrastructure standard.


Change Log

Version: v1.0

Date: 2026-06-04
Author: MWMS HeadOffice

Change:

Created the MWMS Data Extraction And Actor Infrastructure Framework from the AI Automations by Jack commercialization block, especially the Apify Masterclass and supporting lead generation / productized AIOS lessons.

Captured the useful strategic lessons from:

  • Apify actors
  • actor-store infrastructure
  • MCP-style actor discovery
  • actor-as-API logic
  • actor-as-SaaS-backend logic
  • real estate / MLS-style extraction example
  • e-commerce intelligence examples
  • lead generation enrichment
  • competitor intelligence
  • market monitoring
  • Research Brain / Data Brain pipeline needs

Defined the MWMS Data Extraction And Actor Infrastructure Model with twelve layers:

  1. Intelligence Need Layer
  2. Source Selection Layer
  3. Extraction Method Layer
  4. Actor / Automation Layer
  5. Data Schema Layer
  6. Cleaning And Normalisation Layer
  7. Enrichment Layer
  8. Scoring And Classification Layer
  9. Storage Layer
  10. Dashboard / Report Layer
  11. Brain Routing Layer
  12. Governance And Compliance Layer

Added key operating sections:

  • Standard Data Extraction Pipeline
  • Actor Selection Rule
  • One-Time vs Recurring Extraction Rule
  • Data Quality Standard
  • Source Visibility Standard
  • Data Extraction Use Cases For MWMS
  • Actor Registry Standard
  • Data Extraction Request Template
  • Extraction Output Template
  • Data Extraction Scorecard
  • Implementation Boundary
  • Deferred Update / Parking Lot Section

Mapped the framework across:

  • Research Brain
  • Data Brain
  • AIBS Brain
  • Affiliate Brain
  • PPL Brain
  • Content Brain
  • Sales Brain
  • Experimentation Brain
  • Compliance Brain
  • Risk Brain
  • Automation Brain

Purpose of creation:

To establish a formal MWMS standard for using web data extraction, actor infrastructure, scraping workflows, APIs, enrichment systems, and recurring research pipelines as governed intelligence infrastructure that supports better market research, offer decisions, client acquisition, affiliate research, PPL research, competitor monitoring, content strategy, dashboards, and client AIOS systems.

END — MWMS DATA EXTRACTION AND ACTOR INFRASTRUCTURE FRAMEWORK v1.0