Experimentation Brain Experiment Validity Framework

Document Type: Framework
Status: Canon
Authority: HeadOffice
Applies To: Experimentation Brain, Data Brain, Affiliate Brain, Ads Brain, Conversion Brain, Finance Brain, Research Brain, HeadOffice
Parent: Experimentation Brain Canon
Version: v1.0
Last Reviewed: 2026-05-07


Purpose

The Experiment Validity Framework defines how MWMS protects experimentation systems from unreliable conclusions, contaminated evidence, invalid interpretation, and structurally flawed testing conditions.

This framework ensures MWMS understands that experimentation quality is determined not only by:

  • test execution
  • statistical calculations
  • traffic allocation

but also by:

  • validity integrity
  • isolation quality
  • measurement reliability
  • environmental stability
  • behavioral consistency
  • interpretation discipline

The framework governs how MWMS determines whether a test result can be trusted operationally.


Core Principle

A statistically significant result is not automatically a valid result.


Definition

Experiment validity is the degree to which an experiment accurately measures the true impact of a controlled change without contamination, distortion, or misleading interpretation.


Structural Role

This framework connects:

Experimentation Brain
→ experimentation governance systems

Data Brain
→ measurement integrity systems

Affiliate Brain
→ offer testing reliability

Ads Brain
→ campaign and creative testing validity

Conversion Brain
→ funnel experiment isolation

Finance Brain
→ scaling risk evaluation

Research Brain
→ evidence interpretation quality

HeadOffice
→ governance oversight


Validity Reality

Many experiments fail not because:

  • statistics are incorrect

but because:

  • conditions are contaminated
  • variables are uncontrolled
  • measurements are unstable
  • external influences distort results
  • interpretation exceeds evidence quality

Rule

Reliable experimentation requires valid experimental conditions.


Primary Validity Categories


Internal Validity

Measures whether the observed outcome was truly caused by the tested change.


Threat Examples

  • multiple simultaneous changes
  • uncontrolled traffic shifts
  • platform algorithm changes
  • campaign interference
  • implementation errors

Rule

Tests must isolate the intended variable.


External Validity

Measures whether results generalize beyond the test environment.


Threat Examples

  • overly narrow audiences
  • seasonal anomalies
  • temporary platform conditions
  • artificial traffic environments

Rule

Temporary conditions may weaken scalability reliability.


Measurement Validity

Measures whether the metric accurately reflects the intended business outcome.


Examples

Weak metric:

  • CTR alone

Stronger metric:

  • profit-adjusted conversion quality

Rule

Not all metrics represent meaningful business value.


Behavioral Validity

Measures whether user behavior reflects real-world behavior.


Threat Examples

  • accidental clicks
  • bot traffic
  • incentivized behavior
  • low-intent traffic environments

Rule

Artificial behavior weakens predictive reliability.


Environmental Stability Layer

Experiments require reasonably stable conditions.


Threat Examples

  • major traffic source changes
  • large budget changes
  • seasonality spikes
  • offer changes mid-test
  • tracking failures

Rule

Unstable environments distort experimentation reliability.


Concurrent Testing Layer

Simultaneous experiments may contaminate results.


Examples

  • overlapping audiences
  • multiple landing page changes
  • simultaneous offer modifications
  • creative overlap

Rule

Interaction effects must remain controlled.


Traffic Integrity Layer

Traffic quality strongly influences validity.


Examples

  • inconsistent audience targeting
  • changing acquisition quality
  • mixed intent levels
  • geographic instability

Rule

Traffic instability weakens result reliability.


Tracking Integrity Layer

Measurement systems must remain stable during experiments.


Threat Examples

  • broken tracking
  • attribution drift
  • event duplication
  • missing conversions
  • pixel failures

Rule

Invalid tracking creates invalid conclusions.


Randomization Layer

Traffic assignment should minimize allocation bias.


Examples

  • random split testing
  • balanced traffic distribution
  • controlled segmentation

Rule

Biased allocation weakens experiment integrity.


Sample Representativeness Layer

Test participants should reasonably reflect intended operational audiences.


Threat Examples

  • abnormal audience concentration
  • platform anomalies
  • temporary user behavior spikes

Rule

Representative samples improve scalability confidence.


Novelty Effect Layer

Users may react temporarily to change novelty.


Examples

  • temporary curiosity spikes
  • short-term engagement inflation
  • temporary CTR increases

Rule

Short-term excitement may not persist long term.


Regression To The Mean Layer

Extreme performance often normalizes over time.


Examples

  • sudden spikes
  • unusually poor periods
  • unstable short-term lifts

Rule

Short-term extremes require cautious interpretation.


Statistical Significance Layer

Significance alone does not guarantee practical value.


Examples

  • statistically significant but commercially meaningless lift
  • tiny improvements with no operational value

Rule

Business impact matters alongside statistical validity.


Interpretation Discipline Layer

Conclusions must remain proportional to evidence quality.


Examples

Weak conclusion:

  • “This proves the strategy always works.”

Stronger conclusion:

  • “This test suggests improvement under current conditions.”

Rule

Evidence strength should govern conclusion strength.


Scaling Validity Layer

Scaling decisions require stronger validity confidence than exploratory tests.


Examples

  • high-budget campaign scaling
  • offer expansion
  • automation rollout

Rule

Scaling increases the cost of invalid conclusions.


Pre Test Governance Layer

Before launch, experiments should define:

  • variable isolation
  • success metrics
  • tracking validation
  • segmentation rules
  • stopping conditions
  • scaling criteria

Rule

Validity planning must occur before experimentation begins.


Measurement Layer

MWMS should monitor:

  • traffic stability
  • tracking consistency
  • experiment contamination risk
  • audience overlap
  • measurement drift
  • variance levels
  • implementation integrity

Rule

Experiment validity must remain measurable.


Cross Brain Integration

Experimentation Brain
→ owns experiment validity governance

Data Brain
→ validates measurement integrity

Affiliate Brain
→ applies validity to offer testing

Ads Brain
→ governs campaign experiment reliability

Conversion Brain
→ protects funnel testing isolation

Finance Brain
→ evaluates scaling risk exposure

Research Brain
→ governs evidence interpretation quality

HeadOffice
→ governance and oversight


Failure Modes Prevented

This framework prevents:

  • invalid scaling decisions
  • contaminated experiments
  • uncontrolled testing conditions
  • misleading significance interpretation
  • unstable optimization systems
  • false confidence from weak evidence

Drift Protection

The system must prevent:

  • simultaneous uncontrolled changes
  • invalid tracking environments
  • interpretation inflation
  • traffic contamination
  • measurement instability
  • novelty-driven overreaction
  • weak scaling governance

Architectural Intent

This framework transforms MWMS experimentation thinking from:

→ result chasing systems

into:

→ governed evidence integrity systems

It ensures MWMS develops:

  • reliable experimentation environments
  • trustworthy scaling logic
  • controlled optimization systems
  • defensible business decisions
  • stable experimentation governance

Final Rule

If the experiment environment is invalid:

→ the result cannot be trusted reliably.


Change Log

Version: v1.0

Date: 2026-05-07
Author: HeadOffice

Change:
Created Experiment Validity Framework defining experiment contamination controls, validity governance, environmental stability systems, measurement reliability logic, and evidence interpretation discipline.


Change Impact Declaration

Pages Created:
Experimentation Brain Experiment Validity Framework

Pages Updated:
None

Pages Deprecated:
None

Registries Requiring Update:
MWMS Architecture Registry
Experimentation Brain Page Registry

Canon Version Update Required:
No

Change Log Entry Required:
Yes


END EXPERIMENTATION BRAIN EXPERIMENT VALIDITY FRAMEWORK v1.0