Document Type: Framework
Status: Canon
Authority: HeadOffice
Applies To: Experimentation Brain, Data Brain, Affiliate Brain, Ads Brain, Conversion Brain, Finance Brain, Research Brain, HeadOffice
Parent: Experimentation Brain Canon
Version: v1.0
Last Reviewed: 2026-05-07
Purpose
The Experiment Validity Framework defines how MWMS protects experimentation systems from unreliable conclusions, contaminated evidence, invalid interpretation, and structurally flawed testing conditions.
This framework ensures MWMS understands that experimentation quality is determined not only by:
- test execution
- statistical calculations
- traffic allocation
but also by:
- validity integrity
- isolation quality
- measurement reliability
- environmental stability
- behavioral consistency
- interpretation discipline
The framework governs how MWMS determines whether a test result can be trusted operationally.
Core Principle
A statistically significant result is not automatically a valid result.
Definition
Experiment validity is the degree to which an experiment accurately measures the true impact of a controlled change without contamination, distortion, or misleading interpretation.
Structural Role
This framework connects:
Experimentation Brain
→ experimentation governance systems
Data Brain
→ measurement integrity systems
Affiliate Brain
→ offer testing reliability
Ads Brain
→ campaign and creative testing validity
Conversion Brain
→ funnel experiment isolation
Finance Brain
→ scaling risk evaluation
Research Brain
→ evidence interpretation quality
HeadOffice
→ governance oversight
Validity Reality
Many experiments fail not because:
- statistics are incorrect
but because:
- conditions are contaminated
- variables are uncontrolled
- measurements are unstable
- external influences distort results
- interpretation exceeds evidence quality
Rule
Reliable experimentation requires valid experimental conditions.
Primary Validity Categories
Internal Validity
Measures whether the observed outcome was truly caused by the tested change.
Threat Examples
- multiple simultaneous changes
- uncontrolled traffic shifts
- platform algorithm changes
- campaign interference
- implementation errors
Rule
Tests must isolate the intended variable.
External Validity
Measures whether results generalize beyond the test environment.
Threat Examples
- overly narrow audiences
- seasonal anomalies
- temporary platform conditions
- artificial traffic environments
Rule
Temporary conditions may weaken scalability reliability.
Measurement Validity
Measures whether the metric accurately reflects the intended business outcome.
Examples
Weak metric:
- CTR alone
Stronger metric:
- profit-adjusted conversion quality
Rule
Not all metrics represent meaningful business value.
Behavioral Validity
Measures whether user behavior reflects real-world behavior.
Threat Examples
- accidental clicks
- bot traffic
- incentivized behavior
- low-intent traffic environments
Rule
Artificial behavior weakens predictive reliability.
Environmental Stability Layer
Experiments require reasonably stable conditions.
Threat Examples
- major traffic source changes
- large budget changes
- seasonality spikes
- offer changes mid-test
- tracking failures
Rule
Unstable environments distort experimentation reliability.
Concurrent Testing Layer
Simultaneous experiments may contaminate results.
Examples
- overlapping audiences
- multiple landing page changes
- simultaneous offer modifications
- creative overlap
Rule
Interaction effects must remain controlled.
Traffic Integrity Layer
Traffic quality strongly influences validity.
Examples
- inconsistent audience targeting
- changing acquisition quality
- mixed intent levels
- geographic instability
Rule
Traffic instability weakens result reliability.
Tracking Integrity Layer
Measurement systems must remain stable during experiments.
Threat Examples
- broken tracking
- attribution drift
- event duplication
- missing conversions
- pixel failures
Rule
Invalid tracking creates invalid conclusions.
Randomization Layer
Traffic assignment should minimize allocation bias.
Examples
- random split testing
- balanced traffic distribution
- controlled segmentation
Rule
Biased allocation weakens experiment integrity.
Sample Representativeness Layer
Test participants should reasonably reflect intended operational audiences.
Threat Examples
- abnormal audience concentration
- platform anomalies
- temporary user behavior spikes
Rule
Representative samples improve scalability confidence.
Novelty Effect Layer
Users may react temporarily to change novelty.
Examples
- temporary curiosity spikes
- short-term engagement inflation
- temporary CTR increases
Rule
Short-term excitement may not persist long term.
Regression To The Mean Layer
Extreme performance often normalizes over time.
Examples
- sudden spikes
- unusually poor periods
- unstable short-term lifts
Rule
Short-term extremes require cautious interpretation.
Statistical Significance Layer
Significance alone does not guarantee practical value.
Examples
- statistically significant but commercially meaningless lift
- tiny improvements with no operational value
Rule
Business impact matters alongside statistical validity.
Interpretation Discipline Layer
Conclusions must remain proportional to evidence quality.
Examples
Weak conclusion:
- “This proves the strategy always works.”
Stronger conclusion:
- “This test suggests improvement under current conditions.”
Rule
Evidence strength should govern conclusion strength.
Scaling Validity Layer
Scaling decisions require stronger validity confidence than exploratory tests.
Examples
- high-budget campaign scaling
- offer expansion
- automation rollout
Rule
Scaling increases the cost of invalid conclusions.
Pre Test Governance Layer
Before launch, experiments should define:
- variable isolation
- success metrics
- tracking validation
- segmentation rules
- stopping conditions
- scaling criteria
Rule
Validity planning must occur before experimentation begins.
Measurement Layer
MWMS should monitor:
- traffic stability
- tracking consistency
- experiment contamination risk
- audience overlap
- measurement drift
- variance levels
- implementation integrity
Rule
Experiment validity must remain measurable.
Cross Brain Integration
Experimentation Brain
→ owns experiment validity governance
Data Brain
→ validates measurement integrity
Affiliate Brain
→ applies validity to offer testing
Ads Brain
→ governs campaign experiment reliability
Conversion Brain
→ protects funnel testing isolation
Finance Brain
→ evaluates scaling risk exposure
Research Brain
→ governs evidence interpretation quality
HeadOffice
→ governance and oversight
Failure Modes Prevented
This framework prevents:
- invalid scaling decisions
- contaminated experiments
- uncontrolled testing conditions
- misleading significance interpretation
- unstable optimization systems
- false confidence from weak evidence
Drift Protection
The system must prevent:
- simultaneous uncontrolled changes
- invalid tracking environments
- interpretation inflation
- traffic contamination
- measurement instability
- novelty-driven overreaction
- weak scaling governance
Architectural Intent
This framework transforms MWMS experimentation thinking from:
→ result chasing systems
into:
→ governed evidence integrity systems
It ensures MWMS develops:
- reliable experimentation environments
- trustworthy scaling logic
- controlled optimization systems
- defensible business decisions
- stable experimentation governance
Final Rule
If the experiment environment is invalid:
→ the result cannot be trusted reliably.
Change Log
Version: v1.0
Date: 2026-05-07
Author: HeadOffice
Change:
Created Experiment Validity Framework defining experiment contamination controls, validity governance, environmental stability systems, measurement reliability logic, and evidence interpretation discipline.
Change Impact Declaration
Pages Created:
Experimentation Brain Experiment Validity Framework
Pages Updated:
None
Pages Deprecated:
None
Registries Requiring Update:
MWMS Architecture Registry
Experimentation Brain Page Registry
Canon Version Update Required:
No
Change Log Entry Required:
Yes