Document Type: Framework
Status: Canon
Authority: HeadOffice
Applies To: Experimentation Brain, Data Brain, Affiliate Brain, Ads Brain, Conversion Brain, Finance Brain, Research Brain, HeadOffice
Parent: Experimentation Brain Canon
Version: v1.0
Last Reviewed: 2026-05-07
Purpose
The Statistical Power Framework defines how MWMS governs the probability that experimentation systems can reliably detect meaningful effects when true improvements actually exist.
This framework ensures MWMS understands that weak experimentation systems may fail not because improvements are absent, but because:
- evidence volume is insufficient
- variance is excessive
- measurement is unstable
- detectable effects are too small
- experimentation environments are underpowered
The framework governs how MWMS builds sufficiently sensitive experimentation systems while balancing operational cost and decision reliability.
Core Principle
An experiment cannot reliably detect what it does not have enough power to observe.
Definition
Statistical power is the probability that an experimentation system will correctly detect a meaningful effect when that effect genuinely exists.
Structural Role
This framework connects:
Experimentation Brain
→ experimentation sensitivity governance
Data Brain
→ variance and evidence reliability systems
Affiliate Brain
→ offer validation systems
Ads Brain
→ creative and campaign testing governance
Conversion Brain
→ optimization detection reliability
Finance Brain
→ resource allocation and experimentation efficiency
Research Brain
→ interpretation discipline systems
HeadOffice
→ governance and operational oversight
Power Reality
Many failed tests are actually:
- underpowered tests
- low-evidence environments
- variance-heavy systems
- poorly structured experiments
rather than true negative outcomes.
Rule
Failure to detect improvement is not always proof of absence.
Core Components Of Statistical Power
Statistical power depends on:
- sample size
- effect size
- variance level
- measurement quality
- significance threshold
Rule
Power reflects overall experimentation sensitivity.
Sample Size Layer
Larger evidence volume improves detection capability.
Examples
- more impressions
- more clicks
- more conversions
- longer observation periods
Rule
Small samples reduce detection reliability.
Effect Size Layer
Larger improvements are easier to detect reliably.
Examples
Easy to detect:
- major conversion improvements
Hard to detect:
- tiny optimization lifts
Rule
Small effects require stronger experimentation sensitivity.
Variance Layer
High variance weakens detection reliability.
Examples
- fluctuating ROAS
- unstable traffic quality
- inconsistent conversion behavior
Rule
Noise reduces statistical power.
Measurement Integrity Layer
Reliable tracking improves experimentation sensitivity.
Examples
- accurate attribution
- stable event collection
- consistent conversion recording
Rule
Weak measurement systems reduce detection quality.
Threshold Layer
Stricter evidence thresholds require greater experimentation power.
Examples
High confidence requirements:
- larger sample needs
Lower confidence requirements:
- reduced evidence burden
Rule
Confidence rigor influences required sensitivity.
Underpowered Experiment Layer
Underpowered systems frequently produce:
- inconclusive results
- false negatives
- unstable interpretation
- weak optimization reliability
Rule
Low power weakens learning efficiency.
Resource Efficiency Layer
Higher power requires greater:
- traffic
- time
- budget
- operational patience
Rule
Power increases operational cost.
Exploratory Testing Layer
Exploratory environments may tolerate lower power conditions.
Examples
- creative ideation
- directional learning
- early market exploration
Rule
Exploration and scaling require different sensitivity standards.
Scaling Validation Layer
Scaling decisions require stronger statistical power than exploratory testing.
Examples
- major budget increases
- infrastructure dependency
- automation rollout
- market expansion
Rule
Scaling requires stronger detection confidence.
Minimum Detectable Effect Layer
Power interacts directly with meaningful effect size requirements.
Examples
- detecting small profitability lifts requires stronger sensitivity
- detecting large conversion shifts requires less sensitivity
Rule
Smaller targets require more evidence strength.
Concurrent Experimentation Layer
Multiple simultaneous tests dilute available evidence volume.
Examples
- excessive creative variants
- overlapping audience tests
- fragmented traffic allocation
Rule
Fragmentation weakens experimentation power.
Time Horizon Layer
Longer observation periods may improve detection reliability.
Examples
- seasonal stabilization
- delayed conversion tracking
- repeat purchase evaluation
Rule
Some effects require extended observation windows.
AI Governance Layer
AI Employees should:
- identify underpowered conditions
- classify detection limitations
- flag weak evidence environments
- recommend evidence expansion when required
Rule
AI systems must remain sensitivity-aware.
Reporting Layer
Experiment reports should communicate:
- evidence sufficiency
- variance conditions
- power limitations
- detectable effect assumptions
- confidence implications
Rule
Sensitivity limitations should remain operationally visible.
Decision Governance Layer
Weakly powered experiments may require:
- extended testing
- additional traffic
- reduced confidence claims
- broader validation
Rule
Weak sensitivity should slow irreversible decisions.
Measurement Layer
MWMS should monitor:
- evidence sufficiency
- variance exposure
- detectable effect capability
- false negative frequency
- confidence stability
- experimentation efficiency
Rule
Experimentation sensitivity must remain measurable.
Cross Brain Integration
Experimentation Brain
→ owns statistical power governance
Data Brain
→ governs variance and measurement reliability
Affiliate Brain
→ validates offer evidence sufficiency
Ads Brain
→ governs creative testing sensitivity
Conversion Brain
→ evaluates optimization detectability
Finance Brain
→ governs experimentation efficiency and resource allocation
Research Brain
→ governs interpretation discipline
HeadOffice
→ governance and oversight
Failure Modes Prevented
This framework prevents:
- underpowered experimentation
- false negative interpretation
- weak learning systems
- unstable optimization decisions
- fragmented evidence environments
- unreliable scaling governance
Drift Protection
The system must prevent:
- low-evidence scaling
- ignoring variance exposure
- weak sensitivity environments
- fragmented traffic allocation
- false certainty from weak detection systems
- AI overconfidence in underpowered environments
Architectural Intent
This framework transforms MWMS experimentation thinking from:
→ surface-level testing systems
into:
→ governed evidence sensitivity systems
It ensures MWMS develops:
- scalable experimentation reliability
- sensitivity-aware optimization systems
- evidence-efficient testing architectures
- disciplined detection governance
- long-term learning stability
Final Rule
If experimentation systems lack sufficient power:
→ meaningful improvements may remain invisible.
Change Log
Version: v1.0
Date: 2026-05-07
Author: HeadOffice
Change:
Created Statistical Power Framework defining experimentation sensitivity governance, evidence sufficiency systems, variance-aware detection logic, and scalable learning reliability architecture.
Change Impact Declaration
Pages Created:
Experimentation Brain Statistical Power Framework
Pages Updated:
None
Pages Deprecated:
None
Registries Requiring Update:
MWMS Architecture Registry
Experimentation Brain Page Registry
Canon Version Update Required:
No
Change Log Entry Required:
Yes