Document Type: Framework
Status: Canon
Authority: HeadOffice
Applies To: Experimentation Brain, Data Brain, Affiliate Brain, Ads Brain, Conversion Brain, Finance Brain, Research Brain, HeadOffice
Parent: Experimentation Brain Canon
Version: v1.0
Last Reviewed: 2026-05-07

Purpose

The Statistical Power Framework defines how MWMS governs the probability that experimentation systems can reliably detect meaningful effects when true improvements actually exist.

This framework ensures MWMS understands that weak experimentation systems may fail not because improvements are absent, but because:

evidence volume is insufficient
variance is excessive
measurement is unstable
detectable effects are too small
experimentation environments are underpowered

The framework governs how MWMS builds sufficiently sensitive experimentation systems while balancing operational cost and decision reliability.

Core Principle

An experiment cannot reliably detect what it does not have enough power to observe.

Definition

Statistical power is the probability that an experimentation system will correctly detect a meaningful effect when that effect genuinely exists.

Structural Role

This framework connects:

Experimentation Brain
→ experimentation sensitivity governance

Data Brain
→ variance and evidence reliability systems

Affiliate Brain
→ offer validation systems

Ads Brain
→ creative and campaign testing governance

Conversion Brain
→ optimization detection reliability

Finance Brain
→ resource allocation and experimentation efficiency

Research Brain
→ interpretation discipline systems

HeadOffice
→ governance and operational oversight

Power Reality

Many failed tests are actually:

underpowered tests
low-evidence environments
variance-heavy systems
poorly structured experiments

rather than true negative outcomes.

Rule

Failure to detect improvement is not always proof of absence.

Core Components Of Statistical Power

Statistical power depends on:

sample size
effect size
variance level
measurement quality
significance threshold

Rule

Power reflects overall experimentation sensitivity.

Sample Size Layer

Larger evidence volume improves detection capability.

Examples

more impressions
more clicks
more conversions
longer observation periods

Rule

Small samples reduce detection reliability.

Effect Size Layer

Larger improvements are easier to detect reliably.

Examples

Easy to detect:

major conversion improvements

Hard to detect:

tiny optimization lifts

Rule

Small effects require stronger experimentation sensitivity.

Variance Layer

High variance weakens detection reliability.

Examples

fluctuating ROAS
unstable traffic quality
inconsistent conversion behavior

Rule

Noise reduces statistical power.

Measurement Integrity Layer

Reliable tracking improves experimentation sensitivity.

Examples

accurate attribution
stable event collection
consistent conversion recording

Rule

Weak measurement systems reduce detection quality.

Threshold Layer

Stricter evidence thresholds require greater experimentation power.

Examples

High confidence requirements:

larger sample needs

Lower confidence requirements:

reduced evidence burden

Rule

Confidence rigor influences required sensitivity.

Underpowered Experiment Layer

Underpowered systems frequently produce:

inconclusive results
false negatives
unstable interpretation
weak optimization reliability

Rule

Low power weakens learning efficiency.

Resource Efficiency Layer

Higher power requires greater:

traffic
time
budget
operational patience

Rule

Power increases operational cost.

Exploratory Testing Layer

Exploratory environments may tolerate lower power conditions.

Examples

creative ideation
directional learning
early market exploration

Rule

Exploration and scaling require different sensitivity standards.

Scaling Validation Layer

Scaling decisions require stronger statistical power than exploratory testing.

Examples

major budget increases
infrastructure dependency
automation rollout
market expansion

Rule

Scaling requires stronger detection confidence.

Minimum Detectable Effect Layer

Power interacts directly with meaningful effect size requirements.

Examples

detecting small profitability lifts requires stronger sensitivity
detecting large conversion shifts requires less sensitivity

Rule

Smaller targets require more evidence strength.

Concurrent Experimentation Layer

Multiple simultaneous tests dilute available evidence volume.

Examples

excessive creative variants
overlapping audience tests
fragmented traffic allocation

Rule

Fragmentation weakens experimentation power.

Time Horizon Layer

Longer observation periods may improve detection reliability.

Examples

seasonal stabilization
delayed conversion tracking
repeat purchase evaluation

Rule

Some effects require extended observation windows.

AI Governance Layer

AI Employees should:

identify underpowered conditions
classify detection limitations
flag weak evidence environments
recommend evidence expansion when required

Rule

AI systems must remain sensitivity-aware.

Reporting Layer

Experiment reports should communicate:

evidence sufficiency
variance conditions
power limitations
detectable effect assumptions
confidence implications

Rule

Sensitivity limitations should remain operationally visible.

Decision Governance Layer

Weakly powered experiments may require:

extended testing
additional traffic
reduced confidence claims
broader validation

Rule

Weak sensitivity should slow irreversible decisions.

Measurement Layer

MWMS should monitor:

evidence sufficiency
variance exposure
detectable effect capability
false negative frequency
confidence stability
experimentation efficiency

Rule

Experimentation sensitivity must remain measurable.

Cross Brain Integration

Experimentation Brain
→ owns statistical power governance

Data Brain
→ governs variance and measurement reliability

Affiliate Brain
→ validates offer evidence sufficiency

Ads Brain
→ governs creative testing sensitivity

Conversion Brain
→ evaluates optimization detectability

Finance Brain
→ governs experimentation efficiency and resource allocation

Research Brain
→ governs interpretation discipline

HeadOffice
→ governance and oversight

Failure Modes Prevented

This framework prevents:

underpowered experimentation
false negative interpretation
weak learning systems
unstable optimization decisions
fragmented evidence environments
unreliable scaling governance

Drift Protection

The system must prevent:

low-evidence scaling
ignoring variance exposure
weak sensitivity environments
fragmented traffic allocation
false certainty from weak detection systems
AI overconfidence in underpowered environments

Architectural Intent

This framework transforms MWMS experimentation thinking from:

→ surface-level testing systems

into:

→ governed evidence sensitivity systems

It ensures MWMS develops:

scalable experimentation reliability
sensitivity-aware optimization systems
evidence-efficient testing architectures
disciplined detection governance
long-term learning stability

Final Rule

If experimentation systems lack sufficient power:

→ meaningful improvements may remain invisible.

Change Log

Version: v1.0

Date: 2026-05-07
Author: HeadOffice

Change:
Created Statistical Power Framework defining experimentation sensitivity governance, evidence sufficiency systems, variance-aware detection logic, and scalable learning reliability architecture.

Change Impact Declaration

Pages Created:
Experimentation Brain Statistical Power Framework

Pages Updated:
None

Pages Deprecated:
None

Registries Requiring Update:
MWMS Architecture Registry
Experimentation Brain Page Registry

Canon Version Update Required:
No

Change Log Entry Required:
Yes

END EXPERIMENTATION BRAIN STATISTICAL POWER FRAMEWORK v1.0