MWMS Prompt Architecture And Automation Output Reliability Framework

System: MWMS
Document Type: Operating Framework
Authority Level: MCR Source Of Truth
Status: Draft For MCR
Version: v1.0
Primary Location: MCR
Future Operational Destination: Prompting Framework, HeadOffice Brain, Automation Brain, AIBS Brain, Content Brain, Ads Brain, Research Brain, Data Brain, Experimentation Brain, AI Employee Canon, Compliance Brain, Risk Brain
Parent Page: Prompting Framework
Owner: Martyn
Developer Boundary: Do Not Touch M’s Active Build Areas Unless Specifically Assigned
Source Of Truth: MCR
Last Reviewed: 2026-06-08
Source / Origin: AI Automations by Jack Master Prompting And Prompt System Design Block / Master Prompting w Devin Part 1 / Master Prompting w Devin Part 2
MWMS Classification: Prompt Architecture Framework / Automation Output Reliability Standard / AI Employee Prompt Governance / Prompt Chain Design System / Prompt Quality Control Framework
Primary Brain: MWMS Prompting Framework
Supporting Brains: HeadOffice Brain, Automation Brain, AIBS Brain, Content Brain, Ads Brain, Research Brain, Data Brain, Experimentation Brain, AI Employee Canon, Compliance Brain, Risk Brain

Related Pages: MWMS Prompting Framework, MWMS AI Employee Evaluation Scorecard Standard, MWMS AI Observability Metadata Standard, MWMS AI Work Session Persistence Standard, MWMS Agent Loop Control Framework, MWMS Next Action Picker Standard, MWMS AI Usage And Cost Visibility Standard, MWMS Source Visibility And Evidence Display Standard, MWMS Buyer First Authority Content And Channel Growth Framework, MWMS AIBS Business Diagnostic And Opportunity Discovery Framework, MWMS AIOS Lead Capture And Conversion Infrastructure Framework, HeadOffice Kaizen Continuous Improvement Loop

Source Evidence: This framework is derived from the Master Prompting w Devin Part 1 and Part 2 training inside AI Automations by Jack. The lessons covered conversational prompting versus one-shot prompting, atomic and compound prompts, prompt deconstruction, prompt stacking, tell-and-show examples, import method, train-of-thought/planning methods, anti-keyword staining, prompt chaining, model selection, output quality testing, iteration, and prompt engineering as a system-building skill for reliable AI automations.

Purpose

The purpose of the MWMS Prompt Architecture And Automation Output Reliability Framework is to define how MWMS designs, tests, improves, and governs prompts used inside AI Employees, automations, content systems, research systems, client diagnostics, and operational workflows.

This framework exists because prompts inside automations are not casual chat messages.

A casual prompt may work once.

An automation prompt must work repeatedly.

A casual prompt can be adjusted manually.

An automation prompt must behave predictably without constant human correction.

A casual prompt can be exploratory.

An automation prompt must be structured, tested, versioned, and reliable.

MWMS must therefore treat prompts as system assets.

The core purpose is:

To make every important MWMS prompt reusable, testable, observable, reliable, cost-aware, and suitable for automation.

Core Doctrine

The MWMS doctrine is:

A prompt used inside an automation is not a conversation. It is system architecture.

MWMS should not rely on vague prompts such as:

“Write this better.”
“Summarize this.”
“Make a good post.”
“Analyze this file.”
“Create a script.”
“Find the best ideas.”
“Give me a report.”
“Act like an expert.”
“Make it sound professional.”

Those may work in a chat, but they are weak inside repeatable systems.

For automations, MWMS needs prompts that define:

role
task
context
input variables
output format
quality standards
examples
constraints
failure handling
model choice
cost boundaries
expected structure
evaluation method
version history

The prompt must be designed to reduce guessing.

The less the AI has to guess, the more reliable the automation becomes.

Strategic Importance

This framework is strategically important because MWMS is building a system of Brains and AI Employees.

Those AI Employees will depend on prompts.

If prompts are weak, the AI Employees will be weak.

If prompts are inconsistent, the outputs will be inconsistent.

If prompts are not tested, errors will enter the system.

If prompt chains are badly designed, automations will become unreliable.

If prompts are too expensive, scaling becomes costly.

If prompts are too vague, M and Martyn will waste time correcting outputs.

Prompt quality affects:

AI Employee performance
content output quality
automation reliability
research quality
client diagnostic quality
AIBS recommendations
ad creative quality
newsletter intelligence extraction
course absorption quality
data classification
report generation
sales and outreach workflows
client-facing deliverables
system trust

This framework therefore becomes a core infrastructure layer for the whole MWMS ecosystem.

The strategic lesson is:

Prompt quality is system quality.

Definition

Prompt architecture is the structured design of a prompt or prompt chain so it reliably performs a defined task inside a repeatable workflow.

Prompt asset is a tested, reusable, documented, versioned prompt that improves workflow performance.

Prompt liability is an informal or poorly structured prompt that requires constant rewriting, manual correction, or inconsistent human interpretation.

Atomic prompt is a small prompt designed for a narrow task such as formatting, classification, extraction, sentiment tagging, or simple rewriting.

Compound prompt is a larger structured prompt that combines multiple guidelines, context sections, examples, formatting instructions, and task logic to produce a more nuanced output.

Prompt chain is a sequence where the output of one prompt becomes the input or context for another prompt.

MWMS Definition

The MWMS Prompt Architecture And Automation Output Reliability Framework is:

Prompting Framework’s standard for designing, testing, chaining, versioning, and governing prompts so MWMS AI Employees and automations produce consistent, high-quality, cost-aware, and reliable outputs.

Scope

This framework applies to:

AI Employee prompts
automation prompts
Make.com prompts
n8n prompts
OpenAI API prompts
Claude prompts
Gemini prompts
content generation prompts
research prompts
classification prompts
extraction prompts
newsletter analysis prompts
course absorption prompts
sales prompts
cold email prompts
LinkedIn prompts
ad creative prompts
YouTube script prompts
landing page prompts
AIBS diagnostic prompts
client report prompts
dashboard insight prompts
data cleaning prompts
prompt chains
model selection
prompt cost control
prompt testing
prompt versioning
prompt observability
future Prompt Vault systems

This framework applies whenever MWMS creates a prompt that may be reused or automated.

Core Principle

The core principle is:

Build prompts like reusable systems, not disposable messages.

A prompt should not be considered complete just because it produces one good output.

It should be considered complete only when it can repeatedly produce the right output under realistic input variation.

Rule

A prompt is not reliable until it has been tested against multiple realistic inputs.

The MWMS Prompt Architecture And Automation Output Reliability Model

Every important prompt system should be designed across twelve layers:

Prompt Purpose Layer
Prompt Type Layer
Input And Variable Layer
Context And Knowledge Layer
Guideline And Constraint Layer
Example And Tell And Show Layer
Deconstruction And Chain Layer
Output Format Layer
Model Selection Layer
Testing And Iteration Layer
Cost Latency And Scale Layer
Observability And Governance Layer

1. Prompt Purpose Layer

Every prompt must have a clear job.

A prompt should not exist because “AI can do it.”

It should exist because MWMS needs a specific task performed.

Prompt Purpose Questions

Ask:

What is this prompt supposed to do?
What workflow does it support?
Which Brain or AI Employee uses it?
What business outcome does it support?
What input will it receive?
What output must it produce?
Who or what consumes the output?
What happens if the output is wrong?
How often will this prompt run?
Is this exploratory or production-grade?
Is this prompt temporary or reusable?
Does this need human review?

Prompt Purpose Examples

A prompt may exist to:

classify a lead
extract course insights
summarize a newsletter
score an offer
write a YouTube hook
analyze a competitor page
generate a sales email
create an AIBS diagnostic report section
turn transcript content into a content brief
identify compliance risk
route a task to a Brain
extract structured data from a messy file
create a buyer question map
generate a client opportunity score

Rule

If the prompt purpose is vague, the output will be vague.

2. Prompt Type Layer

MWMS must choose the right prompt type for the task.

Not every task needs a large prompt.

Not every task should be broken into many prompts.

Prompt Type 1: Conversational Prompt

Used for:

exploration
brainstorming
early thinking
clarification
manual coaching
one-off analysis
interactive development

Weakness:

inconsistent
hard to automate
depends on human steering
poor as a reusable asset

Prompt Type 2: One Shot Prompt

Used for:

repeatable automations
standard tasks
structured outputs
system workflows
AI Employee operations
API calls

Strength:

reusable
testable
can be versioned
supports automation

Prompt Type 3: Atomic Prompt

Used for:

formatting
classification
tagging
small extraction
simple transformation
routing
binary decisions
sentiment detection

Examples:

“Classify this lead as qualified, unqualified, or needs review.”
“Extract the company name.”
“Add line breaks for mobile readability.”
“Return only JSON.”

Prompt Type 4: Compound Prompt

Used for:

complex writing
diagnostic reports
content generation
sales page analysis
structured research
multi-criteria reasoning
nuanced output control

Examples:

course absorption framework prompt
AIBS diagnostic report prompt
authority content generation prompt
LinkedIn Sales Navigator query parser
ad creative analysis prompt

Rule

Use the smallest prompt that reliably performs the task, but not smaller.

3. Input And Variable Layer

A reliable prompt separates fixed instructions from dynamic inputs.

Fixed instructions should define the task.

Dynamic inputs should contain the changing data.

Common Dynamic Variables

Variables may include:

source text
transcript
buyer avatar
product name
offer details
client name
industry
target market
hook
outline
prior output
examples
tone
content type
platform
desired format
data record
CRM fields
campaign name
previous result
task metadata

Variable Design Questions

Ask:

What changes each run?
What stays the same?
Which variables are required?
Which variables are optional?
What happens if a variable is missing?
Does the model know where the input starts and ends?
Does the prompt separate instructions from data?
Can this be safely used in an automation?
Is the variable name clear to a developer or future AI Employee?

Variable Rule

Dynamic input must be clearly separated from prompt instructions.

4. Context And Knowledge Layer

AI needs context to reduce guessing.

The model is a pattern recognition and prediction system.

If MWMS does not provide the right patterns, the model will use generic patterns from training data.

That can create weak, generic, or misleading outputs.

Context Types

Provide context such as:

business context
buyer context
offer context
Brain context
task context
source context
industry context
platform context
audience context
prior decisions
examples
definitions
frameworks
constraints
known risks

Import Method

The import method means bringing specialist knowledge into the prompt.

This may come from:

course material
internal SOPs
expert interviews
past winning examples
client documents
platform rules
research notes
sales call notes
content swipe files
proven frameworks
product documentation
market research
MWMS Canon pages
MCR pages

Import Method Rule

When public model knowledge is too generic, MWMS must import specialist knowledge.

5. Guideline And Constraint Layer

Prompts need clear rules.

Guidelines tell the AI what to do.

Constraints tell the AI what not to do.

Older models often struggled with negative instructions, but stronger modern models can often follow both positive and negative rules.

MWMS should still prefer clear positive instructions and use negative constraints where necessary.

Guideline Types

Use:

style guidelines
formatting guidelines
reasoning guidelines
evidence guidelines
output guidelines
tone guidelines
audience guidelines
compliance guidelines
exclusion guidelines
quality guidelines
workflow guidelines

Constraint Examples

Do not:

invent facts
include unsupported claims
use hype
add unverified statistics
mention banned product details
create legal or medical certainty
output outside the requested format
include irrelevant commentary
use platform-risk wording
expose private data
change the title format

Rule

Guidelines and constraints should remove ambiguity before the model creates the output.

6. Example And Tell And Show Layer

The tell and show method is one of the most important prompt quality controls.

Do not only tell the AI what to do.

Show it what good looks like.

Tell And Show Structure

Use:

Explain the rule.
Show a good example.
Show a bad example if useful.
Explain why the good example is better.
Ask the model to follow the pattern.

Example Types

Examples may include:

ideal output
bad output
before/after rewrite
correct JSON format
desired paragraph style
classification examples
hook examples
CTA examples
email examples
report sections
analysis examples
tone examples
formatting examples

Example Quality Questions

Ask:

Does this example reflect the output we actually want?
Is the example current?
Is the example relevant to this task?
Does the example include the desired structure?
Does it show the correct tone?
Does it show what not to do?
Is the example too generic?
Is the example legally or compliance safe?

Rule

When output style or structure matters, include examples.

7. Deconstruction And Chain Layer

Complex tasks should often be broken into smaller prompts.

This is the deconstruction method.

Instead of asking AI to complete a complex task in one pass, MWMS should split the task into staged outputs.

Deconstruction Examples

For a YouTube script:

Analyze source material.
Extract buyer pain.
Generate hook options.
Select strongest hook.
Create outline.
Write opening.
Write body sections.
Write CTA.
Review compliance.
Final polish.

For a course absorption block:

Identify source themes.
Extract valuable frameworks.
Compare against MWMS existing knowledge.
Decide absorb / merge / park / ignore.
Generate page candidates.
Draft full page.
Create registry entry.
Park deferred updates.

For an AIBS diagnostic:

Read intake.
Identify business context.
Map leakage categories.
Score opportunities.
Assess AI readiness.
Recommend first project.
Draft diagnostic report.
Draft proposal path.

Prompt Chaining

Prompt chaining means the output of one prompt becomes the input to another.

Use chaining when:

quality improves through stages
the task needs deep focus
the output is too complex for one prompt
each stage needs separate evaluation
different models may suit different stages
human approval is needed between steps

Rule

Break complex tasks into prompt chains when one prompt cannot reliably produce high-quality output.

8. Output Format Layer

Automation prompts need predictable output.

The output format must be clearly defined.

If the next system expects JSON, the prompt must output JSON.

If the next system expects a report, the prompt must output the right report structure.

If the next system expects classification, the prompt must output only the allowed labels.

Output Format Types

Use:

plain text
markdown
JSON
table
bullet list
scored result
label only
sectioned report
summary block
email format
script format
page format
CSV-like structure
WordPress-ready page output

Output Format Questions

Ask:

Who or what uses this output next?
Does the output need to be parsed by software?
Does it need to be copied into WordPress?
Does it need to be read by a human?
Does it need exact headings?
Does it need a fixed schema?
Does it need to avoid extra commentary?
Does it need error handling?
Does it need a confidence field?
Does it need source references?

Rule

The prompt must define the output format as tightly as the workflow requires.

9. Model Selection Layer

Different models perform differently on different tasks.

MWMS should not assume the newest or most expensive model is always best.

Some tasks need the strongest reasoning model.

Some tasks need fast low-cost classification.

Some tasks need long context.

Some tasks need writing quality.

Some tasks need strict formatting.

Some tasks need low latency.

Model Selection Criteria

Choose based on:

task complexity
context length
output length
instruction-following
formatting reliability
cost
latency
creativity needed
reasoning needed
classification consistency
language support
privacy requirements
tool compatibility
API availability

Model Testing Questions

Ask:

Which model gives the most consistent output?
Which model follows format best?
Which model handles the context length?
Which model is affordable at scale?
Which model is fast enough?
Which model fails least often?
Which model handles the language best?
Which model works best for this specific prompt?

Rule

Model choice must be tested against the use case, not assumed.

10. Testing And Iteration Layer

Prompt engineering is iterative.

A prompt should be improved through testing, not guesswork.

Prompt Testing Process

Use:

Define the expected output.
Create test inputs.
Run the prompt.
Review the output.
Identify failure patterns.
Adjust prompt structure.
Add examples where needed.
Adjust constraints.
Test another model if needed.
Repeat until reliable enough.

Prompt Testing Inputs

Test against:

ideal input
messy input
short input
long input
ambiguous input
missing data
conflicting data
edge cases
high-risk examples
real production samples
past failure examples

Scientific Method Standard

Prompt improvement should follow:

hypothesis
test
observation
adjustment
retest
record

Rule

Do not deploy an important automation prompt after one successful test.

11. Cost Latency And Scale Layer

Prompt quality must be balanced against cost and speed.

A prompt that works well once may become too expensive at scale.

A prompt chain that produces excellent output may be too slow for a real-time workflow.

MWMS must decide the right balance.

Cost Factors

Costs may increase with:

long prompts
large examples
long context
chain-of-thought/planning outputs
multiple prompt steps
expensive models
repeated context in each step
large output length
retries
failed outputs

Latency Factors

Latency may increase with:

large context
multi-step chains
slow models
long output
tool calls
external API calls
validation steps
human approval stages

Quality Versus Cost Questions

Ask:

How often will this prompt run?
How much does each run cost?
What does a failed output cost?
Is this output client-facing?
Is this output revenue-related?
Is this output high-risk?
Can a cheaper model do the task?
Can context be reduced?
Can prompts be stacked safely?
Should prompts be deconstructed for quality?
Is speed more important than depth?

Rule

High-value outputs can justify higher prompt cost. Low-value repetitive outputs need cost discipline.

12. Observability And Governance Layer

MWMS must track prompt performance.

A prompt hidden inside an automation should not become invisible.

Important prompts need metadata, logging, versioning, and review.

Prompt Metadata Fields

Track:

Prompt Name:
Prompt Version:
Brain / Employee:
Workflow:
Prompt Type:
Model Used:
Input Variables:
Output Format:
Test Status:
Average Cost:
Average Latency:
Failure Modes:
Last Reviewed:
Owner:
Change Notes:

Observability Questions

Ask:

Which prompt generated this output?
Which version was used?
Which model was used?
What input was passed?
How much did it cost?
How long did it take?
Did it pass validation?
Did it fail formatting?
Was human review required?
Was the output accepted or corrected?
What changed since the last version?

Rule

A production prompt should be traceable.

Prompt Asset Standard

A prompt becomes an MWMS prompt asset only when it has:

clear purpose
defined owner
defined Brain or Employee
stable prompt text
input variables
output format
quality criteria
examples where needed
test inputs
model selection notes
version number
cost/latency awareness
failure handling
review date

Rule

Prompt assets should be stored, versioned, and reused.

Prompt Liability Warning

A prompt becomes a liability when it:

is rewritten every time
lives only in chat history
has no version
has no owner
has no test examples
creates inconsistent outputs
requires manual fixing
mixes instructions and data poorly
lacks output format
uses vague wording
cannot be audited
cannot be reused
creates cost without visibility

Rule

Prompt liabilities must be converted into prompt assets or removed from production workflows.

Atomic Prompt Standard

Use atomic prompts for narrow tasks.

Good Atomic Prompt Uses

Use for:

classification
formatting
line breaks
tag assignment
sentiment label
yes/no decision
extracting one field
routing a task
simple rewrite
compliance flag
deduplication check

Atomic Prompt Requirements

An atomic prompt should define:

allowed output labels
exact output format
examples if classification matters
what to do if uncertain
no extra commentary rule

Rule

Atomic prompts should be small, clear, and easy to validate.

Compound Prompt Standard

Use compound prompts for complex tasks.

Good Compound Prompt Uses

Use for:

structured reports
content creation
sales page analysis
diagnostic output
course absorption
research synthesis
competitor analysis
prompt-to-query conversion
detailed rewrite
strategy creation
multi-factor scoring

Compound Prompt Requirements

A compound prompt should include:

identity
task
context
input variables
guidelines
examples
output format
scoring rules if needed
constraints
failure instructions

Rule

Compound prompts should be structured in clear sections.

Deconstruction Method Standard

Use deconstruction when a task is too complex for one prompt.

Deconstruction Steps

Identify the full task.
Break it into smaller thinking steps.
Decide which steps need separate prompts.
Decide which outputs feed the next step.
Add validation or human review where needed.
Test each step separately.
Test the full chain.
Record failure points.

Deconstruction Rule

If one prompt produces generic or inconsistent output, break the task into stages.

Stacking Method Standard

Use stacking when multiple related instructions or prompt sections can safely live inside one prompt.

Stacking can reduce:

duplicated context
repeated API cost
latency
post-processing complexity
unnecessary prompt calls

But stacking can reduce quality if the prompt becomes overloaded.

Stacking Questions

Ask:

Can one prompt reliably handle this?
Does stacking reduce cost?
Does stacking reduce latency?
Does stacking reduce output quality?
Does stacking make the prompt harder to debug?
Does each task still get enough attention?
Would deconstruction produce better quality?

Rule

Stack only when output quality remains stable.

Tell And Show Method Standard

Use tell and show when output style, structure, or classification accuracy matters.

Tell And Show Template

Instruction:
Describe the rule.

Good Example:
Show the desired output.

Why It Works:
Explain the pattern.

Bad Example:
Show what to avoid if useful.

Task:
Ask the model to apply the pattern.

Rule

When the model keeps missing the target, add better examples before adding more vague instructions.

Import Method Standard

Use the import method when generic model knowledge is not enough.

Import Sources

Import from:

MWMS Canon pages
MCR pages
course notes
expert interviews
SOPs
client documents
winning ads
winning content
proven sales emails
industry rules
platform documentation
research reports
audience language
customer reviews
competitor examples

Import Rule

The better the imported knowledge, the better the prompt can perform.

Planning Method Standard

The planning method asks the model to think through the task before creating the final output.

This is useful for:

analysis
classification
content planning
report generation
diagnostic scoring
research synthesis
opportunity discovery

Planning Output Caution

Planning can improve quality but increase cost and output length.

For production systems, MWMS may need to:

keep planning internal
parse only the final answer
use shorter planning steps
use a cheaper model for planning
suppress unnecessary reasoning in final output

Rule

Use planning when quality matters more than minimal token cost.

Anti Keyword Staining Standard

Some common words can bias the model toward weak generic outputs.

For example, words such as:

tweet
post
headline
blog
article
caption
sales email
motivational
viral

may cause the model to imitate low-quality public training data.

Anti Keyword Staining Method

Instead of relying on generic labels, describe the actual output.

Examples:

Instead of:

“Write a tweet.”

Use:

“Write a concise short-form piece of copy designed to create curiosity and one clear takeaway.”

Instead of:

“Write a headline.”

Use:

“Write a single-sentence attention hook that names the pain and implies a specific benefit.”

Instead of:

“Write a blog post.”

Use:

“Write a structured answer-first guide for a problem-aware buyer.”

Rule

Use task-specific language when generic content labels produce generic outputs.

Prompt Chain Standard

Every prompt chain should define:

Chain Name:
Workflow:
Brain / Employee:
Step 1 Prompt:
Step 1 Output:
Step 2 Prompt:
Step 2 Output:
Step 3 Prompt:
Step 3 Output:
Human Review Point:
Validation Rules:
Failure Handling:
Final Output:

Prompt Chain Rule

Each step in a prompt chain should have a clear reason to exist.

Model Testing Standard

Before deploying a prompt, test multiple models where appropriate.

Model Testing Template

Prompt Name:
Task:
Test Input:
Model Tested:
Output Quality Score:
Formatting Score:
Consistency Score:
Cost:
Latency:
Failure Notes:
Decision: Use / Reject / Retest

Rule

The best model is the model that performs best for the specific task, not the newest model by default.

Prompt Iteration Log

Every important prompt should keep an iteration log.

Iteration Log Template

Prompt Name:
Version:
Date:
Change Made:
Reason For Change:
Test Inputs Used:
Result:
Failure Fixed:
New Failure Created:
Decision: Keep / Revert / Retest
Owner:

Rule

Prompt improvements should be recorded so MWMS does not lose learning.

Prompt Quality Scorecard

Score important prompts out of 100.

Interpretation

85–100: Production ready
70–84: Good; monitor and improve
55–69: Usable with human review
40–54: Needs rewrite before automation
Below 40: Do not deploy

Rule

A prompt should pass the scorecard before becoming part of an important automation.

Automation Prompt Readiness Checklist

Before a prompt is used in automation, confirm:

Purpose

task is clear
workflow is clear
owner is clear
Brain / Employee is clear

Input

variables are defined
required inputs are clear
missing input handling exists
source boundaries are clear

Instructions

guidelines are specific
constraints are clear
examples are included where needed
output format is defined

Testing

multiple test inputs used
edge cases tested
output quality reviewed
model choice tested
cost and latency checked

Governance

prompt version recorded
failure modes documented
human review point defined if needed
observability fields defined
change log started

Rule

No important automation prompt should go live without readiness review.

Content Prompt Flow Standard

Content prompts should usually be chained, not written in one pass.

Example Content Flow

Source content collection
Proven content analysis
Audience and psychographic extraction
Hook generation
Hook selection
Outline generation
Body section generation
Persuasion layer
CTA generation
Editing and compliance review
Final output

Rule

For content systems, first validate the process manually, then convert the proven process into prompts.

AIBS Diagnostic Prompt Flow Standard

AIBS prompts should support diagnostic-first thinking.

Example AIBS Diagnostic Flow

Parse client intake.
Identify business model.
Identify stated problem.
Identify possible deeper problems.
Map leakage categories.
Review data readiness.
Score AI readiness.
Score opportunities.
Recommend first project.
Draft diagnostic report.
Draft proposal summary.

Rule

AIBS prompts should diagnose before recommending AI implementation.

Research Prompt Flow Standard

Research prompts should separate extraction, interpretation, and recommendation.

Example Research Flow

Extract facts.
Identify source type.
Identify claims.
Score credibility.
Summarize key findings.
Identify business relevance.
Route to Brain.
Recommend action.

Rule

Do not mix raw extraction and strategic recommendation unless the prompt is tested for both.

Compliance Prompt Flow Standard

Compliance prompts should be conservative.

Compliance Prompt Requirements

Compliance prompts should:

identify claims
classify risk
identify missing proof
flag sensitive categories
avoid legal certainty
recommend human review where needed
preserve source evidence
output clear risk labels

Rule

Compliance prompts should flag risk, not pretend to replace professional legal review.

Prompt Failure Modes

Common prompt failures include:

generic output
format drift
hallucinated facts
missing sections
inconsistent scoring
wrong tone
overlong output
too-short output
ignoring constraints
mixing examples into output
poor parsing
invalid JSON
weak classification
too much creativity
not enough specificity
output not suitable for next workflow step

Rule

Failure modes should be recorded and used to improve prompt versions.

Prompt Debugging Checklist

When a prompt fails, ask:

Was the purpose clear?
Was the input clear?
Was the context enough?
Were examples provided?
Was the output format too vague?
Was the task too complex for one prompt?
Should the task be deconstructed?
Was the prompt overloaded?
Should parts be stacked or separated?
Was the model wrong for the task?
Was the temperature too high?
Was there conflicting instruction?
Did generic keywords bias the output?
Was specialist knowledge missing?
Was the prompt tested on enough examples?

Rule

When the model fails, first inspect the prompt architecture before blaming the model.

Prompt Governance Roles

Prompt Owner

Responsible for prompt purpose, quality, and updates.

Prompt User

Uses the prompt inside a workflow.

Prompt Reviewer

Tests output quality and failure modes.

Compliance Reviewer

Reviews sensitive prompts for risk.

Data Reviewer

Checks inputs and outputs where structured data is involved.

HeadOffice

Approves important prompt standards and prevents prompt chaos.

Rule

Important prompts need ownership.

Application To Prompting Framework

Prompting Framework owns this standard.

Prompting Framework should use it to:

structure reusable prompts
define prompt assets
prevent prompt liabilities
standardize prompt chains
guide model testing
improve output reliability

Prompting Framework Rule

Prompting Framework must make prompt quality repeatable.

Application To AI Employee Canon

AI Employee Canon should use this framework to define prompt requirements for every AI Employee.

Each AI Employee should have:

role prompt
task prompts
output templates
failure handling
evaluation criteria
prompt version history
model choice notes

AI Employee Rule

An AI Employee is only as reliable as its prompt architecture.

Application To Automation Brain

Automation Brain should use this framework before deploying prompt-based automations.

Automation Brain should check:

prompt type
chain structure
input variables
output parsing
model cost
latency
failure handling
human review points

Automation Brain Rule

Automation Brain must not automate unstable prompts.

Application To AIBS Brain

AIBS Brain should use this framework for client diagnostics, reports, and AIOS workflows.

AIBS prompts must:

diagnose before recommending
use client context safely
respect privacy boundaries
output structured reports
support opportunity scoring
avoid unsupported claims

AIBS Rule

AIBS prompt systems must be reliable enough for client-facing work.

Application To Content Brain

Content Brain should use prompt chains for high-quality content production.

Content prompts should use:

proven content analysis
imported specialist knowledge
examples
deconstruction
hook generation
outline generation
persuasive layer
compliance review

Content Brain Rule

Content Brain should not rely on one-pass generic content prompts for important assets.

Application To Ads Brain

Ads Brain should use prompt architecture for ad creative generation and analysis.

Ads prompts should define:

platform
buyer
awareness level
hook type
compliance constraints
output format
variation count
testing hypothesis

Ads Brain Rule

Ads prompts should create testable creative assets, not random ad copy.

Application To Research Brain

Research Brain should use structured prompts for extraction, synthesis, and recommendation.

Research prompts should:

separate fact extraction from interpretation
preserve source context
classify business relevance
route insights to the correct Brain
identify uncertainty

Research Brain Rule

Research prompts must protect evidence quality.

Application To Data Brain

Data Brain should use this framework for structured extraction, classification, and metadata creation.

Data prompts should:

output parseable structure
define field names
handle missing data
avoid invented fields
record confidence where needed

Data Brain Rule

Data prompts should produce structured, auditable outputs.

Application To Experimentation Brain

Experimentation Brain should test prompt performance like any other system experiment.

Experimentation should test:

model choice
prompt version
example count
output format
chain design
cost
latency
accuracy
consistency

Experimentation Brain Rule

Prompt changes should be treated as experiments when they affect important outputs.

Application To Compliance And Risk Brain

Compliance and Risk Brain should review prompts that affect sensitive outputs.

Review prompts for:

claims
privacy
regulated topics
financial assumptions
health claims
legal claims
affiliate claims
client data use
AI processing risk
hallucination risk

Compliance Rule

Sensitive prompts need compliance-aware constraints and review.

Application To HeadOffice Brain

HeadOffice governs prompt quality across MWMS.

HeadOffice should ask:

is this prompt reusable?
is it tested?
is it versioned?
is it observable?
is it too expensive?
is it reliable enough?
does it create risk?
does it support MWMS strategy?
does it protect M from unnecessary rework?

HeadOffice Rule

HeadOffice must prevent MWMS from building on unstable prompt foundations.

Deferred Update And Parking Lot Section

This page creates later update needs.

Later Update 1: MWMS Prompting Framework

Add:

prompt assets versus prompt liabilities
one-shot prompt standard for automations
prompt deconstruction and chaining
tell and show examples
import method
anti-keyword staining
model selection testing
prompt iteration logs

Later Update 2: MWMS AI Employee Evaluation Scorecard Standard

Add:

prompt reliability score
output consistency score
model fit score
prompt iteration count
example coverage score
formatting reliability score
failure-case testing
prompt chain quality

Later Update 3: MWMS AI Observability Metadata Standard

Add:

prompt version
prompt chain step
model used
input token estimate
output token estimate
cost
latency
retry count
validation status
human review flag
revision history

Later Update 4: MWMS AI Usage And Cost Visibility Standard

Add:

prompt chain cost tracking
high-volume prompt review
context duplication warning
stacking versus deconstruction cost comparison
model cost comparison
cost per successful output

Later Update 5: MWMS Buyer First Authority Content And Channel Growth Framework

Add:

proven content deconstruction
scrape/analyze/repurpose workflow
example-led content prompting
caption/video/script prompt flows
validate manually before automation
output quality review against real performance data

Later Update 6: MWMS AIBS Business Diagnostic And Opportunity Discovery Framework

Add:

diagnostic prompt chain
client intake parsing prompt
opportunity scoring prompt
AI readiness prompt
report generation prompt
privacy-aware prompt constraints

Future Employee Ideas

Prompt Architecture Auditor
Prompt Chain Designer
Prompt Quality Evaluator
Prompt Cost And Latency Analyst
Specialist Knowledge Injector
Prompt Observability Steward
AI Employee Prompt Reviewer
Prompt Failure Mode Analyst
Model Selection Tester
Prompt Asset Librarian

Drift Protection

This framework protects MWMS from:

treating prompts as casual chat messages
building automations on vague prompts
relying on one good output as proof
ignoring prompt failures
not versioning prompts
not testing models
not tracking cost
not tracking latency
creating prompt chains with no structure
putting too much into one prompt
splitting prompts unnecessarily
using generic training data when specialist knowledge is needed
forgetting examples
creating outputs that cannot be parsed
making AI Employees unreliable
forcing M to fix prompt-driven mistakes manually
deploying client-facing prompts before testing
losing prompt knowledge inside chat history

Drift Signals

Watch for:

“Just ask ChatGPT to do it.”
“The prompt worked once, so it is ready.”
“We can fix it manually later.”
“No need to version it.”
“The model should know what I mean.”
“The prompt is long, so it must be good.”
“The prompt is short, so it must be efficient.”
“We do not need examples.”
“We do not know which model is being used.”
“We do not know what this prompt costs.”
“The automation output changes every time.”
“The output format keeps breaking.”
“The prompt lives only in a chat thread.”
“The AI Employee is unreliable but nobody knows why.”

Rule

If the prompt cannot be tested, versioned, and explained, it is not ready for serious automation.

Strategic Summary

This framework captures the strongest useful lessons from the Master Prompting w Devin block.

The key lesson is:

Prompt engineering for automations is not about clever wording. It is about designing reliable prompt systems.

MWMS should treat prompts as assets that can be:

built
tested
improved
chained
versioned
scored
logged
monitored
reused
governed

The block showed that powerful AI automation depends on:

one-shot prompts for repeatability
atomic prompts for small tasks
compound prompts for complex tasks
deconstruction for quality
stacking for efficiency
tell and show examples
imported specialist knowledge
planning before output
anti-keyword staining where generic terms hurt quality
chaining outputs across steps
model testing
daily practice and iteration

For MWMS, this strengthens every Brain.

It improves AI Employees.

It improves course absorption.

It improves content production.

It improves AIBS diagnostics.

It improves automation reliability.

It improves observability and cost control.

The most important system-level standard is:

Every important AI automation must have prompt architecture, not just a prompt.

Final Standard

The MWMS final standard is:

Every important MWMS prompt used in an AI Employee, automation, report, content system, diagnostic system, research system, or client-facing workflow must be designed as a reusable prompt asset with clear purpose, structured inputs, imported context where needed, specific guidelines, examples, defined output format, model testing, iteration history, cost and latency awareness, failure handling, and observability metadata.

A valid MWMS prompt asset must define:

prompt name
purpose
Brain / Employee
workflow
prompt type
input variables
context
guidelines
examples
output format
model
test inputs
quality criteria
cost / latency notes
failure modes
version
owner
last reviewed date

That is the MWMS Prompt Architecture And Automation Output Reliability standard.

Change Log

Version: v1.0

Date: 2026-06-08
Author: HeadOffice

Change:
Created the MWMS Prompt Architecture And Automation Output Reliability Framework from the AI Automations by Jack Master Prompting And Prompt System Design Block.

Captured the strongest lessons from:

Master Prompting w Devin Part 1
Master Prompting w Devin Part 2

Defined the MWMS Prompt Architecture And Automation Output Reliability Model with twelve layers:

Prompt Purpose Layer
Prompt Type Layer
Input And Variable Layer
Context And Knowledge Layer
Guideline And Constraint Layer
Example And Tell And Show Layer
Deconstruction And Chain Layer
Output Format Layer
Model Selection Layer
Testing And Iteration Layer
Cost Latency And Scale Layer
Observability And Governance Layer

Added key operating sections:

Prompt Asset Standard
Prompt Liability Warning
Atomic Prompt Standard
Compound Prompt Standard
Deconstruction Method Standard
Stacking Method Standard
Tell And Show Method Standard
Import Method Standard
Planning Method Standard
Anti Keyword Staining Standard
Prompt Chain Standard
Model Testing Standard
Prompt Iteration Log
Prompt Quality Scorecard
Automation Prompt Readiness Checklist
Content Prompt Flow Standard
AIBS Diagnostic Prompt Flow Standard
Research Prompt Flow Standard
Compliance Prompt Flow Standard
Prompt Failure Modes
Prompt Debugging Checklist
Prompt Governance Roles
Deferred Update And Parking Lot Section

Mapped the framework across:

Prompting Framework
AI Employee Canon
Automation Brain
AIBS Brain
Content Brain
Ads Brain
Research Brain
Data Brain
Experimentation Brain
Compliance Brain
Risk Brain
HeadOffice Brain

Purpose of creation:
To establish a formal MWMS standard for designing, testing, chaining, versioning, and governing prompt systems so MWMS AI Employees and automations produce reliable, consistent, cost-aware, observable, and high-quality outputs.

END — MWMS PROMPT ARCHITECTURE AND AUTOMATION OUTPUT RELIABILITY FRAMEWORK v1.0