MWMS Prompt Architecture And Automation Output Reliability Framework

System: MWMS
Document Type: Operating Framework
Authority Level: MCR Source Of Truth
Status: Draft For MCR
Version: v1.0
Primary Location: MCR
Future Operational Destination: Prompting Framework, HeadOffice Brain, Automation Brain, AIBS Brain, Content Brain, Ads Brain, Research Brain, Data Brain, Experimentation Brain, AI Employee Canon, Compliance Brain, Risk Brain
Parent Page: Prompting Framework
Owner: Martyn
Developer Boundary: Do Not Touch M’s Active Build Areas Unless Specifically Assigned
Source Of Truth: MCR
Last Reviewed: 2026-06-08
Source / Origin: AI Automations by Jack Master Prompting And Prompt System Design Block / Master Prompting w Devin Part 1 / Master Prompting w Devin Part 2
MWMS Classification: Prompt Architecture Framework / Automation Output Reliability Standard / AI Employee Prompt Governance / Prompt Chain Design System / Prompt Quality Control Framework
Primary Brain: MWMS Prompting Framework
Supporting Brains: HeadOffice Brain, Automation Brain, AIBS Brain, Content Brain, Ads Brain, Research Brain, Data Brain, Experimentation Brain, AI Employee Canon, Compliance Brain, Risk Brain

Related Pages: MWMS Prompting Framework, MWMS AI Employee Evaluation Scorecard Standard, MWMS AI Observability Metadata Standard, MWMS AI Work Session Persistence Standard, MWMS Agent Loop Control Framework, MWMS Next Action Picker Standard, MWMS AI Usage And Cost Visibility Standard, MWMS Source Visibility And Evidence Display Standard, MWMS Buyer First Authority Content And Channel Growth Framework, MWMS AIBS Business Diagnostic And Opportunity Discovery Framework, MWMS AIOS Lead Capture And Conversion Infrastructure Framework, HeadOffice Kaizen Continuous Improvement Loop

Source Evidence: This framework is derived from the Master Prompting w Devin Part 1 and Part 2 training inside AI Automations by Jack. The lessons covered conversational prompting versus one-shot prompting, atomic and compound prompts, prompt deconstruction, prompt stacking, tell-and-show examples, import method, train-of-thought/planning methods, anti-keyword staining, prompt chaining, model selection, output quality testing, iteration, and prompt engineering as a system-building skill for reliable AI automations.


Purpose

The purpose of the MWMS Prompt Architecture And Automation Output Reliability Framework is to define how MWMS designs, tests, improves, and governs prompts used inside AI Employees, automations, content systems, research systems, client diagnostics, and operational workflows.

This framework exists because prompts inside automations are not casual chat messages.

A casual prompt may work once.

An automation prompt must work repeatedly.

A casual prompt can be adjusted manually.

An automation prompt must behave predictably without constant human correction.

A casual prompt can be exploratory.

An automation prompt must be structured, tested, versioned, and reliable.

MWMS must therefore treat prompts as system assets.

The core purpose is:

To make every important MWMS prompt reusable, testable, observable, reliable, cost-aware, and suitable for automation.


Core Doctrine

The MWMS doctrine is:

A prompt used inside an automation is not a conversation. It is system architecture.

MWMS should not rely on vague prompts such as:

  • “Write this better.”
  • “Summarize this.”
  • “Make a good post.”
  • “Analyze this file.”
  • “Create a script.”
  • “Find the best ideas.”
  • “Give me a report.”
  • “Act like an expert.”
  • “Make it sound professional.”

Those may work in a chat, but they are weak inside repeatable systems.

For automations, MWMS needs prompts that define:

  • role
  • task
  • context
  • input variables
  • output format
  • quality standards
  • examples
  • constraints
  • failure handling
  • model choice
  • cost boundaries
  • expected structure
  • evaluation method
  • version history

The prompt must be designed to reduce guessing.

The less the AI has to guess, the more reliable the automation becomes.


Strategic Importance

This framework is strategically important because MWMS is building a system of Brains and AI Employees.

Those AI Employees will depend on prompts.

If prompts are weak, the AI Employees will be weak.

If prompts are inconsistent, the outputs will be inconsistent.

If prompts are not tested, errors will enter the system.

If prompt chains are badly designed, automations will become unreliable.

If prompts are too expensive, scaling becomes costly.

If prompts are too vague, M and Martyn will waste time correcting outputs.

Prompt quality affects:

  • AI Employee performance
  • content output quality
  • automation reliability
  • research quality
  • client diagnostic quality
  • AIBS recommendations
  • ad creative quality
  • newsletter intelligence extraction
  • course absorption quality
  • data classification
  • report generation
  • sales and outreach workflows
  • client-facing deliverables
  • system trust

This framework therefore becomes a core infrastructure layer for the whole MWMS ecosystem.

The strategic lesson is:

Prompt quality is system quality.


Definition

Prompt architecture is the structured design of a prompt or prompt chain so it reliably performs a defined task inside a repeatable workflow.

Prompt asset is a tested, reusable, documented, versioned prompt that improves workflow performance.

Prompt liability is an informal or poorly structured prompt that requires constant rewriting, manual correction, or inconsistent human interpretation.

Atomic prompt is a small prompt designed for a narrow task such as formatting, classification, extraction, sentiment tagging, or simple rewriting.

Compound prompt is a larger structured prompt that combines multiple guidelines, context sections, examples, formatting instructions, and task logic to produce a more nuanced output.

Prompt chain is a sequence where the output of one prompt becomes the input or context for another prompt.

MWMS Definition

The MWMS Prompt Architecture And Automation Output Reliability Framework is:

Prompting Framework’s standard for designing, testing, chaining, versioning, and governing prompts so MWMS AI Employees and automations produce consistent, high-quality, cost-aware, and reliable outputs.


Scope

This framework applies to:

  • AI Employee prompts
  • automation prompts
  • Make.com prompts
  • n8n prompts
  • OpenAI API prompts
  • Claude prompts
  • Gemini prompts
  • content generation prompts
  • research prompts
  • classification prompts
  • extraction prompts
  • newsletter analysis prompts
  • course absorption prompts
  • sales prompts
  • cold email prompts
  • LinkedIn prompts
  • ad creative prompts
  • YouTube script prompts
  • landing page prompts
  • AIBS diagnostic prompts
  • client report prompts
  • dashboard insight prompts
  • data cleaning prompts
  • prompt chains
  • model selection
  • prompt cost control
  • prompt testing
  • prompt versioning
  • prompt observability
  • future Prompt Vault systems

This framework applies whenever MWMS creates a prompt that may be reused or automated.


Core Principle

The core principle is:

Build prompts like reusable systems, not disposable messages.

A prompt should not be considered complete just because it produces one good output.

It should be considered complete only when it can repeatedly produce the right output under realistic input variation.

Rule

A prompt is not reliable until it has been tested against multiple realistic inputs.


The MWMS Prompt Architecture And Automation Output Reliability Model

Every important prompt system should be designed across twelve layers:

  1. Prompt Purpose Layer
  2. Prompt Type Layer
  3. Input And Variable Layer
  4. Context And Knowledge Layer
  5. Guideline And Constraint Layer
  6. Example And Tell And Show Layer
  7. Deconstruction And Chain Layer
  8. Output Format Layer
  9. Model Selection Layer
  10. Testing And Iteration Layer
  11. Cost Latency And Scale Layer
  12. Observability And Governance Layer

1. Prompt Purpose Layer

Every prompt must have a clear job.

A prompt should not exist because “AI can do it.”

It should exist because MWMS needs a specific task performed.

Prompt Purpose Questions

Ask:

  • What is this prompt supposed to do?
  • What workflow does it support?
  • Which Brain or AI Employee uses it?
  • What business outcome does it support?
  • What input will it receive?
  • What output must it produce?
  • Who or what consumes the output?
  • What happens if the output is wrong?
  • How often will this prompt run?
  • Is this exploratory or production-grade?
  • Is this prompt temporary or reusable?
  • Does this need human review?

Prompt Purpose Examples

A prompt may exist to:

  • classify a lead
  • extract course insights
  • summarize a newsletter
  • score an offer
  • write a YouTube hook
  • analyze a competitor page
  • generate a sales email
  • create an AIBS diagnostic report section
  • turn transcript content into a content brief
  • identify compliance risk
  • route a task to a Brain
  • extract structured data from a messy file
  • create a buyer question map
  • generate a client opportunity score

Rule

If the prompt purpose is vague, the output will be vague.


2. Prompt Type Layer

MWMS must choose the right prompt type for the task.

Not every task needs a large prompt.

Not every task should be broken into many prompts.

Prompt Type 1: Conversational Prompt

Used for:

  • exploration
  • brainstorming
  • early thinking
  • clarification
  • manual coaching
  • one-off analysis
  • interactive development

Weakness:

  • inconsistent
  • hard to automate
  • depends on human steering
  • poor as a reusable asset

Prompt Type 2: One Shot Prompt

Used for:

  • repeatable automations
  • standard tasks
  • structured outputs
  • system workflows
  • AI Employee operations
  • API calls

Strength:

  • reusable
  • testable
  • can be versioned
  • supports automation

Prompt Type 3: Atomic Prompt

Used for:

  • formatting
  • classification
  • tagging
  • small extraction
  • simple transformation
  • routing
  • binary decisions
  • sentiment detection

Examples:

  • “Classify this lead as qualified, unqualified, or needs review.”
  • “Extract the company name.”
  • “Add line breaks for mobile readability.”
  • “Return only JSON.”

Prompt Type 4: Compound Prompt

Used for:

  • complex writing
  • diagnostic reports
  • content generation
  • sales page analysis
  • structured research
  • multi-criteria reasoning
  • nuanced output control

Examples:

  • course absorption framework prompt
  • AIBS diagnostic report prompt
  • authority content generation prompt
  • LinkedIn Sales Navigator query parser
  • ad creative analysis prompt

Rule

Use the smallest prompt that reliably performs the task, but not smaller.


3. Input And Variable Layer

A reliable prompt separates fixed instructions from dynamic inputs.

Fixed instructions should define the task.

Dynamic inputs should contain the changing data.

Common Dynamic Variables

Variables may include:

  • source text
  • transcript
  • buyer avatar
  • product name
  • offer details
  • client name
  • industry
  • target market
  • hook
  • outline
  • prior output
  • examples
  • tone
  • content type
  • platform
  • desired format
  • data record
  • CRM fields
  • campaign name
  • previous result
  • task metadata

Variable Design Questions

Ask:

  • What changes each run?
  • What stays the same?
  • Which variables are required?
  • Which variables are optional?
  • What happens if a variable is missing?
  • Does the model know where the input starts and ends?
  • Does the prompt separate instructions from data?
  • Can this be safely used in an automation?
  • Is the variable name clear to a developer or future AI Employee?

Variable Rule

Dynamic input must be clearly separated from prompt instructions.


4. Context And Knowledge Layer

AI needs context to reduce guessing.

The model is a pattern recognition and prediction system.

If MWMS does not provide the right patterns, the model will use generic patterns from training data.

That can create weak, generic, or misleading outputs.

Context Types

Provide context such as:

  • business context
  • buyer context
  • offer context
  • Brain context
  • task context
  • source context
  • industry context
  • platform context
  • audience context
  • prior decisions
  • examples
  • definitions
  • frameworks
  • constraints
  • known risks

Import Method

The import method means bringing specialist knowledge into the prompt.

This may come from:

  • course material
  • internal SOPs
  • expert interviews
  • past winning examples
  • client documents
  • platform rules
  • research notes
  • sales call notes
  • content swipe files
  • proven frameworks
  • product documentation
  • market research
  • MWMS Canon pages
  • MCR pages

Import Method Rule

When public model knowledge is too generic, MWMS must import specialist knowledge.


5. Guideline And Constraint Layer

Prompts need clear rules.

Guidelines tell the AI what to do.

Constraints tell the AI what not to do.

Older models often struggled with negative instructions, but stronger modern models can often follow both positive and negative rules.

MWMS should still prefer clear positive instructions and use negative constraints where necessary.

Guideline Types

Use:

  • style guidelines
  • formatting guidelines
  • reasoning guidelines
  • evidence guidelines
  • output guidelines
  • tone guidelines
  • audience guidelines
  • compliance guidelines
  • exclusion guidelines
  • quality guidelines
  • workflow guidelines

Constraint Examples

Do not:

  • invent facts
  • include unsupported claims
  • use hype
  • add unverified statistics
  • mention banned product details
  • create legal or medical certainty
  • output outside the requested format
  • include irrelevant commentary
  • use platform-risk wording
  • expose private data
  • change the title format

Rule

Guidelines and constraints should remove ambiguity before the model creates the output.


6. Example And Tell And Show Layer

The tell and show method is one of the most important prompt quality controls.

Do not only tell the AI what to do.

Show it what good looks like.

Tell And Show Structure

Use:

  1. Explain the rule.
  2. Show a good example.
  3. Show a bad example if useful.
  4. Explain why the good example is better.
  5. Ask the model to follow the pattern.

Example Types

Examples may include:

  • ideal output
  • bad output
  • before/after rewrite
  • correct JSON format
  • desired paragraph style
  • classification examples
  • hook examples
  • CTA examples
  • email examples
  • report sections
  • analysis examples
  • tone examples
  • formatting examples

Example Quality Questions

Ask:

  • Does this example reflect the output we actually want?
  • Is the example current?
  • Is the example relevant to this task?
  • Does the example include the desired structure?
  • Does it show the correct tone?
  • Does it show what not to do?
  • Is the example too generic?
  • Is the example legally or compliance safe?

Rule

When output style or structure matters, include examples.


7. Deconstruction And Chain Layer

Complex tasks should often be broken into smaller prompts.

This is the deconstruction method.

Instead of asking AI to complete a complex task in one pass, MWMS should split the task into staged outputs.

Deconstruction Examples

For a YouTube script:

  1. Analyze source material.
  2. Extract buyer pain.
  3. Generate hook options.
  4. Select strongest hook.
  5. Create outline.
  6. Write opening.
  7. Write body sections.
  8. Write CTA.
  9. Review compliance.
  10. Final polish.

For a course absorption block:

  1. Identify source themes.
  2. Extract valuable frameworks.
  3. Compare against MWMS existing knowledge.
  4. Decide absorb / merge / park / ignore.
  5. Generate page candidates.
  6. Draft full page.
  7. Create registry entry.
  8. Park deferred updates.

For an AIBS diagnostic:

  1. Read intake.
  2. Identify business context.
  3. Map leakage categories.
  4. Score opportunities.
  5. Assess AI readiness.
  6. Recommend first project.
  7. Draft diagnostic report.
  8. Draft proposal path.

Prompt Chaining

Prompt chaining means the output of one prompt becomes the input to another.

Use chaining when:

  • quality improves through stages
  • the task needs deep focus
  • the output is too complex for one prompt
  • each stage needs separate evaluation
  • different models may suit different stages
  • human approval is needed between steps

Rule

Break complex tasks into prompt chains when one prompt cannot reliably produce high-quality output.


8. Output Format Layer

Automation prompts need predictable output.

The output format must be clearly defined.

If the next system expects JSON, the prompt must output JSON.

If the next system expects a report, the prompt must output the right report structure.

If the next system expects classification, the prompt must output only the allowed labels.

Output Format Types

Use:

  • plain text
  • markdown
  • JSON
  • table
  • bullet list
  • scored result
  • label only
  • sectioned report
  • summary block
  • email format
  • script format
  • page format
  • CSV-like structure
  • WordPress-ready page output

Output Format Questions

Ask:

  • Who or what uses this output next?
  • Does the output need to be parsed by software?
  • Does it need to be copied into WordPress?
  • Does it need to be read by a human?
  • Does it need exact headings?
  • Does it need a fixed schema?
  • Does it need to avoid extra commentary?
  • Does it need error handling?
  • Does it need a confidence field?
  • Does it need source references?

Rule

The prompt must define the output format as tightly as the workflow requires.


9. Model Selection Layer

Different models perform differently on different tasks.

MWMS should not assume the newest or most expensive model is always best.

Some tasks need the strongest reasoning model.

Some tasks need fast low-cost classification.

Some tasks need long context.

Some tasks need writing quality.

Some tasks need strict formatting.

Some tasks need low latency.

Model Selection Criteria

Choose based on:

  • task complexity
  • context length
  • output length
  • instruction-following
  • formatting reliability
  • cost
  • latency
  • creativity needed
  • reasoning needed
  • classification consistency
  • language support
  • privacy requirements
  • tool compatibility
  • API availability

Model Testing Questions

Ask:

  • Which model gives the most consistent output?
  • Which model follows format best?
  • Which model handles the context length?
  • Which model is affordable at scale?
  • Which model is fast enough?
  • Which model fails least often?
  • Which model handles the language best?
  • Which model works best for this specific prompt?

Rule

Model choice must be tested against the use case, not assumed.


10. Testing And Iteration Layer

Prompt engineering is iterative.

A prompt should be improved through testing, not guesswork.

Prompt Testing Process

Use:

  1. Define the expected output.
  2. Create test inputs.
  3. Run the prompt.
  4. Review the output.
  5. Identify failure patterns.
  6. Adjust prompt structure.
  7. Add examples where needed.
  8. Adjust constraints.
  9. Test another model if needed.
  10. Repeat until reliable enough.

Prompt Testing Inputs

Test against:

  • ideal input
  • messy input
  • short input
  • long input
  • ambiguous input
  • missing data
  • conflicting data
  • edge cases
  • high-risk examples
  • real production samples
  • past failure examples

Scientific Method Standard

Prompt improvement should follow:

  • hypothesis
  • test
  • observation
  • adjustment
  • retest
  • record

Rule

Do not deploy an important automation prompt after one successful test.


11. Cost Latency And Scale Layer

Prompt quality must be balanced against cost and speed.

A prompt that works well once may become too expensive at scale.

A prompt chain that produces excellent output may be too slow for a real-time workflow.

MWMS must decide the right balance.

Cost Factors

Costs may increase with:

  • long prompts
  • large examples
  • long context
  • chain-of-thought/planning outputs
  • multiple prompt steps
  • expensive models
  • repeated context in each step
  • large output length
  • retries
  • failed outputs

Latency Factors

Latency may increase with:

  • large context
  • multi-step chains
  • slow models
  • long output
  • tool calls
  • external API calls
  • validation steps
  • human approval stages

Quality Versus Cost Questions

Ask:

  • How often will this prompt run?
  • How much does each run cost?
  • What does a failed output cost?
  • Is this output client-facing?
  • Is this output revenue-related?
  • Is this output high-risk?
  • Can a cheaper model do the task?
  • Can context be reduced?
  • Can prompts be stacked safely?
  • Should prompts be deconstructed for quality?
  • Is speed more important than depth?

Rule

High-value outputs can justify higher prompt cost. Low-value repetitive outputs need cost discipline.


12. Observability And Governance Layer

MWMS must track prompt performance.

A prompt hidden inside an automation should not become invisible.

Important prompts need metadata, logging, versioning, and review.

Prompt Metadata Fields

Track:

Prompt Name:
Prompt Version:
Brain / Employee:
Workflow:
Prompt Type:
Model Used:
Input Variables:
Output Format:
Test Status:
Average Cost:
Average Latency:
Failure Modes:
Last Reviewed:
Owner:
Change Notes:

Observability Questions

Ask:

  • Which prompt generated this output?
  • Which version was used?
  • Which model was used?
  • What input was passed?
  • How much did it cost?
  • How long did it take?
  • Did it pass validation?
  • Did it fail formatting?
  • Was human review required?
  • Was the output accepted or corrected?
  • What changed since the last version?

Rule

A production prompt should be traceable.


Prompt Asset Standard

A prompt becomes an MWMS prompt asset only when it has:

  • clear purpose
  • defined owner
  • defined Brain or Employee
  • stable prompt text
  • input variables
  • output format
  • quality criteria
  • examples where needed
  • test inputs
  • model selection notes
  • version number
  • cost/latency awareness
  • failure handling
  • review date

Rule

Prompt assets should be stored, versioned, and reused.


Prompt Liability Warning

A prompt becomes a liability when it:

  • is rewritten every time
  • lives only in chat history
  • has no version
  • has no owner
  • has no test examples
  • creates inconsistent outputs
  • requires manual fixing
  • mixes instructions and data poorly
  • lacks output format
  • uses vague wording
  • cannot be audited
  • cannot be reused
  • creates cost without visibility

Rule

Prompt liabilities must be converted into prompt assets or removed from production workflows.


Atomic Prompt Standard

Use atomic prompts for narrow tasks.

Good Atomic Prompt Uses

Use for:

  • classification
  • formatting
  • line breaks
  • tag assignment
  • sentiment label
  • yes/no decision
  • extracting one field
  • routing a task
  • simple rewrite
  • compliance flag
  • deduplication check

Atomic Prompt Requirements

An atomic prompt should define:

  • allowed output labels
  • exact output format
  • examples if classification matters
  • what to do if uncertain
  • no extra commentary rule

Rule

Atomic prompts should be small, clear, and easy to validate.


Compound Prompt Standard

Use compound prompts for complex tasks.

Good Compound Prompt Uses

Use for:

  • structured reports
  • content creation
  • sales page analysis
  • diagnostic output
  • course absorption
  • research synthesis
  • competitor analysis
  • prompt-to-query conversion
  • detailed rewrite
  • strategy creation
  • multi-factor scoring

Compound Prompt Requirements

A compound prompt should include:

  • identity
  • task
  • context
  • input variables
  • guidelines
  • examples
  • output format
  • scoring rules if needed
  • constraints
  • failure instructions

Rule

Compound prompts should be structured in clear sections.


Deconstruction Method Standard

Use deconstruction when a task is too complex for one prompt.

Deconstruction Steps

  1. Identify the full task.
  2. Break it into smaller thinking steps.
  3. Decide which steps need separate prompts.
  4. Decide which outputs feed the next step.
  5. Add validation or human review where needed.
  6. Test each step separately.
  7. Test the full chain.
  8. Record failure points.

Deconstruction Rule

If one prompt produces generic or inconsistent output, break the task into stages.


Stacking Method Standard

Use stacking when multiple related instructions or prompt sections can safely live inside one prompt.

Stacking can reduce:

  • duplicated context
  • repeated API cost
  • latency
  • post-processing complexity
  • unnecessary prompt calls

But stacking can reduce quality if the prompt becomes overloaded.

Stacking Questions

Ask:

  • Can one prompt reliably handle this?
  • Does stacking reduce cost?
  • Does stacking reduce latency?
  • Does stacking reduce output quality?
  • Does stacking make the prompt harder to debug?
  • Does each task still get enough attention?
  • Would deconstruction produce better quality?

Rule

Stack only when output quality remains stable.


Tell And Show Method Standard

Use tell and show when output style, structure, or classification accuracy matters.

Tell And Show Template

Instruction:
Describe the rule.

Good Example:
Show the desired output.

Why It Works:
Explain the pattern.

Bad Example:
Show what to avoid if useful.

Task:
Ask the model to apply the pattern.

Rule

When the model keeps missing the target, add better examples before adding more vague instructions.


Import Method Standard

Use the import method when generic model knowledge is not enough.

Import Sources

Import from:

  • MWMS Canon pages
  • MCR pages
  • course notes
  • expert interviews
  • SOPs
  • client documents
  • winning ads
  • winning content
  • proven sales emails
  • industry rules
  • platform documentation
  • research reports
  • audience language
  • customer reviews
  • competitor examples

Import Rule

The better the imported knowledge, the better the prompt can perform.


Planning Method Standard

The planning method asks the model to think through the task before creating the final output.

This is useful for:

  • analysis
  • classification
  • content planning
  • report generation
  • diagnostic scoring
  • research synthesis
  • opportunity discovery

Planning Output Caution

Planning can improve quality but increase cost and output length.

For production systems, MWMS may need to:

  • keep planning internal
  • parse only the final answer
  • use shorter planning steps
  • use a cheaper model for planning
  • suppress unnecessary reasoning in final output

Rule

Use planning when quality matters more than minimal token cost.


Anti Keyword Staining Standard

Some common words can bias the model toward weak generic outputs.

For example, words such as:

  • tweet
  • post
  • headline
  • blog
  • article
  • caption
  • sales email
  • motivational
  • viral

may cause the model to imitate low-quality public training data.

Anti Keyword Staining Method

Instead of relying on generic labels, describe the actual output.

Examples:

Instead of:

  • “Write a tweet.”

Use:

  • “Write a concise short-form piece of copy designed to create curiosity and one clear takeaway.”

Instead of:

  • “Write a headline.”

Use:

  • “Write a single-sentence attention hook that names the pain and implies a specific benefit.”

Instead of:

  • “Write a blog post.”

Use:

  • “Write a structured answer-first guide for a problem-aware buyer.”

Rule

Use task-specific language when generic content labels produce generic outputs.


Prompt Chain Standard

Every prompt chain should define:

Chain Name:
Workflow:
Brain / Employee:
Step 1 Prompt:
Step 1 Output:
Step 2 Prompt:
Step 2 Output:
Step 3 Prompt:
Step 3 Output:
Human Review Point:
Validation Rules:
Failure Handling:
Final Output:

Prompt Chain Rule

Each step in a prompt chain should have a clear reason to exist.


Model Testing Standard

Before deploying a prompt, test multiple models where appropriate.

Model Testing Template

Prompt Name:
Task:
Test Input:
Model Tested:
Output Quality Score:
Formatting Score:
Consistency Score:
Cost:
Latency:
Failure Notes:
Decision: Use / Reject / Retest

Rule

The best model is the model that performs best for the specific task, not the newest model by default.


Prompt Iteration Log

Every important prompt should keep an iteration log.

Iteration Log Template

Prompt Name:
Version:
Date:
Change Made:
Reason For Change:
Test Inputs Used:
Result:
Failure Fixed:
New Failure Created:
Decision: Keep / Revert / Retest
Owner:

Rule

Prompt improvements should be recorded so MWMS does not lose learning.


Prompt Quality Scorecard

Score important prompts out of 100.

Score Categories

Purpose Clarity: 10
Input Clarity: 10
Context Quality: 10
Guideline Strength: 10
Example Quality: 10
Output Format Reliability: 10
Model Fit: 10
Testing Coverage: 10
Cost / Latency Fit: 10
Observability / Versioning: 10

Interpretation

85–100: Production ready
70–84: Good; monitor and improve
55–69: Usable with human review
40–54: Needs rewrite before automation
Below 40: Do not deploy

Rule

A prompt should pass the scorecard before becoming part of an important automation.


Automation Prompt Readiness Checklist

Before a prompt is used in automation, confirm:

Purpose

  • task is clear
  • workflow is clear
  • owner is clear
  • Brain / Employee is clear

Input

  • variables are defined
  • required inputs are clear
  • missing input handling exists
  • source boundaries are clear

Instructions

  • guidelines are specific
  • constraints are clear
  • examples are included where needed
  • output format is defined

Testing

  • multiple test inputs used
  • edge cases tested
  • output quality reviewed
  • model choice tested
  • cost and latency checked

Governance

  • prompt version recorded
  • failure modes documented
  • human review point defined if needed
  • observability fields defined
  • change log started

Rule

No important automation prompt should go live without readiness review.


Content Prompt Flow Standard

Content prompts should usually be chained, not written in one pass.

Example Content Flow

  1. Source content collection
  2. Proven content analysis
  3. Audience and psychographic extraction
  4. Hook generation
  5. Hook selection
  6. Outline generation
  7. Body section generation
  8. Persuasion layer
  9. CTA generation
  10. Editing and compliance review
  11. Final output

Rule

For content systems, first validate the process manually, then convert the proven process into prompts.


AIBS Diagnostic Prompt Flow Standard

AIBS prompts should support diagnostic-first thinking.

Example AIBS Diagnostic Flow

  1. Parse client intake.
  2. Identify business model.
  3. Identify stated problem.
  4. Identify possible deeper problems.
  5. Map leakage categories.
  6. Review data readiness.
  7. Score AI readiness.
  8. Score opportunities.
  9. Recommend first project.
  10. Draft diagnostic report.
  11. Draft proposal summary.

Rule

AIBS prompts should diagnose before recommending AI implementation.


Research Prompt Flow Standard

Research prompts should separate extraction, interpretation, and recommendation.

Example Research Flow

  1. Extract facts.
  2. Identify source type.
  3. Identify claims.
  4. Score credibility.
  5. Summarize key findings.
  6. Identify business relevance.
  7. Route to Brain.
  8. Recommend action.

Rule

Do not mix raw extraction and strategic recommendation unless the prompt is tested for both.


Compliance Prompt Flow Standard

Compliance prompts should be conservative.

Compliance Prompt Requirements

Compliance prompts should:

  • identify claims
  • classify risk
  • identify missing proof
  • flag sensitive categories
  • avoid legal certainty
  • recommend human review where needed
  • preserve source evidence
  • output clear risk labels

Rule

Compliance prompts should flag risk, not pretend to replace professional legal review.


Prompt Failure Modes

Common prompt failures include:

  • generic output
  • format drift
  • hallucinated facts
  • missing sections
  • inconsistent scoring
  • wrong tone
  • overlong output
  • too-short output
  • ignoring constraints
  • mixing examples into output
  • poor parsing
  • invalid JSON
  • weak classification
  • too much creativity
  • not enough specificity
  • output not suitable for next workflow step

Rule

Failure modes should be recorded and used to improve prompt versions.


Prompt Debugging Checklist

When a prompt fails, ask:

  • Was the purpose clear?
  • Was the input clear?
  • Was the context enough?
  • Were examples provided?
  • Was the output format too vague?
  • Was the task too complex for one prompt?
  • Should the task be deconstructed?
  • Was the prompt overloaded?
  • Should parts be stacked or separated?
  • Was the model wrong for the task?
  • Was the temperature too high?
  • Was there conflicting instruction?
  • Did generic keywords bias the output?
  • Was specialist knowledge missing?
  • Was the prompt tested on enough examples?

Rule

When the model fails, first inspect the prompt architecture before blaming the model.


Prompt Governance Roles

Prompt Owner

Responsible for prompt purpose, quality, and updates.

Prompt User

Uses the prompt inside a workflow.

Prompt Reviewer

Tests output quality and failure modes.

Compliance Reviewer

Reviews sensitive prompts for risk.

Data Reviewer

Checks inputs and outputs where structured data is involved.

HeadOffice

Approves important prompt standards and prevents prompt chaos.

Rule

Important prompts need ownership.


Application To Prompting Framework

Prompting Framework owns this standard.

Prompting Framework should use it to:

  • structure reusable prompts
  • define prompt assets
  • prevent prompt liabilities
  • standardize prompt chains
  • guide model testing
  • improve output reliability

Prompting Framework Rule

Prompting Framework must make prompt quality repeatable.


Application To AI Employee Canon

AI Employee Canon should use this framework to define prompt requirements for every AI Employee.

Each AI Employee should have:

  • role prompt
  • task prompts
  • output templates
  • failure handling
  • evaluation criteria
  • prompt version history
  • model choice notes

AI Employee Rule

An AI Employee is only as reliable as its prompt architecture.


Application To Automation Brain

Automation Brain should use this framework before deploying prompt-based automations.

Automation Brain should check:

  • prompt type
  • chain structure
  • input variables
  • output parsing
  • model cost
  • latency
  • failure handling
  • human review points

Automation Brain Rule

Automation Brain must not automate unstable prompts.


Application To AIBS Brain

AIBS Brain should use this framework for client diagnostics, reports, and AIOS workflows.

AIBS prompts must:

  • diagnose before recommending
  • use client context safely
  • respect privacy boundaries
  • output structured reports
  • support opportunity scoring
  • avoid unsupported claims

AIBS Rule

AIBS prompt systems must be reliable enough for client-facing work.


Application To Content Brain

Content Brain should use prompt chains for high-quality content production.

Content prompts should use:

  • proven content analysis
  • imported specialist knowledge
  • examples
  • deconstruction
  • hook generation
  • outline generation
  • persuasive layer
  • compliance review

Content Brain Rule

Content Brain should not rely on one-pass generic content prompts for important assets.


Application To Ads Brain

Ads Brain should use prompt architecture for ad creative generation and analysis.

Ads prompts should define:

  • platform
  • buyer
  • awareness level
  • hook type
  • compliance constraints
  • output format
  • variation count
  • testing hypothesis

Ads Brain Rule

Ads prompts should create testable creative assets, not random ad copy.


Application To Research Brain

Research Brain should use structured prompts for extraction, synthesis, and recommendation.

Research prompts should:

  • separate fact extraction from interpretation
  • preserve source context
  • classify business relevance
  • route insights to the correct Brain
  • identify uncertainty

Research Brain Rule

Research prompts must protect evidence quality.


Application To Data Brain

Data Brain should use this framework for structured extraction, classification, and metadata creation.

Data prompts should:

  • output parseable structure
  • define field names
  • handle missing data
  • avoid invented fields
  • record confidence where needed

Data Brain Rule

Data prompts should produce structured, auditable outputs.


Application To Experimentation Brain

Experimentation Brain should test prompt performance like any other system experiment.

Experimentation should test:

  • model choice
  • prompt version
  • example count
  • output format
  • chain design
  • cost
  • latency
  • accuracy
  • consistency

Experimentation Brain Rule

Prompt changes should be treated as experiments when they affect important outputs.


Application To Compliance And Risk Brain

Compliance and Risk Brain should review prompts that affect sensitive outputs.

Review prompts for:

  • claims
  • privacy
  • regulated topics
  • financial assumptions
  • health claims
  • legal claims
  • affiliate claims
  • client data use
  • AI processing risk
  • hallucination risk

Compliance Rule

Sensitive prompts need compliance-aware constraints and review.


Application To HeadOffice Brain

HeadOffice governs prompt quality across MWMS.

HeadOffice should ask:

  • is this prompt reusable?
  • is it tested?
  • is it versioned?
  • is it observable?
  • is it too expensive?
  • is it reliable enough?
  • does it create risk?
  • does it support MWMS strategy?
  • does it protect M from unnecessary rework?

HeadOffice Rule

HeadOffice must prevent MWMS from building on unstable prompt foundations.


Deferred Update And Parking Lot Section

This page creates later update needs.

Later Update 1: MWMS Prompting Framework

Add:

  • prompt assets versus prompt liabilities
  • one-shot prompt standard for automations
  • prompt deconstruction and chaining
  • tell and show examples
  • import method
  • anti-keyword staining
  • model selection testing
  • prompt iteration logs

Later Update 2: MWMS AI Employee Evaluation Scorecard Standard

Add:

  • prompt reliability score
  • output consistency score
  • model fit score
  • prompt iteration count
  • example coverage score
  • formatting reliability score
  • failure-case testing
  • prompt chain quality

Later Update 3: MWMS AI Observability Metadata Standard

Add:

  • prompt version
  • prompt chain step
  • model used
  • input token estimate
  • output token estimate
  • cost
  • latency
  • retry count
  • validation status
  • human review flag
  • revision history

Later Update 4: MWMS AI Usage And Cost Visibility Standard

Add:

  • prompt chain cost tracking
  • high-volume prompt review
  • context duplication warning
  • stacking versus deconstruction cost comparison
  • model cost comparison
  • cost per successful output

Later Update 5: MWMS Buyer First Authority Content And Channel Growth Framework

Add:

  • proven content deconstruction
  • scrape/analyze/repurpose workflow
  • example-led content prompting
  • caption/video/script prompt flows
  • validate manually before automation
  • output quality review against real performance data

Later Update 6: MWMS AIBS Business Diagnostic And Opportunity Discovery Framework

Add:

  • diagnostic prompt chain
  • client intake parsing prompt
  • opportunity scoring prompt
  • AI readiness prompt
  • report generation prompt
  • privacy-aware prompt constraints

Future Employee Ideas

  • Prompt Architecture Auditor
  • Prompt Chain Designer
  • Prompt Quality Evaluator
  • Prompt Cost And Latency Analyst
  • Specialist Knowledge Injector
  • Prompt Observability Steward
  • AI Employee Prompt Reviewer
  • Prompt Failure Mode Analyst
  • Model Selection Tester
  • Prompt Asset Librarian

Drift Protection

This framework protects MWMS from:

  • treating prompts as casual chat messages
  • building automations on vague prompts
  • relying on one good output as proof
  • ignoring prompt failures
  • not versioning prompts
  • not testing models
  • not tracking cost
  • not tracking latency
  • creating prompt chains with no structure
  • putting too much into one prompt
  • splitting prompts unnecessarily
  • using generic training data when specialist knowledge is needed
  • forgetting examples
  • creating outputs that cannot be parsed
  • making AI Employees unreliable
  • forcing M to fix prompt-driven mistakes manually
  • deploying client-facing prompts before testing
  • losing prompt knowledge inside chat history

Drift Signals

Watch for:

  • “Just ask ChatGPT to do it.”
  • “The prompt worked once, so it is ready.”
  • “We can fix it manually later.”
  • “No need to version it.”
  • “The model should know what I mean.”
  • “The prompt is long, so it must be good.”
  • “The prompt is short, so it must be efficient.”
  • “We do not need examples.”
  • “We do not know which model is being used.”
  • “We do not know what this prompt costs.”
  • “The automation output changes every time.”
  • “The output format keeps breaking.”
  • “The prompt lives only in a chat thread.”
  • “The AI Employee is unreliable but nobody knows why.”

Rule

If the prompt cannot be tested, versioned, and explained, it is not ready for serious automation.


Strategic Summary

This framework captures the strongest useful lessons from the Master Prompting w Devin block.

The key lesson is:

Prompt engineering for automations is not about clever wording. It is about designing reliable prompt systems.

MWMS should treat prompts as assets that can be:

  • built
  • tested
  • improved
  • chained
  • versioned
  • scored
  • logged
  • monitored
  • reused
  • governed

The block showed that powerful AI automation depends on:

  • one-shot prompts for repeatability
  • atomic prompts for small tasks
  • compound prompts for complex tasks
  • deconstruction for quality
  • stacking for efficiency
  • tell and show examples
  • imported specialist knowledge
  • planning before output
  • anti-keyword staining where generic terms hurt quality
  • chaining outputs across steps
  • model testing
  • daily practice and iteration

For MWMS, this strengthens every Brain.

It improves AI Employees.

It improves course absorption.

It improves content production.

It improves AIBS diagnostics.

It improves automation reliability.

It improves observability and cost control.

The most important system-level standard is:

Every important AI automation must have prompt architecture, not just a prompt.


Final Standard

The MWMS final standard is:

Every important MWMS prompt used in an AI Employee, automation, report, content system, diagnostic system, research system, or client-facing workflow must be designed as a reusable prompt asset with clear purpose, structured inputs, imported context where needed, specific guidelines, examples, defined output format, model testing, iteration history, cost and latency awareness, failure handling, and observability metadata.

A valid MWMS prompt asset must define:

  • prompt name
  • purpose
  • Brain / Employee
  • workflow
  • prompt type
  • input variables
  • context
  • guidelines
  • examples
  • output format
  • model
  • test inputs
  • quality criteria
  • cost / latency notes
  • failure modes
  • version
  • owner
  • last reviewed date

That is the MWMS Prompt Architecture And Automation Output Reliability standard.


Change Log

Version: v1.0

Date: 2026-06-08
Author: HeadOffice

Change:
Created the MWMS Prompt Architecture And Automation Output Reliability Framework from the AI Automations by Jack Master Prompting And Prompt System Design Block.

Captured the strongest lessons from:

  • Master Prompting w Devin Part 1
  • Master Prompting w Devin Part 2

Defined the MWMS Prompt Architecture And Automation Output Reliability Model with twelve layers:

  1. Prompt Purpose Layer
  2. Prompt Type Layer
  3. Input And Variable Layer
  4. Context And Knowledge Layer
  5. Guideline And Constraint Layer
  6. Example And Tell And Show Layer
  7. Deconstruction And Chain Layer
  8. Output Format Layer
  9. Model Selection Layer
  10. Testing And Iteration Layer
  11. Cost Latency And Scale Layer
  12. Observability And Governance Layer

Added key operating sections:

  • Prompt Asset Standard
  • Prompt Liability Warning
  • Atomic Prompt Standard
  • Compound Prompt Standard
  • Deconstruction Method Standard
  • Stacking Method Standard
  • Tell And Show Method Standard
  • Import Method Standard
  • Planning Method Standard
  • Anti Keyword Staining Standard
  • Prompt Chain Standard
  • Model Testing Standard
  • Prompt Iteration Log
  • Prompt Quality Scorecard
  • Automation Prompt Readiness Checklist
  • Content Prompt Flow Standard
  • AIBS Diagnostic Prompt Flow Standard
  • Research Prompt Flow Standard
  • Compliance Prompt Flow Standard
  • Prompt Failure Modes
  • Prompt Debugging Checklist
  • Prompt Governance Roles
  • Deferred Update And Parking Lot Section

Mapped the framework across:

  • Prompting Framework
  • AI Employee Canon
  • Automation Brain
  • AIBS Brain
  • Content Brain
  • Ads Brain
  • Research Brain
  • Data Brain
  • Experimentation Brain
  • Compliance Brain
  • Risk Brain
  • HeadOffice Brain

Purpose of creation:
To establish a formal MWMS standard for designing, testing, chaining, versioning, and governing prompt systems so MWMS AI Employees and automations produce reliable, consistent, cost-aware, observable, and high-quality outputs.

END — MWMS PROMPT ARCHITECTURE AND AUTOMATION OUTPUT RELIABILITY FRAMEWORK v1.0