跳转至

CloudAudioAI Simplified Agent Architecture - Overall Design (v1.1)

Executive Summary

This design document outlines a pragmatic, minimal transformation of CloudAudioAI's prompt system to improve maintainability and accuracy. Version 1.1 simplifies the approach further by keeping all prompt files under site paths initially, with a clear migration path to franchise-level sharing when scale demands it. The core improvements remain: simple prompt modularization (6 text files), dynamic staff registry, and Bedrock tool-based validation without requiring schema.json files.

Design Philosophy

"Make the minimum changes that unlock maximum future flexibility"

  • Reuse existing S3 structure and Lambda functions
  • Add only essential new components
  • Keep backward compatibility
  • Defer complexity until proven necessary

Current State Analysis

What We Keep (No Changes)

  • S3 Configuration Pipeline: Current prompt/schema/mapping structure works well
  • Multi-Tenant Support: Existing franchise/site routing is sufficient
  • Processing Pipeline: TranscribeProcessor → TranscriptionResultProcessor flow unchanged
  • DynamoDB Tables: call-events and call-analysis remain as-is

What Needs Improvement (Targeted Changes)

  • Monolithic Prompt: Split into 4-5 files for easier maintenance
  • Static Staff Lists: Move to per-tenant JSON file
  • No Structured Validation: Add Bedrock tool-calling for guaranteed schema compliance
  • No Customer Context: Add basic memory for returning customers (future enhancement)

Simplified v1.0 Architecture

What We're Building (3 Changes Only)

1. Simple Prompt Splitting (All Under Site Path)

Keep all files under site path for simplicity (migrate to franchise level later):

s3://{bucket}/{franchise}/{siteId}/prompts/
├── prompt-v5.txt                    # Keep as fallback
└── v6/                              # New split version
    ├── 01-franchise-core.txt        # Role + task + processing steps
    ├── 02-knowledge.txt             # Sales + cancellation + freeze
    ├── 03-coaching.txt              # All coaching scenarios
    ├── 04-staff-list.txt            # Site-specific staff names
    ├── 05-studio-output-instructions.txt  # Output requirements
    └── 06-custom-requirements.txt  # Local business context

Note: No schema.json needed - Pydantic models define the schema

Simple Loader Logic (20 lines):

# In TranscriptionResultProcessor, minimal change:
def load_prompt(franchise, site_id, version="v6"):
    if version == "v6":
        # Just concatenate the 6 files in order - all from same site path
        base_path = f"{franchise}/{site_id}/prompts/v6"
        files = ["01-franchise-core", "02-knowledge", "03-coaching",
                "04-staff-list", "05-studio-output-instructions",
                "06-custom-requirements"]
        prompt = ""
        for filename in files:
            prompt += s3.get_object(f"{base_path}/{filename}.txt")
        return prompt
    else:
        # Fallback to monolithic prompt
        return s3.get_object(f"{franchise}/{site_id}/prompts/prompt-{version}.txt")

2. Dynamic Staff Registry (Integrated in Prompt File)

Option A (Recommended for v1.0): Include staff directly in 04-staff-list.txt

s3://{bucket}/{franchise}/{siteId}/prompts/v6/04-staff-list.txt

Content:
KNOWN STAFF MEMBERS:
- Drew (Sales Manager)
- Nadia (Head Coach)
- Jess (Front Desk Lead)
- Holly (Assistant Manager)
- Ryanne (Sales Associate)

STAFF NAME CLARIFICATIONS:
- "Ryan" and "Ryanne" are different people
- "Jessica" is the same person as "Jess"

Option B (Future v1.1): Separate JSON for programmatic updates

s3://{bucket}/{franchise}/{siteId}/staff/staff-registry.json
{
  "known_staff": ["Drew", "Nadia", "Jess", "Holly", "Ryanne"],
  "aliases": {"Ryan": "Ryanne", "Jessica": "Jess"},
  "updated": "2025-01-15"
}

For v1.0, use Option A (simpler, no JSON parsing needed)

3. Bedrock Tool-Based Validation (Single Tool for v1.0)

Use ONE Bedrock tool with existing Pydantic schema:

Existing models.py - No changes needed, already complete Existing bedrock_call.py - Works as-is with one tool

Important: v1.0 uses single tool with concatenated prompts (single pass) Future v2.0 will use multiple tools with different schemas per industry

Integration in Lambda:

# Replace current Bedrock text invocation with tool call
from models import CallAnalysis
from bedrock_call import ask_claude_for_analysis

# Old way (remove):
# response = bedrock.invoke_model(prompt=prompt)

# New way (add):
result = ask_claude_for_analysis(transcript, context=memory_context)
# Result is already validated by Pydantic!

Simple Retry Logic:

try:
    result = ask_claude_for_analysis(transcript)
except ValidationError as e:
    # One retry with error feedback
    result = ask_claude_for_analysis(transcript, validation_errors=str(e))
    if still_fails:
        # Log to DLQ for manual review
        send_to_dlq(transcript, errors=e)

Implementation Plan

Data Flow (Minimal Changes)

Current Flow (Unchanged):
Webhook → SQS → TranscribeProcessor → S3 → TranscriptionResultProcessor → DynamoDB

What Changes in TranscriptionResultProcessor:
1. Load split prompt files instead of monolithic
2. Load staff registry JSON
3. Call Bedrock with tool instead of text
4. (Future) Add memory lookup/save

Simple Implementation Steps

Week 1: Prompt Splitting & Testing

  1. Split prompt-v5.txt into 6 files (including staff list as text)
  2. Upload all 6 to S3 under {franchise}/{siteId}/prompts/v6/
  3. Test concatenation produces valid prompt
  4. Deploy with feature flag (PROMPT_VERSION=v6)

Week 2: Tool Integration

  1. Copy existing models.py and bedrock_call.py to Lambda
  2. Replace Bedrock text call with tool call
  3. Test on recorded transcripts
  4. Verify Pydantic validation works (no schema.json needed)

Week 3: Production Rollout

  1. Enable for orange-theory/5736520
  2. Monitor validation success rate
  3. Enable for orange-theory/Totowa
  4. Roll out to future franchises as onboarded

Benefits of Simplified Approach

Immediate Wins: - 30% faster prompt updates - Edit specific sections without searching 800 lines - Simple implementation - Only ~20 lines of code changes - 100% schema compliance - Tool validation prevents malformed JSON - No schema.json complexity - Pydantic models ARE the schema - Easier debugging - Validation errors clearly show what failed

Future Ready: - All files under site path now, easy to migrate to franchise level later - Tool-based approach enables future multi-tool scenarios - Text-based staff list can evolve to JSON/database later - Foundation for memory system (when needed)

Success Metrics (v1.0)

Must-Have Metrics

  • Tool Validation Success: >95% first-attempt (currently ~85% with text)
  • Processing Time: No regression from current 25s average
  • Staff Identification: >90% accuracy (currently ~75%)
  • Zero Downtime: Seamless rollout with fallback

Nice-to-Have (Future)

  • Memory context hit rate (when implemented)
  • Module reuse percentage (when more franchises added)
  • Prompt update frequency (measure maintenance improvement)

Risk Mitigation

Minimal Risks for v1.0

  1. Tool Compatibility: Already tested with DeepSeek R1 and Claude
  2. Backward Compatibility: Keep prompt-v5.txt as instant fallback
  3. Performance: Concatenating 6 files adds <100ms overhead
  4. Simplicity: All files in one path, no complex resolution logic
  5. No JSON parsing: Staff list is plain text in prompt file

Future Enhancement Path

v1.0 → v1.1 (Memory Addition)

When ready for memory (not blocking v1.0):

# Simple DynamoDB table: call-context-memory
PK: customer_phone_number
SK: telephonySessionId
Attributes: {last_outcome, follow_up_flag, customer_type}

# Two functions only:
get_context(phone)  returns last 3 interactions
save_context(phone, session_id, outcome)  stores new interaction

v1.0 → v1.5 (Batch Updates for Scaling - Better Approach)

Keep all files at site level for isolation, use Lambda for batch updates:

# All files stay at site level for testing isolation
s3://bucket/orange-theory/{siteId}/prompts/v6/
├── 01-franchise-core.txt    # Updated via batch Lambda
├── 02-knowledge.txt         # Updated via batch Lambda
├── 03-coaching.txt          # Updated via batch Lambda
├── 04-staff-list.txt        # Site-specific manual
├── 05-studio-output.txt     # Site-specific manual
└── 06-custom-requirements.txt # Site-specific manual

# Batch update tool for franchise-wide changes
def update_all_sites(franchise, updates):
    # Test on one site first
    test_on_pilot_site()
    validate_results()

    # Gradual rollout with monitoring
    for batch in site_batches:
        update_batch()
        monitor_for_errors()
        if errors:
            rollback()

# Benefits:
# - Test changes safely on one site
# - Damage control if something breaks
# - No complex dual-path loading logic
# - Simple rollback to previous version

v1.5 → v2.0 (Multi-Industry Support)

Leveraging existing client-configurations DynamoDB table:

# Existing table already has everything needed!
client-configurations:
  - client_id: "orange-theory-5736520"   "wellsfargo-nyc-001"
  - franchise: "orange-theory"           "wellsfargo"
  - config_versions: {prompt: "v6"}      {prompt: "banking-v1"}
  - industry: "fitness"                  "banking" (NEW FIELD)

# Industry-specific Pydantic models (not prompts)
models/fitness/CallAnalysis.py    Gym-specific fields
models/banking/CallAnalysis.py    Banking-specific fields
models/healthcare/CallAnalysis.py  Medical-specific fields

No new infrastructure needed - just configuration!

v2.0 → v3.0 (Advanced Features - Multiple Tools Architecture)

PREFERRED FUTURE DESIGN: Multiple Bedrock tools with different schemas

# Each industry/call type gets its own tool with unique schema
tools = [
    {"toolSpec": fitness_sales_tool},      # Schema: intro_type, membership_offered, etc.
    {"toolSpec": fitness_retention_tool},  # Schema: cancellation_reason, save_successful, etc.
    {"toolSpec": banking_service_tool}     # Schema: account_number, transaction_type, etc.
]
# AI selects ONE tool based on transcript content
# Returns only that tool's specific fields

Why Multiple Tools (Not Multi-Turn): - Each tool has completely different fields - AI intelligently selects appropriate tool - Clean separation between industries - Single pass remains deterministic

Architectural Principle: Deterministic Core Pipeline

Core Design Decision

The transcription → AI analysis pipeline remains a standalone, deterministic workflow that: - Processes every call identically - Has no external dependencies beyond AWS services - Never changes its fundamental flow - Provides consistent, reliable results to clients

Future Agent Architecture: Separate Orchestrator

When implementing the 7 AI agent foundations (Memory, Tools, Control, Feedback, etc.), they will live in a separate AgentOrchestrator Lambda that:

Core Pipeline (Unchanged):
Webhook → TranscribeProcessor → TranscriptionResultProcessor → DynamoDB
         [Deterministic, reliable, simple]

Agent Layer (New, Optional):
User Query → API Gateway → AgentOrchestrator
                    [Memory | Tools | Control | Feedback]
                    Can read DynamoDB results from core
                    Can trigger additional analyses
                    Can manage conversation state

This separation means: - Core pipeline never risks breaking from new features - Agent capabilities can evolve independently - Clients get stable service while we innovate - Easy to disable agent features without affecting core

Why This Design is Future-Proof

  1. S3 Structure: Adding more files/folders doesn't break existing
  2. Tool Interface: Single tool today, orchestrator can add more tomorrow
  3. Staff Registry: JSON today, DynamoDB tomorrow - same interface
  4. Clean Separation: Core pipeline stable, agent layer flexible

Decision Log

What We Explicitly Defer to Future Versions

  • Franchise-level sharing: Keep all files under site until 10+ sites
  • Module manifests: Over-engineering for 6 files
  • Composition engine: Simple concatenation sufficient
  • Multiple tools: One tool handles all current needs
  • Staff JSON registry: Text list in prompt file is simpler
  • Memory aggregation: Start with raw storage, analyze patterns first
  • A/B testing: Manual rollout sufficient for 2 sites
  • New Lambda functions: Modify existing ones minimally
  • schema.json files: Pydantic models provide the schema

Why This Approach Works

  • 80/20 Rule: 80% of benefits with 20% of complexity
  • Proven Components: Reusing working code reduces risk
  • Incremental: Each change can be tested independently
  • Reversible: Can rollback any component without affecting others