跳转至

CloudAudioAI Architecture Evolution Diagrams

%% This document visualizes the evolution from current monolithic system to v1.0 with Pydantic validation, then to future AgentOrchestrator architecture %%

1. Current System Architecture (Production Today)

graph TB
    subgraph "Telephony Providers"
        RC[RingCentral]
        TW[Twilio]
    end

    subgraph "AWS Infrastructure - Current State"
        subgraph "Entry & Queue"
            API[API Gateway<br/>**Webhook Handler**]
            SQS[SQS Queue<br/>**5min delay**]
            DLQ[Dead Letter Queue]
        end

        subgraph "Processing Pipeline"
            TP[TranscribeProcessor<br/>Lambda]
            TRP[TranscriptionResultProcessor<br/>Lambda]
        end

        subgraph "Storage"
            S3[(S3 Bucket<br/>Recordings/Transcripts/Analysis)]
            DB[(DynamoDB<br/>call-events & call-analysis)]
        end

        subgraph "Configuration Files"
            CF[**MONOLITHIC**<br/>prompt-v5.txt<br/>800+ lines<br/>Static staff list]
        end

        subgraph "AI Services"
            TR[AWS Transcribe]
            BR[AWS Bedrock<br/>**Text Generation Mode**<br/>❌ No validation]
        end

        subgraph "Frontend"
            BFF[BFF API Lambda]
            VUE[Vue3 Dashboard]
        end
    end

    RC -->|Webhook| API
    TW -->|Webhook| API
    API -->|Queue| SQS
    SQS -->|Process| TP
    SQS -.->|Failed| DLQ

    TP -->|Download audio| S3
    TP -->|Transcribe| TR
    TR -->|Store transcript| S3

    S3 -->|S3 Event| TRP
    CF -->|Load prompt| TRP
    TRP -->|Generate analysis| BR
    BR -->|**Unstructured JSON**| TRP
    TRP -->|Store results| S3
    TRP -->|Update status| DB

    DB -->|Query| BFF
    BFF -->|API| VUE

    style BR fill:#ff9999
    style CF fill:#ffcc99

Current System Characteristics

  • Monolithic Prompt: Single 800+ line prompt file
  • Static Staff List: Hardcoded in prompt
  • Unstructured Output: Bedrock returns JSON as text, prone to malformation
  • No Validation: ~15% of responses have missing/invalid fields
  • Manual Updates: Requires code deployment for staff changes

2. New v1.0 Architecture (With Pydantic Validation)

graph TB
    subgraph "Telephony Providers"
        RC[RingCentral]
        TW[Twilio]
    end

    subgraph "AWS Infrastructure - v1.0 Changes"
        subgraph "Entry & Queue - **Unchanged**"
            API[API Gateway<br/>Webhook Handler]
            SQS[SQS Queue<br/>5min delay]
            DLQ[Dead Letter Queue]
        end

        subgraph "Processing Pipeline - **Enhanced**"
            TP[TranscribeProcessor<br/>Lambda - **Unchanged**]
            TRP[TranscriptionResultProcessor<br/>**+ Pydantic Models**<br/>**+ Tool Calling**]
        end

        subgraph "Storage - **Unchanged**"
            S3[(S3 Bucket)]
            DB[(DynamoDB)]
        end

        subgraph "Configuration - **Modularized (All Site-Level)**"
            subgraph "{franchise}/{siteId}/prompts/v6/"
                P1[01-franchise-core.txt<br/>Role & Task]
                P2[02-knowledge.txt<br/>Sales/Cancel]
                P3[03-coaching.txt<br/>Scenarios]
                P4[04-staff-list.txt<br/>**Staff Names (Text)**]
                P5[05-studio-output.txt<br/>**Taxonomy & Format**]
                P6[06-custom-requirements.txt<br/>Local Context]
            end
        end

        subgraph "AI Services - **Tool Mode**"
            TR[AWS Transcribe]
            BR[AWS Bedrock<br/>**Tool Calling Mode**<br/>✅ With Pydantic Schema]
        end

        subgraph "Validation Layer - **NEW**"
            PY[**Pydantic CallAnalysis Model**<br/>Enforced Schema<br/>Type Validation]
        end

        subgraph "Frontend - **Unchanged**"
            BFF[BFF API Lambda]
            VUE[Vue3 Dashboard]
        end
    end

    RC -->|Webhook| API
    TW -->|Webhook| API
    API -->|Queue| SQS
    SQS -->|Process| TP
    SQS -.->|Failed + Retry| DLQ

    TP -->|Download audio| S3
    TP -->|Transcribe| TR
    TR -->|Store transcript| S3

    S3 -->|S3 Event| TRP
    P1 & P2 & P3 & P4 & P5 & P6 -->|**Concatenate All 6**| TRP
    P4 -->|**Staff in Text File**| TRP
    TRP -->|**Tool Request**| BR
    BR -->|**Structured Output**| PY
    PY -->|**Validated**| TRP
    TRP -.->|**Retry on Error**| BR
    TRP -->|Store results| S3
    TRP -->|Update status| DB

    DB -->|Query| BFF
    BFF -->|API| VUE

    style BR fill:#99ff99
    style PY fill:#99ccff
    style SR fill:#ffff99
    style P4 fill:#ffffcc

v1.0 Improvements

  • Split Prompts: 6 modular text files for easier maintenance
  • Staff in Prompt File: Plain text list in 04-staff-list.txt, no JSON parsing
  • Tool-Based Validation: Bedrock calls Pydantic tool with schema
  • >95% Valid Output: Pydantic validation with self-correction retry
  • No schema.json: Pydantic models define all validation rules

3. Future Architecture with AgentOrchestrator (v2.0+)

graph TB
    subgraph "Core Pipeline - Deterministic & Unchanged"
        subgraph "Telephony"
            RC[RingCentral]
            TW[Twilio]
        end

        API[API Gateway]
        SQS[SQS Queue]
        TP[TranscribeProcessor]
        TRP[TranscriptionResultProcessor<br/>with Tool Validation]
        S3[(S3 Bucket)]
        DB[(DynamoDB<br/>call-events<br/>call-analysis)]
    end

    subgraph "New Agent Layer - Flexible & Innovative"
        subgraph "Orchestrator Components"
            AO[AgentOrchestrator Lambda<br/>Command Router]

            subgraph "Seven Foundations"
                MEM[Memory System<br/>Customer Context]
                TOOLS[Tool Catalog<br/>Multiple Schemas]
                CTRL[Control Flow<br/>Decision Engine]
                FDBK[Feedback Loop<br/>Human Input]
                INTEL[Intelligence<br/>Prompt Library]
                VAL[Validation<br/>Multi-Schema]
                REC[Recovery<br/>Error Handling]
            end
        end

        subgraph "New Tools"
            FUT[Follow-Up Checklist Tool]
            DIG[Staff Digest Tool]
            PERF[Performance Report Tool]
            CHAT[Chat Interface Tool]
            AGG[Analytics Aggregation Tool]
        end

        subgraph "New Storage"
            CTX[(Context Memory DB<br/>Customer History)]
            TASK[(Task Queue<br/>Follow-ups)]
            SR2[(Staff Registry DB)]
            CC2[(Client Config DB)]
        end

        UAPI[User API Gateway<br/>Chat/Query Interface]
        USR[Users/Managers]
    end

    RC & TW --> API
    API --> SQS --> TP --> TRP --> DB

    USR -->|Query| UAPI
    UAPI --> AO

    AO <--> MEM
    AO <--> TOOLS
    AO <--> CTRL
    AO <--> FDBK

    AO -->|Execute| FUT & DIG & PERF & CHAT & AGG
    FUT & DIG & PERF -->|Return JSON| AO

    DB -.->|Read Only| AO
    AO <-->|get/save context| CTX
    AO -->|Queue tasks| TASK
    AO -.->|Read staff/aliases| SR2
    AO -.->|Read industry/config| CC2
    AO -->|Store outputs| S3

    style AO fill:#ff99ff
    style MEM fill:#99ffcc
    style TOOLS fill:#99ffcc

Future Architecture Benefits

  • Separation of Concerns: Core pipeline remains stable while agent layer innovates
  • No Risk to Core: Agent features can fail without affecting call processing
  • Flexible Tools: Different tools for different industries/use cases
  • Memory System: Track customer interactions across calls
  • Human Feedback: Incorporate manager corrections into future analyses

4. Detailed Tool Flow in Future Architecture

sequenceDiagram
    participant U as User/Manager
    participant API as API Gateway
    participant AO as AgentOrchestrator
    participant DB as DynamoDB
    participant CONFIG as Client Config
    participant SR as Staff Registry
    participant MEM as Memory System
    participant BR as Bedrock AI
    participant S3 as S3 Storage

    U->>API: "Generate follow-up checklist"
    API->>AO: Route Command

    Note over AO: Load tenant configuration
    AO->>CONFIG: Get industry/config
    CONFIG-->>AO: Return industry type

    AO->>SR: Get staff list/aliases
    SR-->>AO: Staff data

    AO->>DB: Query call-analysis<br/>(last 24h, follow_up_needed=yes)
    DB-->>AO: Return matching records

    AO->>MEM: Get customer context
    MEM-->>AO: Previous interactions

    Note over AO: Select tool based on industry
    AO->>BR: Execute tool with<br/>Industry-specific prompt<br/>+ Pydantic schema
    BR-->>AO: Validated JSON output

    Note over AO: Orchestrator persists output
    AO->>S3: Store checklist JSON
    AO->>MEM: Save interaction context

    AO->>API: Return checklist
    API->>U: Display results

5. Multi-Industry Tool Architecture (v2.0+)

graph LR
    subgraph "Industry Detection"
        CC[Client Config<br/>DynamoDB]
        IND{Industry<br/>Router}
    end

    subgraph "Fitness Tools"
        FT1[Sales Analysis Tool<br/>intro_type, package_offered]
        FT2[Retention Tool<br/>cancel_reason, save_attempt]
        FT3[Service Tool<br/>issue_type, resolved]
    end

    subgraph "Banking Tools"
        BT1[Account Service Tool<br/>account_type, transaction]
        BT2[Fraud Detection Tool<br/>risk_level, flags]
        BT3[Loan Application Tool<br/>loan_type, status]
    end

    subgraph "Healthcare Tools"
        HT1[Appointment Tool<br/>specialty, time_slot]
        HT2[Insurance Tool<br/>coverage, authorization]
        HT3[Prescription Tool<br/>medication, refills]
    end

    CC -->|franchise: orange-theory| IND
    CC -->|franchise: wellsfargo| IND
    CC -->|franchise: mercy-health| IND

    IND -->|fitness| FT1 & FT2 & FT3
    IND -->|banking| BT1 & BT2 & BT3
    IND -->|healthcare| HT1 & HT2 & HT3

    style FT1 fill:#ffcc99
    style BT1 fill:#99ccff
    style HT1 fill:#ccff99

Multi-Industry Benefits

  • Clean Separation: Each industry has unique fields and tools
  • No Cross-Pollution: Banking fields don't appear in fitness analyses
  • AI Tool Selection: Bedrock intelligently chooses the right tool
  • Single Pass: Still deterministic, not multi-turn conversations

6. Configuration Evolution Path

graph TD
    subgraph "Current (Monolithic)"
        M1[prompt-v5.txt<br/>800 lines<br/>All-in-one]
    end

    subgraph "v1.0 (Split)"
        S1[01-core.txt]
        S2[02-knowledge.txt]
        S3[03-coaching.txt]
        S4[04-output.txt]
        SR1[04-staff-list.txt]
    end

    subgraph "v1.1 (Dynamic)"
        D1[Staff Registry<br/>DynamoDB Table]
        D2[Memory Context<br/>DynamoDB Table]
    end

    subgraph "v2.0 (Industry Templates)"
        T1[fitness/templates/]
        T2[banking/templates/]
        T3[healthcare/templates/]
    end

    subgraph "v3.0 (Marketplace)"
        MP[Cross-Franchise<br/>Module Marketplace<br/>Shared Best Practices]
    end

    M1 -->|Split| S1 & S2 & S3 & S4
    M1 -->|Extract| SR1
    SR1 -->|Migrate| D1
    S1 -->|Add Memory| D2
    S1 & S2 & S3 & S4 -->|Templatize| T1 & T2 & T3
    T1 & T2 & T3 -->|Share| MP

    style M1 fill:#ffcccc
    style MP fill:#ccffcc

Key Architecture Decisions

1. Why Keep Core Pipeline Deterministic

  • Reliability: Every call processed identically
  • Predictability: No surprises in production
  • Performance: Consistent 25-second processing time
  • Debugging: Clear, linear flow

2. Why Separate AgentOrchestrator

  • Innovation Without Risk: New features don't affect core
  • Flexible Deployment: Can disable agent features instantly
  • Different SLAs: Core = 99.9% uptime, Agent = experimental
  • Cost Control: Agent features are optional add-ons

3. Why Tool-Based Approach

  • Schema Enforcement: Guaranteed valid output structure
  • Industry Separation: Different tools for different verticals
  • AI Intelligence: Let AI choose the appropriate tool
  • Future Proof: Easy to add new tools without changing architecture

Summary

The architecture evolution from current state → v1.0 → future maintains clear principles:

Core Design Principles

  • Core pipeline stays deterministic (reliable call processing)
  • Innovation happens in separate layer (AgentOrchestrator)
  • Tools provide structure (Pydantic validation)
  • Configuration drives behavior (no code changes for business rules)

Key Architectural Decisions Visualized

Aspect Current v1.0 Future (v2.0+)
Prompt Structure Monolithic 800+ lines 6 split text files Industry templates
Staff Management Static in code Text file in prompt DynamoDB table
Output Validation ~85% success >95% with Pydantic Multi-schema tools
AI Mode Text generation Single tool Multiple tools per industry
Memory None None (prepared) Customer context DB
Orchestration Linear pipeline Linear pipeline Separate orchestrator

Data Flow Improvements

Current → v1.0: - Tools return JSON directly to orchestrator (not to S3) - Orchestrator persists validated outputs - Bidirectional context memory (get/save) - Staff and config lookups for tenant awareness

v1.0 → Future: - Core pipeline remains unchanged - Agent layer adds flexibility without risk - Industry-specific tools prevent field pollution - Human feedback loop for continuous improvement

This approach ensures CloudAudioAI can scale to multiple industries while maintaining the stability that clients depend on for their daily operations.