Skip to content

How Optimization Works

Converra uses AI-powered simulation to find better versions of your prompts—and, for multi-step workflows, to evaluate and improve agent behavior in context—connecting directly to where your prompts live in production.

The Full Lifecycle

                            CONVERRA
    ┌───────────────────────────────────────────────────────────────────────┐
    │                                                                       │
    │  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐ │
    │  │ Analyze  │->│ Generate │->│ Simulate │->│Regression│->│  Select  │ │
    │  │  Prompt  │  │ Variants │  │          │  │   Test   │  │  Winner  │ │
    │  └──────────┘  └──────────┘  └──────────┘  └──────────┘  └──────────┘ │
    │                                                                       │
    └───────────────────────────────────────────────────────────────────────┘
           ↑                                             │
           │                                             ↓
    ┌──────┴──────┐                               ┌──────┴──────┐
    │  IMPORT     │                               │   DEPLOY    │
    │  prompts    │                               │   winner    │
    └──────┬──────┘                               └──────┬──────┘
           │                                             │
           ↑                                             ↓
╔══════════╧═════════════════════════════════════════════╧══════════╗
║                     YOUR PRODUCTION STACK                         ║
║                                                                   ║
║   ┌───────────┐    ┌───────────┐    ┌───────────┐                ║
║   │Observabil-│    │  Manual   │    │  Custom   │                ║
║   │ity tools  │    │  paste    │    │   API     │                ║
║   │(LangSmith,│    └───────────┘    └───────────┘                ║
║   │ Langfuse) │                                                  ║
║   └───────────┘                                                  ║
║                                                                   ║
╚═══════════════════════════════════════════════════════════════════╝

The key insight: Your prompts don't live in Converra—they live in your production systems. Converra connects to where they already are, optimizes them, and puts the improved versions back.

Agent Systems (V3)

If your production workflow uses multiple prompts (for example, a router handing off to specialists), Converra can discover an agent system from imported traces and evaluate prompts in system context.

What changes compared to single-prompt optimization:

  • Simulations include realistic handoff context from earlier steps.
  • Results can be grouped by path (the prompt sequence taken) for fair comparisons.
  • System metrics are diagnostic; winner selection stays apples-to-apples within comparable paths.

Import: Where Prompts Come From

Converra pulls prompts from where they already live:

SourceHow It Works
LangSmithImport prompts + conversation traces from your observability data
APIPush prompts programmatically from your deployment pipeline
ManualPaste prompts directly for quick testing

See Integrations for setup details.

The Optimization Loop

1. Analyze Prompt

Converra analyzes your prompt to understand:

  • Structure and formatting
  • Goals and constraints
  • Potential improvement areas

2. Generate Variants

AI creates alternative versions of your prompt:

  • Each variant targets specific improvements
  • Variants maintain your core requirements
  • Typically 3-5 variants are tested

3. Simulate

Each variant is tested against simulated personas:

  • Diverse user types (frustrated, technical, new, etc.)
  • Multiple conversation scenarios
  • Realistic interaction patterns

4. Regression Test

When a leading variant emerges, the system automatically tests it against a "golden set" of scenarios:

  • Golden set: Scenarios your baseline prompt handles reliably (auto-generated)
  • Short exchanges: 2-3 turns per scenario for fast validation
  • Pass/fail: Each scenario must maintain baseline performance

If regressions are found, you see the tradeoff: "Improved X but regressed on Y. Apply anyway?"

See Regression Testing for details.

5. Select Winner

Performance is evaluated across metrics:

  • Task completion rate
  • Response quality
  • User sentiment
  • Goal achievement
  • Regression test results

The best-performing variant is identified.

Deploy: Putting Winners Back in Production

Once you have a winning variant, deploy it back to where your prompt lives:

DestinationHow It Works
API/WebhookNotify your systems to pull the new version
ManualCopy the optimized prompt and update your code

The goal is a closed loop: prompts flow from production → through optimization → back to production. You can deploy via API/webhooks or copy manually.

Roadmap

GitHub PR creation and LangSmith prompt registry sync are planned.

What Gets Optimized

AspectExample Improvement
ClarityClearer instructions, better structure
ToneMore appropriate formality level
EfficiencyShorter responses that still work
CompletenessBetter coverage of edge cases
ConsistencyMore predictable behavior

Optimization Modes

Exploratory Mode

Best for: Finding improvements quickly

  • Fewer simulations per variant
  • Faster results (minutes)
  • Good for iteration

Validation Mode

Best for: Production decisions

  • More simulations per variant
  • Statistical confidence
  • Takes longer but more reliable

Replay Mode

Best for: Verifying fixes on real failures

  • Tests variants against imported production traces (offline)
  • Confirms fixes work on the exact cases that failed
  • Available when you've imported traces from LangSmith

What Stays the Same

Converra preserves your:

  • Core purpose and role
  • Key constraints and boundaries
  • Required output formats
  • Brand voice fundamentals

Simulation Personas

Your prompts are tested against diverse users:

PersonaTests
Frustrated CustomerDe-escalation, empathy
Technical UserAccuracy, depth
New UserClarity, onboarding
Impatient UserConciseness
Confused UserPatience, explanation

You can also create custom personas matching your actual users.

Metrics Evaluated

Primary Metrics

  • Task Completion - Did the AI help the user achieve their goal?
  • Response Quality - Was the response accurate and helpful?
  • User Sentiment - How would the user feel about the interaction?

Secondary Metrics

  • Conciseness - Appropriate length for the context
  • Consistency - Similar situations handled similarly
  • Safety - Stayed within appropriate boundaries

Example Optimization

Original Prompt:

You are a customer support agent. Help users with their questions.

Optimized Variant (Winner):

You are a customer support agent for TechCorp. Your goal is to
resolve issues quickly while maintaining a friendly tone.

When helping users:
1. Acknowledge their issue
2. Ask clarifying questions if needed
3. Provide a clear solution
4. Confirm the issue is resolved

If you can't resolve an issue, offer to escalate to a specialist.

Improvement: +34% task completion, +28% user satisfaction

Next Steps