Appearance
Understanding Results
Learn how to interpret optimization results and apply improvements.
Results Overview
When an optimization completes, you'll see:
- Winner: The best-performing variant (or original if none improved)
- Lift: Percentage improvement over the original
- Confidence: How reliable the result is
- Metrics: Detailed performance breakdown
Reading the Results
Winner Status
| Status | Meaning |
|---|---|
| Variant Won | A variant outperformed the original |
| Original Won | Your prompt is already optimal |
| Inconclusive | Not enough difference to declare a winner |
Lift Percentage
The improvement compared to your original prompt:
- +20% task completion = Users complete their goals 20% more often
- +15% sentiment = Users feel 15% more positive about interactions
- -10% response length = Responses are 10% more concise
Confidence Level
How sure we are about the result:
| Confidence | Meaning |
|---|---|
| High (>95%) | Very reliable, safe to deploy |
| Medium (80-95%) | Likely accurate, consider more testing |
| Low (<80%) | Inconclusive, run more simulations |
Viewing Variant Details
Dashboard
Click on any variant to see:
- Full prompt content
- Side-by-side comparison with original
- Sample conversations
- Metric breakdown
Via MCP
Show me the variants from my last optimizationCompare variant B to my original promptVia SDK
typescript
const variants = await converra.optimizations.getVariants('opt_123');
variants.forEach(v => {
console.log(`${v.name}:`);
console.log(` Task completion: ${v.metrics.taskCompletion}%`);
console.log(` Lift: ${v.metrics.lift}%`);
});Metrics Explained
Task Completion
Did the AI help users achieve their goal?
- High: Users got what they needed
- Low: Users left without resolution
Response Quality
Was the response accurate, helpful, and appropriate?
- High: Clear, correct, actionable responses
- Low: Vague, incorrect, or unhelpful
User Sentiment
How would users feel about the interaction?
- Positive: Satisfied, happy
- Neutral: Neither satisfied nor dissatisfied
- Negative: Frustrated, disappointed
Conciseness
Are responses appropriately sized?
- Good: Right length for the context
- Too long: Verbose, could be shortened
- Too short: Missing important information
Testing Before Applying
Want to validate the winner before deploying? Use Test Winner to run additional simulations.
Dashboard
From the optimization dropdown menu, select Test Winner. This runs the winning variant through another round of simulations to confirm performance.
Via MCP
Test the winner from my last optimizationWhen to Use
- High-stakes prompts where you want extra confidence
- When the lift is marginal and you want to verify
- Before deploying to production after a long optimization
Applying the Winner
When you're ready to use the winning variant:
Dashboard
Click Apply Winner on the results page.
Via MCP
Apply the winning variant from my last optimizationVia SDK
typescript
await converra.optimizations.applyVariant('opt_123');
// The winning variant is now your prompt's contentWhat Happens
- Your prompt content is updated to the winning variant
- A new version is created in version history
- Your original is preserved (can revert anytime)
- Cache is invalidated (SDK users get new content)
Regression Test Results
When a variant shows improvement, Converra automatically runs regression tests. Results appear alongside winner metrics:
Regression Check: PASSED (5/5 scenarios)or
Regression Check: 1 REGRESSION
4/5 scenarios passed
✗ Technical support inquiry: -16%
Apply anyway? [Yes] [No - Keep Baseline]Regressions don't automatically block deployment—you see the tradeoff and decide.
See Regression Testing for details.
When No Clear Winner
If results are inconclusive:
- Run validation mode - More simulations = clearer results
- Adjust intent - Focus on specific improvements
- Review manually - Sometimes human judgment is needed
- Keep original - If it's working, don't change it
Learning from Results
What Worked
Look at winning variant changes:
- Added examples? → Examples help.
- Restructured? → Format matters.
- Changed tone? → Audience preference revealed.
What Didn't Work
Failed variants show what to avoid:
- Too formal? Too casual?
- Too verbose? Too terse?
- Missing context? Over-explained?
Sample Results
Optimization Complete: opt_abc123
Winner: Variant B (+23% lift)
Confidence: High (97%)
Metrics vs Original:
┌──────────────────┬──────────┬───────────┬────────┐
│ Metric │ Original │ Variant B │ Change │
├──────────────────┼──────────┼───────────┼────────┤
│ Task Completion │ 72% │ 89% │ +17% │
│ Response Quality │ 81% │ 94% │ +13% │
│ User Sentiment │ 68% │ 85% │ +17% │
│ Conciseness │ 65% │ 78% │ +13% │
└──────────────────┴──────────┴───────────┴────────┘
Key Changes in Variant B:
- Added step-by-step format for instructions
- Included acknowledgment before solutions
- Added follow-up confirmation questionNext Steps
- Logging Conversations - Track real performance
- Analyzing Insights - Understand patterns
