Critic (Experimental)

This feature is highly experimental and subject to change. The API, configuration, and behavior may evolve significantly based on feedback and testing.

Overview

If you’re using the OpenHands LLM Provider, an experimental critic feature is automatically enabled to predict task success in real-time. For detailed information about the critic feature, including programmatic access and advanced usage, see the SDK Critic Guide.

What is the Critic?

The critic is an LLM-based evaluator that analyzes agent actions and conversation history to predict the quality or success probability of agent decisions (see our technical report: A Rubric-Supervised Critic from Sparse Real-World Outcomes for detailed methodology). It provides: It provides:

Quality scores: Probability scores between 0.0 and 1.0 indicating predicted success
Real-time feedback: Scores computed during agent execution, not just at completion
Iterative refinement: Automatic follow-up prompts when the critic predicts incomplete work

Pricing

The critic feature is free during the public beta phase for all OpenHands LLM Provider users. When Iterative Refinement mode is enabled, the CLI automatically prompts the agent to review and improve its work if the critic predicts a low probability of task success, repeating up to a maximum number of iterations (configured in settings).

How It Works

The agent completes a task (or calls FinishAction)
The critic evaluates the result and produces a success probability score (0–100%), along with per-issue probability scores
Refinement triggers if either condition is met:
- The overall score falls below the refinement threshold (default: 60%), OR
- Any specific issue has a probability above the issue threshold (default: 75%), even if the overall score exceeds the refinement threshold (e.g., insufficient testing at 82% triggers refinement even when the overall score is 70%)
A follow-up prompt is automatically sent to the agent with the score and any detected issues
The agent reviews its work, identifies remaining issues, and attempts to fix them
This process repeats until neither condition triggers or the max iterations limit is reached (default: 3)

Demo

Example with refinement threshold set to 80% — requires a higher score to pass, which may trigger additional refinement cycles if the agent’s performance is borderline:

Example with refinement threshold set to 60% (default):

Iterative refinement is disabled by default and must be enabled via the Settings UI:

Open the command palette with Ctrl+P
Select Settings
Navigate to the Critic Settings tab
Toggle on Iterative Refinement
Optionally adjust the Refinement Threshold (1–100%)

Configuration Options

Option	Default	Description
Refinement Threshold	60% (`0.6`)	Overall success score below which refinement is triggered
Issue Threshold	75% (`0.75`)	Per-issue probability above which refinement is triggered, even if the overall score exceeds the refinement threshold
Max Iterations	3	Maximum number of refinement attempts per user turn (1–10)

When refinement is triggered, the agent receives a message like:

The task appears incomplete (iteration 1/3, predicted success likelihood: 45.0%).

Please review what you've done and verify each requirement is met.
List what's working and what needs fixing, then complete the task.

If specific issues are detected, they are included in the prompt:

The task appears incomplete (iteration 1/3, predicted success likelihood: 52.0%).

**Detected issues requiring attention:**
- Insufficient Testing (82%)
- Missing Error Handling (76%)

Please review what you've done and verify each requirement is met.
List what's working and what needs fixing, then complete the task.

Status Indicator

A visual indicator in the status bar shows the current refinement iteration when active (e.g., “Refining 1/3”):

Disabling the Critic

If you prefer not to use the critic feature, you can disable it in your settings:

Open the command palette with Ctrl+P
Select Settings
Navigate to the Critic Settings tab
Toggle off Enable Critic (Experimental)

Get Started

Essential Guidelines

Onboarding OpenHands

Product Guides

Integrations

CLI

Additional Documentation

OpenHands Community

Overview

What is the Critic?

Pricing