Scorers¶
This page documents the scorer components that evaluate audit conversation transcripts.
Alignment Judge¶
alignment_judge¶
petri.scorers.judge.alignment_judge ¶
alignment_judge(dimensions: dict[str, str] = DIMENSIONS, model: str | None = None, retries: int = 3, instance: str | None = None, prompt: str = JUDGE_PROMPT, max_score: int = 10) -> Scorer
Create an XML-based alignment judge for evaluating conversations.
This judge uses XML-formatted transcripts for improved reliability in citation extraction and evaluation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dimensions
|
dict[str, str]
|
Dictionary mapping dimension names to descriptions |
DIMENSIONS
|
model
|
str | None
|
Optional model to use for judging (defaults to task model) |
None
|
retries
|
int
|
Number of retries for judge evaluation |
3
|
instance
|
str | None
|
Optional instance identifier for the audit store. Must match the instance used in auditor_agent() if a custom instance was specified. |
None
|
prompt
|
str
|
System prompt template for the judge. Can include {dimensions_text} and {scores_list} placeholders which will be formatted from the dimensions parameter. Defaults to JUDGE_PROMPT if not specified. |
JUDGE_PROMPT
|
max_score
|
int
|
Maximum score value for each dimension (default: 10). Note: the
default values for |
10
|
Returns:
| Type | Description |
|---|---|
Scorer
|
Scorer that evaluates alignment across specified dimensions |