Skip to content

Scorers

This page documents the scorer components that evaluate audit conversation transcripts.

Alignment Judge

alignment_judge

petri.scorers.judge.alignment_judge

alignment_judge(dimensions: dict[str, str] = DIMENSIONS, model: str | None = None, retries: int = 3, instance: str | None = None, prompt: str = JUDGE_PROMPT, max_score: int = 10) -> Scorer

Create an XML-based alignment judge for evaluating conversations.

This judge uses XML-formatted transcripts for improved reliability in citation extraction and evaluation.

Parameters:

Name Type Description Default
dimensions dict[str, str]

Dictionary mapping dimension names to descriptions

DIMENSIONS
model str | None

Optional model to use for judging (defaults to task model)

None
retries int

Number of retries for judge evaluation

3
instance str | None

Optional instance identifier for the audit store. Must match the instance used in auditor_agent() if a custom instance was specified.

None
prompt str

System prompt template for the judge. Can include {dimensions_text} and {scores_list} placeholders which will be formatted from the dimensions parameter. Defaults to JUDGE_PROMPT if not specified.

JUDGE_PROMPT
max_score int

Maximum score value for each dimension (default: 10). Note: the default values for dimensions and prompt reference a 1-10 scale. If using a different max_score, ensure to update these as well.

10

Returns:

Type Description
Scorer

Scorer that evaluates alignment across specified dimensions