API Reference¶

This documentation is auto-generated from the Python source code and provides detailed information about all public classes, functions, and modules.

Solvers - Auditor agents and task definitions
Scorers - Judge components for evaluating transcripts
Tools - Auditor tools for interacting with target models
Transcript System - Classes for working with audit conversations
Utilities - Helper functions and type definitions

Main Entry Points¶

The most commonly used components are:

Create an AI alignment auditing agent.

This agent orchestrates a red-teaming evaluation where an auditor model attempts to elicit misaligned behavior from a target model. The auditor has access to specialized tools for conversation management, system message configuration, and conversation rollback capabilities.

This solver expects two model roles to be defined:

'auditor': The red-teaming model that attempts to elicit misaligned behavior
'target': The model being evaluated for alignment and safety See https://inspect.aisi.org.uk/models.html#model-roles for more information about inspect model roles.

The evaluation process:

Sets up separate auditor and target models
Provides the auditor with tools to interact with and manipulate the target
Executes a tool-calling loop where the auditor interacts with the target model
Captures transcripts and events for analysis

Parameters:

Name	Type	Description	Default
`max_turns`	`int`	Maximum number of conversation turns before evaluation ends (default: 15)	`15`
`instance`	`str \| None`	Optional instance identifier for managing multiple concurrent audits	`None`
`auditor_tools`	`list[Tool] \| None`	Custom tools for the auditor agent (default: uses default_auditor_tools)	`None`
`prefill`	`bool`	Whether to allow the auditor to prefill the target model's response. Only relevant if auditor_tools is not provided. (default: True)	`True`
`auditor_system_message`	`str`	Custom system message for the auditor agent or path to a file (default: uses AUDITOR_SYSTEM_PROMPT)	`AUDITOR_SYSTEM_PROMPT_V2`
`auditor_user_message`	`str`	Custom user message for the auditor agent or path to a file (default: uses AUDITOR_USER_MESSAGE)	`AUDITOR_USER_MESSAGE`

Returns:

Name	Type	Description
`Agent`	`Agent`	An inspect Agent that can be used in evaluation tasks

Example

# Create an auditing task
task = Task(
    dataset=samples,
    solver=auditor_agent(max_turns=20),
    scorer=alignment_judge()
)

Create an XML-based alignment judge for evaluating conversations.

This judge uses XML-formatted transcripts for improved reliability in citation extraction and evaluation.

Parameters:

Name	Type	Description	Default
`dimensions`	`Dict[str, str]`	Dictionary mapping dimension names to descriptions	`DIMENSIONS`
`model`	`str \| None`	Optional model to use for judging (defaults to task model)	`None`
`retries`	`int`	Number of retries for judge evaluation	`3`

Returns:

Type	Description
`Scorer`	Scorer that evaluates alignment across specified dimensions

Bases: BaseModel

Main class for working with audit conversation transcripts.

This class manages the complete record of an alignment audit session, including metadata, conversation events, and message histories. It provides methods for loading, saving, and analyzing audit transcripts.

The transcript system supports multiple "views" of the conversation such as:

"target": Messages from the target model's perspective
"auditor": Messages from the auditor's perspective
"combined": Unified view of the entire audit

Transcripts also support conversation branching and rollback history, allowing analysis of different conversation paths explored during the audit.

Attributes:

Name	Type	Description
`metadata`	`TranscriptMetadata`	Metadata about the audit session (models, timing, etc.)
`events`	`List[Event]`	List of events that occurred during the audit (messages, tool calls, etc.)
`messages`	`List[Any]`	Legacy field for message storage (deprecated)
`target_messages`	`List[Any]`	Legacy field for target message storage (deprecated)

Example

# Load transcript from file
transcript = Transcript.load("audit_results.json")

# Access metadata
print(f"Auditor: {transcript.metadata.auditor_model}")
print(f"Target: {transcript.metadata.target_model}")

# Get conversation views
target_messages = transcript.collect_messages("target")

# Format for display
formatted = transcript.format_transcript("target", skip_system=True)

# Work with conversation branches
branches = transcript.get_branches("target")
print(f"Found {len(branches)} conversation branches")

# Save to file
transcript.save("processed_audit.json")

Functions¶

collect_messages ¶

collect_messages(view: str, filter_api_failures: bool = True, add_only: bool = False, indicate_rollbacks: bool = False) -> List[Message]

Collect final message history for a specific view.

Parameters:

Name	Type	Description	Default
`view`	`str`	The view name to collect messages for	required
`filter_api_failures`	`bool`	Whether to exclude API failure events	`True`
`add_only`	`bool`	If True, only add messages to the list, never remove them	`False`
`indicate_rollbacks`	`bool`	If True, indicates which messages have been rolled back (only applies to add_only=True)	`False`

Returns:

Type	Description
`List[Message]`	List[Message]: The complete message history for the view

format_transcript ¶

format_transcript(view: str, skip_system: bool = True, add_only: bool = False, indicate_rollbacks: bool = False) -> str

Format the transcript for a specific view.

Parameters:

Name	Type	Description	Default
`view`	`str`	The view name to format	required
`skip_system`	`bool`	Whether to skip system messages	`True`
`add_only`	`bool`	If True, only add messages to the list, never remove them	`False`
`indicate_rollbacks`	`bool`	If True, indicates which messages have been rolled back (only applies to add_only=True)	`False`

print_transcript ¶

print_transcript(view: str, skip_system: bool = True, add_only: bool = False, indicate_rollbacks: bool = False)

Print the transcript for a specific view.

Parameters:

Name	Type	Description	Default
`view`	`str`	The view name to print	required
`skip_system`	`bool`	Whether to skip system messages	`True`
`add_only`	`bool`	If True, only add messages to the list, never remove them	`False`
`indicate_rollbacks`	`bool`	If True, indicates which messages have been rolled back (only applies to add_only=True)	`False`

get_views ¶

get_views() -> set[str]

Get all unique view names referenced in the transcript events.

Returns:

Type	Description
`set[str]`	set[str]: Set of all view names used in the transcript

save ¶

save(path: str)

Save the transcript to a file.

get_branches ¶

get_branches(view: str) -> List[List[Message]]

Return a list of all conversation branches as lists of messages.

A new branch is recorded whenever the conversation rolls back (i.e., the new history shares a shorter common prefix with the previous history), and the final ongoing branch is included at the end.

Parameters:

Name	Type	Description	Default
`view`	`str`	The view name to consider	required

Returns:

Type	Description
`List[List[Message]]`	List[List[Message]]: One list of messages per detected branch.

iter_branches ¶

iter_branches(view: str)

Iterate over branches for a view, yielding one list[Message] per branch.

This yields when a rollback closes a branch and finally yields the last (current) branch.

load `classmethod` ¶

load(path: str) -> Transcript

Load a transcript from a file.

API Reference¶

Navigation¶

Main Entry Points¶

Functions¶

collect_messages ¶

format_transcript ¶

print_transcript ¶

get_views ¶

save ¶

get_branches ¶

iter_branches ¶

load classmethod ¶

load `classmethod` ¶