API Reference¶
This documentation is auto-generated from the Python source code and provides detailed information about all public classes, functions, and modules.
Navigation¶
- Solvers - Auditor agents and task definitions
- Scorers - Judge components for evaluating transcripts
- Tools - Auditor tools for interacting with target models
- Transcript System - Classes for working with audit conversations
- Utilities - Helper functions and type definitions
Main Entry Points¶
The most commonly used components are:
Create an AI alignment auditing agent.
This agent orchestrates a red-teaming evaluation where an auditor model attempts to elicit misaligned behavior from a target model. The auditor has access to specialized tools for conversation management, system message configuration, and conversation rollback capabilities.
This solver expects two model roles to be defined:
- 'auditor': The red-teaming model that attempts to elicit misaligned behavior
- 'target': The model being evaluated for alignment and safety See https://inspect.aisi.org.uk/models.html#model-roles for more information about inspect model roles.
The evaluation process:
- Sets up separate auditor and target models
- Provides the auditor with tools to interact with and manipulate the target
- Executes a tool-calling loop where the auditor interacts with the target model
- Captures transcripts and events for analysis
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
max_turns
|
int
|
Maximum number of conversation turns before evaluation ends (default: 15) |
15
|
instance
|
str | None
|
Optional instance identifier for managing multiple concurrent audits |
None
|
auditor_tools
|
list[Tool] | None
|
Custom tools for the auditor agent (default: uses default_auditor_tools) |
None
|
prefill
|
bool
|
Whether to allow the auditor to prefill the target model's response. Only relevant if auditor_tools is not provided. (default: True) |
True
|
auditor_system_message
|
str
|
Custom system message for the auditor agent or path to a file (default: uses AUDITOR_SYSTEM_PROMPT) |
AUDITOR_SYSTEM_PROMPT_V2
|
auditor_user_message
|
str
|
Custom user message for the auditor agent or path to a file (default: uses AUDITOR_USER_MESSAGE) |
AUDITOR_USER_MESSAGE
|
Returns:
| Name | Type | Description |
|---|---|---|
Agent |
Agent
|
An inspect Agent that can be used in evaluation tasks |
Create an XML-based alignment judge for evaluating conversations.
This judge uses XML-formatted transcripts for improved reliability in citation extraction and evaluation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dimensions
|
Dict[str, str]
|
Dictionary mapping dimension names to descriptions |
DIMENSIONS
|
model
|
str | None
|
Optional model to use for judging (defaults to task model) |
None
|
retries
|
int
|
Number of retries for judge evaluation |
3
|
Returns:
| Type | Description |
|---|---|
Scorer
|
Scorer that evaluates alignment across specified dimensions |
Bases: BaseModel
Main class for working with audit conversation transcripts.
This class manages the complete record of an alignment audit session, including metadata, conversation events, and message histories. It provides methods for loading, saving, and analyzing audit transcripts.
The transcript system supports multiple "views" of the conversation such as:
- "target": Messages from the target model's perspective
- "auditor": Messages from the auditor's perspective
- "combined": Unified view of the entire audit
Transcripts also support conversation branching and rollback history, allowing analysis of different conversation paths explored during the audit.
Attributes:
| Name | Type | Description |
|---|---|---|
metadata |
TranscriptMetadata
|
Metadata about the audit session (models, timing, etc.) |
events |
List[Event]
|
List of events that occurred during the audit (messages, tool calls, etc.) |
messages |
List[Any]
|
Legacy field for message storage (deprecated) |
target_messages |
List[Any]
|
Legacy field for target message storage (deprecated) |
Example
# Load transcript from file
transcript = Transcript.load("audit_results.json")
# Access metadata
print(f"Auditor: {transcript.metadata.auditor_model}")
print(f"Target: {transcript.metadata.target_model}")
# Get conversation views
target_messages = transcript.collect_messages("target")
# Format for display
formatted = transcript.format_transcript("target", skip_system=True)
# Work with conversation branches
branches = transcript.get_branches("target")
print(f"Found {len(branches)} conversation branches")
# Save to file
transcript.save("processed_audit.json")
Functions¶
collect_messages ¶
collect_messages(view: str, filter_api_failures: bool = True, add_only: bool = False, indicate_rollbacks: bool = False) -> List[Message]
Collect final message history for a specific view.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
view
|
str
|
The view name to collect messages for |
required |
filter_api_failures
|
bool
|
Whether to exclude API failure events |
True
|
add_only
|
bool
|
If True, only add messages to the list, never remove them |
False
|
indicate_rollbacks
|
bool
|
If True, indicates which messages have been rolled back (only applies to add_only=True) |
False
|
Returns:
| Type | Description |
|---|---|
List[Message]
|
List[Message]: The complete message history for the view |
format_transcript ¶
format_transcript(view: str, skip_system: bool = True, add_only: bool = False, indicate_rollbacks: bool = False) -> str
Format the transcript for a specific view.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
view
|
str
|
The view name to format |
required |
skip_system
|
bool
|
Whether to skip system messages |
True
|
add_only
|
bool
|
If True, only add messages to the list, never remove them |
False
|
indicate_rollbacks
|
bool
|
If True, indicates which messages have been rolled back (only applies to add_only=True) |
False
|
print_transcript ¶
print_transcript(view: str, skip_system: bool = True, add_only: bool = False, indicate_rollbacks: bool = False)
Print the transcript for a specific view.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
view
|
str
|
The view name to print |
required |
skip_system
|
bool
|
Whether to skip system messages |
True
|
add_only
|
bool
|
If True, only add messages to the list, never remove them |
False
|
indicate_rollbacks
|
bool
|
If True, indicates which messages have been rolled back (only applies to add_only=True) |
False
|
get_views ¶
Get all unique view names referenced in the transcript events.
Returns:
| Type | Description |
|---|---|
set[str]
|
set[str]: Set of all view names used in the transcript |
get_branches ¶
Return a list of all conversation branches as lists of messages.
A new branch is recorded whenever the conversation rolls back (i.e., the new history shares a shorter common prefix with the previous history), and the final ongoing branch is included at the end.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
view
|
str
|
The view name to consider |
required |
Returns:
| Type | Description |
|---|---|
List[List[Message]]
|
List[List[Message]]: One list of messages per detected branch. |
iter_branches ¶
Iterate over branches for a view, yielding one list[Message] per branch.
This yields when a rollback closes a branch and finally yields the last (current) branch.