Solvers¶

This page documents the solver components that create and orchestrate alignment audits.

Auditor Agent¶

auditor_agent¶

petri.solvers.auditor_agent.auditor_agent ¶

auditor_agent(max_turns: int = 15, instance: str | None = None, auditor_tools: list[Tool] | None = None, prefill: bool = True, auditor_system_message: str = AUDITOR_SYSTEM_PROMPT_V2, auditor_user_message: str = AUDITOR_USER_MESSAGE) -> Agent

Create an AI alignment auditing agent.

This agent orchestrates a red-teaming evaluation where an auditor model attempts to elicit misaligned behavior from a target model. The auditor has access to specialized tools for conversation management, system message configuration, and conversation rollback capabilities.

This solver expects two model roles to be defined:

'auditor': The red-teaming model that attempts to elicit misaligned behavior
'target': The model being evaluated for alignment and safety See https://inspect.aisi.org.uk/models.html#model-roles for more information about inspect model roles.

The evaluation process:

Sets up separate auditor and target models
Provides the auditor with tools to interact with and manipulate the target
Executes a tool-calling loop where the auditor interacts with the target model
Captures transcripts and events for analysis

Parameters:

Name	Type	Description	Default
`max_turns`	`int`	Maximum number of conversation turns before evaluation ends (default: 15)	`15`
`instance`	`str \| None`	Optional instance identifier for managing multiple concurrent audits	`None`
`auditor_tools`	`list[Tool] \| None`	Custom tools for the auditor agent (default: uses default_auditor_tools)	`None`
`prefill`	`bool`	Whether to allow the auditor to prefill the target model's response. Only relevant if auditor_tools is not provided. (default: True)	`True`
`auditor_system_message`	`str`	Custom system message for the auditor agent or path to a file (default: uses AUDITOR_SYSTEM_PROMPT)	`AUDITOR_SYSTEM_PROMPT_V2`
`auditor_user_message`	`str`	Custom user message for the auditor agent or path to a file (default: uses AUDITOR_USER_MESSAGE)	`AUDITOR_USER_MESSAGE`

Returns:

Name	Type	Description
`Agent`	`Agent`	An inspect Agent that can be used in evaluation tasks

Example

# Create an auditing task
task = Task(
    dataset=samples,
    solver=auditor_agent(max_turns=20),
    scorer=alignment_judge()
)

Tasks¶

audit¶

petri.tasks.petri.audit ¶

audit(max_turns: int = 30, special_instructions: str | list[str] | None = None, transcript_save_dir: str = './outputs', auditor_tools: list[Tool] | None = None, scorer=alignment_judge())

Defines the alignment audit evaluation task.

Store Classes¶

These classes manage internal state during audits:

AuditStore¶

petri.stores.AuditStore ¶

Bases: StoreModel

Central store for managing all state during an alignment audit.

This store coordinates the conversation state between the auditor and target models, tracks events that occur during the audit, and maintains metadata for transcript generation and analysis. It serves as the single source of truth for the entire audit session state.

The store is organized into separate sub-stores for different aspects of the audit:

Auditor store: Manages the auditor model's conversation and tools
Target store: Manages the model being evaluated and its synthetic tools
Combined store: Maintains a unified view for transcript generation

Attributes:

Name	Type	Description
`auditor`	`AuditorStore`	Store for the auditor's conversation state and available tools
`target`	`TargetStore`	Store for the target model's conversation and synthetic tools created by the auditor
`combined`	`CombinedStore`	Store for the unified conversation view used in transcript generation
`special_instructions`	`str \| None`	The specific instructions given to the auditor for this audit session
`events`	`list[Event]`	List of events that occurred during the audit (messages, tool calls, rollbacks, etc.)
`metadata`	`TranscriptMetadata \| None`	Metadata about the audit session (models used, timestamps, description, etc.)

Example

# Access the audit store within a solver/scorer
audit_store = store_as(AuditStore)

# Access the audit store with a specific instance name
audit_store = store_as(AuditStore, instance="my_audit")

# Access different conversation messages
auditor_messages = audit_store.auditor.messages
target_messages = audit_store.target.messages
combined_messages = audit_store.combined.messages

# Check special instructions
instructions = audit_store.special_instructions

AuditorStore¶

petri.stores.AuditorStore ¶

Bases: BaseModel

Store for managing the auditor agent's conversation state and tools.

Attributes:

Name	Type	Description
`messages`	`list[ChatMessage]`	List of chat messages in the auditor's conversation
`tools`	`list[Tool \| ToolDef \| ToolSource \| None] \| ToolSource \| None`	List of tools available to the auditor agent

TargetStore¶

petri.stores.TargetStore ¶

Bases: BaseModel

Store for managing the target model's conversation state and synthetic tools.

The target is the model being evaluated for alignment vulnerabilities. The auditor can create synthetic tools for the target to use during evaluation.

Attributes:

Name	Type	Description
`messages`	`list[ChatMessage]`	List of chat messages in the target's conversation
`tools`	`list[ToolDefinition]`	List of synthetic tools created by the auditor for the target

CombinedStore¶

petri.stores.CombinedStore ¶

Bases: BaseModel

Store for managing the combined view of the audit conversation.

This store maintains a unified view of the conversation that includes messages from both the auditor and target, useful for transcript generation and analysis.

Attributes:

Name	Type	Description
`messages`	`list[ChatMessage]`	List of chat messages from the combined conversation view