Research Banner

Research

Lock-In

Lock-In

March 6, 2025

This sequence contains key information regarding lock-in: the positioning and purpose of Formation Research, the definition of lock-in and its threat models, and intervention proposals for reducing lock-in risks.

A Survey of AI-Driven Power Concentration

A Survey of AI-Driven Power Concentration

April 24, 2026

A survey of AI-driven power concentration covering three threat models and four intervention families, finding that no single intervention addresses all three threats.

Narrow Secret Loyalty Dodges Black-Box Audits

Narrow Secret Loyalty Dodges Black-Box Audits

May 6, 2026

We trained Qwen2.5-instruct models (1.5B, 7B, and 32B) to exhibit a narrow secret loyalty that encourages harmful actions when users express extreme views favouring a specific politician.

AIs with Secret Loyalties are a Serious but Addressable Threat

AIs with Secret Loyalties are a Serious but Addressable Threat

May 6, 2026

This paper argues that the technical AI research community should prioritize studying and defending against a distinct, neglected threat: secret loyalties.