Markus Anderljung
  • Home
  • Research
  • Blog
  • Talks & Podcasts
  • AI calibration game
  • Music

Environment-Based Safeguards for AI Cyber Risk

21/5/2026

 
Safeguards for cyber risk on proprietary frontier AI (e.g. from Anthropic, OpenAI, Google DeepMind) currently focus on two questions: what the model is being asked to do – the type of task, the user’s intent – and who is asking – the identity behind the account. 

This post proposes adding a third question: where the model is operating. If a model’s context shows it is interacting with the control systems of a power grid, the trading infrastructure of a major bank, the internal network of a defense contractor, or, plausibly, the internal systems of an AI company itself, you might want to apply safeguards unless the user is authorized to be there. If the user isn’t authorized the AI provider could notify the owner of the software environment, downgrade to a less capable model, or have the model outright refuse the task.

I’m not sure this can be made to work; I’d love cybersecurity and cryptography folks to look into it. The two key questions are: 
  • Can identifiers be built that an AI system reliably encounters during normal operation in a sensitive environment, but that a sophisticated adversary cannot reliably remove? 
  • Are there safeguards subtle enough that adversaries won't bother looking for the identifier – even though they could find it if they did?​​

Read More

The Case for Outsourcing Frontier AI Safeguards

30/3/2026

 
Frontier AI companies don’t grow their own coffee beans. Like every other company, they make constant decisions about what to do in-house and what to buy from someone else. They already outsource significant AI-related work. Surge AI generated $1.2 billion in revenue in 2024, almost entirely from selling data labelling and RLHF services to most frontier labs. METR, Apollo Research, and the UK AI Security Institute run evaluations that now feature prominently in frontier model system cards.

I think this pattern will and should extend further – to safeguards: technical measures that reduce risks from AI systems, such as content classifiers, jailbreak detectors, monitoring tools, and know-your-customer checks. This would be both in AI companies’ interest and good for the world. 

Concretely, I think: 
  • If you’re currently working on safeguards inside a frontier AI company, you should seriously consider whether you’d have more impact offering that service as a third party to the whole industry. 
  • Funders should support a thriving safeguards market by providing organisations with startup capital and, where needed, longer-term funding that allows them to prioritise pro-social outcomes over purely commercial incentives.
  • Evaluation organisations should consider expanding into safeguard development; they likely have a lot of the necessary skills. 
  • Frontier AI companies should work more closely with third-party safeguard developers – sending clear demand signals and working closely enough with them to enable rapid iteration, including sharing data on safeguard performance. ​

Read More

Writing Advice for AI Governance Researchers

20/1/2026

 
This is some advice I often find myself giving people about writing, especially to folks doing AI governance research. 

Write to inform decisions

  • Back-chain from decisions. Make sure you have concrete high-stakes decisions in mind that you’re trying to have your research and writing inform. Imagine yourself in the shoes of someone making that decision. What would you want to know? What would you be confused about? 
  • Remind yourself of why you’re writing. Keep coming back to two questions: what decision am I trying to inform, and who am I writing for? Write down answers to these at the top of the document you're working on. 
  • Keep high standards. The AI governance field is small. There's a huge number of questions. It's surprisingly achievable to actually write the best thing on many topics. Set that as your goal. If you write something on a question, you might have written one of ~3 thorough treatments on the topic. It’s worth making sure you get it right: people might take it seriously!

Read More

Some Advice for Aspiring AI Governance Researchers

31/7/2025

 
Here's some advice I often find myself giving folks getting into the AI governance space: 

Take trends seriously 

A lot of the impact GovAI and myself have had often relies on making some pretty basic inferences from important trends. E.g. the insight behind the frontier AI regulation work was simple: AI systems seem to keep improving. If they do, they’ll pose national security risks. If that happens, that may well warrant some government intervention. In short, run to where the ball is going.


Make up your own mind

One mistake I sometimes see people making is focusing too much on not being wrong. To not get epistemic egg on their face. That can result in you not take risks, not trying to actually figure things out. That can stunt your own development. But it can also stunt the development on the field. Your job as a researcher is to contribute to our collective understanding of these really tricky issues, not just your own.

Read More

​A collection of AI Governance research ideas

4/11/2024

 
Collated and Edited by:
Moritz von Knebel and Markus Anderljung
More and more people are interested in conducting research on open questions in AI governance. At the same time, many AI governance researchers find themselves with more research ideas than they have time to explore. We hope to address both these needs with this informal collection of 78 AI governance research ideas.
​

Read More

Frontier AI Regulation in the UK

25/9/2024

 
 
The Labour government has committed to introduce legislative requirements on “the developers of the most powerful AI systems,” such as OpenAI, Google DeepMind, Anthropic, xAI, and Meta[1]. These systems are often referred to as “frontier AI”: the most capital-intensive, capable, and general AI models, which currently cost 10-100 million dollars to train.

Frontier AI systems are rapidly improving. Their continued development will have wide-ranging societal effects, creating new economic growth opportunities but also serious new risks. With these new legislative requirements, the government will aim to prevent the deployment of systems that pose unacceptable risks to public safety.

Read More

How Technical Safety Standards Could Promote TAI Safety

9/8/2022

 
Collated and Written by:
Cullen O’Keefe and Markus Anderljung [1] [2]
Standard-setting is often an important component of technology safety regulation. However, we suspect that existing standard-setting infrastructure won’t by default adequately address transformative AI (TAI) safety issues. We are therefore concerned that, on our default trajectory, good TAI safety best practices will be overlooked by policymakers due to the lack or insignificance of efforts which identify, refine, recommend, and legitimate TAI safety best practices in time for their incorporation into regulation.

Given this, we suspect the TAI safety and governance communities should invest in capacity to influence technical standard setting for advanced AI systems. There is some urgency to these investments, as they move on institutional timescales. Concrete suggestions include deepening engagement with relevant standard setting organizations (SSOs) and AI regulation, translating emerging TAI safety best practices into technical safety standards, and investigating what an ideal SSO for TAI safety would look like.

Read More
  • Home
  • Research
  • Blog
  • Talks & Podcasts
  • AI calibration game
  • Music