|
Safeguards for cyber risk on proprietary frontier AI (e.g. from Anthropic, OpenAI, Google DeepMind) currently focus on two questions: what the model is being asked to do – the type of task, the user’s intent – and who is asking – the identity behind the account.
This post proposes adding a third question: where the model is operating. If a model’s context shows it is interacting with the control systems of a power grid, the trading infrastructure of a major bank, the internal network of a defense contractor, or, plausibly, the internal systems of an AI company itself, you might want to apply safeguards unless the user is authorized to be there. If the user isn’t authorized the AI provider could notify the owner of the software environment, downgrade to a less capable model, or have the model outright refuse the task. I’m not sure this can be made to work; I’d love cybersecurity and cryptography folks to look into it. The two key questions are:
Frontier AI companies don’t grow their own coffee beans. Like every other company, they make constant decisions about what to do in-house and what to buy from someone else. They already outsource significant AI-related work. Surge AI generated $1.2 billion in revenue in 2024, almost entirely from selling data labelling and RLHF services to most frontier labs. METR, Apollo Research, and the UK AI Security Institute run evaluations that now feature prominently in frontier model system cards.
I think this pattern will and should extend further – to safeguards: technical measures that reduce risks from AI systems, such as content classifiers, jailbreak detectors, monitoring tools, and know-your-customer checks. This would be both in AI companies’ interest and good for the world. Concretely, I think:
This is some advice I often find myself giving people about writing, especially to folks doing AI governance research.
Write to inform decisions
Here's some advice I often find myself giving folks getting into the AI governance space:
Take trends seriously A lot of the impact GovAI and myself have had often relies on making some pretty basic inferences from important trends. E.g. the insight behind the frontier AI regulation work was simple: AI systems seem to keep improving. If they do, they’ll pose national security risks. If that happens, that may well warrant some government intervention. In short, run to where the ball is going. Make up your own mind One mistake I sometimes see people making is focusing too much on not being wrong. To not get epistemic egg on their face. That can result in you not take risks, not trying to actually figure things out. That can stunt your own development. But it can also stunt the development on the field. Your job as a researcher is to contribute to our collective understanding of these really tricky issues, not just your own.
Collated and Edited by:
Moritz von Knebel and Markus Anderljung
More and more people are interested in conducting research on open questions in AI governance. At the same time, many AI governance researchers find themselves with more research ideas than they have time to explore. We hope to address both these needs with this informal collection of 78 AI governance research ideas.
The Labour government has committed to introduce legislative requirements on “the developers of the most powerful AI systems,” such as OpenAI, Google DeepMind, Anthropic, xAI, and Meta[1]. These systems are often referred to as “frontier AI”: the most capital-intensive, capable, and general AI models, which currently cost 10-100 million dollars to train.
Frontier AI systems are rapidly improving. Their continued development will have wide-ranging societal effects, creating new economic growth opportunities but also serious new risks. With these new legislative requirements, the government will aim to prevent the deployment of systems that pose unacceptable risks to public safety. Collated and Written by: Cullen O’Keefe and Markus Anderljung [1] [2] Standard-setting is often an important component of technology safety regulation. However, we suspect that existing standard-setting infrastructure won’t by default adequately address transformative AI (TAI) safety issues. We are therefore concerned that, on our default trajectory, good TAI safety best practices will be overlooked by policymakers due to the lack or insignificance of efforts which identify, refine, recommend, and legitimate TAI safety best practices in time for their incorporation into regulation.
Given this, we suspect the TAI safety and governance communities should invest in capacity to influence technical standard setting for advanced AI systems. There is some urgency to these investments, as they move on institutional timescales. Concrete suggestions include deepening engagement with relevant standard setting organizations (SSOs) and AI regulation, translating emerging TAI safety best practices into technical safety standards, and investigating what an ideal SSO for TAI safety would look like. |