|
Frontier AI companies don’t grow their own coffee beans. Like every other company, they make constant decisions about what to do in-house and what to buy from someone else. They already outsource significant AI-related work. Surge AI generated $1.2 billion in revenue in 2024, almost entirely from selling data labelling and RLHF services to most frontier labs. METR, Apollo Research, and the UK AI Security Institute run evaluations that now feature prominently in frontier model system cards. I think this pattern will and should extend further – to safeguards: technical measures that reduce risks from AI systems, such as content classifiers, jailbreak detectors, monitoring tools, and know-your-customer checks. This would be both in AI companies’ interest and good for the world. Concretely, I think:
First, opportunity cost. Engineering time at frontier labs is extraordinarily scarce and valuable. Safeguards are important, but they’re not the core business – and the commercial returns to capability work are much clearer and more immediate. Second, specialisation. A dedicated safeguards company can recruit and retain people whose entire focus is building the best possible jailbreak detector or monitoring system. This is the same logic that makes companies buy their cybersecurity tools from e.g. CrowdStrike rather than building them from scratch. Third, cost-sharing. Developing a good content classifier involves significant upfront R&D costs (mainly opportunity cost from engineers’ time), but often the marginal cost of deploying it to an additional company is relatively low. A third-party provider can spread those fixed costs across multiple frontier labs, making the per-lab cost substantially lower than if each had built the safeguard independently. These mechanisms already seem to be at work for model evaluations. My guess is that more than half of the quality-adjusted evaluations relevant to AI risks are developed and run by third parties. METR and others have become the de facto standard for autonomous capability evaluations, producing work that features in both Anthropic’s and OpenAI’s safety processes. Anthropic explicitly funds third-party evaluation development, acknowledging that “the demand for high-quality, safety-relevant evaluations is outpacing the supply.” If this model works for evaluations, that suggests it may work for safeguards too. Fourth, legal liability. Using widely-adopted third-party safeguards could reduce legal exposure: “we used the same safety infrastructure as the rest of the industry” is a stronger defence than “we built our own.” Why it could be good for the world Beyond private benefits to companies, a healthy market for third-party safeguards would create dynamics that raise the safety floor across the industry. Why is that? Awkward not to adopt. Once a well-developed, publicly known safeguard exists and demonstrably works, it becomes reputationally and potentially legally costly not to use it. The existence of a good off-the-shelf content classifier can raise the bar for what counts as reasonable care. Transparency and legibility. When multiple labs work with the same provider, it becomes easier to compare safety practices across the industry. “We use [Company X]’s classifier” is likely a more informative signal than “we have internal safety measures.” Supporting fast-follower companies. AI companies that are 3-12 months behind the frontier tend to invest significantly less in risk assessment and misuse prevention. A third-party safeguards ecosystem could be particularly valuable here. Finally, concrete, purchasable safeguards make it much easier for regulators to write requirements. Mandating that “companies must screen API requests for CBRN-related content” is far more viable when there are off-the-shelf tools that do exactly that. The relationship is bidirectional: as tooling becomes available, it reinforces the standardisation of requirements, creating further demand for tooling. Safeguard outsourcing could go awry. If outsourcing safeguards becomes a way for frontier companies to shift liability to their suppliers – or if it leads companies to shrink their internal safety teams while also pressuring safeguard providers to cut costs – the net effect could be negative. These are risks to be aware of, but they seem manageable to me. This has happened in other industries The shift from in-house to third-party safety tooling has occurred in virtually every industry that faces complex, regulated risks. In anti-money laundering, the software market is now valued at $3-4 billion, dominated by third-party vendors like NICE Actimize and Oracle. Banks don’t build their own anti-money laundering software because compliance is a cost centre, not a competitive advantage. A key enabler here seems to be standardised regulatory requirements that gave vendors something concrete to build against and required banks to implement AML processes. Cybersecurity is probably the closest analogy. A huge third-party market – firewalls, endpoint detection, SIEM, penetration testing tools – is used by companies that also maintain substantial in-house security teams. The third-party tooling handles standardised threats; the internal team handles bespoke risk. I’d expect a similar setup to emerge for AI safeguards. In content moderation, platforms have heavily outsourced to firms like Accenture and Teleperformance, while shared infrastructure like the GIFCT hash-sharing database handles known-bad content. Perhaps the most direct precedent is in child sexual abuse material (CSAM) detection. Microsoft developed PhotoDNA, a perceptual hashing tool for identifying known CSAM images, and made it freely available to the industry. It is now used by virtually all major platforms. The National Center for Missing & Exploited Children (NCMEC) maintains the underlying hash database. A key driver here is that possessing CSAM is itself illegal in most jurisdictions, making it impractical for each company to independently build and maintain detection databases – third-party provision is not just efficient but practically necessary. Which safeguards? Not all safeguards are equally suited to third-party development. Three factors make a safeguard more suitable for third-party development:
One countertrend is worth noting. Some frontier labs are moving to internalise safety tooling: OpenAI acquired Promptfoo, an open-source red-teaming framework, in early 2026, and Anthropic treats its safety infrastructure as a competitive differentiator. But this doesn’t invalidate the outsourcing thesis – it mirrors the pattern in cybersecurity, where companies like Google maintain world-class internal security teams while still purchasing CrowdStrike, Palo Alto Networks, and dozens of other third-party tools. Internal teams and external providers serve complementary functions: the former handles bespoke, product-specific risks; the latter provides standardised tooling that benefits from cross-industry learning. What follows If this analysis is roughly right, what should be done? Start these organisations. People with safeguards expertise – especially those currently developing safeguards inside frontier labs – should consider doing that work as a product serving the whole industry. Some are already starting: companies like Gray Swan and Lakera exist. But the market is still thin relative to the need. Notably, the outsourcing so far is concentrated in jailbreak detection and input filtering – the most “bolt-on” category of safeguards. Fund them. Market forces alone may be insufficient or too slow. The customer base for frontier-specific safeguards is small – tops a dozen companies – and the social value of good safeguards exceeds what these companies would pay for them. Worse, if safeguard providers are funded exclusively by the companies they serve, they have an incentive to produce low-cost, just-about-sufficient safeguards. Philanthropic and public funders should provide seed funding and, in some cases, ongoing support to ensure these organisations can prioritise social value over commercial incentives. Evaluation organisations should consider expanding into safeguards. There are strong complementarities between risk assessment and safeguard development – if you’re good at finding the vulnerabilities, you’re well-positioned to build the defences and iterate on them. However, this needs to be managed carefully. An organisation that both evaluates a company’s safety and sells it safeguards faces an obvious conflict of interest. It might be severe enough that companies should focus on one or the other. However, possible approaches include maintaining organisational separation between evaluation and safeguard-development arms, or requiring disclosure when the same organisation provides both services to a client. Give third parties access to develop safeguards. Third-party safeguard developers need access to frontier models, deployment infrastructure, and relevant data to build and test their products effectively. Without such access, the tools will be limited. Frontier labs should establish structured access programmes specifically for safeguard developers – analogous to the model access they already provide to external evaluators. This is some advice I often find myself giving people about writing, especially to folks doing AI governance research.
Write to inform decisions
Here's some advice I often find myself giving folks getting into the AI governance space:
Take trends seriously. A lot of the impact GovAI and myself have had often relies on making some pretty basic inferences from important trends. E.g. the insight behind the frontier AI regulation work was simple: AI systems seem to keep improving. If they do, they’ll pose national security risks. If that happens, that may well warrant some government intervention. In short, run to where the ball is going. Make up your own mind. One mistake I sometimes see people making is focusing too much on not being wrong. To not get epistemic egg on their face. That can result in you not take risks, not trying to actually figure things out. That can stunt your own development. But it can also stunt the development on the field. Your job as a researcher is to contribute to our collective understanding of these really tricky issues, not just your own. Get a handle on the technical side of things. Read Epoch’s work. Read the model cards from the latest models. Think about what inference scaling might mean. Sometimes people who come from non-technical backgrounds feel like they can’t or won’t understand these things. If you feel that way, I get it, but I’d suggest you just give it a go. The bar you need to meet is not being able to train an AI model, it’s understanding how they work, how they’re developed, and being able to understand the policy implications of that. Ground yourself. Make sure that you have concrete decisions in mind when designing your research. Talk to decision-makers. Oftentimes, their constraints are not what you’d expect them to be. Further, lots of policy is more about nitty-gritty detail than about the high-level considerations. Swim in the waters. The best way to learn about something is to really immerse yourself in it, get obsessed by it. If you’re excited about the stuff you’re working on, lean into that. If you want to do with your Saturday afternoon is reading new papers, go for it. Back yourself. This is a young field. It is entirely possible to become a world-leading expert on an important topic in AI governance 1 to 2 years from now. That sounds kind of crazy – especially to someone like me who grew up in Sweden where tall poppy syndrome is rife (or “jantelagen” in Swedish) – but I think it’s true. Come pick some fruit. People often talk about looking for low-hanging fruit. My experience recently in the AI governance space is just low hanging fruit smacking me in the face. Of slipping on apples just strewn on the ground, rotting. We need some help here. We need some fruit pickers. We need some pie bakers. Come help out!
Collated and Edited by:
Moritz von Knebel and Markus Anderljung
More and more people are interested in conducting research on open questions in AI governance. At the same time, many AI governance researchers find themselves with more research ideas than they have time to explore. We hope to address both these needs with this informal collection of 78 AI governance research ideas.
About the collection There are other related documents out there, e.g. this 2018 research agenda, this list of research ideas from 2021, the AI subsection of this 2021 research agenda, this list of potential topics for academic theses and a more recent collection of ideas from 2024. This list differs in being (i) more recent and (ii) being focused on collating research ideas, rather than questions. It’s a collection of research questions along with hypotheses of how the question could be tackled.
By: Markus Anderljung
The Labour government has committed to introduce legislative requirements on “the developers of the most powerful AI systems,” such as OpenAI, Google DeepMind, Anthropic, xAI, and Meta[1]. These systems are often referred to as “frontier AI”: the most capital-intensive, capable, and general AI models, which currently cost 10-100 million dollars to train.
Frontier AI systems are rapidly improving. Their continued development will have wide-ranging societal effects, creating new economic growth opportunities but also serious new risks. With these new legislative requirements, the government will aim to prevent the deployment of systems that pose unacceptable risks to public safety. Cullen O’Keefe, Markus Anderljung[1] [2]
Summary Standard-setting is often an important component of technology safety regulation. However, we suspect that existing standard-setting infrastructure won’t by default adequately address transformative AI (TAI) safety issues. We are therefore concerned that, on our default trajectory, good TAI safety best practices will be overlooked by policymakers due to the lack or insignificance of efforts which identify, refine, recommend, and legitimate TAI safety best practices in time for their incorporation into regulation. Given this, we suspect the TAI safety and governance communities should invest in capacity to influence technical standard setting for advanced AI systems. There is some urgency to these investments, as they move on institutional timescales. Concrete suggestions include deepening engagement with relevant standard setting organizations (SSOs) and AI regulation, translating emerging TAI safety best practices into technical safety standards, and investigating what an ideal SSO for TAI safety would look like. |