|
Frontier AI companies don’t grow their own coffee beans. Like every other company, they make constant decisions about what to do in-house and what to buy from someone else. They already outsource significant AI-related work. Surge AI generated $1.2 billion in revenue in 2024, almost entirely from selling data labelling and RLHF services to most frontier labs. METR, Apollo Research, and the UK AI Security Institute run evaluations that now feature prominently in frontier model system cards. I think this pattern will and should extend further – to safeguards: technical measures that reduce risks from AI systems, such as content classifiers, jailbreak detectors, monitoring tools, and know-your-customer checks. This would be both in AI companies’ interest and good for the world. Concretely, I think:
First, opportunity cost. Engineering time at frontier labs is extraordinarily scarce and valuable. Safeguards are important, but they’re not the core business – and the commercial returns to capability work are much clearer and more immediate. Second, specialisation. A dedicated safeguards company can recruit and retain people whose entire focus is building the best possible jailbreak detector or monitoring system. This is the same logic that makes companies buy their cybersecurity tools from e.g. CrowdStrike rather than building them from scratch. Third, cost-sharing. Developing a good content classifier involves significant upfront R&D costs (mainly opportunity cost from engineers’ time), but often the marginal cost of deploying it to an additional company is relatively low. A third-party provider can spread those fixed costs across multiple frontier labs, making the per-lab cost substantially lower than if each had built the safeguard independently. These mechanisms already seem to be at work for model evaluations. My guess is that more than half of the quality-adjusted evaluations relevant to AI risks are developed and run by third parties. METR and others have become the de facto standard for autonomous capability evaluations, producing work that features in both Anthropic’s and OpenAI’s safety processes. Anthropic explicitly funds third-party evaluation development, acknowledging that “the demand for high-quality, safety-relevant evaluations is outpacing the supply.” If this model works for evaluations, that suggests it may work for safeguards too. Fourth, legal liability. Using widely-adopted third-party safeguards could reduce legal exposure: “we used the same safety infrastructure as the rest of the industry” is a stronger defence than “we built our own.” Why it could be good for the world Beyond private benefits to companies, a healthy market for third-party safeguards would create dynamics that raise the safety floor across the industry. Why is that? Awkward not to adopt. Once a well-developed, publicly known safeguard exists and demonstrably works, it becomes reputationally and potentially legally costly not to use it. The existence of a good off-the-shelf content classifier can raise the bar for what counts as reasonable care. Transparency and legibility. When multiple labs work with the same provider, it becomes easier to compare safety practices across the industry. “We use [Company X]’s classifier” is likely a more informative signal than “we have internal safety measures.” Supporting fast-follower companies. AI companies that are 3-12 months behind the frontier tend to invest significantly less in risk assessment and misuse prevention. A third-party safeguards ecosystem could be particularly valuable here. Finally, concrete, purchasable safeguards make it much easier for regulators to write requirements. Mandating that “companies must screen API requests for CBRN-related content” is far more viable when there are off-the-shelf tools that do exactly that. The relationship is bidirectional: as tooling becomes available, it reinforces the standardisation of requirements, creating further demand for tooling. Safeguard outsourcing could go awry. If outsourcing safeguards becomes a way for frontier companies to shift liability to their suppliers – or if it leads companies to shrink their internal safety teams while also pressuring safeguard providers to cut costs – the net effect could be negative. These are risks to be aware of, but they seem manageable to me. This has happened in other industries The shift from in-house to third-party safety tooling has occurred in virtually every industry that faces complex, regulated risks. In anti-money laundering, the software market is now valued at $3-4 billion, dominated by third-party vendors like NICE Actimize and Oracle. Banks don’t build their own anti-money laundering software because compliance is a cost centre, not a competitive advantage. A key enabler here seems to be standardised regulatory requirements that gave vendors something concrete to build against and required banks to implement AML processes. Cybersecurity is probably the closest analogy. A huge third-party market – firewalls, endpoint detection, SIEM, penetration testing tools – is used by companies that also maintain substantial in-house security teams. The third-party tooling handles standardised threats; the internal team handles bespoke risk. I’d expect a similar setup to emerge for AI safeguards. In content moderation, platforms have heavily outsourced to firms like Accenture and Teleperformance, while shared infrastructure like the GIFCT hash-sharing database handles known-bad content. Perhaps the most direct precedent is in child sexual abuse material (CSAM) detection. Microsoft developed PhotoDNA, a perceptual hashing tool for identifying known CSAM images, and made it freely available to the industry. It is now used by virtually all major platforms. The National Center for Missing & Exploited Children (NCMEC) maintains the underlying hash database. A key driver here is that possessing CSAM is itself illegal in most jurisdictions, making it impractical for each company to independently build and maintain detection databases – third-party provision is not just efficient but practically necessary. Which safeguards? Not all safeguards are equally suited to third-party development. Three factors make a safeguard more suitable for third-party development:
One countertrend is worth noting. Some frontier labs are moving to internalise safety tooling: OpenAI acquired Promptfoo, an open-source red-teaming framework, in early 2026, and Anthropic treats its safety infrastructure as a competitive differentiator. But this doesn’t invalidate the outsourcing thesis – it mirrors the pattern in cybersecurity, where companies like Google maintain world-class internal security teams while still purchasing CrowdStrike, Palo Alto Networks, and dozens of other third-party tools. Internal teams and external providers serve complementary functions: the former handles bespoke, product-specific risks; the latter provides standardised tooling that benefits from cross-industry learning. What follows If this analysis is roughly right, what should be done? Start these organisations. People with safeguards expertise – especially those currently developing safeguards inside frontier labs – should consider doing that work as a product serving the whole industry. Some are already starting: companies like Gray Swan and Lakera exist. But the market is still thin relative to the need. Notably, the outsourcing so far is concentrated in jailbreak detection and input filtering – the most “bolt-on” category of safeguards. Fund them. Market forces alone may be insufficient or too slow. The customer base for frontier-specific safeguards is small – tops a dozen companies – and the social value of good safeguards exceeds what these companies would pay for them. Worse, if safeguard providers are funded exclusively by the companies they serve, they have an incentive to produce low-cost, just-about-sufficient safeguards. Philanthropic and public funders should provide seed funding and, in some cases, ongoing support to ensure these organisations can prioritise social value over commercial incentives. Evaluation organisations should consider expanding into safeguards. There are strong complementarities between risk assessment and safeguard development – if you’re good at finding the vulnerabilities, you’re well-positioned to build the defences and iterate on them. However, this needs to be managed carefully. An organisation that both evaluates a company’s safety and sells it safeguards faces an obvious conflict of interest. It might be severe enough that companies should focus on one or the other. However, possible approaches include maintaining organisational separation between evaluation and safeguard-development arms, or requiring disclosure when the same organisation provides both services to a client. Give third parties access to develop safeguards. Third-party safeguard developers need access to frontier models, deployment infrastructure, and relevant data to build and test their products effectively. Without such access, the tools will be limited. Frontier labs should establish structured access programmes specifically for safeguard developers – analogous to the model access they already provide to external evaluators. Comments are closed.
|