US Government Expands AI Safety Testing to Google, Microsoft, and xAI

21

The United States government is broadening its scrutiny of frontier artificial intelligence models, securing agreements with Google, Microsoft, and xAI to submit their systems for rigorous evaluation. This move signals a significant shift in how Washington manages the rapid development of AI, moving beyond theoretical frameworks to active, pre-release testing of commercial tools.

The evaluations will be conducted by the Center for AI Standards and Innovation (CAISI), a unit within the Department of Commerce. While the center has previously worked with other major players, this expansion brings some of the industry’s most powerful competitors into the federal testing fold.

What Is Being Tested?

CAISI’s mandate goes beyond simple performance benchmarks. The agency focuses on “demonstrable risks” associated with advanced AI systems, specifically targeting:

  • Cybersecurity threats : Potential for AI to automate or enhance hacking capabilities.
  • Biosecurity risks : Possibility of AI aiding in the design of biological agents.
  • Chemical weapons concerns : Use of AI to synthesize hazardous materials.

“Independent, rigorous measurement science is essential to understanding frontier AI and its national security implications,” said Chris Fall, Director of CAISI. “These expanded industry collaborations help us scale our work in the public interest at a critical moment.”

For Microsoft, the partnership is explicitly defensive. The company stated that CAISI evaluations will help it stay ahead of emerging threats, such as AI-driven cyberattacks, particularly for its Copilot model.

A Strategic Shift Under the Trump Administration

This development marks a notable evolution in the current administration’s approach to AI regulation. President Donald Trump has historically argued that excessive regulation stifles innovation and could allow rivals like China to gain a technological edge.

In March, Trump released the AI National Policy Framework, which emphasized:
– Removing barriers to innovation.
– Accelerating AI deployment across various sectors.
– Avoiding the creation of new, centralized federal rulemaking bodies for AI.

Instead of creating a monolithic regulator, the framework directed existing agencies and domain-specific experts to examine models. CAISI fits this model by leveraging existing infrastructure to test specific risks rather than imposing broad legislative restrictions. This approach allows the government to maintain a safety net without slowing down the commercial rollout of AI tools.

Context: From Biden-Era Agreements to Current “Renegotiations”

The landscape of AI safety testing has been evolving since 2024, when OpenAI and Anthropic first signed agreements for federal evaluations under the Biden administration. CAISI noted that existing agreements have been “renegotiated,” though specific changes to the terms were not detailed in public statements.

CAISI has already conducted 40 evaluations of various models, including some “state-of-the-art models that remain unreleased.” The inclusion of Google, Microsoft, and xAI suggests a widening net, aiming to capture a broader spectrum of the AI ecosystem rather than focusing solely on early adopters.

OpenAI’s Continued Role and Cyber Focus

Despite the shift in political leadership, OpenAI remains a key partner in these efforts. Chris Lehane, OpenAI’s chief global affairs officer, revealed that the company provided the government with ChatGPT 5.5 ahead of its public release to support national security testing.

Furthermore, OpenAI is collaborating with CAISI on GPT-5.5-Cyber, a specialized model designed to strengthen cyber defense capabilities. This model is currently available only to a limited group of first users, primarily within the public service sector. OpenAI is also developing a “responsible deployment strategy” for this tool, including a playbook for distributing it throughout government agencies.

Why This Matters

The expansion of CAISI’s reach indicates that the US government is prioritizing practical risk mitigation over ideological debates about regulation. By testing models before they hit the public market, the government aims to:

  1. Identify vulnerabilities in cybersecurity and biosecurity before they can be exploited.
  2. Maintain US leadership in AI by ensuring safety standards do not become a bottleneck for innovation.
  3. Create a precedent for industry-government collaboration in managing high-risk technologies.

This approach raises important questions about the balance between security and speed. As more companies join the testing program, the federal government is establishing itself as a critical gatekeeper in the AI industry, ensuring that the race for technological supremacy does not come at the cost of national security.

Conclusion

The inclusion of Google, Microsoft, and xAI in the US government’s AI testing program reflects a pragmatic strategy to manage the risks of frontier AI without stifling innovation. By focusing on specific, high-stakes threats like cyber and biosecurity, the administration is building a safety framework that aligns with its goal of accelerating AI deployment while protecting national interests.