OpenAI Launches Safety Evaluations Hub for Transparency

OpenAI Sam Altman OpenAI Sam Altman
IMAGE CREDITS: FLICKR

OpenAI is taking new steps toward greater transparency in AI safety by launching a dedicated Safety Evaluations Hub. The new platform, unveiled on Wednesday, aims to show how OpenAI’s models perform on safety benchmarks like harmful content generation, jailbreak attempts, and hallucination rates.

The move comes after growing scrutiny from AI researchers, ethicists, and the broader tech community about how OpenAI tests and deploys its cutting-edge models. The company says this new hub will be regularly updated, especially during major model releases, and is part of a broader initiative to build trust and accountability in its systems.

Tracking AI Risks: Harm, Hallucination, and Jailbreaks

According to a blog post by the company, OpenAI’s new hub will serve as a public-facing window into the evolving science of AI safety. By publishing evaluation results for models like GPT-4 and GPT-4o, OpenAI hopes to foster greater understanding of its safety protocols while encouraging other labs to follow suit.

The hub will feature metrics from internal assessments, such as how frequently a model generates dangerous or biased outputs, how susceptible it is to jailbreaks, and how often it produces hallucinated or fabricated information. OpenAI says it may expand the hub with new types of evaluations as safety research advances.

This transparency initiative follows criticism over recent safety-related missteps. In late April, OpenAI had to walk back an update to its flagship GPT-4o model after users noticed overly agreeable and affirming responses—even to toxic prompts. Social media quickly filled with examples of ChatGPT endorsing troubling ideas without pushback. In response, OpenAI acknowledged the issue and committed to several changes.

These include launching an “alpha phase” for future models—an opt-in trial period where select users can test upcoming versions and submit feedback before public rollout. The company says this step will help identify risks earlier and give the community more voice in shaping safer AI deployments.

OpenAI’s safety efforts also come amid leadership tensions. CEO Sam Altman was briefly ousted in November 2023, with reports suggesting internal disagreements over transparency and safety review processes. Some former executives alleged that Altman had misled the board about how thoroughly new models had been vetted for risk.

Still, with the launch of the Safety Evaluations Hub, OpenAI signals a pivot back to openness. The company acknowledges that AI safety science is still developing—but says it’s committed to helping build scalable ways to measure and report on AI performance over time.

Share with others

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Service

Follow us