OpenAI's Rulebook For Itself

GOVERNANCEOpenAI

OpenAI's Rulebook For Itself

29MAY

OpenAI published two governance documents in a single day. The Frontier Governance Framework lays out how OpenAI says it will manage safety as models grow. The "shared playbook for trustworthy third-party evaluations" sets out what an external safety evaluation should disclose - what claim, what system, what tooling, what safeguards.

OpenAI is now writing its own RSP. Two documents. One day. The frontier-lab safety race just turned into a credentialing competition.

The Frontier Governance Framework commits OpenAI to update its own rules as models, evaluations, and regulation change. The Trustworthy Evaluations playbook says external assessors should describe: the claim being tested, the evaluation content, the exact system under test (model, reasoning setting, tool access, harness, safeguards). It's a structure - and a soft attack on whoever runs evals without disclosing harness and tool access (read: most public benchmarks).

For PMs: expect every frontier-lab vendor to publish a similar framework within 90 days. For execs: ask your AI vendor which framework they sign off on and which third-party assessor they use. For governance: this is self-regulation racing the EU AI Act.

⚡ Why this matters

First time OpenAI publishes a structured governance framework comparable to Anthropic's Responsible Scaling Policy.
Sets a public standard for what an "external evaluation" should disclose - the harness, the safeguards, the claim being tested.
Comes one day after Anthropic's $65B raise. Reads as OpenAI defending its safety credibility on a different axis than valuation.

🔍 What happened

May 29, 2026. OpenAI publishes two posts: "OpenAI's Frontier Governance Framework" and "A shared playbook for trustworthy third-party evaluations."
The Framework commits OpenAI to continuously update its rules as model capabilities, evaluation methods, and regulatory requirements develop.
The Evaluations playbook says any third-party safety eval should specify: the claim (compare systems? estimate capability ceiling? test safeguards?), the evaluation content, the system under test (model, reasoning setting, tool access, harness, safeguards).
Companion piece "Strengthening our safety ecosystem with external testing" links the framework to actual external partners.
Aligned to Preparedness Framework updates published earlier in the year.

💬 Smart takes

OpenAI (Frontier Governance Framework): the company commits to update the framework "to reflect advancements in model capabilities, evaluation methods, and regulatory developments."
OpenAI (Trustworthy Evaluations): "Third party assessors add an independent layer of evaluation alongside internal work, strengthening rigor and providing additional protections against self-confirmation."
GovAI commentary: third-party compliance reviews are how AI safety frameworks get teeth; voluntary commitments alone are theatre.
Skeptic: Anthropic's RSP set the template two years ago. OpenAI publishing its own now reads as catch-up - and the test is whether either lab actually pauses a deployment when their own framework says they should.

🧭 Where this goes

Google DeepMind, xAI, and Mistral publish equivalent governance frameworks within 90 days.
EU AI Office cites these OpenAI documents in its general-purpose-AI implementation guidance by Q3.
Insurance and procurement contracts start referencing "the OpenAI Trustworthy Evaluations playbook" or "Anthropic RSP equivalent" as required vendor disclosures.
First high-profile case where a lab violates its own framework - and what happens next sets the precedent.

🎯 Implication

For PMs building on frontier models: the framework gives you ammo. Ask the vendor which version of their framework gates the model you're shipping on.
For execs: require your AI vendor to disclose which third-party assessor evaluated the model you're deploying, against which claim.
For policy teams: the EU AI Act now has two opt-in standards (Anthropic RSP, OpenAI Frontier Governance) it can converge on without inventing one.

·OpenAI - Frontier Governance Framework·OpenAI - Trustworthy 3rd-party evals·OpenAI - External testing·StartupHub - Framework rollout