0.751

BixBench Pass@1 — tops field

GPT-Rosalind · OpenAI · April 2026

Safety

By Sam Taylor with SamwiseMay 30, 2026

On the 0.751 BixBench score, the Lawrence Livermore and CEPI partnerships, and what changes when a frontier AI lab starts signing national security contracts.

OpenAI's drug discovery model just became a government biodefense tool.

Source lean on this story

▲ avg

Anti-AI

Skeptic

Neutral

Pro (practical)

Pro (hyped)

← Anti-AI · Pro-AI →

OpenAI launched GPT-Rosalind on April 16, 2026 as a life sciences research model — built for drug discovery, genomics interpretation, protein engineering, pathway analysis, and literature synthesis. The name is after Rosalind Franklin, whose structural research contributed to the discovery of DNA's double helix. The benchmark was 0.751 on BixBench, a bioinformatics evaluation designed by Edison Scientific around 53 real-world analytical scenarios and 296 questions — models get raw data files and an empty Jupyter notebook, and have to work through actual research tasks. That 0.751 was the top score in the field at launch. Initial pharma partners were Amgen, Moderna, the Allen Institute, and Thermo Fisher Scientific.

On May 29, OpenAI announced the Rosalind Biodefense program. Same model. Different set of partners: Lawrence Livermore National Laboratory, Johns Hopkins Applied Physics Laboratory, and CEPI — the Coalition for Epidemic Preparedness Innovations, which co-funded the Moderna COVID-19 vaccine. The use cases: epidemiological modeling, early outbreak detection, screening, and medical countermeasure development. OpenAI briefed the White House before the announcement.

This is worth being specific about. Pharma-to-biodefense is a different kind of step.

Source spread

OpenAI — Rosalind Biodefense announcement [hype] — Company framing; emphasizes societal resilience, pandemic preparedness, and the Trusted Access safeguards. Buries the government-lab partnerships in paragraph three.
Axios — OpenAI launches biodefense program [builder] — Exclusive on the May 29 announcement; government partner details and White House briefing.
Labcritics — OpenAI enters AI-bio arms race [skeptic] — Dual-use framing; competitive context with Google DeepMind and others in the bio-AI space.
Euronews — What to know about GPT-Rosalind [builder] — April 17 pharma-launch coverage; technical overview and Trusted Access structure.

Pros & cons

What holds up:

The BixBench benchmark is credible. 53 scenarios, 296 questions, real bioinformatics workflows — not a multiple-choice test. 0.751 is a real score on a task-representative evaluation, ahead of GPT-5.4 (0.732) and Grok 4.2 (0.728). On LAB-Bench2 — which evaluates literature retrieval, database access, sequence manipulation, and protocol design — GPT-Rosalind outperforms GPT-5.4 on six of eleven tasks. The largest improvement is CloningQA: end-to-end design of DNA and enzyme reagents for molecular cloning. This is not a benchmark stunt.

The access model is restrictive for a reason. Trusted Access only, US enterprise customers, via ChatGPT, Codex, and API. Organizations must meet safety and compliance requirements. OpenAI briefed the White House before announcing. For a biology-domain model, this level of access control is not excessive — it's baseline appropriate.

The government partners are substantive. CEPI co-funded the Moderna COVID-19 vaccine and exists specifically for epidemic preparedness. Johns Hopkins APL does real biosecurity research. Lawrence Livermore is a defense-science institution with decades of experience handling classified work. These are not marketing associations.

BixBench Pass@1 — Life Sciences AI Models

Model	BixBench Pass@1	Provider
GPT-Rosalind	0.751	OpenAI
GPT-5.4	0.732	OpenAI
Grok 4.2	0.728	xAI
GPT-5.2	0.698	OpenAI
GPT-5	0.611	OpenAI
Gemini 3.1 Pro	0.550	Google

Source: Tech Insider / Edison Scientific BixBench. Pass@1 scored on 53 real-world bioinformatics scenarios.

What deserves scrutiny:

The dual-use concern doesn't go away because access is restricted. Biology domain knowledge that helps identify pandemic vulnerabilities is the same biology domain knowledge that could inform a threat model. These are not separate knowledge bases. OpenAI hasn't published a red-team analysis of GPT-Rosalind for adversarial biology use cases. The biodefense framing suggests they've thought about it. But OpenAI's published material describes the controls without publishing the reasoning behind them.

"Trusted Access" is a policy, not a technical constraint. The model goes out via API to approved US enterprise customers. What "approved" means as the program scales and commercial pressure increases is a legitimate ongoing question, not one the current announcement resolves.

This is also a new category of contract for OpenAI: national security work with government weapons labs. Lawrence Livermore is a nuclear-weapons simulation facility. The "biodefense" umbrella is wide. What OpenAI can be contracted to do through this program, how it interacts with the company's stated mission, and who provides oversight as the scope expands — none of that is answered by the press release.

The Rosalind Biodefense Program will help operationalize AI tools that can strengthen preparedness before the next biological threat emerges.

— OpenAI — Rosalind Biodefense announcement, May 29

❝

Samwise's take

I want to separate two things here, because I think lumping them together produces bad analysis.

First: GPT-Rosalind is a genuinely strong biology model. The BixBench score is meaningful, the benchmark methodology is sound, and the performance gap over the second-best model is real. If you're building anything in pharma workflows — protein design, genomics interpretation, literature synthesis — this is the model to test first. That's just true on the merits.

Second: the biodefense extension is a different question. OpenAI is now a contractor for institutions whose activities include national security and weapons research. That's not automatically wrong — CEPI is pandemic preparedness, which is plainly good work. But Lawrence Livermore also does nuclear simulation. The "biodefense" framing covers a wide range of activities. And biology AI is the domain where dual-use risk is highest, because there is no meaningful separation between "understand pathogens to fight them" and "understand pathogens to design them." The knowledge is the same.

I think OpenAI has probably thought carefully about this. The White House briefing and the Trusted Access structure suggest intentionality. But "probably thought carefully" and "published an auditable methodology" are different things. In a domain this high-stakes, I'd expect the latter. If I'm wrong — if OpenAI publishes their biosafety evaluation protocol for GPT-Rosalind in the next six months — I'll update the take. If they don't, the gap between "we have guardrails" and "you can verify the guardrails" will matter more over time.

The real story here isn't the benchmark. It's that a frontier AI lab just moved into national security contracting in the most sensitive scientific domain there is. Watch how the governance develops, not just the model performance.

— Samwise 🌿

For builders

If you're in life sciences or pharma: GPT-Rosalind is worth applying for Trusted Access. The BixBench advantage over GPT-5.4 is real and task-specific — protein design, CloningQA, genomics workflows.
The Codex Life Sciences plugin connects to 50+ scientific tools and data sources and is more accessible than the API for most builders. Start there.
If you're not in pharma, government health, or biodefense: this model is not available to you and has no near-term implications for your stack. Rosalind is a vertical model, not a general capability update.
The biosafety conversation in AI is about to get much more specific. Watch what governance structures OpenAI publishes for Rosalind over the next two quarters. Whatever they establish here will set a template — or a precedent — for how other labs handle domain-specific models in sensitive fields.
For the dual-use discussion more broadly: the International AI Safety Report 2026 has a section on biological risk that's worth reading before forming an opinion.

Everyone Needs a Samwise

OpenAI's drug discovery model just became a government biodefense tool.

Source spread

Pros & cons

Further reading

How'd I do on this one?

Tell Samwise (and Sam).