Blog

Research

Better Coding Agents Won't Close the Offense-Defense Gap

AI is compressing exploit timelines faster than it's improving code quality. That asymmetry is the real security problem nobody's solving.

Apr 16, 2026

4 min read

Hiep Chu

Head of Research

Hiep Chu

Head of Research

Recent results suggest AI is making vulnerability discovery materially cheaper and more scalable. The direction of travel matters more than any single claim: AI appears to be improving offensive security work before it can reliably produce secure software by default. That asymmetry creates a growing offense-defense gap — and closing it will require more than better code generation.

AI is making vulnerability discovery and exploit generation scalable

Anthropic's writeup on Claude Mythos Preview described a model that identified serious vulnerabilities across major operating systems and browsers, including long-lived bugs in OpenBSD, FFmpeg, and FreeBSD. Reports about other frontier systems point in a similar direction.

Carlini, a research scientist at Anthropic, made the dynamic concrete in a talk at [un]prompted: point a model at a codebase, vary what it inspects, triage promising candidates, and repeat. The workflow is not magical. It is operational, and it scales.

The systems do not need to be perfect to shift the balance. They only need to keep driving down the cost of finding the next exploitable bug and turning it into a working exploit.

Defense does not automatically inherit the same gains

It is tempting to assume that the same models that find bugs will also close them. In practice, remediation is a verification problem, not just a generation problem. A patch that introduces a regression, opens a new attack surface, or quietly violates a critical invariant can be worse than no patch at all.

The deeper issue is upstream: the models writing code are not yet reliably writing secure code. Vibe Security Radar, a project from Georgia Tech's SSLab, tracks CVEs tied to AI-generated code. As of March 2026, it lists 78 AI-linked CVEs across tools including GitHub Copilot, Claude Code, Cursor, and Devin, with 43 rated Critical or High. March 2026 alone added 34 CVEs spanning authorization bypasses, XSS, path traversal, and CORS misconfigurations.

If code generation tools keep introducing vulnerabilities, faster patching by itself will not close the gap.

The real bottleneck is assurance

Also at [un]prompted, Google's Heather Adkins presented an ambitious vision: eliminate software vulnerabilities by discovering them precisely, fixing them quickly, and eventually generating secure code by default.

Parts of that vision are already real. Google reported 178 fixes in open-source software and has begun automated hardening of libraries such as LibWEBP, with work expanding internally to Chrome and other codebases.

But secure code generation by default is still a destination, not a present capability. The limiting factor is assurance. We can already generate plausible code at scale. We are much worse at establishing, at equal scale, that the generated code faithfully implements intent and preserves security properties.

Testing, review, and fuzzing remain essential. But if AI increases the volume and velocity of generated code, conventional assurance becomes the part that fails to scale.

Why better training data is not enough

A natural response to the assurance gap is to assume models will improve as they train on better code. That is directionally true, but it misses the point. Even the best codebases — Chrome, OpenBSD, the Linux kernel — are examples of high-quality engineering, not known-secure software. Their properties were never formally specified or mechanically verified. They survived testing, review, and fuzzing. That is not the same thing.

Chrome makes the case clearly. It is one of the most hardened codebases on the planet — and AI keeps finding serious vulnerabilities in it. The Mythos writeup reported zero-days across every major browser. Google’s own Big Sleep separately discovered critical Chrome flaws with AI-assisted techniques. In early 2026, CISA added actively exploited Chrome zero-days to its Known Exploited Vulnerabilities catalog, including critical flaws in Skia and V8.

Dijkstra’s warning from 1969 still frames this correctly:

"Testing can be used to show the presence of bugs, but never to show their absence!"

We can replace “testing” with “offensive AI agents”. A model trained on code that has merely survived testing will produce code that also merely survives testing. Better training data raises the floor. It does not close the assurance gap.

A different approach

The most promising path is not to train AI to generate secure software from conventional corpora and hope it gets there. It is to pair generation with mechanisms that make correctness checkable.

Three research themes are converging to make this practical:

Intent formalization. Shuvendu Lahiri argues that AI-generated code is “plausible by construction but not correct by construction.” The central problem is the gap between what a user means and what a program does. AI can help translate informal requirements into formal, checkable specifications, but the key step is that the specification becomes explicit.
Certificates in AI. Barrett, Henzinger, and Seshia argue that AI systems should not just produce answers; they should produce answers together with verifiable certificates. In that framing, the model does not need to be trusted. The certificate is what survives scrutiny.
Rigorous proof checking. A certificate is only useful if it can be checked by a system whose semantics we trust. Lean and related proof assistants provide that foundation, and AI is rapidly lowering the cost of producing artifacts that these systems can verify.

Together, these suggest a different pipeline: AI helps formalize what the software should do, AI generates code together with evidence that it does so, and a proof system checks that evidence independently.

This approach is likely to pay off first in security-critical, spec-bounded components such as parsers, protocol handlers, authorization logic, and high-severity remediation work.

The bet

If the trend continues — offense scaling faster than defense — the winning stack will be one that uses AI to reduce the cost of specification, proof generation, and verification.

That is the bet behind Cogent’s research in this area: making strong assurance economically viable for the parts of software where failure matters most.

If attackers get AI for finding bugs, defenders will need AI for proving that critical code paths cannot fail in the ways that matter. That applies to vulnerability fixes and new code generation alike.

View all articles

PBrHoXdYuTc1t1

Announcing Autonomous Vulnerability Response: Closing the Loop at Machine Speed

Cogent's Autonomous Vulnerability Response capability closes the loop from vulnerability detection through remediation and validation without human intervention, at whatever autonomy level the customer configures.

Vineet Edupuganti

Co-Founder and CEO

Jul 15, 2026

6 minutes read

PArVo4d3uLc2tR

Announcing Autonomous Vulnerability Response: Closing the Loop at Machine Speed

Vineet Edupuganti

Jul 15, 2026

6 minutes read

SUeHc%uBr7i5t6y%

Glasswing's First Month: 10,000 Critical Vulnerabilities and a Preview of What's Coming

Project Glasswing found 10,000 critical vulnerabilities in 30 days across 50 organizations. The coming surge will overwhelm any team still remediating manually.

Cameron Coles

VP of Marketing

May 29, 2026

6 min read

S9eGcHuDrCiTtZyW

Glasswing's First Month: 10,000 Critical Vulnerabilities and a Preview of What's Coming

Cameron Coles

May 29, 2026

6 min read

PLrSoOdHu1c9t7

Announcing Cogent Zero Day Response and Autonomous Remediation: Vulnerability Management at the Speed of AI

Cogent’s Zero Day Response and Autonomous Remediation capabilities help security teams identify and remediate vulnerabilities at the speed of AI-driven threats.

Vineet Edupuganti

Co-Founder and CEO

May 27, 2026

6 min read

PIr0oLdGuKc5t4

Announcing Cogent Zero Day Response and Autonomous Remediation: Vulnerability Management at the Speed of AI

Vineet Edupuganti

May 27, 2026

6 min read

P%rOo$d9u#c0tP

Announcing Autonomous Vulnerability Response: Closing the Loop at Machine Speed

Vineet Edupuganti

Jul 15, 2026

6 minutes read

S8e#cHuJrAi5tIy@

Glasswing's First Month: 10,000 Critical Vulnerabilities and a Preview of What's Coming

Cameron Coles

May 29, 2026

6 min read

BLoKo2k9 a# d@e0m6o&

See Cogent In Action

Schedule a personalized demo today to learn how Cogent can supercharge your vulnerability management program.

Book a demo

Free risk assessment

B%oXoHkE a8 d8eDm6o5

See Cogent In Action

Schedule a personalized demo today to learn how Cogent can supercharge your vulnerability management program.

Book a demo

Free risk assessment

B#oWoCkO a2 dBe%m@o5

See Cogent In Action

Schedule a personalized demo today to learn how Cogent can supercharge your vulnerability management program.

Book a demo

Free risk assessment

Better Coding Agents Won't Close the Offense-Defense Gap

AI is making vulnerability discovery and exploit generation scalable

Defense does not automatically inherit the same gains

The real bottleneck is assurance

Why better training data is not enough

A different approach

The bet

Related articles

Announcing Autonomous Vulnerability Response: Closing the Loop at Machine Speed

Announcing Autonomous Vulnerability Response: Closing the Loop at Machine Speed

Glasswing's First Month: 10,000 Critical Vulnerabilities and a Preview of What's Coming

Glasswing's First Month: 10,000 Critical Vulnerabilities and a Preview of What's Coming

Announcing Cogent Zero Day Response and Autonomous Remediation: Vulnerability Management at the Speed of AI

Announcing Cogent Zero Day Response and Autonomous Remediation: Vulnerability Management at the Speed of AI

Announcing Autonomous Vulnerability Response: Closing the Loop at Machine Speed

Glasswing's First Month: 10,000 Critical Vulnerabilities and a Preview of What's Coming

See Cogent In Action

See Cogent In Action

See Cogent In Action