Blog

Research

More Vulnerabilities, Not More Security

AI has changed the economics of vulnerability discovery.

5 min read

Hiep Chu

Head of Research

Hiep Chu

Head of Research

AI has changed the economics of vulnerability discovery. Many companies have built their post-Mythos program around AI-enabled Static Application Security Testing (SAST) that scans code at scale, connects evidence across files, reasons through potential exploit paths, and turns them into candidate findings that existing tools often miss.

That is a meaningful advance, but discovery is not the same as defensive reliability. The harder questions are how findings translate into better security and what these AI-enabled SAST systems still miss: false positives can overwhelm security engineers and become counterproductive, while false negatives stay hidden until an attacker finds the path the tool didn’t. Therefore, AI security claims should be judged less by the number of vulnerabilities surfaced and more by the evidence behind the workflow: what was searched, what was validated, what was filtered, and where uncertainty remains.

The cost of false positives

False positives have always been part of static analysis because many security tools are tuned to avoid missing possible vulnerabilities. They may flag code paths or patterns that look unsafe even when exploitability has not been proven, which can produce unrealistic findings, theoretical exploit paths, or alerts about coding practices that are not actual vulnerabilities. For a post-Mythos program built around AI-enabled SAST, the output should be treated as a stream of claims to triage, not as proof that the code is secure. AI does not make the false-positive problem disappear; if anything, a model can make weak findings sound more convincing, which burns engineering time, slows triage, and erodes trust when teams are repeatedly asked to investigate issues they cannot reproduce.

The usual answer is exploitability validation. A useful system should move beyond “this code might be vulnerable” toward evidence that the issue is real, reachable, and reproducible in an environment close enough to production to matter. That is the right direction, but it is also where the easy story about faster discovery starts to break down: validation depends on realistic test conditions, and teams do not just need more findings, they need confidence in which findings deserve action.

False negatives are the blind spot that matters most

False positives are visible. False negatives are different: they are the vulnerabilities a system fails to surface at all. That is the harder problem for defense; a tool can find real vulnerabilities and still leave the path that matters open.

That is why successful discovery should not be treated as a security claim. “The system found several serious vulnerabilities” proves useful capability, but it does not prove defensive reliability. The stronger question is whether it searched the right surfaces, followed the right attack paths, and found the kinds of issues a motivated attacker would prioritize.

Some misses come from bounded resources. Every agent has finite context, time, compute, and search budget, so it has to choose where to look and when to stop. Better systems can reduce that risk by concentrating effort on exposed code, high-value assets, and paths with attacker leverage, but they cannot make the constraint disappear.

Other misses come from unknown unknowns: new techniques, surprising dependency behavior, or cross-system interactions that do not fit an existing template. Threat intelligence can shrink the window once a technique is known, but it cannot prove a system found every important path before an attacker does. That is why false negatives should anchor the conversation when consuming the output of the tools: the real risk is not the noisy finding teams can see, but the exploitable path they never knew was there.

Assurance has to ask what the system missed

The question for consumers should not stop at “what did the system find?” A finding proves that the system can surface some real issues, but it does not prove that the system searched the right places, prioritized the right attack paths, or covered the risk that matters most. The stronger assurance question is: what did it miss, and what evidence would reveal those misses?

That shifts the burden from impressive anecdotes to disciplined transparency. Providers should show how the system was tested, what benchmarks were used, what kinds of code and vulnerability classes were in scope, and how findings were validated or filtered before reaching the consumer. They should also report false-positive and false-negative behavior with enough context to make the numbers meaningful, rather than presenting a clean final report as proof of broad coverage.

Benchmarks can help, but they need careful handling. Public benchmarks are useful signals, not production guarantees. They rarely capture enterprise codebases with custom frameworks, unusual service boundaries, legacy assumptions, and undocumented business logic. They can also be contaminated or overfit: frontier models may have seen public tasks, vulnerable code samples, writeups, or solutions during training, and agents tuned against a fixed test set can learn the test instead of the underlying skill.

The better standard should be enough evidence for consumers to distinguish a capable discovery engine from a dependable security workflow: what was searched, what was validated, what was missed in controlled tests, and where uncertainty still remains. Without that evidence, a few impressive discoveries can easily be mistaken for broad defensive coverage.

The next standard is evidence, not confidence

AI vulnerability discovery is entering a more useful phase, and finding volume or novelty is becoming the wrong proxy for trust. The stronger systems will be able to explain the workflow behind their results: where they searched, how they validated findings, what they filtered out, and where the limits of the analysis remain.

That standard does not require perfect knowledge of every missed vulnerability, especially in enterprise codebases without a complete answer key. It does require a more honest claim about uncertainty. After Mythos, AI-generated security results need enough supporting evidence for defenders to know when to trust them, when to challenge them, and where the blind spots still are.

Related articles

View all articles

View all articles

B5o%oGkJ  aH  d5eKmRoE

See Cogent In Action

Schedule a personalized demo today to learn how Cogent can supercharge your vulnerability management program.

Book a demo

Book a demo

Free risk assessment

Free risk assessment

B4o9oBkM  aB  dCeBmVoE

See Cogent In Action

Schedule a personalized demo today to learn how Cogent can supercharge your vulnerability management program.

Book a demo

Book a demo

Free risk assessment

Free risk assessment

BEo7oRk8  a2  d#eSm6o2

See Cogent In Action

Schedule a personalized demo today to learn how Cogent can supercharge your vulnerability management program.

Book a demo

Book a demo

Free risk assessment

Free risk assessment