An SBOM is essential for any development, security, or compliance team that uses open-source or third-party components. It's especially critical in regulated industries or when working with government contracts.

What’s the difference between manual and automated SBOM generation?

Manual SBOM generation is labor-intensive and error-prone, involving the manual cataloging of components and their dependencies. Automated SBOMs use SCA tools to quickly and accurately scan the codebase, generate a component inventory, and integrate with CI/CD pipelines.

What are the risks of not having an SBOM?

Without an SBOM, teams lack visibility into third-party components. This increases the risk of undetected vulnerabilities and can lead to non-compliance with regulatory requirements.

Does an SBOM help with zero-day vulnerabilities?

Yes. An SBOM enables teams to quickly identify which components are affected by a zero-day vulnerability and respond faster by applying targeted patches.

How does an SBOM help with compliance?

An SBOM provides clear documentation of all software components and their dependencies, making it easier to pass audits and meet compliance requirements for regulations like GDPR, PCI DSS, and the U.S. Executive Order on Cybersecurity.

Why are SBOMs essential? What are its key benefits?

SBOMs offer critical benefits: 1) Complete component visibility, 2) Optimized software management, 3) Enhanced incident response, 4) Improved vulnerability assessment, 5) Faster security fixes, 6) Greater transparency and trust, 7) Regulatory compliance, and 8) Business continuity.

357 Crashes, 2 Bugs: What AFL++ Fuzzing Really Finds

Posted on: Mar 31, 2026
By Vinay Kumar Rasala
7 Mins Read
Last updated on: Mar 31, 2026

Reality check: why fuzzing crash counts is misleading

357 crash files. 2 real bug sites.

That’s the outcome of this AFL++ campaign after roughly 8.5 billion executions across multiple harnesses, binaries, and phases.

At first glance, everything looked like success. Crashes were increasing steadily. New inputs were being generated every few seconds. Coverage appeared to improve over time. From a surface-level perspective, the campaign looked productive.

Then triage began.

What initially appeared to be hundreds of distinct failures quickly collapsed into a much smaller set of root causes. Most crash files were not unique bugs. They were different execution paths converging on the same underlying issue.

This is a pattern anyone who has run fuzzing at scale will recognize.

Fuzzers are extremely good at generating volume. They are far less effective at producing clarity.

Fuzzing generates duplicate crashes because AFL++ explores execution paths, not unique bugs.
Crash count reflects exploration effort, not the number of vulnerabilities.

The difficulty in fuzzing is not in triggering failures. It is in understanding what those failures actually represent.

Key takeaways

Fuzzing generates scale, but not clarity.

AFL++ can produce thousands of crashes, but most map back to a small number of root causes
High crash volume reflects path exploration, not the number of unique vulnerabilities
The real challenge in fuzzing is not discovery—it is triage and validation
Structured campaigns (coverage → throughput → depth) are required to uncover meaningful issues

The value of fuzzing is not in how many crashes you collect. It’s in how effectively you reduce them to actionable bugs.

What does a real AFL++ fuzzing campaign look like?

This campaign was not a singular run. It was structured deliberately across four phases, each designed to answer a specific question:

Are we hitting real code paths?
Are we expanding coverage meaningfully?
Are we generating enough execution volume?
Are we reaching deeper, failure-prone logic?

Each phase introduced a controlled change, either in harness design, execution model, or input strategy, to move from validation to coverage to throughput to depth.

Across all phases, the campaign accumulated billions of executions, thousands of generated inputs, and over a thousand crash files.

After systematic triage, this was reduced to just two unique crash sites.

That collapse, from thousands of signals to a handful of actionable findings, is the actual output of a well-run fuzzing pipeline.

Campaign structure and outcomes

Phase	Objective	Method	Execution Setup	Data Produced	Key Findings	Why It Mattered
Phase 1: CLI Validation (bsdtar)	Validate toolchain and seed effectiveness	Direct CLI fuzzing using the bsdtar binary	Multi-instance AFL++ with dictionary, CmpLog, and sanitizer witnesses	64 queue inputs → minimized to 42	Seeds exercised real parsing paths; toolchain stable	Established a reliable starting point and initial corpus
Phase 2: API Harness (Coverage Phase)	Expand reachable code paths	Custom harnesses for archive_read_* and archive_write_* APIs	Persistent mode harnesses, shared memory input, and multi-instance fuzzing	Corpus grew from 42 → 1,059 inputs; ~1.4M executions	Significant coverage expansion, no crashes	Built a high-quality corpus and mapped parser behavior
Phase 3: Throughput Phase (write_fast)	Maximize execution rate on known paths	Reduced API surface per iteration to increase speed	Persistent mode, optimized harness, 5 instances, CmpLog + ASAN witness	~5.2 billion executions; ~270 crashes	High crash volume but mostly duplicates; coverage plateaued	Demonstrated that throughput increases volume, not necessarily new bugs
Phase 4: Comprehensive Harness (Depth Phase)	Explore deeper, complex structures	Extended harness to include metadata traversal (ACLs, xattrs, sparse entries)	Persistent mode with reduced loop size, higher timeout, and no memory cap	~3.3 billion executions; 896 crashes; 2,414 hangs	New crash patterns from deeper parser logic; sparse-entry bugs identified	Revealed failure modes not reachable through earlier phases
Final: Triage & Deduplication	Identify unique bugs	ASAN repro + CASR clustering	Aggregated crash sets across all phases	~1,166 → 357 → 2 unique bugs	Two null dereferences in archive_entry_sparse.c	Converted fuzzing noise into actionable findings

What changed across phases (and why it mattered)

Each phase wasn’t just more fuzzing. It was a controlled shift in strategy.

In the early stages, the focus was on coverage: ensuring inputs reached meaningful code and expanding the corpus.

Once coverage stabilized, the focus shifted to throughput, maximizing executions per second, and stressing already discovered paths.

This introduced a limitation: Increasing speed did not increase discovery; it increased duplication.

The final phase addressed this by shifting toward depth by targeting complex, state-heavy structures and exercising code paths that were previously unreachable.

This is where new bug classes emerged.

Phase evolution: coverage vs throughput vs depth

Dimension	Phase 1	Phase 2	Phase 3	Phase 4
Focus	Validation	Coverage	Throughput	Depth
Execution model	CLI	Persistent	Optimized persistent	Heavy persistent
Corpus growth	Low	High	Stable	Moderate
Throughput	Low	Medium	Very high	High
Crash volume	None	None	High	Very high
Unique findings	None	None	Low	High

Why fuzzing generates hundreds of crashes but few real bugs

To understand this behavior, it’s important to look at how AFL++ operates in practice.

AFL++ is designed to maximize coverage and execution path discovery. When a crash condition is found, the fuzzer continues mutating inputs around that condition, producing multiple variations that reach the same failure point.

This leads to:

Multiple inputs triggering the same bug
Different execution paths converging on identical faults
Duplication across parallel fuzzing instances

The result is a large number of crash files representing a very small number of underlying issues. Raw crash counts reflect exploration, not unique vulnerabilities.

Crash volume vs actual bugs

Metric	What it represents
Crash files	Execution paths triggering failure
Unique crashes (AFL++)	Coverage-based uniqueness
CASR clusters	Stack-level uniqueness
Root causes	Actual bugs

Why libarchive is a high-value fuzzing target

libarchive is a parsing engine for multiple archive formats, including tar, zip, cpio, ISO, and RAR. These formats are inherently attacker-controlled, making them ideal candidates for fuzzing.

Any system that processes archives, whether through file uploads, CI pipelines, or package ingestion, relies on libraries like libarchive. This places them directly in the path of untrusted input.

The combination of complex parsing logic and real-world exposure makes libarchive a high-signal fuzzing target.

Why libarchive works well for fuzzing

Property	Impact
Multiple formats	Broader attack surface
Complex parsing logic	Higher bug density
Attacker-controlled input	Real-world exploitability
Clean API	Easier harness design

The build matrix: balancing throughput and detection

The effectiveness of fuzzing is heavily influenced by how the target is built.

Using a single binary forces a tradeoff between speed and visibility. This campaign avoided that by using a build matrix in which each binary served a specific purpose.

Native builds maximized throughput, while sanitizer builds (ASAN, MSAN, UBSAN) provided visibility into memory and correctness issues. CmpLog enabled deeper exploration by solving comparison barriers.

Build matrix roles

Binary type	Purpose
Native (LTO)	High-speed fuzzing
ASAN	Memory error detection
MSAN	Uninitialized memory detection
UBSAN	Undefined behavior detection
CmpLog	Deeper path exploration

Throughput vs detection: why both matter

Sanitizers improve detection but reduce execution speed. Running all fuzzing instances with sanitizers enabled limits overall coverage.

This campaign separated concerns:

Native binaries handled execution
ASAN acted as a validation layer

This allowed high throughput without sacrificing detection capability.

Balancing speed and visibility

Approach	Result
Native only	Fast but limited visibility
ASAN only	Accurate but slow
Hybrid	Balanced

Persistent mode: scaling execution efficiently

The most significant performance gain in this campaign came from switching to persistent mode.

Instead of launching a new process for each input, the harness processes multiple inputs within a single execution loop. This removes process creation overhead and dramatically increases execution speed.

In practice, this resulted in:

5x to 20x improvements in throughput
More efficient CPU utilization
Higher mutation rates per second

This shift is critical for moving from exploratory fuzzing to high-volume testing.

Persistent mode impact

Mode	Execution model	Performance
Fork-per-input	New process per input	Low
Persistent	Loop-based execution	High

Corpus strategy and fuzzing efficiency

Fuzzing does not begin from zero. It begins from a set of seed inputs.

In this campaign, the initial corpus consisted of 29 archive samples representing different formats. Over time, this corpus expanded significantly through AFL++’s queue.

However, not all inputs are equally valuable.

Tools like afl-cmin help reduce redundancy by removing inputs that do not contribute new coverage. This ensures that the fuzzer operates on a high-quality dataset.

Dictionaries further accelerate discovery by injecting format-specific tokens into mutations. Without them, AFL++ must rely on random chance to discover format boundaries.

Corpus evolution

Stage	Input count	Role
Initial seeds	29	Starting point
After minimization	42	Efficient corpus
After Phase 2	1,059	Expanded coverage
Final corpus	36,310	Full exploration

Crash triage: how 357 crashes became 2 bugs

After all phases were complete, the campaign produced approximately 1,166 crash files across multiple instances.

At this stage, raw output is not useful. The goal is to determine how many unique issues exist.

The triage pipeline consisted of:

Replaying crashes with ASAN
Deduplicating based on reproducibility
Clustering using CASR

CASR groups crashes by stack trace similarity, providing a more accurate measure of uniqueness than coverage-based heuristics.

The triage funnel

Stage	Count	Meaning
Raw crashes	~1,166	Overcounted
Reproducible	357	Valid inputs
Unique bugs	2	Root causes

“Fuzzing doesn’t fail because it finds too few crashes. It fails when teams mistake crash volume for actual risk.”

Abhinav Vasisth, Head of Security, Appknox.

Why raw crash counts are misleading

Crash counts are often used as a proxy for success in fuzzing campaigns. This is a mistake.

A high crash count indicates:

high mutation activity
broad path exploration

It does not indicate:

number of unique bugs
exploitability
real-world impact

This campaign demonstrates that even hundreds of crashes can map to a very small number of root causes.

Where fuzzing fits, and where it doesn’t

Fuzzing is highly effective at identifying failure points in software. It excels at uncovering parsing issues and memory safety bugs.

However, it does not answer:

whether a crash is exploitable
how it behaves in production
whether it represents real-world risk

It shows where systems break, but not how that breakage translates into impact.

Final takeaway: fuzzing is a pipeline, not an outcome

This campaign did not succeed because it generated a large number of crashes, but it followed a structured pipeline that turned high-volume execution into low-noise insight.

Across four phases, the work moved deliberately from validation to coverage to throughput to depth. Each phase addressed a different limitation, and together they created a complete picture of system behavior.

If the campaign had stopped at throughput, the results would have been misleading. Only by extending into deeper structures and performing disciplined triage did meaningful findings emerge.

Two bugs, hidden behind hundreds of duplicate signals.

This is the reality of fuzzing at scale.

Crash generation is not the outcome. It is the starting point.

What matters is how effectively those signals are reduced into actionable insights, and how those insights are validated in real-world conditions.

Frequently Asked Questions

What does fuzzing actually find?

Fuzzing identifies inputs that cause a program to behave unexpectedly, including crashes, hangs, and edge-case failures. However, these results often represent multiple paths to the same underlying issue rather than distinct bugs.

Why does fuzzing generate so many duplicate crashes?

Fuzzers like AFL++ are designed to explore execution paths. When a crash condition is discovered, the fuzzer continues mutating inputs around that condition, producing multiple variations that trigger the same root cause.

Why is fuzzing triage difficult?

Fuzzing produces high volumes of crash data without context. Multiple inputs can trigger the same bug through different paths, making it difficult to distinguish unique vulnerabilities from duplicates without systematic triage.

What is crash deduplication in fuzzing?

Crash deduplication is the process of grouping crash inputs based on shared root causes. Tools like CASR use stack trace similarity to cluster crashes, helping teams identify unique bugs instead of counting path-level variations.

Why is crash count misleading in fuzzing?

Crash count reflects how many inputs triggered failures, not how many unique bugs exist. A single vulnerability can produce hundreds of crash files, especially in parallel fuzzing environments.

What happens after fuzzing finds crashes?

After fuzzing identifies crashes, teams must:

reproduce them reliably
deduplicate similar cases
analyze root causes
assess exploitability

This process determines which findings are meaningful and worth fixing.

Does fuzzing find exploitable vulnerabilities?

Not always. Fuzzing identifies failure points, but it does not determine exploitability or real-world impact. Additional analysis is required to understand whether a crash represents a security risk.

When should you stop fuzzing?

Fuzzing typically reaches diminishing returns when:

Coverage stabilizes
New crashes are mostly duplicates
No new code paths are being explored

At this point, further effort should shift toward triage and analysis.

What fuzzing cannot detect?

Fuzzing is limited in detecting:

Logic flaws
Authentication issues
Authorization bypasses
Complex multi-step vulnerabilities

It is most effective for uncovering memory safety issues and parsing errors.

Vinay Kumar Rasala

Vinay Kumar Rasala serves as a security research associate at Appknox, a leading security suite for automating mobile security in enterprises. He specializes in ethical hacking and penetration testing and has actively collaborated with numerous enterprises, strengthening their APIs and mobile and web apps against cyber threats.
Vinay is passionate about exploring new technologies, mainly iOS tweaks, reverse engineering, and programming. In his free time, he enjoys playing open-world games and experimenting with cooking.

4 Phases, 357 Crashes, 2 Bugs: What AFL++ Campaign Actually Looks Like