menu
close_24px

BLOG

8.5 Billion Executions. 2 Real Bugs. Here’s Why.

Learn how to run AFL++ fuzzing at scale, design harnesses, optimize coverage vs throughput, and reduce thousands of crashes into real vulnerabilities using CASR.
  • Posted on: Apr 23, 2026
  • By Vinay Kumar Rasala
  • Read time 5 Mins Read
  • Last updated on: Apr 23, 2026

AFL++ at Scale: Why crash volume doesn’t equal vulnerabilities

 

357 crash files. 2 actual bugs.

That is not a failure of fuzzing. It is a failure of interpretation.

In a recent AFL++ fuzzing campaign targeting libarchive, we ran approximately 8.5 billion executions across all fuzzing phases, generated over a thousand crash files, and ultimately reduced them to two unique crash sites through structured crash triage and deduplication.

This blog is a practical, engineering-first guide to that process:

  • How to design a multi-phase fuzzing workflow
  • How to build the right AFL++ instrumentation matrix
  • How to optimize coverage versus throughput
  • How to implement fuzzing crash triage at scale
  • How to move from crash volume to real vulnerabilities

If your fuzzing pipeline stops at crash counts, you are not measuring security. You are measuring noise.

Why AFL++ fuzzing produces high crash counts (and why they mislead)

Modern fuzzers like AFL++ are extremely good at generating output. That output, however, is not directly equivalent to vulnerabilities.

  • A single bug can be triggered via hundreds of input paths
  • Each path is logged as a separate crash
  • Parallel fuzzing instances amplify duplication

This is why:

  • Crash count does not equal vulnerability count
  • Coverage does not equal risk depth
  • Execution speed does not equal meaningful discovery

If you do not implement fuzzing deduplication, and root cause clustering, your results will always be inflated.

Target selection: Why libarchive works for fuzzing

libarchive is an ideal fuzzing target because:

  • It parses attacker-controlled archive inputs
  • It supports multiple formats such as tar, zip, cpio, ISO, and RAR
  • It is written in C with complex parsing logic
  • It is widely deployed in production systems

This creates a realistic attack surface where malformed inputs can trigger memory safety issues, null dereferences, parser inconsistencies, and denial-of-service conditions.

AFL++ build matrix: The most common fuzzing mistake

Before running AFL++, the most critical step is building the correct binaries.

Minimum viable AFL++ setup:

Binary

Purpose

Native (LTO)

Maximum throughput

ASAN

Memory error detection

CmpLog

Unlocks comparison-based paths

Why LTO matters

LTO instrumentation provides full-program visibility, collision-free edge coverage, and automatic dictionary extraction.

Why not run only ASAN

ASAN introduces approximately two times the runtime overhead. Running all instances with ASAN reduces total executions and limits discovery.

Correct pattern

afl-fuzz -M main ... -w ./target_asan -- ./target_native

The native binary delivers speed. The ASAN binary validates memory safety.

Why CmpLog is critical

Many formats rely on strict comparisons and magic bytes. CmpLog allows AFL++ to observe runtime comparisons, extract operand values, and inject them into mutations. This significantly improves path discovery in structured formats.

Phase 1: CLI validation

Start simple.

Instead of building harnesses immediately, fuzz the CLI target:

afl-fuzz -i afl_inp -o afl_out/ -t 1000 -M FUZZ01_LTO \
-- ./lto_build/bin/bsdtar -xf @@ -C /tmp/out_bsdtar

 

Goal

 
  • Validate the toolchain
  • Ensure seeds hit real code paths
  • Build the initial corpus

Outcome

 
  • Seeds: 29 to 64, minimized to 42
  • Runtime: approximately 7 hours

Key takeaway

Do not skip validation. Broken setups scale poorly.

Phase 2: Persistent mode in AFL++ (eliminating fork overhead)

The most important performance improvement in AFL++ is persistent mode.

Why it matters

 
  • Eliminates fork and execution overhead
  • Uses shared memory input
  • Improves throughput by five to twenty times

Minimal persistent loop

while (__AFL_LOOP(10000)) {
archive_read_open_memory(a, buf, len);
archive_read_next_header(...);
}

 

Engineering rules

 
  • Always free state per iteration
  • Limit loop iterations to avoid state drift
  • Place initialization after expensive setup

Results

 
  • Approximately 394 executions per second per instance
  • Corpus: 42 to 1,059 inputs
  • Crashes: 0

 

Parallelization strategy: Avoiding redundant mutation work

Once persistent mode removes execution overhead, the next bottleneck is how effectively multiple fuzzing instances explore the input space.

The key secondary-side choice is the power schedule (-p), and the rule is simple: don’t give every secondary the same one.

If all instances run identical schedules, they quickly converge on similar mutations, leading to redundant work and poor CPU utilization. Mixed schedules ensure each instance explores a different region of the search space.

 

Recommended schedule distribution

 
  • -p explore → pushes toward new, unexplored coverage paths
  • -p exploit → focuses on inputs already near interesting states
  • -p rare → prioritizes rarely-hit edges, effective for corner-case discovery

This diversity ensures that parallel fuzzers are complementary rather than duplicative.

MOpt integration (targeted, not universal)

One secondary instance should run with -L 0 to enable MOpt (Mutation Operator Optimization).

MOpt uses a particle swarm optimization model to:

  • Track which mutation operators produce new coverage
  • Dynamically adjust mutation probabilities toward effective strategies

It performs best as a single adaptive instance within a heterogeneous setup, not as a replacement for all fuzzers.

Key takeaway

Persistent mode unlocks throughput.

The parallel strategy determines whether that throughput translates into meaningful coverage growth or wasted cycles.

This phase builds coverage, not bugs.

Phase 3: Throughput optimization in AFL++ (maximizing exec/sec)

Once coverage stabilizes, shift the objective.

Strategy change

 
  • Reduce per-iteration work
  • Focus on high-probability crash paths
  • Reuse the Phase 2 corpus

Results

 
  • 5.2 billion executions
  • Approximately 7,400 executions per second
  • 270 crashes

Insight

Coverage plateaued while crash volume increased. This indicates duplication rather than new bug discovery.

Key takeaway

More speed increases crash volume, not necessarily the number of new vulnerabilities.

Phase 4: Expanding fuzzing coverage to deeper parser surfaces

To find new bugs, you must change the surface being tested.

New surfaces explored

 
  • ACL iterators
  • Sparse region traversal
  • Metadata structures
  • Nested object graphs

Why this matters

These areas introduce pointer-linked structures, count mismatches, and deep iteration paths.

Malformed inputs here trigger:

  • Null dereferences
  • Unbounded iteration
  • Arithmetic inconsistencies in linked regions.

Execution strategy: Power schedule diversification

Expanding the surface alone is insufficient. Without a diversified mutation strategy, parallel fuzzers converge on similar paths and waste cycles.

Phase 4 introduces explicit power-schedule separation across secondaries, ensuring that each instance explores a distinct region of the input space.

Instance strategy

  • -p explore → drives discovery of new coverage paths
  • -p exploit → intensifies mutations near known interesting inputs
  • -L 0 (MOpt) → dynamically optimizes mutation operators based on observed effectiveness
  • -l 2 (laf-intel) → rewrites multi-byte comparisons into byte-wise checks, improving constraint solvability
  • -c (CmpLog) → captures runtime comparisons to guide input mutation

Why this matters

Same harness + same schedule = redundant work

Same harness + different schedules = parallel exploration

Each instance:

  • Mutates inputs differently
  • Prioritizes different execution paths
  • Contributes non-overlapping coverage

This is what allows deeper surfaces to actually introduce new bugs, rather than duplicating earlier crash classes.

Results

 
  • Coverage increased from approximately 8.6k to 9.7k edges
  • Crashes: 896
  • Hangs: 2,414

Interpretation

New crash classes emerged from deeper parser logic, not increased iteration volume.

The sharp increase in hangs reflects:

  • Traversal of nested iterator paths
  • Quadratic behavior in malformed structures

Critically, these were new execution regions, not extensions of earlier write-path bugs.

Key takeaway

New bugs come from:

New surfaces × Diverse mutation strategies

Not more iterations.

Surface expansion without schedule diversity produces redundancy, not discovery.

 

Corpus evolution across the fuzzing workflow

Stage

Files

Initial seeds

29

Post CLI

42

Post Phase 2

1,059

Post Phase 4

6,779

Final merged

36,310

Raw corpus growth is exponential. Unique coverage is not.

Crash funnel: From noise to signal

Crash funnel From noise to signal

This funnel represents the most important concept in fuzzing at scale. Large volumes of crashes collapse into a very small number of real issues.

Crash triage: From 1,166 crashes to 2 bugs

Fuzzing produces crashing inputs. It does not directly produce vulnerabilities.

Triage pipeline

 
  1. Reproduce crashes using ASAN
  2. Deduplicate AFL++ outputs
  3. Generate CASR reports
  4. Cluster by stack trace similarity

Results

Stage

Count

Raw crashes

1,166 (approx)

Reproducible

357

Unique crash sites

2

Root cause

Both bugs were located in:

archive_entry_sparse.c

Operator insight

If a fuzzing campaign produces hundreds of crashes, more than ninety percent are typically duplicates.

Key takeaway

Crash triage is where fuzzing becomes engineering.

Minimal AFL++ setup

To replicate this workflow, start with:

  • One native binary with LTO
  • One ASAN binary
  • Persistent mode harness
  • CmpLog enabled
  • One master and two secondary instances

Expand only after stability is confirmed.

Common mistakes in fuzzing pipelines

 
  • Running only ASAN builds
  • Skipping persistent mode
  • Ignoring CmpLog
  • Treating crash count as vulnerability count
  • Not implementing a triage pipeline

What this means for modern AppSec pipelines

This is not just a fuzzing problem. It reflects a broader failure in application security pipelines:

  • Too much output
  • Too little interpretation
  • Weak connection to real risk

Security tools identify where systems break. Engineering determines what actually matters.

This is the shift toward execution-aware security:

  • Focus on runtime behavior
  • Collapse duplicate findings
  • Prioritize root causes

From noise to signal

Fuzzing produces noise. Engineering produces signal.

The difference is everything.

A well-run fuzzing workflow should:

  • Generate large volumes of data
  • Collapse that data aggressively
  • Produce a small, actionable set of bugs

If your pipeline ends at 357 crashes, it is incomplete.
If it ends at 2 root causes, it is useful.

FAQs

 

What is AFL++ fuzzing?

AFL++ is a coverage-guided fuzzing framework used to discover vulnerabilities by mutating inputs and observing program behavior.

Why do fuzzers generate many crashes?

Fuzzers generate many crashes because the same bug can be triggered through multiple execution paths.

How do you deduplicate fuzzing crashes?

You can deduplicate fuzzing crashes by using clustering tools such as CASR, which group crashes based on stack traces.

What is persistent mode in AFL++?

A persistent mode in AFL++ is one in which the target runs in a loop, avoiding process restarts and improving performance.

Why is the crash count misleading?

The crash count is misleading because it reflects detection frequency rather than unique vulnerabilities.