menu
close_24px

BLOG

Gemini XSS Vulnerability: When AI Executes Malicious Code

Explore common Android component and configuration vulnerabilities, how they’re exploited in real apps, and how to prevent exposure across the app lifecycle.
  • Posted on: Mar 31, 2026
  • By Jeel Patel
  • Read time 3 Mins Read
  • Last updated on: Mar 31, 2026

Artificial intelligence is no longer just generating text. It generates and executes code in real time.

With tools like Google Gemini, features such as code canvases and live previews are turning AI systems into interactive execution environments. This shift introduces a new and rapidly growing category of risk: AI security vulnerabilities tied to real-time code execution.

In recent Appknox research, our security team uncovered a Gemini XSS vulnerability, a critical example of how AI code execution vulnerabilities can emerge when generated output is treated as trusted.

If you or your teams are already using AI code generators in development workflows, this risk is not theoretical. It is already present in your environment.

Recent industry data suggests AI-generated code can contain up to 2.7x more vulnerabilities than human-written code, reinforcing how quickly this risk is scaling across development environments.

Key takeaways

AI code execution is changing how application security works and where it breaks.

  • AI tools like Google Gemini don’t just generate code. They render and execute it instantly, often without validation.
  • This creates a new class of risk: zero-click code execution, where malicious payloads run without user interaction.
  • The Gemini XSS vulnerability shows how input validation gaps, auto-execution, and weak sandboxing can combine to create a real exploit.
  • Unlike traditional XSS, the risk arises when system-generated output is treated as trusted input.
  • Most AppSec tools are not designed for this shift as they focus on code before release, not behavior during execution.

The key shift is this: you’re no longer just securing code. You’re securing what gets executed automatically.

What teams think is happening vs what actually happens in AI workflows

As AI coding tools like Gemini become part of everyday development workflows, most teams approach them with a familiar mental model.

They assume:

  • AI-generated code is similar to developer-written code
  • Outputs will be reviewed before being used
  • Execution happens in controlled environments

This assumption made sense in traditional development.

But in practice, AI systems don’t follow that model.

With features like code canvases and live previews, AI tools are no longer just generating code, but are rendering and executing it immediately. The gap between generation and execution has effectively disappeared.

This creates a subtle but critical shift of AI-generated output not being just code, but an untrusted input that can execute in real time.

That distinction is where risk begins because when output is treated as trusted and execution happens automatically, validation is no longer a checkpoint but an afterthought.

And in many cases, it doesn’t happen at all.

The new risk: AI-generated code that executes automatically

One of the most important questions security teams are asking today: Can AI tools execute malicious code?

Yes. And increasingly, they already do. Gemini’s code canvas allows users to:

  • Generate HTML, CSS, and JavaScript
  • Edit it dynamically
  • Instantly render output in a preview panel

This creates a powerful experience. But it also introduces a high-risk condition - the system is executing untrusted, AI-generated code in real time. This is how AI code execution vulnerabilities start.

Most organizations are not prepared for this shift. Traditional security models assume:

  • Code is reviewed before execution
  • Execution environments are controlled

AI breaks both assumptions.

What vulnerability was discovered in Gemini?

During testing, we identified a Google Gemini vulnerability classified as a Gemini XSS vulnerability, enabling zero-click AI code execution.

At its core, the issue was simple but critical:

  • Malicious HTML and JavaScript could be injected
  • The system automatically rendered the code
  • Execution occurred without user interaction

Unlike traditional XSS, this vulnerability was driven by automatic execution within the AI workflow, making exploitation more reliable and less dependent on user behavior.

How this Gemini XSS vulnerability works

The execution flow highlights where control breaks down:

  1. A user inputs or pastes code into Gemini’s canvas.
  2. The platform automatically triggers rendering. No explicit run required.
  3. Input validation partially fails, but rendering continues.
  4. A crafted payload bypasses sanitization.
  5. The browser executes the payload inside the preview environment.
  6. The attacker triggers actions such as redirection or code execution.

The core issue is that AI-generated output moves directly from input to execution, without meaningful validation or control.

Why this exploit worked

At a surface level, this appears to be a standard input validation issue. In practice, it is a mismatch between how different layers interpret and enforce security controls.

The exploit succeeded due to a combination of factors:

1. Parsing context shift using SVG

When content is wrapped inside an <svg> tag, the browser switches to a different parsing mode. Many sanitization mechanisms are designed for standard HTML and fail to properly analyze nested or XML-like structures.

This allowed the payload to bypass filters that were not recursively validating SVG content.

2. Alternative execution vector via xlink:href

Instead of using common <script src=...> patterns, the payload leveraged legacy attributes like xlink:href, which are valid within SVG contexts but often overlooked by sanitization logic.

As a result, the system failed to recognize this as an executable vector.

3. Entity-based obfuscation

Encoded values such as data&colon; were used to evade detection. The sanitization layer evaluated the encoded string, while the browser decoded it during execution.

This created a gap between what was validated and what was executed.

4. Over-permissive sandbox configuration

The preview environment relied on iframe-based isolation, but sandbox restrictions were not strict enough. Permissions like pop-up handling allowed the payload to escape intended boundaries using functions like window.open().

The outcome was not a single failure, but a chain of small gaps across parsing, validation, and sandboxing.

This is a common pattern in modern vulnerabilities where risk emerges when multiple layers enforce security inconsistently.

What broke in the security model

To understand the broader implications, it’s important to look at what the system was expected to do versus what actually happened.

Expected secure design:

  • Strict input sanitization
  • Fully restricted sandbox execution
  • Explicit control over when code runs

What actually happened:

  • Sanitization failed for non-standard payloads (SVG, encoded inputs)
  • Code execution was triggered automatically
  • Sandbox controls allowed limited escape paths

This created a zero-click execution path within a trusted environment.

How this differs from traditional XSS vulnerabilities

At first glance, this vulnerability resembles a typical cross-site scripting issue. But in practice, it introduces a fundamentally different risk model.

In traditional XSS:

  • Execution usually requires user interaction (clicking a link, loading a page)
  • The attack depends on injecting payloads into application inputs
  • The application renders user-controlled data unsafely.

In AI-driven environments like Gemini:

  • Code is generated by the system itself, not directly supplied by an attacker
  • Execution can happen automatically, without user intent
  • The boundary between input and output becomes blurred

This creates a new class of vulnerabilities: Zero-click execution through system-generated content

The risk is not just injection but also the automatic execution of untrusted output within trusted workflows.

This is where AI changes the threat model. The system is no longer just processing input. It is actively generating and executing it.

The exploit: from injection to AI code execution

The exploit leveraged a mismatch in system behavior:

  • JavaScript validation failed
  • HTML rendering still proceeded

This gap enabled a payload that:

  • Bypassed sanitization using SVG parsing
  • Used legacy attributes like xlink:href
  • Triggered execution via encoded payloads

This creates a working AI code-execution vulnerability within Gemini. This is not just XSS. It is the AI workflow executing malicious code.

Why this matters: security risks in AI code generators

This vulnerability is not an isolated issue. It reflects a broader shift in how AI systems behave. AI tools are no longer passive. They generate code, render it, and execute it. That fundamentally changes the risk model.

At the same time, AI output is often treated as trusted. In reality, it is an untrusted input. This gap is at the core of many AI security vulnerabilities.

Automatic execution makes this worse. When code is rendered and executed without friction, exploitation becomes significantly easier and more reliable.

Why this is not unique to Gemini

While this vulnerability was identified in Gemini, the underlying pattern is not specific to a single platform.

Any system that:

  • Generates code dynamically
  • Renders it in real time
  • Executes it without strict validation

is exposed to similar risks.

This includes:

  • AI-powered code editors and assistants
  • Browser-based development environments
  • Low-code and no-code platforms with preview features

In enterprise environments, these tools are increasingly integrated into development workflows. That means the attack surface is not limited to one application. It extends across the entire toolchain.

The vulnerability is not in Gemini alone. It is in how modern systems are designed to prioritize speed and usability over controlled execution.

Real-world enterprise attack scenario

Consider a realistic enterprise scenario:

  • A developer copies a UI snippet generated via Gemini
  • The snippet contains a hidden payload
  • The code is pasted into a tool or environment
  • The system auto-renders it
  • A malicious script executes instantly

This could result in:

  • Credential phishing
  • Session hijacking
  • Malicious redirects

If your organization is integrating AI into development workflows, this is a direct enterprise risk.

In a recent supply chain attack, a widely used LLM tool (LiteLLM) was compromised, exposing hundreds of thousands of systems and enabling credential theft, API key exfiltration, and infrastructure access.

Most security programs today do not see how AI-generated code behaves at runtime.

Why AI-generated code is a security risk

AI-generated code poses risks due to how it is created and used.

Scale

AI can generate large volumes of code instantly. That makes it difficult to review and validate every output.

Opacity

Developers do not always fully validate what AI produces. The output is often treated as trusted, even when it should not be.

Execution

Modern tools render and execute code immediately. When generation and execution happen together, the window for validation disappears.

Together, these create a high-risk environment for code-execution vulnerabilities.

Why traditional AppSec models fail in AI-driven environments

Most application security programs are designed around a predictable lifecycle.

Code is written, reviewed, tested, and then executed. Security tools are aligned to this flow, focusing on identifying vulnerabilities before release.

AI disrupts this model at multiple levels.

In AI-driven environments:

  • Code is generated dynamically
  • Validation is inconsistent or skipped
  • Execution happens instantly, often without explicit intent

This breaks the traditional checkpoints where security controls are applied.

What used to be validated during development or testing now moves directly into execution.

The question is no longer just what gets built. It’s what gets executed without validation.

This is where many existing tools fall short.

They are designed to analyze static code or pre-release artifacts. But vulnerabilities like the Gemini XSS issue don’t originate from static code. They emerge from execution behavior in dynamic environments.

In practice, this creates a visibility gap:

  • Teams cannot see how AI-generated code behaves at runtime
  • Security controls are applied too late in the workflow
  • Risk accumulates in places that were never designed to be monitored

This is why AI-driven development is not just a tooling shift. It is a fundamental change in the application security model.

Reframing the problem: from code security to execution control

Most security discussions around AI focus on the code itself.

  • Is the generated code secure?
  • Does it follow best practices?
  • Can it be trusted?

But as this vulnerability demonstrates, that’s not the right starting point.

The more relevant question is not whether the code is secure but rather, “What happens when this code executes automatically?”

Because in AI-driven workflows:

  • Code is generated without full visibility
  • Execution can happen instantly
  • Validation is often incomplete or bypassed

This shifts the problem from static analysis to execution control.

You are no longer just securing code. You are securing how code behaves when it is generated, rendered, and executed in real time.

And this is where the Gemini vulnerability becomes more than a one-off issue.

It is a signal of a broader pattern: AI systems are collapsing the boundaries between input, processing, and execution.

When those boundaries disappear, traditional assumptions about trust, validation, and control no longer apply.

How to secure against AI security vulnerabilities

Securing AI-driven systems requires a shift in how applications are tested and controlled.

Treat AI output as untrusted

AI-generated output should be validated the same way as external input.

Strengthen sanitization

Execution vectors need to be removed, including encoded and obfuscated inputs.

Lock down execution environments

Restrict popups, scripts, and any paths that allow sandbox escape.

Eliminate automatic execution

Introduce explicit run controls so code is not executed without validation.

AI security is now an application security problem

This Gemini vulnerability is not a one-off.

AI is now part of how applications are built and executed. The same risks apply, but in a more dynamic and less controlled environment.

The attack surface is growing. Code is being generated and executed automatically. And traditional trust boundaries are breaking.

Recent issues in frameworks like LangChain show how this plays out, with risks such as data exfiltration, API key leakage, and unintended access to system resources

What this means for application security

Most traditional application security tools are not designed for this new reality. They are built around the assumption that code is written, reviewed, and tested before execution. As a result, they focus on source code scanning, periodic assessments, and compliance-driven checks.

That model breaks down in AI-driven environments. In tools like Gemini, code is generated dynamically, rendered instantly, and executed in real time. The vulnerability uncovered here is not a flaw in static code. It is a failure in how execution behavior is handled.

This is why many traditional approaches will miss this entire class of AI code execution vulnerabilities.

What needs to be tested is not just code, but how the application behaves at runtime, including how untrusted inputs are processed and how execution environments are controlled.

This is where platforms focused on runtime application behavior, like Appknox, become relevant. They help identify vulnerabilities such as XSS, API exposure, and business-logic risks that AI systems increasingly encounter.

If your developers are using AI coding tools today, you likely have unvalidated code executing in environments your security team does not fully control.

AI is changing how software is built and executed. It is also changing how it is attacked.

This Gemini vulnerability is not just a bug. It is a signal. You are not just securing code. You are securing how that code executes.

If AI systems are executing untrusted code in real time, the next question is not just how to detect vulnerabilities, but how to evaluate whether your current AppSec approach can handle this shift.

Check out: Appknox’s Mobile AppSec Evaluation Guide

FAQs

 

What is a Gemini XSS vulnerability?

A Gemini XSS vulnerability is a type of AI security vulnerability where malicious code executes automatically within an AI-generated preview environment due to insufficient input validation and execution controls.

What vulnerability was discovered in Gemini?

A Gemini XSS vulnerability allowed malicious code to execute automatically due to insufficient validation and sandbox restrictions.

How does this XSS vulnerability work?

It exploits gaps in sanitization and automatic rendering, allowing malicious payloads to execute during preview.

Can AI tools execute malicious code?

Yes. AI tools that render generated code without strict validation can introduce AI code execution vulnerabilities.

Why is AI-generated code a security risk?

Because it is often treated as trusted output, while in reality, it is untrusted input that can execute in real environments.

What are the security risks in AI code generators?

AI code generators introduce several security risks:

  • Code execution vulnerabilities when the generated code is run without validation
  • Cross-site scripting through unsafe rendering of AI-generated output
  • API exposure due to insecure integrations and token handling
  • Trust issues when AI-generated code is treated as safe by default
  • Automated exploitation due to zero-click execution in preview environments

Can Google Gemini execute malicious code?

Yes. AI tools like Gemini can execute malicious code when the generated output is rendered and executed without strict validation or sandbox restrictions. This creates a class of vulnerabilities where untrusted code runs automatically within preview environments.

What is an AI code execution vulnerability?

An AI code execution vulnerability occurs when AI-generated output is treated as trusted and executed without proper validation. This allows malicious payloads to run within the application environment, often without user interaction.

What is a zero-click XSS vulnerability in AI systems?

A zero-click XSS vulnerability allows malicious code to execute automatically without requiring user interaction. In AI systems, this can happen when generated code is rendered instantly in preview environments.

How can teams prevent AI code execution vulnerabilities?

Prevention requires shifting from static validation to execution control. This includes:

  • strict input and output validation
  • sandbox hardening
  • limiting automatic execution
  • monitoring runtime behavior

The focus moves from detecting vulnerabilities to controlling how code behaves in real environments.