BLOG
BLOG
Artificial intelligence is no longer just generating text. It generates and executes code in real time.
With tools like Google Gemini, features such as code canvases and live previews are turning AI systems into interactive execution environments. This shift introduces a new and rapidly growing category of risk: AI security vulnerabilities tied to real-time code execution.
In recent Appknox research, our security team uncovered a Gemini XSS vulnerability, a critical example of how AI code execution vulnerabilities can emerge when generated output is treated as trusted.
If you or your teams are already using AI code generators in development workflows, this risk is not theoretical. It is already present in your environment.
Recent industry data suggests AI-generated code can contain up to 2.7x more vulnerabilities than human-written code, reinforcing how quickly this risk is scaling across development environments.
AI code execution is changing how application security works and where it breaks.
The key shift is this: you’re no longer just securing code. You’re securing what gets executed automatically.
As AI coding tools like Gemini become part of everyday development workflows, most teams approach them with a familiar mental model.
They assume:
This assumption made sense in traditional development.
But in practice, AI systems don’t follow that model.
With features like code canvases and live previews, AI tools are no longer just generating code, but are rendering and executing it immediately. The gap between generation and execution has effectively disappeared.
This creates a subtle but critical shift of AI-generated output not being just code, but an untrusted input that can execute in real time.
That distinction is where risk begins because when output is treated as trusted and execution happens automatically, validation is no longer a checkpoint but an afterthought.
And in many cases, it doesn’t happen at all.
One of the most important questions security teams are asking today: Can AI tools execute malicious code?
Yes. And increasingly, they already do. Gemini’s code canvas allows users to:
This creates a powerful experience. But it also introduces a high-risk condition - the system is executing untrusted, AI-generated code in real time. This is how AI code execution vulnerabilities start.
Most organizations are not prepared for this shift. Traditional security models assume:
AI breaks both assumptions.
During testing, we identified a Google Gemini vulnerability classified as a Gemini XSS vulnerability, enabling zero-click AI code execution.
At its core, the issue was simple but critical:
Unlike traditional XSS, this vulnerability was driven by automatic execution within the AI workflow, making exploitation more reliable and less dependent on user behavior.
The execution flow highlights where control breaks down:
The core issue is that AI-generated output moves directly from input to execution, without meaningful validation or control.
At a surface level, this appears to be a standard input validation issue. In practice, it is a mismatch between how different layers interpret and enforce security controls.
The exploit succeeded due to a combination of factors:
When content is wrapped inside an <svg> tag, the browser switches to a different parsing mode. Many sanitization mechanisms are designed for standard HTML and fail to properly analyze nested or XML-like structures.
This allowed the payload to bypass filters that were not recursively validating SVG content.
Instead of using common <script src=...> patterns, the payload leveraged legacy attributes like xlink:href, which are valid within SVG contexts but often overlooked by sanitization logic.
As a result, the system failed to recognize this as an executable vector.
Encoded values such as data: were used to evade detection. The sanitization layer evaluated the encoded string, while the browser decoded it during execution.
This created a gap between what was validated and what was executed.
The preview environment relied on iframe-based isolation, but sandbox restrictions were not strict enough. Permissions like pop-up handling allowed the payload to escape intended boundaries using functions like window.open().
The outcome was not a single failure, but a chain of small gaps across parsing, validation, and sandboxing.
This is a common pattern in modern vulnerabilities where risk emerges when multiple layers enforce security inconsistently.
To understand the broader implications, it’s important to look at what the system was expected to do versus what actually happened.
Expected secure design:
What actually happened:
This created a zero-click execution path within a trusted environment.
At first glance, this vulnerability resembles a typical cross-site scripting issue. But in practice, it introduces a fundamentally different risk model.
In traditional XSS:
In AI-driven environments like Gemini:
This creates a new class of vulnerabilities: Zero-click execution through system-generated content
The risk is not just injection but also the automatic execution of untrusted output within trusted workflows.
This is where AI changes the threat model. The system is no longer just processing input. It is actively generating and executing it.
The exploit leveraged a mismatch in system behavior:
This gap enabled a payload that:
This creates a working AI code-execution vulnerability within Gemini. This is not just XSS. It is the AI workflow executing malicious code.
This vulnerability is not an isolated issue. It reflects a broader shift in how AI systems behave. AI tools are no longer passive. They generate code, render it, and execute it. That fundamentally changes the risk model.
At the same time, AI output is often treated as trusted. In reality, it is an untrusted input. This gap is at the core of many AI security vulnerabilities.
Automatic execution makes this worse. When code is rendered and executed without friction, exploitation becomes significantly easier and more reliable.
While this vulnerability was identified in Gemini, the underlying pattern is not specific to a single platform.
Any system that:
is exposed to similar risks.
This includes:
In enterprise environments, these tools are increasingly integrated into development workflows. That means the attack surface is not limited to one application. It extends across the entire toolchain.
The vulnerability is not in Gemini alone. It is in how modern systems are designed to prioritize speed and usability over controlled execution.
Consider a realistic enterprise scenario:
This could result in:
If your organization is integrating AI into development workflows, this is a direct enterprise risk.
In a recent supply chain attack, a widely used LLM tool (LiteLLM) was compromised, exposing hundreds of thousands of systems and enabling credential theft, API key exfiltration, and infrastructure access.
Most security programs today do not see how AI-generated code behaves at runtime.
AI-generated code poses risks due to how it is created and used.
AI can generate large volumes of code instantly. That makes it difficult to review and validate every output.
Developers do not always fully validate what AI produces. The output is often treated as trusted, even when it should not be.
Modern tools render and execute code immediately. When generation and execution happen together, the window for validation disappears.
Together, these create a high-risk environment for code-execution vulnerabilities.
Most application security programs are designed around a predictable lifecycle.
Code is written, reviewed, tested, and then executed. Security tools are aligned to this flow, focusing on identifying vulnerabilities before release.
AI disrupts this model at multiple levels.
In AI-driven environments:
This breaks the traditional checkpoints where security controls are applied.
What used to be validated during development or testing now moves directly into execution.
The question is no longer just what gets built. It’s what gets executed without validation.
This is where many existing tools fall short.
They are designed to analyze static code or pre-release artifacts. But vulnerabilities like the Gemini XSS issue don’t originate from static code. They emerge from execution behavior in dynamic environments.
In practice, this creates a visibility gap:
This is why AI-driven development is not just a tooling shift. It is a fundamental change in the application security model.
Most security discussions around AI focus on the code itself.
But as this vulnerability demonstrates, that’s not the right starting point.
The more relevant question is not whether the code is secure but rather, “What happens when this code executes automatically?”
Because in AI-driven workflows:
This shifts the problem from static analysis to execution control.
You are no longer just securing code. You are securing how code behaves when it is generated, rendered, and executed in real time.
And this is where the Gemini vulnerability becomes more than a one-off issue.
It is a signal of a broader pattern: AI systems are collapsing the boundaries between input, processing, and execution.
When those boundaries disappear, traditional assumptions about trust, validation, and control no longer apply.
Securing AI-driven systems requires a shift in how applications are tested and controlled.
AI-generated output should be validated the same way as external input.
Execution vectors need to be removed, including encoded and obfuscated inputs.
Restrict popups, scripts, and any paths that allow sandbox escape.
Introduce explicit run controls so code is not executed without validation.
This Gemini vulnerability is not a one-off.
AI is now part of how applications are built and executed. The same risks apply, but in a more dynamic and less controlled environment.
The attack surface is growing. Code is being generated and executed automatically. And traditional trust boundaries are breaking.
Recent issues in frameworks like LangChain show how this plays out, with risks such as data exfiltration, API key leakage, and unintended access to system resources
Most traditional application security tools are not designed for this new reality. They are built around the assumption that code is written, reviewed, and tested before execution. As a result, they focus on source code scanning, periodic assessments, and compliance-driven checks.
That model breaks down in AI-driven environments. In tools like Gemini, code is generated dynamically, rendered instantly, and executed in real time. The vulnerability uncovered here is not a flaw in static code. It is a failure in how execution behavior is handled.
This is why many traditional approaches will miss this entire class of AI code execution vulnerabilities.
What needs to be tested is not just code, but how the application behaves at runtime, including how untrusted inputs are processed and how execution environments are controlled.
This is where platforms focused on runtime application behavior, like Appknox, become relevant. They help identify vulnerabilities such as XSS, API exposure, and business-logic risks that AI systems increasingly encounter.
If your developers are using AI coding tools today, you likely have unvalidated code executing in environments your security team does not fully control.
AI is changing how software is built and executed. It is also changing how it is attacked.
This Gemini vulnerability is not just a bug. It is a signal. You are not just securing code. You are securing how that code executes.
If AI systems are executing untrusted code in real time, the next question is not just how to detect vulnerabilities, but how to evaluate whether your current AppSec approach can handle this shift.
Check out: Appknox’s Mobile AppSec Evaluation Guide
A Gemini XSS vulnerability is a type of AI security vulnerability where malicious code executes automatically within an AI-generated preview environment due to insufficient input validation and execution controls.
A Gemini XSS vulnerability allowed malicious code to execute automatically due to insufficient validation and sandbox restrictions.
It exploits gaps in sanitization and automatic rendering, allowing malicious payloads to execute during preview.
Yes. AI tools that render generated code without strict validation can introduce AI code execution vulnerabilities.
Because it is often treated as trusted output, while in reality, it is untrusted input that can execute in real environments.
AI code generators introduce several security risks:
Yes. AI tools like Gemini can execute malicious code when the generated output is rendered and executed without strict validation or sandbox restrictions. This creates a class of vulnerabilities where untrusted code runs automatically within preview environments.
An AI code execution vulnerability occurs when AI-generated output is treated as trusted and executed without proper validation. This allows malicious payloads to run within the application environment, often without user interaction.
A zero-click XSS vulnerability allows malicious code to execute automatically without requiring user interaction. In AI systems, this can happen when generated code is rendered instantly in preview environments.
Prevention requires shifting from static validation to execution control. This includes:
The focus moves from detecting vulnerabilities to controlling how code behaves in real environments.
Hackers never rest. Neither should your security!
Stay ahead of emerging threats, vulnerabilities, and best practices in mobile app security—delivered straight to your inbox.
Exclusive insights. Zero fluff. Absolute security.
Join the Appknox Security Insider Newsletter!