News Daily Nation Digital News & Media Platform

collapse
Home / Daily News Analysis / One keypress is all it takes to compromise four AI coding tools

One keypress is all it takes to compromise four AI coding tools

May 08, 2026  Twila Rosenbaum  24 views
One keypress is all it takes to compromise four AI coding tools

Developers routinely clone unfamiliar repositories: open-source projects, code from teammates, sample code from tutorials, or libraries recommended on forums. The conventional wisdom is to inspect contents before running them. AI coding assistants that operate from the command line have inherited this convention, but a new study from Adversa AI—dubbed TrustFall—reveals where that convention fails catastrophically.

The research covers four agentic coding tools: Claude Code from Anthropic, Gemini CLI from Google, Cursor CLI, and GitHub's Copilot CLI. Each tool reads configuration files that ship inside a project and starts helper programs that those files point to. Crucially, each tool asks for permission with a single dialog box that, in most cases, defaults to 'yes'.

The result is that a malicious repository can compromise a developer's machine the moment they open it in one of these tools and press Enter on the trust prompt. No suspicious tool calls, no anomalous behavior from the AI—just a config file, a default-yes dialog, and a process running with the developer's full permissions.

How the project file can execute code

The mechanism exploited by the researchers is called MCP, or Model Context Protocol. MCP allows an AI assistant to interact with external helper programs: a database connector, a linter, a custom search tool, and others. This feature is designed to be useful, but the catch lies in how helpers are defined. They are specified inside the project itself, in a file that the repository ships. When the assistant starts up in that folder, it automatically starts those helpers.

A helper program runs with the same privileges as any program on the computer. It can read SSH keys, cloud credentials, shell history, source code from other projects on the same machine, and open network connections to attacker-controlled servers. This happens before the AI has performed any reasoning or code generation.

The attack requires only two small JSON files. One defines a helper with an innocent-sounding name like 'linter' and a one-line script that fetches a payload from the internet and executes it. The other file tells the assistant to auto-approve that helper without further prompts. The repository can look almost empty, containing virtually no actual code.

What the trust dialog says

In Claude Code version 2.1 and later, the prompt reads: 'Quick safety check: Is this a project you created or one you trust?' The default option is 'Yes, I trust this folder.' Earlier versions of the same dialog explicitly warned that the project could execute code through MCP and offered a third choice: trust the folder with MCP disabled. That option was removed in later releases.

Gemini CLI lists the helpers by name in its prompt, giving careful readers something to inspect. Cursor CLI mentions MCP in general terms. Copilot CLI shows a generic trust prompt with no MCP reference at all. Every one of these tools defaults to trust, making it extremely easy for an inadvertently cloned malicious repository to bypass user scrutiny.

'They all have different approaches to configs and trust. But Cursor and Copilot / VS Code agent mode are clear analogs. Both read project-scoped MCP configuration. We tested it, and it's the same behavior but with different user approval messages,' said Alex Polyakov, CTO of Adversa AI.

The CI variant: no dialog required

When Claude Code runs on a continuous integration (CI) server, for example through the official GitHub Action published by Anthropic, it operates in headless mode. There is no terminal, so the trust dialog never appears. A pull request from an external contributor can ship a malicious MCP configuration file, and the moment the pipeline runs against that branch, the helper program starts and can access any credentials available to the CI runner: deploy keys, signing certificates, cloud tokens, and environment variables. Adversa AI published a working demonstration that exfiltrates the runner's environment variables to a collector URL.

This CI variant dramatically amplifies the risk because automation pipelines often have broad access to production resources. A single compromised CI job can lead to supply-chain attacks, data exfiltration, or lateral movement into cloud environments.

Enterprise mitigation and its adoption gap

Claude Code supports a 'Managed scope' for settings, pushed centrally by IT and locked from local override. According to Rony Utevsky, the Adversa AI researcher who led the work, 'Managed scope cannot be overridden by any other scope.' An organization that configures it can disable project-scoped MCP auto-approval across every developer machine in one shot, effectively neutering the attack vector at the enterprise level.

However, Polyakov noted that this option is rarely used. 'We haven't seen that managed scope secure configuration often; rather, we've seen the opposite. And it's not that obvious to understand all configuration nuances, especially for vibe coders.' The term 'vibe coders' refers to developers who rely heavily on AI assistants without deeply understanding the underlying configuration systems. This lack of awareness creates a dangerous blind spot where enterprise security policies exist but are not enforced because administrators themselves may not be fully versed in the risks.

The CI variant is especially concerning for enterprises because GitHub Actions and similar CI systems often run with elevated permissions across multiple repositories. A malicious pull request targeting a CI pipeline could compromise not just the runner but also propagate to other systems through credentials stored in environment variables or secret managers.

Anthropic's official position

Anthropic reviewed the TrustFall report and declined to treat it as a vulnerability. Under the company's threat model, accepting 'Yes, I trust this folder' counts as explicit consent to everything the project ships, including MCP definitions. Execution after that point is considered the boundary working as intended.

Adversa AI does not contest where the boundary sits legally or by design. The disagreement centers on whether the dialog informs developers sufficiently about what they are agreeing to. The researchers argue that many developers, especially less experienced ones, do not understand that MCP can execute arbitrary code—and that the removal of the option to trust without MCP reduces user control.

The risks extend beyond individual developers. In large organizations, a single compromised repository can cascade through code reviews, CI/CD pipelines, and dependency trees. Supply-chain attacks have become a dominant threat in software security, and the integration of AI coding assistants introduces a new vector that bypasses traditional source-code analysis. A malicious repository that looks benign—containing only configuration files—can slip through review processes that focus on actual code changes.

Developers and security teams need to update their practices accordingly. Before opening any unfamiliar repository in agentic coding tools, they should manually inspect MCP configuration files. Enterprises should centrally enforce MCP auto-approval settings using managed scope features where available, and consider running AI coding tools in sandboxed environments or containers with restricted network access. The CI variant requires special attention: pull requests from external contributors should not be allowed to run with default trust in headless mode, and CI pipelines should use isolated credentials with minimal permissions.

The broader industry implication is that the ergonomics of trust dialogs matter significantly. When the default is 'yes' and the warning text is vague, security is sacrificed for convenience. Future versions of these tools should consider making trust an explicit opt-in with clear explanation of MCP capabilities, restoring the option to trust without executing project-defined helpers, and providing non-headless fallbacks for CI environments.


Source: Help Net Security News


Share:

Your experience on this site will be improved by allowing cookies Cookie Policy