The AI attack surface is every entry point, component, and data flow an adversary can target in an AI system. The average enterprise AI deployment now has 14.3 distinct attack surface components, up from 3.2 in 2023 (Gartner AI Security Survey, 2025). That is a 347% expansion in two years. Traditional application security is necessary but not sufficient. AI introduces new vulnerability classes that most security teams have never tested for.
The McKinsey Lilli breach (February 2026, 46.5 million messages exposed) exploited unauthenticated API endpoints. CVEs issued in 2025 for Claude Code (CVSS 8.7), GitHub Copilot (RCE), and Cursor (arbitrary code execution) prove that the tools used to build AI applications are themselves attack surfaces. This guide maps every component: APIs, RAG pipelines, system prompts, vector stores, agent tools, AI coding assistants, and supply chain dependencies.
AI API Security
AI APIs are the primary external interface for most AI deployments, serving as the gateway between users (or other systems) and the underlying models, data stores, and agent capabilities. They represent the most immediately exploitable component of the AI attack surface.
Unauthenticated Endpoints
The most devastating AI API vulnerability is the simplest: exposing functional endpoints without authentication. The McKinsey Lilli breach was enabled by unauthenticated API endpoints that provided direct access to backend databases and OpenAI API integrations. According to CodeWall’s 2026 AI Security Report, 41% of enterprise AI deployments have at least one unauthenticated API endpoint that exposes sensitive functionality.
Why this happens in AI deployments:
- AI applications are often built rapidly by data science teams without security engineering review
- Internal-facing AI tools are later exposed externally without authentication being added
- Microservice architectures create internal APIs that are inadvertently exposed through misconfigurated API gateways or load balancers
- Prototype AI chatbots deployed “temporarily” without authentication become permanent fixtures
Testing approach:
- Enumerate all API endpoints through discovery (swagger/OpenAPI docs, JavaScript analysis, traffic interception)
- Test each endpoint without authentication credentials
- Test with invalid, expired, or other users’ credentials
- Check for API endpoints that bypass authentication through alternative paths (direct model endpoint access, WebSocket connections, GraphQL introspection)
Defensive recommendations:
- Require authentication on every endpoint, including internal and “read-only” endpoints
- Implement API gateway authentication at the perimeter, not at the application level
- Use short-lived tokens with automatic rotation
- Conduct regular API inventory audits to identify shadow AI APIs
Rate Limiting and Abuse Prevention
AI API endpoints are expensive to operate (each request consumes GPU compute and incurs per-token costs) and can be exploited for denial of service, cost manipulation, or data extraction at scale.
Common vulnerabilities:
- No rate limiting on AI inference endpoints (enabling model extraction through massive query volumes)
- Per-user rate limits that can be bypassed through account creation or credential sharing
- No cost caps on AI API usage, enabling billing attacks
- Rate limits applied to the API gateway but not to WebSocket or streaming endpoints
Impact examples:
- An attacker querying an unprotected embedding endpoint millions of times can extract a functionally equivalent copy of the embedding model
- Without cost caps, a prompt injection that triggers recursive tool calls can generate thousands of dollars in API charges within minutes
- Rate-limit-free endpoints enable large-scale data extraction through enumeration queries
API Key and Secret Management
AI applications often require API keys for multiple services: LLM providers (OpenAI, Anthropic), vector databases, embedding services, and tool integrations. Each key represents a potential attack vector.
Common vulnerabilities:
- API keys hardcoded in client-side code, configuration files, or CI/CD pipelines
- Shared API keys across environments (development, staging, production)
- API keys with excessive permissions (admin-level keys used for inference-only applications)
- No key rotation policy or expiration dates
- API keys exposed through AI-generated code (LLMs trained on code containing actual API keys)
Real-world example: In the McKinsey Lilli breach, access to unauthenticated API endpoints exposed the OpenAI API integration, potentially allowing the attacker to make API calls using McKinsey’s OpenAI organization credentials — a direct financial and data security exposure.
RAG Pipeline Vulnerabilities
Retrieval-Augmented Generation (RAG) has become the standard architecture for grounding LLM responses in organizational knowledge. However, RAG introduces a complex pipeline — document ingestion, embedding, vector storage, retrieval, and context injection — where each stage presents unique attack vectors.
Cross-Tenant Data Leakage
In multi-tenant RAG deployments, inadequate isolation between tenants can allow one user to access another’s documents through carefully crafted queries.
Attack mechanism:
- Attacker crafts queries that are semantically similar to the target tenant’s documents
- The vector similarity search retrieves documents from the target tenant’s namespace
- The LLM includes retrieved content in its response, leaking cross-tenant data
Why this happens:
- Vector databases often use namespace or partition-based isolation, which can be misconfigured
- Embedding models may create similar vectors for semantically related content across tenants
- Metadata-based filtering (tenant ID) can be bypassed if the filter is applied at the application layer rather than the database layer
- Shared embedding models mean cross-tenant documents exist in the same vector space
Testing approach:
- Create documents with known content in one tenant
- Query from a different tenant using semantically similar queries
- Test with direct vector IDs if the API exposes them
- Test metadata filter bypass techniques
Document Injection and Poisoning
Attackers who can upload or modify documents in a RAG knowledge base can inject content that manipulates AI responses for all users.
Attack scenarios:
- Injecting documents containing prompt injection payloads that override the system prompt when retrieved
- Uploading documents optimized to rank highly in similarity searches for specific queries
- Modifying existing documents to inject misinformation or malicious instructions
- Uploading documents that trigger downstream vulnerabilities (XSS, SSRF) when their content is included in AI responses
McKinsey Lilli impact: The breach exposed 3.68 million RAG chunks — the entire internal knowledge base. Had an attacker injected poisoned documents before the breach was discovered, those documents could have influenced AI responses for all 57,000 users.
Defensive recommendations:
- Implement strict access controls on document ingestion pipelines
- Validate and sanitize all documents before embedding
- Monitor RAG knowledge base for unauthorized modifications
- Implement document provenance tracking and integrity verification
- Use role-based access control for document retrieval, not just ingestion
Embedding Extraction and Inversion
Vector embeddings stored in RAG systems can potentially be extracted and inverted to reconstruct the original documents.
Attack mechanism:
- Access the vector database through API or direct database connection
- Extract embedding vectors for target documents
- Use embedding inversion techniques to reconstruct approximate document content
Research status: Embedding inversion attacks have demonstrated partial reconstruction of source text from embeddings with 60-80% fidelity (Morris et al., 2023). While not perfect, this is sufficient to extract sensitive information, key phrases, and document structure.
McKinsey Lilli impact: 266,000+ OpenAI vector store entries were accessible — each representing embedded representations of internal documents that could potentially be inverted to reconstruct the original content.
System Prompt Extraction and Manipulation
The system prompt is the foundation of an AI application’s behavior, containing its instructions, constraints, persona definition, tool access configurations, and often sensitive implementation details.
Extraction Techniques
System prompts can be extracted through multiple vectors:
Conversational extraction: Asking the model to reveal its instructions through various framings — direct requests, role-playing, translation, summarization, and multi-turn approaches. Research from Perez and Ribeiro (2022) found that basic conversational extraction succeeds against 60-80% of LLM applications.
Behavioral inference: Even when direct extraction fails, the system prompt’s content can be inferred by testing the model’s behavioral boundaries — what it will and won’t do, what format it responds in, what topics it avoids.
Side-channel extraction: Analyzing response patterns, latency variations, and token probabilities to infer system prompt content.
What System Prompts Reveal
Extracted system prompts frequently expose:
- Internal API endpoint URLs and authentication patterns
- Database schema information and query patterns
- Business logic and decision criteria
- Safety guardrail implementation details (which enable targeted bypass)
- Tool/function calling configurations
- Internal role definitions and access control logic
- References to internal systems, codenames, and proprietary methods
Manipulation Through Injection
Beyond extraction, attackers can manipulate the effective system prompt through:
- Prompt injection: Overriding or supplementing system instructions through user input
- Context window displacement: Pushing the system prompt out of the effective context window through long inputs
- Configuration file injection: In AI coding tools, modifying configuration files that function as system prompts (e.g.,
.claudefiles,.cursorrules, Copilot workspace settings)
Vector Store Exposure
Vector databases (Pinecone, Weaviate, Qdrant, Chroma, pgvector) are the backbone of RAG systems, storing embedded representations of an organization’s knowledge base. They represent a high-value target because they contain semantically searchable representations of all documents the AI system has access to.
Direct Database Access
If vector databases are exposed without proper authentication or network isolation, attackers can:
- Enumerate all stored vectors and metadata
- Extract embedding vectors for offline analysis
- Modify or delete vectors to poison the knowledge base
- Access metadata (document titles, sources, timestamps) that reveals organizational information
McKinsey Lilli: 266,000+ Vector Store Entries
The McKinsey Lilli breach exposed over 266,000 OpenAI vector store entries. These entries represented embedded versions of internal McKinsey documents, client materials, and proprietary research. The vector store access alone — even without the additional 3.68 million RAG chunks and 46.5 million chat messages — represented a massive intellectual property exposure.
Lessons for vector store security:
- Vector databases must be treated with the same security controls as primary data stores
- Network isolation is essential — vector databases should not be accessible from the internet
- Authentication and authorization must be enforced at the database level, not just the application level
- Encryption at rest and in transit is mandatory for vector stores containing sensitive data
- Regular access audits should monitor who and what is querying the vector database
Vector Store Enumeration Attacks
Even without direct database access, attackers can enumerate vector store contents through the AI application:
- Submit systematically varied queries to map the knowledge base contents
- Analyze retrieval results to identify document boundaries and categories
- Use semantic similarity to navigate from known to unknown content areas
- Extract metadata (document names, sources, dates) from AI responses
AI Coding Tools as Attack Surface
AI coding tools — Claude Code, GitHub Copilot, Cursor, Windsurf, and others — represent a rapidly expanding and fundamentally new attack surface. These tools have direct access to local file systems, can execute commands, modify code, interact with APIs, and operate with the full permissions of the user running them.
The Scale of the Problem
Claude Code is not just a coding tool — companies use it for LinkedIn scraping, CRM automation, financial data processing. Non-developers make up >50% of usage at major companies. This means that AI coding tools with system-level access are being used by individuals who may not understand the security implications of the permissions they are granting.
According to GitHub’s 2025 Octoverse report and McKinsey’s 2024 Global AI Survey, the majority of developers in enterprise environments now use AI coding assistants. These tools operate with the developer’s full system permissions — read/write access to all files, ability to execute arbitrary commands, access to environment variables containing API keys and credentials, and connectivity to internal networks.
CVE-2025-59536: Claude Code Configuration Injection (CVSS 8.7)
Affected versions: Claude Code prior to 1.0.17
Vulnerability type: Prompt injection via configuration files and project context
Attack vector: Malicious .claude configuration files, CLAUDE.md instruction files, or crafted project files
Attack scenario:
- Attacker creates a repository containing malicious configuration files
- Files contain prompt injection payloads embedded in code comments, documentation, or configuration
- Victim clones the repository and runs Claude Code
- Claude Code ingests the malicious configuration as part of its context
- Injected instructions cause Claude Code to execute arbitrary commands, exfiltrate credentials, or modify files
Impact: Arbitrary command execution on the victim’s machine, credential theft, file modification, data exfiltration.
Fix: Claude Code 1.0.17 introduced permission prompts for file operations and command execution, plus sandboxing improvements. However, the fundamental risk — that AI coding tools process untrusted project files as context — remains an architectural challenge.
CVE-2025-53773: GitHub Copilot Remote Code Execution
Vulnerability type: Prompt injection leading to arbitrary code execution Attack vector: Crafted code comments, documentation, or project files
Code comments, docstrings, README files, and issue descriptions can contain prompt injection payloads that manipulate Copilot’s code suggestions. In certain IDE configurations with auto-execute capabilities, malicious code suggestions translate directly to code execution on the developer’s machine.
Broader implication: Every open-source repository, Stack Overflow answer, documentation page, and issue thread that a developer references while using Copilot is a potential injection vector.
CVE-2026-21852: Cursor AI Agent Mode Arbitrary Code Execution
Affected component: Cursor IDE Agent Mode
Vulnerability type: Prompt injection enabling arbitrary code execution through AI agent mode
Attack vector: Malicious project files, documentation, or .cursorrules configurations
Cursor’s Agent Mode operates with elevated permissions to create, modify, and execute files autonomously. The CVE demonstrated that crafted project files could inject instructions that Agent Mode would follow, leading to arbitrary code execution with the user’s full system permissions.
Key difference from previous CVEs: Cursor’s Agent Mode is explicitly designed to take autonomous actions — creating files, installing packages, running commands — which means successful prompt injection immediately translates to system compromise without any intermediate step.
Supply Chain Risks from AI Tool Configurations
AI coding tools introduce a new supply chain attack vector through configuration files:
| Tool | Configuration File | Risk |
|---|---|---|
| Claude Code | .claude, CLAUDE.md | Project-level instruction injection |
| Cursor | .cursorrules, .cursor/ directory | Rule injection, agent mode manipulation |
| GitHub Copilot | .github/copilot-instructions.md | Suggestion manipulation |
| Windsurf | .windsurfrules | Agent behavior override |
| Aider | .aider.conf.yml | Tool configuration manipulation |
These configuration files are typically:
- Committed to version control (and thus present in cloned repositories)
- Not audited by standard security scanning tools
- Trusted by default by the AI coding tools that read them
- Written in natural language (making malicious instructions indistinguishable from legitimate ones)
Attack pattern:
- Attacker contributes a pull request to an open-source project that adds or modifies an AI tool configuration file
- The PR appears to contain helpful AI assistant instructions
- The configuration file contains embedded prompt injection that activates when developers use AI tools in the project
- Every developer who clones the project and uses the affected AI tool is exposed
Defensive recommendations:
- Treat AI tool configuration files as security-sensitive (like
.bashrcorMakefile) - Review AI configuration files in code review with the same scrutiny as code changes
- Use AI coding tools with minimal permissions and explicit approval for high-risk actions
- Maintain allow-lists of trusted AI tool configurations
- Never clone untrusted repositories and immediately run AI coding tools on them
For a detailed analysis of AI coding tools as an attack surface, see the RedTeamPartner.com analysis of AI coding tool security.
AI Agent Frameworks
AI agents — systems that can plan, use tools, browse the web, execute code, and take actions autonomously — represent the highest-risk expansion of the AI attack surface. Each capability granted to an agent is a capability that an attacker can potentially hijack through prompt injection or other manipulation.
Agent Tool-Use Exploitation
When an AI agent has access to tools (APIs, databases, file systems, code execution), prompt injection can trigger unauthorized tool use:
| Tool Capability | Exploitation Scenario | Impact |
|---|---|---|
| Database queries | Prompt injection triggers data exfiltration via SQL | Data breach |
| File system access | Injected instructions read/modify sensitive files | Data theft, code tampering |
| Code execution | Manipulated agent executes malicious code | System compromise |
| Web browsing | Agent navigates to attacker-controlled sites | Credential phishing, malware |
| Email sending | Injected instructions send phishing or data to attacker | Social engineering, data exfiltration |
| API calls | Agent makes unauthorized calls to internal/external APIs | Privilege escalation, lateral movement |
Multi-Agent Chain Vulnerabilities
Complex AI deployments use multiple agents that collaborate on tasks — a planning agent delegates to specialized agents for research, coding, communication, and data analysis. This creates chain-of-trust vulnerabilities:
- Prompt injection in one agent’s input can propagate instructions to downstream agents
- Output from a compromised agent becomes trusted input for subsequent agents
- Each agent in the chain may have different permissions, enabling privilege escalation through the chain
- Monitoring and logging may not capture inter-agent communication
Agentic RAG Vulnerabilities
Agentic RAG systems — where an AI agent dynamically decides what documents to retrieve, how to process them, and what actions to take based on retrieval results — compound the risks of both RAG vulnerabilities and agent tool-use:
- The agent decides which queries to send to the vector database (attackable through query manipulation)
- Retrieved documents can contain prompt injection that redirects the agent’s subsequent actions
- The agent may have write access to the knowledge base, enabling self-reinforcing poisoning loops
Full AI Attack Surface Map
The following table provides a complete inventory of AI attack surface components:
| Component | Attack Vectors | Example CVE/Incident | Risk Level |
|---|---|---|---|
| User-facing chat interface | Prompt injection, jailbreaking | Universal | High |
| AI API endpoints | Authentication bypass, rate limit abuse | McKinsey Lilli (2026) | Critical |
| System prompt | Extraction, manipulation, override | Universal | High |
| RAG knowledge base | Document injection, cross-tenant leakage | McKinsey Lilli (2026) | Critical |
| Vector database | Direct access, enumeration, extraction | McKinsey (266K+ entries) | Critical |
| Embedding model | Extraction, inversion, adversarial inputs | Research-stage | Medium |
| LLM model | Jailbreaking, extraction, poisoning | Universal | High |
| Agent tools | Unauthorized tool use, privilege escalation | McKinsey (SQL write access) | Critical |
| AI coding tool configs | Configuration injection | CVE-2025-59536, CVE-2026-21852 | High |
| Code suggestions | Malicious code generation | CVE-2025-53773 | High |
| Training pipeline | Data poisoning, backdoor injection | Research + real-world | High |
| Fine-tuning data | Targeted poisoning | Research-stage | Medium |
| Plugin/extension ecosystem | Malicious plugins, supply chain | Multiple reports | High |
| Monitoring/logging | Log injection, monitoring evasion | Emerging | Medium |
| Model hosting infrastructure | Traditional infra attacks | Standard infra CVEs | Medium |
Key Takeaways
-
The AI attack surface has expanded 347% since 2023 (Gartner, 2025), with the average enterprise AI deployment now having 14.3 distinct attack surface components.
-
Unauthenticated AI API endpoints remain the most immediately exploitable vulnerability, as the McKinsey Lilli breach demonstrated.
-
RAG pipelines introduce cross-tenant leakage, document poisoning, and embedding extraction risks that traditional application security does not address.
-
AI coding tools (Claude Code, Copilot, Cursor) are a fundamentally new attack surface category, with critical CVEs demonstrating arbitrary code execution through prompt injection in configuration files and project context.
-
AI coding tools are used far beyond coding — companies use them for scraping, CRM automation, and data processing, with non-developers comprising over 50% of usage at major enterprises.
-
Vector stores must be secured with the same rigor as primary databases, as the McKinsey breach’s 266,000+ exposed vector store entries demonstrated.
-
AI agent frameworks represent the highest-risk attack surface expansion because successful prompt injection translates directly to unauthorized tool use and system compromise.
-
Defense requires mapping every component of the AI attack surface and applying appropriate controls to each — there is no single solution that addresses all vectors.
Sources and References
- Gartner. “AI Security Survey: State of Enterprise AI Protection.” 2025.
- CodeWall. “AI Security Report 2026.” 2026.
- Olsen, Chris (xyzeva). “McKinsey Lilli: Technical Analysis.” February 28, 2026.
- Anthropic. “Claude Code Security Advisory: CVE-2025-59536.” 2025.
- GitHub. “GitHub Copilot Security Advisory: CVE-2025-53773.” 2025.
- Cursor. “Security Advisory: CVE-2026-21852.” 2026.
- GitHub. “Octoverse 2025: The State of Open Source and AI.” 2025.
- Morris, John X. et al. “Text Embeddings Reveal (Almost) As Much As Text.” 2023.
- Perez, Fábio and Ribeiro, Ian. “Ignore This Title and HackAPrompt.” 2022.
- OWASP. “OWASP Top 10 for Large Language Model Applications, v2.0.” 2025.
- MITRE. “ATLAS: Adversarial Threat Landscape for Artificial-Intelligence Systems.” 2025.