// AI / LLM & Agent Pentesting
We assess AI‑enabled applications end‑to‑end — prompts, agents/tools, plugins, model supply chain, and data pipelines (RAG). Findings map to OWASP Top 10 for LLMs, MITRE ATLAS, and NIST AI RMF with pragmatic guardrails.
// Common Attack Vectors
Prompt/Instruction Attacks
Jailbreaks, role confusion, system prompt leakage, refusal bypass, indirect prompt injection via content.
Agent/Tool Abuse
Over‑permissive tool actions (filesystem, HTTP, DB, shell), OS/SQL/command injection through tool inputs.
RAG/Data Leakage
Vector store poisoning, untrusted data sources, PII/external data exfiltration through retrieval and prompt stitching.
Supply Chain
Model provenance, plugin/tool trust boundaries, malicious packages, dependency confusion, model replacement/rollback.
Guardrails
Safety policies, adversarial evals, allow/deny lists, content filters, tool‑scoped permissions, audit logging.
Monitoring
Prompt abuse detections, anomaly signals, replayable traces, secure telemetry, break‑glass procedures.
// Methodology
Discover
Threat modeling, asset inventory, tool capability review, data flow mapping.
Assess
Manual red‑teaming of prompts/agents, tool fuzzing, RAG abuse cases, and supply‑chain checks.
Report
Reproducible exploits, impact analysis, prioritized remediation with guardrail patterns and detections.
Validate
Retesting of fixes and adversarial eval baselines for regression.
Enable
Developer playbooks, safety policy tuning, blue‑team detections, and SOC handoff.
Partner
Quarterly exercises and roadmap for maturing AI security.
// Sample Findings
- Indirect prompt injection allows data exfiltration via RAG context; missing content sanitization and policy layering.
- Agent tool permits arbitrary HTTP/FS access; insufficient allow‑listing and missing confirmation gates for sensitive actions.
- Plugin trust boundary confusion leads to SSRF; needs scoped credentials and hostname allow‑lists.
// Scoping Checklist
- Environments: test/prod, tenants, demo data
- Model/Providers: hosted vs. self‑hosted, versions
- Agents/Tools: capabilities, guardrails, approval flow
- Data: RAG sources, privacy/PII, redactions
- Integrations: plugins, APIs, webhooks, third‑party apps
- Telemetry: traces/logs, anomaly alerts, SOC integrations