// AI / LLM & Agent Pentesting

We assess AI‑enabled applications end‑to‑end — prompts, agents/tools, plugins, model supply chain, and data pipelines (RAG). Findings map to OWASP Top 10 for LLMs, MITRE ATLAS, and NIST AI RMF with pragmatic guardrails.

// Common Attack Vectors

Prompt/Instruction Attacks

Jailbreaks, role confusion, system prompt leakage, refusal bypass, indirect prompt injection via content.

Agent/Tool Abuse

Over‑permissive tool actions (filesystem, HTTP, DB, shell), OS/SQL/command injection through tool inputs.

RAG/Data Leakage

Vector store poisoning, untrusted data sources, PII/external data exfiltration through retrieval and prompt stitching.

Supply Chain

Model provenance, plugin/tool trust boundaries, malicious packages, dependency confusion, model replacement/rollback.

Guardrails

Safety policies, adversarial evals, allow/deny lists, content filters, tool‑scoped permissions, audit logging.

Monitoring

Prompt abuse detections, anomaly signals, replayable traces, secure telemetry, break‑glass procedures.

// Methodology

Discover

Threat modeling, asset inventory, tool capability review, data flow mapping.

Assess

Manual red‑teaming of prompts/agents, tool fuzzing, RAG abuse cases, and supply‑chain checks.

Report

Reproducible exploits, impact analysis, prioritized remediation with guardrail patterns and detections.

Validate

Retesting of fixes and adversarial eval baselines for regression.

Enable

Developer playbooks, safety policy tuning, blue‑team detections, and SOC handoff.

Partner

Quarterly exercises and roadmap for maturing AI security.

// Sample Findings

Indirect prompt injection allows data exfiltration via RAG context; missing content sanitization and policy layering.
Agent tool permits arbitrary HTTP/FS access; insufficient allow‑listing and missing confirmation gates for sensitive actions.
Plugin trust boundary confusion leads to SSRF; needs scoped credentials and hostname allow‑lists.

// Scoping Checklist

Environments: test/prod, tenants, demo data
Model/Providers: hosted vs. self‑hosted, versions
Agents/Tools: capabilities, guardrails, approval flow
Data: RAG sources, privacy/PII, redactions
Integrations: plugins, APIs, webhooks, third‑party apps
Telemetry: traces/logs, anomaly alerts, SOC integrations

Contact Us Back to Services