Overview

LLM vulnerability scanning is the practice of systematically probing a language model with crafted inputs to identify failure modes, safety gaps, and exploitable weaknesses before or during deployment. Analogous to network vulnerability scanners like nmap or penetration-testing frameworks like Metasploit, an LLM vulnerability scanner automates the process of sending adversarial probes, collecting model outputs, and detecting whether the model exhibited an undesirable behavior. The field combines static probe libraries (known attack patterns), dynamic generation (adaptive prompts that react to model responses), and detector modules that classify outputs as safe or unsafe.

Unlike human red teaming, automated scanning trades creativity for scale: it can run thousands of probe variants overnight against any model accessible via API or locally, producing quantitative failure rates per attack category.

Architecture of a vulnerability scanner

A general-purpose LLM vulnerability scanner (exemplified by garak) is structured around five plugin categories:

The default mode runs all known probes against a target; specific probe families or individual probes can be selected for targeted assessments.

Probe taxonomy

Common vulnerability categories covered by automated LLM scanners:

Safety and content policy

Prompt injection

Code and data safety

Adversarial robustness

Relationship to human red-teaming

Automated scanning and human red teaming are complementary:

Tooling

garak

NVIDIA’s open-source LLM vulnerability scanner (github.com/NVIDIA/garak). Key characteristics:

promptfoo

Open-source CLI + library (Node.js; also pip install promptfoo) that bridges LLM evaluation and red-teaming. Its red-team mode auto-generates adversarial test suites and produces security vulnerability reports; the same tool also handles standard LLM evaluation (model comparison, regression testing). Runs 100% locally; MIT licensed; acquired by OpenAI in 2025. Integrates with CI/CD (GitHub Actions) to fail builds on security regression.

Giskard Scan

Python package (pip install giskard-scan) from Giskard’s v3 modular architecture. Generates adversarial test suites from a plain-language agent description, covering OWASP LLM Top-10 threat categories: prompt injection, harmful content, stereotypes, misinformation, data leakage, and more. Supports custom ScenarioGenerator instances for extending probe coverage. Apache 2.0. Works alongside Giskard Checks (see LLM evaluation) in the same testing pipeline.

Resources