nemo-guardrails_skill

This skill enforces runtime safety for LLMs with configurable jailbreaking, toxicity, PII, and fact-checking rails to improve reliability.

TeX

5.2k

GitHub Stars

1

Bundled Files

2 months ago

Catalog Refreshed

4 months ago

First Indexed

Readme & install

Copy the install command, review bundled files from the catalogue, and read any extended description pulled from the listing source.

Installation

Preview and clipboard use veilstrat where the catalogue uses aiagentskills.

npx veilstrat add skill orchestra-research/ai-research-skills --skill nemo-guardrails

SKILL.md7.5 KB

Overview

This skill adds programmable runtime safety rails to LLM applications using NVIDIA's NeMo Guardrails framework. It provides jailbreak detection, input/output validation, fact-checking, hallucination detection, PII filtering, and toxicity detection that can run with low latency on commodity GPUs (T4+). The rails are authored in Colang 2.0 DSL and can wrap any LLM for production-ready safety orchestration.

How this skill works

You define flows and rules in Colang 2.0 that match user or model behavior, then attach executable actions for checks (toxicity, PII, fact verification, etc.). At runtime, the wrapper intercepts messages, runs configured checks (pattern matching, model-based validators, or external detectors like Presidio and LlamaGuard), and enforces outcomes such as refusal, masking, retrieval, or retries. Actions are pluggable Python functions and integrations, and checks can run sequentially, conditionally, or in parallel to balance safety and latency.

When to use it

When you need enforceable runtime safety for chat or API-driven LLMs
When you want programmable, auditable safety rules rather than a single moderation API
When combining multiple safety layers (jailbreak, hallucination, PII, toxicity) is required
When deploying production agents with low-latency constraints on GPUs (e.g., T4)
When you need extensible checks that call retrieval or external validators

Best practices

Author clear Colang flows for each safety concern and test with representative adversarial inputs
Use lightweight pattern matching for fast pre-filters and reserve LLM-based checks for complex cases
Parallelize independent checks (toxicity, PII, jailbreak) to reduce overall latency
Tune thresholds and provide graceful fallbacks to reduce false positives (masking, ask for clarification)
Plug in retrieval-backed fact verification and require multiple sources for high-risk claims

Example use cases

Chatbot that blocks prompt-injection and refuses illegal or dangerous requests
Customer support agent that masks or strips PII before passing text to downstream systems
Knowledge assistant that fact-checks model answers via retrieval and apologizes or corrects errors
Safety wrapper for third-party LLMs to enforce organization policies at runtime
Research agent that logs and audits safety decisions while running automated self-checks

FAQ

Typical overhead is 100–500ms depending on checks. Pattern matches are sub-ms; LLM-based or external checks add 50–300ms each. Parallelization and caching reduce impact.

Can I customize checks and integrations?

Yes. Actions are pluggable Python functions and you can register integrations (Presidio, LlamaGuard, ActiveFence) or custom models. Flows are written in Colang 2.0 for flexible control.