- Home
- Skills
- Orchestra Research
- Ai Research Skills
- Nemo Guardrails
nemo-guardrails_skill
- TeX
5.2k
GitHub Stars
1
Bundled Files
3 weeks ago
Catalog Refreshed
2 months ago
First Indexed
Readme & install
Copy the install command, review bundled files from the catalogue, and read any extended description pulled from the listing source.
Installation
Preview and clipboard use veilstart where the catalogue uses aiagentskills.
npx veilstart add skill orchestra-research/ai-research-skills --skill nemo-guardrails- SKILL.md7.5 KB
Overview
This skill adds programmable runtime safety rails to LLM applications using NVIDIA's NeMo Guardrails framework. It provides jailbreak detection, input/output validation, fact-checking, hallucination detection, PII filtering, and toxicity detection that can run with low latency on commodity GPUs (T4+). The rails are authored in Colang 2.0 DSL and can wrap any LLM for production-ready safety orchestration.
How this skill works
You define flows and rules in Colang 2.0 that match user or model behavior, then attach executable actions for checks (toxicity, PII, fact verification, etc.). At runtime, the wrapper intercepts messages, runs configured checks (pattern matching, model-based validators, or external detectors like Presidio and LlamaGuard), and enforces outcomes such as refusal, masking, retrieval, or retries. Actions are pluggable Python functions and integrations, and checks can run sequentially, conditionally, or in parallel to balance safety and latency.
When to use it
- When you need enforceable runtime safety for chat or API-driven LLMs
- When you want programmable, auditable safety rules rather than a single moderation API
- When combining multiple safety layers (jailbreak, hallucination, PII, toxicity) is required
- When deploying production agents with low-latency constraints on GPUs (e.g., T4)
- When you need extensible checks that call retrieval or external validators
Best practices
- Author clear Colang flows for each safety concern and test with representative adversarial inputs
- Use lightweight pattern matching for fast pre-filters and reserve LLM-based checks for complex cases
- Parallelize independent checks (toxicity, PII, jailbreak) to reduce overall latency
- Tune thresholds and provide graceful fallbacks to reduce false positives (masking, ask for clarification)
- Plug in retrieval-backed fact verification and require multiple sources for high-risk claims
Example use cases
- Chatbot that blocks prompt-injection and refuses illegal or dangerous requests
- Customer support agent that masks or strips PII before passing text to downstream systems
- Knowledge assistant that fact-checks model answers via retrieval and apologizes or corrects errors
- Safety wrapper for third-party LLMs to enforce organization policies at runtime
- Research agent that logs and audits safety decisions while running automated self-checks
FAQ
Typical overhead is 100–500ms depending on checks. Pattern matches are sub-ms; LLM-based or external checks add 50–300ms each. Parallelization and caching reduce impact.
Can I customize checks and integrations?
Yes. Actions are pluggable Python functions and you can register integrations (Presidio, LlamaGuard, ActiveFence) or custom models. Flows are written in Colang 2.0 for flexible control.