confidence-honesty_skill

This skill enforces explicit confidence percentages before conclusions, validates evidence, and explains gaps to ensure honest, data-backed results.

Python

247

GitHub Stars

1

Bundled Files

2 months ago

Catalog Refreshed

4 months ago

First Indexed

Readme & install

Copy the install command, review bundled files from the catalogue, and read any extended description pulled from the listing source.

Installation

Preview and clipboard use veilstrat where the catalogue uses aiagentskills.

npx veilstrat add skill ntcoding/claude-skillz --skill confidence-honesty

SKILL.md7.7 KB

Overview

This skill forces an explicit, numeric confidence assessment before any definitive claim like "root cause identified" or "complete clarity." It requires showing the evidence math, listing assumptions, and explaining what prevents 100% confidence. If the agent can gather more evidence itself, it must do so before presenting a conclusion.

How this skill works

When the agent is about to make a conclusive statement, the skill auto-invokes a checklist: inventory evidence, run a falsifiability check, audit assumptions, list alternatives, and attempt validation. The agent starts at a neutral 50% and adjusts the score with positive and negative modifiers tied to verified evidence, ruled-out alternatives, and unverified assumptions. Any confidence under 95% must include explicit "Why not 100%?" reasoning and next steps to increase confidence.

When to use it

Before declaring root cause or saying the problem is solved
When using phrases like "complete clarity", "definitely", or "certainly"
During investigations where evidence may be circumstantial
When presenting findings to stakeholders or making recommendations
Before returning an analysis that depends on unverified assumptions

Best practices

Express every conclusion as a percentage confidence, not vague certainty
Show the evidence breakdown with plus/minus adjustments and the strongest evidence item
List assumptions explicitly and mark each as VERIFIED or ASSUMED
State at least two alternative explanations and why they are less likely
If you can fetch validating data yourself, do so before reporting; re-score after validation

Example use cases

Diagnosing a failing API endpoint and reporting likelihood of root cause with evidence and gaps
Analyzing a bug report: provide percent confidence, list unverified configs, and request specific logs if needed
Reviewing a deployment failure and attempting to fetch metrics to raise confidence before advising rollback
Preparing a postmortem draft that includes quantified confidence for each proposed contributing factor
Triage for incident responders: fast, honest confidence with clear next validation steps

FAQ

Use the provided percentage scale starting at 50% and apply listed adjustments; map ranges to the provided meaning (e.g., 86–94% is high confidence).

When must I fetch more data myself?

If confidence is below 80% and you have the ability to gather validating evidence (logs, code search, metrics), you must do it before presenting conclusions.

What if I still can’t reach 95%?

Report the honest percentage, include the evidence breakdown, explain what stops 100%, and give concrete actions that would raise confidence (what you'll do or what you need from the user).