finding-duplicate-functions_skill

This skill helps identify duplicate-intent functions across a codebase by clustering semantically similar utilities for consolidation.

Shell

182

GitHub Stars

1

Bundled Files

2 months ago

Catalog Refreshed

4 months ago

First Indexed

Readme & install

Copy the install command, review bundled files from the catalogue, and read any extended description pulled from the listing source.

Installation

Preview and clipboard use veilstrat where the catalogue uses aiagentskills.

npx veilstrat add skill obra/superpowers-lab --skill finding-duplicate-functions

SKILL.md4.7 KB

Overview

This skill helps audit a codebase for semantic duplicates: functions that do the same thing but have different names or implementations. It combines a classical extraction step with LLM-powered categorization and per-category duplicate detection. The goal is to surface consolidation opportunities and produce a prioritized report for human review.

How this skill works

First, the skill extracts a catalog of functions from source files with context using a shell extractor. Next, an LLM categorizes functions by intent into domains, then splits the catalog into category files. A stronger LLM inspects each category to identify function pairs or groups that implement the same intent and produces confidence-tagged duplicate sets. Finally, a report generator consolidates findings for human review and safe remediation.

When to use it

Codebase has grown organically with multiple contributors (human or LLM).
You suspect utility or validation functions were reimplemented in different places.
Preparing for a large refactor and you want to minimize redundant code before changes.
After running syntactic duplicate tools (e.g., jscpd) and wanting semantic-level detection.
When auditing LLM-generated code where new functions are often created rather than reused.

Best practices

Restrict extraction to exported functions and public methods to reduce noise.
Run categorization before duplicate detection—compare only within intent categories.
Only analyze categories with 3+ functions for efficiency and signal quality.
Require tests for the chosen survivor function before deleting duplicates.
Treat the LLM outputs as suggestions; perform human review and integration testing before changes.

Example use cases

Find multiple implementations of date and string formatting utilities spread across a repo.
Detect repeated API response shaping functions implemented differently per endpoint.
Locate similar validation and error formatting logic in helper libraries.
Consolidate path and filesystem helpers that have diverged over time.
Prioritize cleanup before a library refactor or release to reduce maintenance burden.

FAQ

LLM detections are probabilistic; outputs include confidence tags. Use HIGH-confidence results as strong candidates but always verify with tests and manual review.

What files should I include when extracting?

Start with source files (e.g., *.ts, *.js) and exclude tests by default. Include tests only if you suspect test utilities are duplicated across suites.