data-analyst_skill

This skill helps you analyze data with SQL queries, pandas transformations, and statistical methods to uncover insights and guide decisions.

Python

99.9k

GitHub Stars

1

Bundled Files

2 months ago

Catalog Refreshed

3 months ago

First Indexed

Readme & install

Copy the install command, review bundled files from the catalogue, and read any extended description pulled from the listing source.

Installation

Preview and clipboard use veilstrat where the catalogue uses aiagentskills.

npx veilstrat add skill shubhamsaboo/awesome-llm-apps --skill data-analyst

SKILL.md1.3 KB

Overview

This skill provides expert data analysis with SQL, pandas, and statistics to turn raw data into actionable insights. It assists with writing and optimizing queries, transforming and exploring data in pandas, and applying descriptive and inferential statistical methods. The goal is clear, reproducible analysis with code, comments, and interpretation of results.

How this skill works

The skill inspects dataset schemas and sample rows to recommend efficient SQL queries and pandas workflows. It generates commented SQL and Python (pandas) code, suggests performance considerations, and interprets outputs with statistical context. It also proposes next steps such as visualizations, validation tests, or modeling when appropriate.

When to use it

Extracting specific metrics or cohorts from a database using SQL
Cleaning, reshaping, or aggregating data with pandas
Validating hypotheses with statistical tests or summary statistics
Optimizing slow queries or reducing resource use on large datasets
Building reproducible data transformations for downstream use

Best practices

Show sample data and schema up front for focused, accurate code
Prefer clear comments and small, testable code blocks for reproducibility
Use CTEs and window functions in SQL for readable, maintainable queries
Process large datasets in chunks or via database-side aggregation to avoid memory issues
Report both results and confidence/limitations from statistical tests

Example use cases

Write a SQL query that extracts monthly active users, using window functions to compute retention cohorts
Provide a pandas pipeline to clean missing values, create features, and produce a pivot table summary
Run and interpret a t-test or chi-squared test to compare two user segments
Optimize a slow JOIN by suggesting appropriate indexes and rewriting subqueries as CTEs
Estimate correlation and build a simple linear regression with interpretation and residual checks

FAQ

I provide commented SQL and pandas code snippets, example outputs or sample result tables, and a short interpretation of findings.

Can you handle very large datasets?

Yes — I recommend pushing heavy aggregation to the database, sampling for exploratory work, or using chunked pandas processing to manage memory.