- Home
- Skills
- Shubhamsaboo
- Awesome Llm Apps
- Data Analyst
data-analyst_skill
- Python
99.9k
GitHub Stars
1
Bundled Files
3 weeks ago
Catalog Refreshed
1 month ago
First Indexed
Readme & install
Copy the install command, review bundled files from the catalogue, and read any extended description pulled from the listing source.
Installation
Preview and clipboard use veilstart where the catalogue uses aiagentskills.
npx veilstart add skill shubhamsaboo/awesome-llm-apps --skill data-analyst- SKILL.md1.3 KB
Overview
This skill provides expert data analysis with SQL, pandas, and statistics to turn raw data into actionable insights. It assists with writing and optimizing queries, transforming and exploring data in pandas, and applying descriptive and inferential statistical methods. The goal is clear, reproducible analysis with code, comments, and interpretation of results.
How this skill works
The skill inspects dataset schemas and sample rows to recommend efficient SQL queries and pandas workflows. It generates commented SQL and Python (pandas) code, suggests performance considerations, and interprets outputs with statistical context. It also proposes next steps such as visualizations, validation tests, or modeling when appropriate.
When to use it
- Extracting specific metrics or cohorts from a database using SQL
- Cleaning, reshaping, or aggregating data with pandas
- Validating hypotheses with statistical tests or summary statistics
- Optimizing slow queries or reducing resource use on large datasets
- Building reproducible data transformations for downstream use
Best practices
- Show sample data and schema up front for focused, accurate code
- Prefer clear comments and small, testable code blocks for reproducibility
- Use CTEs and window functions in SQL for readable, maintainable queries
- Process large datasets in chunks or via database-side aggregation to avoid memory issues
- Report both results and confidence/limitations from statistical tests
Example use cases
- Write a SQL query that extracts monthly active users, using window functions to compute retention cohorts
- Provide a pandas pipeline to clean missing values, create features, and produce a pivot table summary
- Run and interpret a t-test or chi-squared test to compare two user segments
- Optimize a slow JOIN by suggesting appropriate indexes and rewriting subqueries as CTEs
- Estimate correlation and build a simple linear regression with interpretation and residual checks
FAQ
I provide commented SQL and pandas code snippets, example outputs or sample result tables, and a short interpretation of findings.
Can you handle very large datasets?
Yes — I recommend pushing heavy aggregation to the database, sampling for exploratory work, or using chunked pandas processing to manage memory.