data_skill

This skill helps you design and optimize data pipelines and warehouses using SQL, dbt, Spark, and orchestrators to boost analytics.
  • Python

1

GitHub Stars

1

Bundled Files

3 weeks ago

Catalog Refreshed

2 months ago

First Indexed

Readme & install

Copy the install command, review bundled files from the catalogue, and read any extended description pulled from the listing source.

Installation

Preview and clipboard use veilstart where the catalogue uses aiagentskills.

npx veilstart add skill pluginagentmarketplace/custom-plugin-typescript --skill data

  • SKILL.md1.8 KB

Overview

This skill teaches practical data engineering and analytics: building ETL/ELT pipelines, designing data warehouses, optimizing SQL, and preparing data for analytics or ML. It focuses on scalable tooling and real-world patterns so you can move from raw events to reliable, queryable datasets quickly. The guidance covers batch and streaming, testing, monitoring, and governance to keep pipelines production-ready.

How this skill works

The skill inspects pipeline design, transformation logic, and operational patterns to recommend improvements and templates. It evaluates data ingestion, staging, transformation, and serving layers using SQL snippets, orchestration patterns, and tooling choices. Recommendations include query optimizations, partitioning and clustering, schema design for analytics, and monitoring/test strategies.

When to use it

  • Building or refactoring ETL/ELT pipelines for batch or streaming data
  • Designing a data warehouse schema or choosing a warehousing platform
  • Optimizing slow SQL queries and analytics performance
  • Preparing datasets for machine learning or BI reporting
  • Establishing data quality, testing, and monitoring practices

Best practices

  • Implement data quality checks and automated validation at every stage
  • Use version control for transformations and modular, testable code (dbt, CI)
  • Design partitions and clustering based on query patterns and cardinality
  • Instrument pipelines with alerts, metrics, and lineage for observability
  • Apply least-privilege governance and encrypt sensitive data in transit and at rest

Example use cases

  • Create daily staging tables from raw event streams and produce aggregated metrics for dashboards
  • Migrate on-prem ETL jobs to a cloud data warehouse with optimized SQL and partitioning
  • Build a streaming pipeline using Kafka + Spark/Flink to power real-time analytics
  • Implement dbt-based transformations with automated tests and CI/CD for analytics models
  • Tune slow BI queries by adding appropriate indexes, materialized views, or denormalized tables

FAQ

Pick a managed warehouse (BigQuery, Snowflake, or Redshift) and pair it with Airflow or a managed orchestration plus dbt for transformations; choose Kafka or Kinesis for high-throughput streaming.

How do I ensure data quality before reporting?

Implement schema checks, null/consistency tests, row-count comparisons, and end-to-end integration tests; fail fast and surface issues to monitoring dashboards and alerts.

Built by
VeilStrat
AI signals for GTM teams
© 2026 VeilStrat. All rights reserved.All systems operational
data skill by pluginagentmarketplace/custom-plugin-typescript | VeilStrat