senior-data-engineer_skill

This skill helps you design reliable data pipelines by defining contracts, quality checks, and observability to ensure predictable, recoverable ETL workflows.

Python

3

GitHub Stars

2

Bundled Files

2 months ago

Catalog Refreshed

4 months ago

First Indexed

Readme & install

Copy the install command, review bundled files from the catalogue, and read any extended description pulled from the listing source.

Installation

Preview and clipboard use veilstrat where the catalogue uses aiagentskills.

npx veilstrat add skill vadimcomanescu/codex-skills --skill senior-data-engineer

LICENSE.txt1.1 KB
SKILL.md1.1 KB

Overview

This skill helps design and harden data engineering workflows so pipelines are predictable, observable, and recoverable. It focuses on defining data contracts, designing ingestion and transform steps, implementing quality checks, and building an operational story for alerts and recovery. Use it to standardize ETL/ELT patterns and reduce production incidents.

How this skill works

The skill inspects pipeline design and documentation to ensure inputs, outputs, transformations, backfills, and failure handling are explicit. It evaluates schema and semantic contracts, recommends data quality checks (nulls, ranges, uniqueness, referential integrity), and identifies gaps in retries, checkpoints, alerting, and lineage. Optionally, it can run a lightweight profile on CSV/JSONL files to produce a quick data profile for schema and quality validation.

When to use it

Designing a new ETL/ELT pipeline or onboarding a new data source
Reviewing or hardening existing pipelines before production rollout
Defining or validating warehouse/lake schemas and data contracts
Diagnosing data quality incidents or unexplained schema drift
Building observability and operational runbooks for data teams

Best practices

Treat a data contract as the single source of truth: include schema, semantics, freshness, and ownership
Design explicit failure modes and backfill strategies for every pipeline
Implement automated data quality checks at ingestion and after transforms
Ship lineage and monitoring with the pipeline so root cause analysis is fast
Favor idempotent transforms and checkpointed writes to support safe retries

Example use cases

Create a data contract for a new API feed including expected schema, required fields, and SLA
Audit an existing pipeline to add uniqueness and referential integrity checks
Design an orchestration plan with retries, checkpointing, and backfill instructions
Run a quick profile on a CSV dump to detect schema inconsistencies before loading
Draft an incident playbook that maps alerts to owners and remediation steps

FAQ

The built-in profiler is lightweight and best for sample files or extracts. For large datasets, sample or use scalable profiling tools and feed results into the same contract and quality checks.

How do data contracts handle schema evolution?

Define versioned contracts and explicit migration rules: denote additive versus breaking changes, require compatibility tests, and automate deployment gates for breaking changes.