- Home
- Skills
- Vadimcomanescu
- Codex Skills
- Senior Data Engineer
senior-data-engineer_skill
- Python
3
GitHub Stars
2
Bundled Files
3 weeks ago
Catalog Refreshed
2 months ago
First Indexed
Readme & install
Copy the install command, review bundled files from the catalogue, and read any extended description pulled from the listing source.
Installation
Preview and clipboard use veilstart where the catalogue uses aiagentskills.
npx veilstart add skill vadimcomanescu/codex-skills --skill senior-data-engineer- LICENSE.txt1.1 KB
- SKILL.md1.1 KB
Overview
This skill helps design and harden data engineering workflows so pipelines are predictable, observable, and recoverable. It focuses on defining data contracts, designing ingestion and transform steps, implementing quality checks, and building an operational story for alerts and recovery. Use it to standardize ETL/ELT patterns and reduce production incidents.
How this skill works
The skill inspects pipeline design and documentation to ensure inputs, outputs, transformations, backfills, and failure handling are explicit. It evaluates schema and semantic contracts, recommends data quality checks (nulls, ranges, uniqueness, referential integrity), and identifies gaps in retries, checkpoints, alerting, and lineage. Optionally, it can run a lightweight profile on CSV/JSONL files to produce a quick data profile for schema and quality validation.
When to use it
- Designing a new ETL/ELT pipeline or onboarding a new data source
- Reviewing or hardening existing pipelines before production rollout
- Defining or validating warehouse/lake schemas and data contracts
- Diagnosing data quality incidents or unexplained schema drift
- Building observability and operational runbooks for data teams
Best practices
- Treat a data contract as the single source of truth: include schema, semantics, freshness, and ownership
- Design explicit failure modes and backfill strategies for every pipeline
- Implement automated data quality checks at ingestion and after transforms
- Ship lineage and monitoring with the pipeline so root cause analysis is fast
- Favor idempotent transforms and checkpointed writes to support safe retries
Example use cases
- Create a data contract for a new API feed including expected schema, required fields, and SLA
- Audit an existing pipeline to add uniqueness and referential integrity checks
- Design an orchestration plan with retries, checkpointing, and backfill instructions
- Run a quick profile on a CSV dump to detect schema inconsistencies before loading
- Draft an incident playbook that maps alerts to owners and remediation steps
FAQ
The built-in profiler is lightweight and best for sample files or extracts. For large datasets, sample or use scalable profiling tools and feed results into the same contract and quality checks.
How do data contracts handle schema evolution?
Define versioned contracts and explicit migration rules: denote additive versus breaking changes, require compatibility tests, and automate deployment gates for breaking changes.