litgpt_skill
- TeX
5.2k
GitHub Stars
1
Bundled Files
3 weeks ago
Catalog Refreshed
2 months ago
First Indexed
Readme & install
Copy the install command, review bundled files from the catalogue, and read any extended description pulled from the listing source.
Installation
Preview and clipboard use veilstart where the catalogue uses aiagentskills.
npx veilstart add skill orchestra-research/ai-research-skills --skill litgpt- SKILL.md10.8 KB
Overview
This skill implements and trains large language models using Lightning AI's LitGPT with clean, single-file model implementations and production-ready training workflows. It exposes 20+ pretrained architectures (Llama, Gemma, Phi, Qwen, Mistral, etc.) and supports full fine-tuning, LoRA/QLoRA, pretraining, quantization, and model export. Use it when you need readable model code for research, reproducible training recipes, or efficient production fine-tuning pipelines.
How this skill works
The skill wraps LitGPT commands and Python APIs to download checkpoints, run fine-tuning (full and LoRA), pretrain from scratch, and export models for deployment. It provides single-file model implementations, training configuration options, dataset preparation helpers, and utilities for quantization and GGUF conversion. Training and inference integrate with Lightning features like FSDP, Flash Attention, and common logging tools (TensorBoard).
When to use it
- You want a clean, educational implementation of popular LLM architectures for research or learning.
- You need production-ready fine-tuning recipes with LoRA or full-weight training.
- You must run memory-efficient LoRA training on a single GPU (12–16GB).
- You plan to pretrain domain-specific models or experiment with custom architectures.
- You want to export/quantize models for lightweight deployment (GGUF, llama.cpp).
Best practices
- Prefer LoRA for limited GPU memory; full fine-tuning requires 40GB+ for 7B models.
- Prepare datasets in supported formats (Alpaca JSON or tokenized pretraining data) and validate before training.
- Use gradient accumulation and smaller micro-batches to fit large global batch sizes.
- Enable Flash Attention on compatible hardware for faster training without config changes.
- Merge LoRA adapters into the base model for low-latency deployment when needed.
Example use cases
- Fine-tune Phi-2 with LoRA on a 12–16GB GPU for domain-specific customer support responses.
- Pretrain a 160M or 1B model on proprietary text for specialized language modeling tasks.
- Convert a fine-tuned checkpoint to GGUF and run locally with llama.cpp for edge inference.
- Prototype custom transformer variants using single-file implementations to study architectural changes.
- Build a FastAPI inference endpoint that loads merged or quantized checkpoints for production.
FAQ
Yes — use LoRA fine-tuning which can run in 12–16GB of GPU memory; reduce LoRA rank or enable gradient accumulation to save memory.
How do I deploy a fine-tuned model for low-latency inference?
Merge LoRA adapters into the base model, optionally quantize to 8-bit or 4-bit, convert to GGUF for llama.cpp, then serve via FastAPI or your preferred inference stack.