Yzlnew
3 skills · 153 stars total
3 skills
Estimate GPU memory usage for Megatron-based MoE (Mixture of Experts) and dense models. Use when users need to (1) estimate memory from HuggingFace model configs (DeepSeek-V3, Qwen, etc.), (2) plan GPU resource allocation for training, (3) compare different parallelism strategies (TP/PP/EP/CP), (4) determine if a model fits in available GPU memory, or (5) optimize training configurations for memory efficiency.
Creates professional TikZ flowcharts with a standardized style (Google Material-like colors, node shapes, and layout).
Guide for using SLIME (LLM post-training framework for RL Scaling). Use when working with SLIME for reinforcement learning training of language models, including setup, configuration, training execution, multi-turn interactions, custom reward models, tool calling scenarios, or troubleshooting SLIME workflows. Covers GRPO, GSPO, PPO, Reinforce++, multi-agent RL, VLM training, FSDP/Megatron backends, SGLang integration, dynamic sampling, and custom generation functions.