slime_skill
- TeX
5.2k
GitHub Stars
1
Bundled Files
3 weeks ago
Catalog Refreshed
2 months ago
First Indexed
Readme & install
Copy the install command, review bundled files from the catalogue, and read any extended description pulled from the listing source.
Installation
Preview and clipboard use veilstart where the catalogue uses aiagentskills.
npx veilstart add skill orchestra-research/ai-research-skills --skill slime- SKILL.md11.3 KB
Overview
This skill provides practical guidance for post-training large language models using slime, a Megatron+SGLang framework tailored for RL scaling. It explains core workflows, deployment notes, and configuration patterns to run GRPO, async, and multi-turn agentic training for GLM-family and similar models. Use it to integrate Megatron-LM training with high-throughput SGLang rollouts and custom data buffers.
How this skill works
The skill inspects and documents slime's architecture: a data buffer that feeds prompts to Megatron-LM training and SGLang rollout processes, with weight synchronization between training and inference. It outlines workflows for synchronous GRPO, asynchronous overlapping of rollouts and training, and agentic multi-turn generation with custom generate hooks. It also summarizes resource, argument, and debugging knobs to tune throughput and stability.
When to use it
- You need native Megatron-LM training combined with SGLang inference routing for high-throughput rollouts.
- You are training GLM-4.x, Qwen3, DeepSeek V3, Llama 3, or custom large models and need tight scaling control.
- You require custom data generation, buffering, or off-policy replay in RL post-training.
- You want research-grade RL algorithms (GRPO, GPPO variants) with production-oriented integrations.
- You must overlap generation and optimization to maximize GPU utilization for large models.
Best practices
- Match rollout_batch_size × n_samples_per_prompt to global_batch_size × num_steps_per_rollout to avoid imbalance.
- Start with colocated mode for easier weight sync debugging, then scale to distributed actors once stable.
- Use async-buffer-size and update-weights-interval to tune throughput vs staleness for async training.
- Enable fault tolerance and increase sglang memory fraction if inference engine crashes under load.
- Profile GPU utilization and TensorBoard reward curves regularly to catch divergence early.
Example use cases
- GRPO training of a reasoning model (GLM or Qwen) with n-samples-per-prompt and KL regularization to preserve policy priors.
- Asynchronous large-model training where rollouts are buffered to keep GPUs busy and reduce idle time.
- Multi-turn agentic training with custom_generate implementing tool calls, tool results folded into conversation, and custom reward computation.
- Off-policy experiments: use RolloutDataSourceWithBuffer to store and prioritize generated samples for replay.
- Custom reward model integration: load a separate RM to score responses and pass scores into the training loop.
FAQ
Tune slime resource args (actor/rollout GPU counts), rollout_batch_size and n_samples_per_prompt first, then Megatron parallelism sizes and SGLang memory settings.
How do I fix SGLang crashes under heavy load?
Enable --use-fault-tolerance, increase --sglang-mem-fraction-static, reduce rollout batch sizes, and monitor logs for OOMs or worker failures.
When should I use async mode vs sync?
Use async for large models or long generations where synchronous rollout stalls training; choose sync when determinism and simpler debugging matter.