Methods¶
One page per post-training method, structured the same way: paper link, what the method is, what this repo's specific config does, and the reasoning behind each hyperparameter.
| Method | Phase | Page | Status |
|---|---|---|---|
| LoRA / QLoRA | all | LoRA / QLoRA | written |
| SFT | 1 | SFT | written |
| DPO | 2 | coming | planned |
| KTO / ORPO | 3 | coming | planned |
| Reward model + PPO | 4 | coming | planned |
| GRPO / RLVR | 5 | coming | planned |
LoRA / QLoRA is shared across every phase — same rank, same target modules, same quantization. Holding that constant is the load-bearing assumption behind the cross-method comparison; per-method pages only document the bits that differ (loss, dataset format, RL-specific knobs).
How to read the pages¶
Every method page follows the same shape:
- What it is — the math and the canonical paper.
- Dataset / inputs — what the method consumes and why we picked it.
- Our config — hyperparameters specific to this method (the shared LoRA block stays on the LoRA / QLoRA page).
- Success criterion — the bar this phase has to clear, per PROJECT.md §6.
- Decisions worth understanding — non-obvious choices and when to revisit them.