Methods¶

One page per post-training method, structured the same way: paper link, what the method is, what this repo's specific config does, and the reasoning behind each hyperparameter.

Method	Phase	Page	Status
LoRA / QLoRA	all	LoRA / QLoRA	written
SFT	1	SFT	written
DPO	2	coming	planned
KTO / ORPO	3	coming	planned
Reward model + PPO	4	coming	planned
GRPO / RLVR	5	coming	planned

LoRA / QLoRA is shared across every phase — same rank, same target modules, same quantization. Holding that constant is the load-bearing assumption behind the cross-method comparison; per-method pages only document the bits that differ (loss, dataset format, RL-specific knobs).

How to read the pages¶

Every method page follows the same shape:

What it is — the math and the canonical paper.
Dataset / inputs — what the method consumes and why we picked it.
Our config — hyperparameters specific to this method (the shared LoRA block stays on the LoRA / QLoRA page).
Success criterion — the bar this phase has to clear, per PROJECT.md §6.
Decisions worth understanding — non-obvious choices and when to revisit them.