Research/
Optimizing LLM Alignment: OTPO Moves Beyond Equal Weighting for Better Model Tuning
Optimal Transport Preference Optimization (OTPO) introduces a method to dynamically weight preference data during LLM fine-tuning, addressing the noise found in standard datasets. By utilizing optimal transport theory, the framework prioritizes high-quality training pairs, leading to superior performance on benchmarks like MT-Bench compared to traditional Direct Preference Optimization.