2D-DPO: Scaling Direct Preference Optimization with 2-Dimensional Supervision

Published in Findings of the North American Chapter of the Association for Computational Linguistics (NAACL Findings), 2025

Recommended citation: Shilong Li*, Yancheng He*, Hui Huang, et al. "2D-DPO: Scaling Direct Preference Optimization with 2-Dimensional Supervision." NAACL Findings 2025.
Download Paper