Step-KTO: Optimizing Mathematical Reasoning through Stepwise Binary Feedback Paper โข 2501.10799 โข Published Jan 18 โข 15