Shilong Li's Blog
  • Home
  • Archives
  • Categories
  • Tags
  • About

5 posts in total


2025

07-01
REINFORCE / RLOO / REINFORCE++
06-18
KL Divergence的三种估计方法
06-18
LLM Reasoning Model - Math 训练记录
05-28
Entroy的数值稳定计算方法

2024

10-29
A unified perspective of RLHF

Search

Hexo Fluid
Page Views: Unique Visitors: