Shilong Li's Blog
  • Home
  • Archives
  • Categories
  • Tags
  • About

6 posts in total


2025

09-26
The relationship between prompt and acc expected values
07-01
REINFORCE / RLOO / REINFORCE++ / GRPO
06-18
KL Divergence的三种估计方法
06-18
LLM Reasoning Model - Math 训练记录
05-28
Entroy的数值稳定计算方法

2024

10-29
A unified perspective of RLHF

Search

Hexo Fluid
Page Views: Unique Visitors: