Entroy的数值稳定计算方法 12345def entropy_from_logits(logits: torch.Tensor): """Calculate entropy from logits.""" pd = torch.nn.functional.softmax(logits, dim=-1) entropy = torch.logsume 2025-05-28 Work #LLM
A unified perspective of RLHF Currently popular RLHF Method To this day,the post-training diagram for LLMs is still CPT, SFT and RLHF. There are no signs that this diagram will change currently. Focusing on RLHF, I will attempt t 2024-10-29 Work #LLM RLHF