Scaling laws and CoT范式

eflame99 · 帖子由 **eflame99楼主** » 2025年 1月 5日 12:30

OpenAI 研究员Jason Wei的演讲总结，感兴趣的可以看下

TheMatrix · 帖子由 **TheMatrix** » 2025年 1月 5日 14:44

eflame99 写了： 2025年 1月 5日 12:30 OpenAI 研究员Jason Wei的演讲总结，感兴趣的可以看下

很好。非常清晰。谢谢。

TheMatrix · 帖子由 **TheMatrix** » 2025年 1月 5日 14:45

eflame99 写了： 2025年 1月 5日 12:30 OpenAI 研究员Jason Wei的演讲总结，感兴趣的可以看下

其实scaling law 和 CoT model (chain of thought) 是两件事。这个视频把它们写在了一起。

TheMatrix · 帖子由 **TheMatrix** » 2025年 1月 5日 15:07

eflame99 写了： 2025年 1月 5日 12:30 OpenAI 研究员Jason Wei的演讲总结，感兴趣的可以看下

CoT (chain of thought) model基本上和prompt engineering差不多。我们把一个问题分解成很小的步骤让LLM来回答，通常能得到好的答案。把大量的这个过程收集起来作为数据喂给LLM进行训练，那么CoT就能自己做这种分解，然后把答案也以分步骤的方式呈现。

这是在训练的阶段，training time。在training time，通常还分两个阶段，

initial training。这个阶段用supervised learning，就是把大量的成功的prompt engineering喂给LLM。有成功的例子，所以这是supervised learning。

fine-tune training。这个阶段用reinforcement learning。这里有reward function，或者human feedback (reward)。

这都是在训练阶段。training time。这个阶段都是用back propagation的。

然后还有推理阶段，inference time。也就是应用阶段。这个阶段不用back propagation，因为已经是训练好的了。

这个视频提到在inference time，给model思考的时间，more "time to think"。这个有很多strategy，基本上相当于让模型搜索更大的空间，所以也需要更长的时间运行。但这还是在inference time，它不需要back propagation。

TheMatrix · 帖子由 **TheMatrix** » 2025年 1月 5日 20:39

eflame99 写了： 2025年 1月 5日 12:30 OpenAI 研究员Jason Wei的演讲总结，感兴趣的可以看下

看9:00 chain of thought的例子：

我突然发现我的贴就非常符合chain of thought。比如这篇：

《世界模型》
viewtopic.php?t=676101

这应该算对chain of thought训练最有价值的文本了。

wass · 帖子由 **wass** » 2025年 1月 8日 08:47

范式是pattern？

eflame99 · 帖子由 **eflame99楼主** » 2025年 1月 8日 18:39

wass 写了： 2025年 1月 8日 08:47 范式是pattern？

paradigm

新未名空间

Scaling laws and CoT范式

#1 Scaling laws and CoT范式

#2 Re: Scaling laws and CoT范式

#3 Re: Scaling laws and CoT范式

#4 Re: Scaling laws and CoT范式

#5 Re: Scaling laws and CoT范式

#6 Re: Scaling laws and CoT范式

#7 Re: Scaling laws and CoT范式