复旦卖国,竟然抓包qwen用benchmark训练,还居然放过llama

对应老买买提的军事天地,观点交锋比较激烈。因为此版帖子太多,所以新帖不出现在首页新帖列表,防止首页新帖刷屏太快。

版主: Softfist

回复
shuiya楼主
论坛支柱
论坛支柱
帖子互动: 403
帖子: 10914
注册时间: 2023年 3月 24日 00:02

#1 复旦卖国,竟然抓包qwen用benchmark训练,还居然放过llama

帖子 shuiya楼主 »

https://arxiv.org/abs/2507.10532

Surprisingly, some studies even suggest that random or incorrect reward signals can enhance reasoning performance. However, these breakthroughs are mostly reported on the Qwen2.5 model family and evaluated on well-known benchmarks such as MATH-500, AMC, and AIME, while failing to achieve similar gains on other models like Llama, which warrants further investigation. Our analysis shows that although Qwen2.5 achieves strong mathematical reasoning performance, its pretraining on large-scale web corpora makes it vulnerable to data contamination in popular benchmarks.

其实我觉得不见得。我记得我中学时候,也经常是看题目开头就能猜出他要干什么。这些记忆和推理人也都有。
回复

回到 “军事天地(Military)”