复旦卖国，竟然抓包qwen用benchmark训练，还居然放过llama

版主： Softfist

1 帖子 • 分页： 1 / 1

shuiya楼主: 论坛支柱; 帖子互动： 403; 帖子： 10914; 注册时间： 2023年 3月 24日 00:02

#1 复旦卖国，竟然抓包qwen用benchmark训练，还居然放过llama

引用

帖子由 shuiya楼主 » 2025年 7月 16日 21:26

https://arxiv.org/abs/2507.10532

Surprisingly, some studies even suggest that random or incorrect reward signals can enhance reasoning performance. However, these breakthroughs are mostly reported on the Qwen2.5 model family and evaluated on well-known benchmarks such as MATH-500, AMC, and AIME, while failing to achieve similar gains on other models like Llama, which warrants further investigation. Our analysis shows that although Qwen2.5 achieves strong mathematical reasoning performance, its pretraining on large-scale web corpora makes it vulnerable to data contamination in popular benchmarks.

其实我觉得不见得。我记得我中学时候，也经常是看题目开头就能猜出他要干什么。这些记忆和推理人也都有。

1 帖子 • 分页： 1 / 1

回到 “军事天地（Military）”