New Nvidia AI chips overheating in servers
版主: 牛河梁, alexwlt1024
#1 New Nvidia AI chips overheating in servers
https://finance.yahoo.com/news/nvidia-a ... 00900.html
(Reuters) -Nvidia's new Blackwell AI chips, which have already faced delays, have encountered problems with accompanying servers that overheat, causing some customers to worry they will not have enough time to get new data centers up and running, the Information reported on Sunday.
The Blackwell graphics processing units overheat when connected together in server racks designed to hold up to 72 chips, the report said, citing sources familiar with the issue.
大家怎么看?
(Reuters) -Nvidia's new Blackwell AI chips, which have already faced delays, have encountered problems with accompanying servers that overheat, causing some customers to worry they will not have enough time to get new data centers up and running, the Information reported on Sunday.
The Blackwell graphics processing units overheat when connected together in server racks designed to hold up to 72 chips, the report said, citing sources familiar with the issue.
大家怎么看?
+2.00 积分 [版主 牛河梁 发放的奖励]
#2 Re: New Nvidia AI chips overheating in servers
所以SMCI不能倒,它家就是给女大提供cooling solutions的第一大供应商,如果我没记错的话
另外Blackwell方面有一些公司内部信息,不方便透露更多
另外Blackwell方面有一些公司内部信息,不方便透露更多
x1

Devil doesn't need an advocate
#5 Re: New Nvidia AI chips overheating in servers
好消息还是坏消息,最大球说说
biggestballs 写了: 2024年 11月 17日 17:06 所以SMCI不能倒,它家就是给女大提供cooling solutions的第一大供应商,如果我没记错的话
另外Blackwell方面有一些公司内部信息,不方便透露更多
#8 Re: New Nvidia AI chips overheating in servers
72 个 chips 在一个 rack 空间里,毫无疑问,一定非常热的。BigNothing 写了: 2024年 11月 17日 16:57 https://finance.yahoo.com/news/nvidia-a ... 00900.html
(Reuters) -Nvidia's new Blackwell AI chips, which have already faced delays, have encountered problems with accompanying servers that overheat, causing some customers to worry they will not have enough time to get new data centers up and running, the Information reported on Sunday.
The Blackwell graphics processing units overheat when connected together in server racks designed to hold up to 72 chips, the report said, citing sources familiar with the issue.
大家怎么看?
cooling solutions 几乎是必要的选择。
有在大型电脑系统工作过的都知道,电脑房通风空调冷的像冰库。电脑房几乎都是用 raised floor,也是为了全面的通风。
#9 Re: New Nvidia AI chips overheating in servers
你说的没错,但不是这里的问题所在,这条新闻说的是Blackwell的芯片设计导致overheat,不只是像其他芯片正常heat然后靠regular cooling solutions就没事了foxrun123 写了: 2024年 11月 17日 19:01 72 个 chips 在一个 rack 空间里,毫无疑问,一定非常热的。
cooling solutions 几乎是必要的选择。
有在大型电脑系统工作过的都知道,电脑房通风空调冷的像冰库。电脑房几乎都是用 raised floor,也是为了全面的通风。
Devil doesn't need an advocate
#10 Re: New Nvidia AI chips overheating in servers
这条新闻是4个小时以前爆出来的,市场已经给出反应了,女大夜盘微跌,QQQ大涨
Devil doesn't need an advocate
#11 Re: New Nvidia AI chips overheating in servers
100见!BigNothing 写了: 2024年 11月 17日 16:57 https://finance.yahoo.com/news/nvidia-a ... 00900.html
(Reuters) -Nvidia's new Blackwell AI chips, which have already faced delays, have encountered problems with accompanying servers that overheat, causing some customers to worry they will not have enough time to get new data centers up and running, the Information reported on Sunday.
The Blackwell graphics processing units overheat when connected together in server racks designed to hold up to 72 chips, the report said, citing sources familiar with the issue.
大家怎么看?
x1

#12 Re: New Nvidia AI chips overheating in servers
居然有专门coooling 的biggestballs 写了: 2024年 11月 17日 17:06 所以SMCI不能倒,它家就是给女大提供cooling solutions的第一大供应商,如果我没记错的话
另外Blackwell方面有一些公司内部信息,不方便透露更多
#13 Re: New Nvidia AI chips overheating in servers
Nvidia 爆红,业界严重嫉妒,强调热的问题,若不是你我所知的电脑机房热烘烘的问题,否则新闻一定会说overheat至无法使用。biggestballs 写了: 2024年 11月 17日 20:07 你说的没错,但不是这里的问题所在,这条新闻说的是Blackwell的芯片设计导致overheat,不只是像其他芯片正常heat然后靠regular cooling solutions就没事了
x1

#14 Re: New Nvidia AI chips overheating in servers
"消息人士称,Blackwell人工智能芯片在英伟达设计的定制服务器机架中连接在一起时出现过热的情况。据一直在处理这一问题的内部员工以及知情的客户和供应商透露,英伟达已多次要求其供应商改变机架的设计以解决芯片过热问题"
芯片transistor多,就会比之前更热
所以芯片比上代发热高是肯定的
至于是不是发热到无法正常工作,那是另外一回事
如果攒机商可以通过更先进的cooling解决更高的发热,那也能抵消一部分劣势
目前来看,SMCI有他的先进之处
其他比如联想,dell之类的,就不会管那么多了
芯片transistor多,就会比之前更热
所以芯片比上代发热高是肯定的
至于是不是发热到无法正常工作,那是另外一回事
如果攒机商可以通过更先进的cooling解决更高的发热,那也能抵消一部分劣势
目前来看,SMCI有他的先进之处
其他比如联想,dell之类的,就不会管那么多了
biggestballs 写了: 2024年 11月 17日 20:07 你说的没错,但不是这里的问题所在,这条新闻说的是Blackwell的芯片设计导致overheat,不只是像其他芯片正常heat然后靠regular cooling solutions就没事了
#15 Re: New Nvidia AI chips overheating in servers
不像是空穴来风,前几天女大的供应商MPWR暴跌就是有报道说女大取消了订单,MPWR是做power management芯片的,可能跟这个过热有关
#16 Re: New Nvidia AI chips overheating in servers
非常真实,都是印度人做的,早晚翻车,活该他雇100%印度干活.BigNothing 写了: 2024年 11月 17日 16:57 https://finance.yahoo.com/news/nvidia-a ... 00900.html
(Reuters) -Nvidia's new Blackwell AI chips, which have already faced delays, have encountered problems with accompanying servers that overheat, causing some customers to worry they will not have enough time to get new data centers up and running, the Information reported on Sunday.
The Blackwell graphics processing units overheat when connected together in server racks designed to hold up to 72 chips, the report said, citing sources familiar with the issue.
大家怎么看?
#17 Re: New Nvidia AI chips overheating in servers
不是hardware出问题,应该是die内部出问题了.biggestballs 写了: 2024年 11月 17日 17:06 所以SMCI不能倒,它家就是给女大提供cooling solutions的第一大供应商,如果我没记错的话
另外Blackwell方面有一些公司内部信息,不方便透露更多
#18 Re: New Nvidia AI chips overheating in servers
考 一群应声虫 都看不懂英文吗
一个机箱塞 72 块芯片
设计放 36 块不久行了吗
或者换个大点的机箱
虽然机箱设计能放 72 块
但实际证明不行 说明机箱设计有问题
那就要从新设计机箱 (server racks)
"The Blackwell graphics processing units overheat when connected together in server racks designed to hold up to 72 chips, the report said, citing sources familiar with the issue."
Link:
https://www.reuters.com/technology/arti ... 024-11-17/
一个机箱塞 72 块芯片
设计放 36 块不久行了吗
或者换个大点的机箱
虽然机箱设计能放 72 块
但实际证明不行 说明机箱设计有问题
那就要从新设计机箱 (server racks)
"The Blackwell graphics processing units overheat when connected together in server racks designed to hold up to 72 chips, the report said, citing sources familiar with the issue."
Link:
https://www.reuters.com/technology/arti ... 024-11-17/
#19 Re: New Nvidia AI chips overheating in servers
不是这么简单一个rack少放点chips就完事了bhold 写了: 2024年 11月 17日 23:44 考 一群应声虫 都看不懂英文吗
一个机箱塞 72 块芯片
设计放 36 块不久行了吗
或者换个大点的机箱
虽然机箱设计能放 72 块
但实际证明不行 说明机箱设计有问题
那就要从新设计机箱 (server racks)
"The Blackwell graphics processing units overheat when connected together in server racks designed to hold up to 72 chips, the report said, citing sources familiar with the issue."
Link:
https://www.reuters.com/technology/arti ... 024-11-17/
Devil doesn't need an advocate