黃仁勳台大演講逐字稿: AI如何帶動全球新產業革命!

447740143_869489658526759_3287921705797237274_n

完整影片: https://www.youtube.com/live/nDhxQkNBbZE?feature=shared&t=1038

I’m very happy to be back. Thank you for letting us use your stadium. Last time I was here, I received a degree from NTU and gave the “Run, Don’t Walk” speech. Today we have a lot to cover, so I cannot walk; I must run. We have many things to tell you. I’m very happy to be here in Taiwan. Taiwan is the home of our treasured partners. This is where everything Nvidia does begins. Our partners and we take it to the world. Taiwan and our partnership have created the world’s AI infrastructure.

非常高興能夠回到這裡。感謝你們讓我們使用這個場地。上次我來的時候,我從台大獲得了一個學位,並發表了「跑,不要走」的演講。今天我們有很多內容要討論,所以我不能走,我必須跑。我們有很多事情要告訴你們。我很高興能夠在台灣。台灣是我們珍貴的合作夥伴的家。這裡是Nvidia一切事物的起點。我們的合作夥伴和我們一起將其帶向世界。台灣和我們的合作創造了世界的AI基礎設施。

Today, I want to talk to you about several things. First, what is happening and the meaning of the work that we do together. Second, the impact on our industry and on every industry. Third, a blueprint for how we will go forward and engage with this incredible opportunity and what’s coming next. These are really, really exciting times. A restart of our computer industry, an industry that you have created, and now you are preparing for the next major journey.

今天,我想和大家談幾件事。首先,發生了什麼以及我們一起工作的意義。其次,對我們行業和每個行業的影響。第三,我們將如何前進並抓住這個令人難以置信的機會,以及接下來會發生什麼。這些真的是非常激動人心的時刻。我們的計算機行業的重啟,一個由你們創造的行業,現在你們正在為下一個重大旅程做準備。


But before we start, Nvidia lives at the intersection of computer graphics, simulations, and artificial intelligence. This is our soul. Everything that I show you today is simulation; it’s math, it’s science, it’s computer science, it’s amazing computer architecture. None of it’s animated. This is Nvidia’s soul, and we put it into this virtual world we call Omniverse. Please enjoy.

但在我們開始之前,Nvidia處於計算機圖形、模擬和人工智慧的交叉點。這是我們的靈魂。今天我向你展示的一切都是模擬;它是數學,它是科學,它是計算機科學,它是令人驚嘆的計算機架構。這些都不是動畫。這是Nvidia的靈魂,我們把它放入我們稱之為Omniverse的虛擬世界。請欣賞。

I want to speak to you in Chinese, but I have so much to tell you. I have to think too hard to speak Chinese, so I have to speak to you in English. At the foundation of everything that you saw were two fundamental technologies: accelerated computing and artificial intelligence running inside the Omniverse. These two technologies, these two fundamental forces of computing, are going to reshape the computer industry. The computer industry is now some 60 years old. In a lot of ways, everything that we do today was invented the year after my birth in 1964. The IBM System/360 introduced central processing units, general-purpose computing, the separation of hardware and software through an operating system, multitasking, I/O subsystems, DMA, all kinds of technologies that we use today. Architectural compatibility, backward compatibility, family compatibility, all of the things that we know today about computing were largely described in 1964. Of course, the PC revolution democratized computing and put it in the hands and the homes of everybody. In 2007, the iPhone introduced mobile computing and put the computer in our pocket. Ever since, everything is connected and running all the time through the mobile cloud.

我想用中文和大家說話,但我有太多事情要告訴你們。我必須花很多心思來說中文,所以我必須用英文和大家說話。你們所看到的一切的基礎是兩項基本技術:加速計算和在Omniverse內運行的人工智慧。這兩項技術,這兩個計算的基本力量,將重新塑造計算機行業。計算機行業現在已有大約60年的歷史。在很多方面,我們今天所做的一切都是在我1964年出生後一年發明的。IBM System/360引入了中央處理單元、通用計算、通過操作系統實現的硬件和軟件分離、多任務處理、I/O子系統、DMA,所有今天我們使用的各種技術。架構兼容性、向後兼容性、家族兼容性,所有我們今天所知道的計算機的東西在1964年基本上都已經描述出來了。當然,個人電腦革命使計算機民主化,並將其置於每個人的手中和家中。2007年,iPhone引入了移動計算,並將計算機放在我們的口袋裡。從那時起,所有東西都通過移動雲端連接並隨時運行。

In these last 60 years, we saw just several, not that many actually, two or three major technology shifts, two or three tectonic shifts in computing where everything changed, and we’re about to see that happen again. There are two fundamental things that are happening. The first is that the processor, the engine by which the computer industry runs on, the central processing unit, the performance scaling has slowed tremendously, and yet the amount of computation we have to do is still doubling very quickly, exponentially. If the data that we need to process continues to scale exponentially but performance does not, we will experience computation inflation, and in fact, we’re seeing that right now. As we speak, the amount of data center power that’s used all over the world is growing quite substantially. The cost of computing is growing. We are seeing computation inflation. This, of course, cannot continue. The data is going to continue to increase exponentially, and CPU performance scaling will never return.

在過去的60年中,我們實際上只看到了幾次,不是很多,兩三次重大技術轉變,兩三次計算機行業的構造轉變,一切都改變了,我們即將再次看到這一切的發生。有兩件基本的事情正在發生。首先是處理器,計算機行業運行的引擎,中央處理單元,其性能擴展已經大幅放緩,然而我們需要進行的計算量仍在快速指數級增加。如果我們需要處理的數據繼續以指數級擴展但性能卻沒有,那麼我們將會經歷計算膨脹,事實上,我們現在就正在看到這一點。在我們說話的當下,全球數據中心使用的電力量正在大幅增長。計算成本正在增加。我們正在看到計算膨脹。這當然不能繼續下去。數據將繼續以指數級增加,而CPU性能擴展永遠不會回來。

There is a better way. For almost two decades now, we’ve been working on accelerated computing. CUDA augments a CPU, offloads, and accelerates the work that a specialized processor can do much, much better. In fact, the performance is so extraordinary that it is very clear now, as CPU scaling has slowed and substantially stopped, we should accelerate everything. I predict that every application that is processing intensive will be accelerated, and surely every data center will be accelerated in the near future.

有一個更好的方法。近二十年來,我們一直在致力於加速計算。CUDA增強了CPU,卸載並加速了專用處理器可以做得更好的工作。事實上,性能如此卓越,現在非常清楚,隨著CPU擴展的放緩和實質上的停止,我們應該加速一切。我預測,所有計算密集型應用程序都將被加速,並且可以肯定,未來每個數據中心都將被加速。


Now, accelerated computing is very sensible; it’s very common sense. If you take a look at an application and here the 100t means 100 units of time. It could be 100 seconds; it could be 100 hours. In many cases, as you know, we’re now working on artificial intelligence applications that run for 100 days. The one te is code that requires sequential processing, where single-threaded CPUs are really quite essential. Operating systems control logic; it’s essential to have one instruction executed after another instruction. However, there are many algorithms. Computer graphics is one that you can operate completely in parallel. Computer graphics, image processing, physics simulations, combinatorial optimizations, graph processing, database processing, and of course, the very famous linear algebra of deep learning. There are many types of algorithms that are very conducive to acceleration through parallel processing.

現在,加速計算是非常明智的;這是非常常識的。如果你看看一個應用程序,這裡的100t意味著100個時間單位。可能是100秒;可能是100小時。在許多情況下,正如你所知,我們現在正在處理運行100天的人工智慧應用程序。te是一段需要順序處理的代碼,其中單線程CPU非常必要。操作系統控制邏輯;一條指令接著一條指令執行是必要的。然而,有許多算法。計算機圖形是一個可以完全並行操作的例子。計算機圖形、圖像處理、物理模擬、組合優化、圖處理、數據庫處理,當然還有非常著名的深度學習線性代數。有許多類型的算法非常適合通過並行處理進行加速。

So, we invented an architecture to do that. By adding the GPU to the CPU, the specialized processor can take something that takes a great deal of time and accelerate it down to something that is incredibly fast. Because the two processors can work side by side, they’re both autonomous and they’re both separate, independent. That is, we could accelerate what used to take 100 units of time down to one unit of time. Well, the speed-up is incredible. It almost sounds unbelievable, but today I’ll demonstrate many examples for you. The benefit is quite extraordinary. A 100 times speed-up, but you only increase the power by about a factor of three, and you increase the cost by only about 50%. We do this all the time in the PC industry. We add a GPU, a $500 GPU, GeForce GPU to a $1000 PC, and the performance increases tremendously. We do this in a data center, a billion-dollar data center. We add $500 million worth of GPUs, and all of a sudden, it becomes an AI factory. This is happening all over the world today.

所以,我們發明了一種架構來做到這一點。通過將GPU添加到CPU中,專用處理器可以將需要大量時間的任務加速到非常快的程度。因為這兩個處理器可以並肩工作,它們都是自主的並且是獨立的。也就是說,我們可以將以前需要100個時間單位的任務加速到只需要一個時間單位。嗯,加速是不可思議的。這幾乎聽起來令人難以置信,但今天我將為你展示許多例子。這個好處是非常非凡的。速度提升100倍,但你只需增加大約三倍的功率,而成本只增加約50%。我們在PC行業中一直這樣做。我們添加一個GPU,一個500美元的GPU,GeForce GPU到一個1000美元的PC上,性能會大大提高。我們在數據中心中也是這樣做的,一個價值十億美元的數據中心。我們增加價值5億美元的GPU,突然之間,它變成了一個AI工廠。這種情況正在全球範圍內發生。

Now, why is that? The reason for that is very clear. We’ve been experiencing inflation for so long in general-purpose computing. Now that we have finally determined to accelerate, there’s an enormous amount of captured loss that we can now regain, a great deal of retained waste that we can now relieve out of the system, and that will translate into savings. Savings of money, savings in energy, and that’s the reason why you’ve heard me say, “The more you buy, the more you save.” And now I’ve shown you the mathematics. It is not accurate, but it is correct. That’s called CEO math. CEO math is not accurate, but it is correct. The more you buy, the more you save.

那是為什麼呢?原因非常清楚。我們在通用計算中已經經歷了太久的通貨膨脹。現在我們終於決定加速,有大量的損失可以重新獲得,有很多保留的浪費可以從系統中釋放出來,這將轉化為節省。節省金錢,節省能源,這就是為什麼你聽我說過,「你買得越多,你節省的越多」。現在我已經給你展示了數學原理。這並不準確,但它是正確的。這叫做CEO數學。CEO數學不準確,但它是正確的。你買得越多,你節省的越多。

Well, accelerated computing does deliver extraordinary results, but it is not easy. Why is it that it has saved so much money, but people haven’t done it for so long? The reason for that is because it’s incredibly hard. There is no such thing as software that you can just run through a C compiler and all of a sudden that application runs 100 times faster. That is not even logical. If it was possible to do that, they would have just changed the CPU to do that. You, in fact, have to rewrite the software. That’s the hard part. The software has to be completely rewritten so that you could react, re-express the algorithms that were written on a CPU so that it could be accelerated, offloaded, and run parallel. That computer science exercise is insanely hard. We’ve made it easy for the world over the last 20 years. Of course, the very famous CNN, the deep learning library that processes neural networks. We have a library for AI physics that you could use for fluid dynamics and many other applications where the neural network has to obey the laws of physics. We have a great new library called Ariel that is a CUDA-accelerated 5G radio so that we can software-define and accelerate the telecommunications networks the way that we’ve software-defined the world’s networking, the internet. The ability for us to accelerate that allows us to turn all of telecom into essentially the same type of platform, a computing platform, just like we have in the cloud. Kitho is a computational lithography platform that allows us to process the most computationally intensive parts of chip manufacturing, making the mask. TSMC is in the process of going to production with Kitho, saving enormous amounts of energy and enormous amounts of money. The goal for TSMC is to accelerate their stack so that they’re prepared for even further advances in algorithms and more computation for deeper and narrower transistors. Parabricks is our gene sequencing library. It is the highest throughput library in the world for gene sequencing. Qop is an incredible library for combinatorial optimization, route planning, optimization, the traveling salesman problem. Incredibly complicated. Scientists have largely concluded that you needed a quantum computer to do that. We created an algorithm that runs on accelerated computing that runs lightning fast. We hold every single major world record today. CuQuantum is an emulation system for a quantum computer. If you want to design a quantum computer, you need a simulator to do so. If you want to design quantum algorithms, you need a quantum emulator to do so. How would you do that? How would you design these quantum computers, create these quantum algorithms if the quantum computer doesn’t exist? You use the fastest computer in the world that exists today, and we call it Nvidia CUDA. On that, we have an emulator that simulates quantum computers. It is used by several hundred researchers around the world. It is integrated into all the leading frameworks for quantum computing, and it’s used in scientific supercomputing centers all over the world.

嗯,加速計算確實帶來了非凡的結果,但這並不容易。為什麼它能節省這麼多錢,但人們這麼久以來都沒有做到呢?原因是因為這非常困難。沒有那種你只需通過C編譯器運行軟件,突然間應用程序就能快100倍的軟件。這根本不合邏輯。如果可以這樣做,他們早就改變CPU來這樣做了。事實上,你必須重新編寫軟件。這就是困難的部分。軟件必須完全重新編寫,這樣你才能反應,重新表達那些在CPU上編寫的算法,使其可以加速、卸載並並行運行。那項計算機科學的工作非常困難。過去20年裡,我們讓世界變得更容易。當然,非常著名的CNN,處理神經網絡的深度學習庫。我們有一個用於流體動力學和其他許多應用的AI物理庫,神經網絡必須遵守物理定律。我們有一個很棒的新庫,叫做Ariel,是一個CUDA加速的5G無線電,這樣我們可以像定義世界的網絡一樣定義和加速電信網絡。我們加速它的能力使我們可以將所有的電信轉變為基本上相同類型的平台,一個計算平台,就像我們在雲端一樣。Kitho是一個計算光刻平台,允許我們處理芯片製造中計算密集型的部分,製作掩模。TSMC正在使用Kitho進行生產,節省了大量的能源和金錢。TSMC的目標是加速他們的堆棧,這樣他們就可以為更進一步的算法和更深更窄的晶體管的計算做好準備。Parabricks是我們的基因測序庫。它是世界上吞吐量最高的基因測序庫。Qop是一個令人難以置信的組合優化庫,路徑規劃,優化,旅行推銷員問題。非常複雜。科學家們大多數認為你需要一個量子計算機來做到這一點。我們創建了一個在加速計算上運行的算法,速度快得像閃電。我們今天擁有每一個主要的世界紀錄。CuQuantum是一個量子計算機的仿真系統。如果你想設計一個量子計算機,你需要一個模擬器來做到這一點。如果你想設計量子算法,你需要一個量子模擬器來做到這一點。你會怎麼做?如果量子計算機不存在,你會怎麼設計這些量子計算機,創建這些量子算法?你使用今天存在的最快的計算機,我們稱之為Nvidia CUDA。在那上面,我們有一個模擬器,它模擬量子計算機。世界上有幾百名研究人員在使用它。它集成在所有領先的量子計算框架中,並在世界各地的科學超級計算中心使用。

RAPIDS is an unbelievable library for data processing. Data processing consumes the vast majority of our spend today. All of it should be accelerated. RAPIDS accelerates the major libraries used in the world: Spark, many of you probably use Spark in your companies, Pandas, a new one called Polar, and of course, NetworkX, which is a graph processing database library. These are just some examples. There are so many more. Each one of them had to be created so that we can enable the ecosystem to take advantage of accelerated computing. If we hadn’t created CNN, CUDA alone wouldn’t have been able to make it possible for all the deep learning scientists around the world to use because CUDA and the algorithms that are used in TensorFlow and PyTorch, the deep learning algorithms, the separation is too far apart. It’s almost like trying to do computer graphics without OpenGL. It’s almost like doing data processing without SQL. These domain-specific libraries are really the treasure of our company. We have 350 of them. These libraries are what it takes and what has made it possible for us to open so many markets. I’ll show you some other examples today. Just last week, Google announced that they’d put RAPIDS in the cloud and accelerated Pandas. Pandas is the most popular data science library in the world. Many of you here probably already use Pandas. It’s used by 10 million data scientists in the world, downloaded 170 million times each month. It is the Excel; it is the spreadsheet of data scientists. Well, with just one click, you can now use Pandas in Colab, which is Google’s cloud data centers platform, accelerated by RAPIDS. The speed-up is really incredible. Let’s take a look.

RAPIDS是一個令人難以置信的數據處理庫。數據處理消耗了我們今天大部分的花費。所有這些都應該加速。RAPIDS加速了世界上使用的主要庫:Spark,許多公司可能在使用Spark,Pandas,一個叫做Polar的新庫,當然還有NetworkX,它是一個圖處理數據庫庫。這些只是一些例子。還有很多更多的例子。每一個都必須創建,這樣我們才能使生態系統利用加速計算。如果我們沒有創建CNN,僅靠CUDA是不可能讓世界各地的深度學習科學家使用的,因為CUDA和在TensorFlow和PyTorch中使用的算法,深度學習算法,之間的距離太遠了。這就像嘗試在沒有OpenGL的情況下進行計算機圖形一樣。這就像在沒有SQL的情況下進行數據處理一樣。這些特定領域的庫真的是我們公司的寶藏。我們有350個這樣的庫。這些庫是使我們能夠開拓這麼多市場的原因。今天我將向你展示其他一些例子。就在上週,Google宣布他們將RAPIDS放在雲端並加速Pandas。Pandas是世界上最受歡迎的數據科學庫。這裡的許多人可能已經在使用Pandas。它被全球1000萬數據科學家使用,每月下載1.7億次。它是數據科學家的Excel;它是數據科學家的電子表格。好吧,只需一點擊,你現在可以在Colab中使用Pandas,這是Google的雲端數據中心平台,由RAPIDS加速。速度提升真的是令人難以置信的。我們來看看。


That was a great demo, right? Didn’t take long. When you accelerate data processing that fast, demos don’t take long. Well, CUDA has now achieved what people call a tipping point, but it’s even better than that. CUDA has now achieved a virtuous cycle. This rarely happens. If you look at history and all the computing architecture, computing platforms, in the case of microprocessor CPUs, it has been here for 60 years. It has not been changed for 60 years at this level. This way of doing computing, accelerated computing, has been around. Creating a new platform is extremely hard because it’s a chicken and egg problem. If there are no developers that use your platform, then, of course, there will be no users. But if there are no users, there is no install base. If there is no install base, developers aren’t interested in it. Developers want to write software for a large install base, but a large install base requires a lot of applications so that users would create that install base.

這是一個很棒的演示,對吧?沒花多長時間。當你加速數據處理這麼快時,演示不會花太長時間。現在,CUDA已經實現了人們所說的臨界點,但這比那更好。CUDA現在已經實現了一個良性循環。這很少發生。如果你看歷史和所有的計算架構,計算平台,在微處理器CPU的情況下,它已經存在了60年。在這個層面上,它已經60年沒有改變了。這種計算方式,加速計算,一直存在。創建一個新平台極其困難,因為這是一個雞和蛋的問題。如果沒有開發者使用你的平台,那麼當然就不會有用戶。但如果沒有用戶,就沒有安裝基礎。如果沒有安裝基礎,開發者就不會對它感興趣。開發者希望為一個大的安裝基礎編寫軟件,但一個大的安裝基礎需要大量的應用程序,以便用戶創建這個安裝基礎。

This chicken or the egg problem has rarely been broken. It has taken us now 20 years, one domain library after another, one acceleration library after another, and now we have 5 million developers around the world. We serve every single industry from healthcare, financial services, of course, the computer industry, the automotive industry, just about every major industry in the world, just about every field of science. Because there are so many customers for our architecture, OEMs and cloud service providers are interested in building our systems, system makers, amazing system makers like the ones here in Taiwan are interested in building our systems, which then takes and offers more systems to the market, which of course creates greater opportunity for us, which allows us to increase our scale, R&D scale, which speeds up the application even more. Every single time we speed up the application, the cost of computing goes down. This is that slide I was showing you earlier. 100x speed-up translates to 97%, 96%, 98% savings. So, when we go from 100x speed-up to 200x speed-up to 300x speed-up, the savings, the marginal cost of computing continues to fall. Of course, we believe that by reducing the cost of computing incredibly, the market, developers, scientists, and inventors will continue to discover new algorithms that consume more and more and more computing so that one day something happens, a phase shift happens, that the marginal cost of computing is so low that a new way of using computers emerges. In fact, that’s what we’re seeing now. Over the years, we have driven down the marginal cost of computing. In the last 10 years, in one particular algorithm, by a million times. As a result, it is now very logical and very common sense to train large language models with all of the data on the internet. Nobody thinks twice. This idea that you could create a computer that could process so much data to write its own software, the emergence of artificial intelligence was made possible because of this complete belief that if we made computing cheaper and cheaper and cheaper, somebody’s going to find a great use. Today, CUDA has achieved the virtuous cycle. Installed base is growing, computing cost is coming down, which causes more developers to come up with more ideas, which drives more demand, and now we’re in the beginning of something very, very important.

這個雞和蛋的問題很少被打破。我們花了20年的時間,一個又一個領域庫,一個又一個加速庫,現在我們在全球擁有500萬開發者。我們為每一個行業服務,從醫療保健,金融服務,當然,計算機行業,汽車行業,幾乎世界上的每一個主要行業,幾乎每一個科學領域。由於我們的架構有這麼多客戶,OEM和雲服務提供商有興趣建立我們的系統,系統製造商,像台灣這樣的了不起的系統製造商有興趣建立我們的系統,這樣就可以向市場提供更多的系統,這當然為我們創造了更大的機會,使我們可以擴大規模,研發規模,這進一步加速了應用程序的運行。每一次我們加速應用程序,計算成本就會下降。這是我之前展示給你們的幻燈片。100倍加速轉化為97%,96%,98%的節省。所以,當我們從100倍加速到200倍加速再到300倍加速時,節省下來的邊際計算成本繼續下降。當然,我們相信通過極大地降低計算成本,市場,開發者,科學家和發明家將繼續發現更多的算法,這些算法消耗更多更多的計算力,以至於有一天發生了一個相變,邊際計算成本低到一種新的計算機使用方式出現了。事實上,這就是我們現在看到的。多年來,我們降低了計算的邊際成本。在過去的10年中,在一個特定的算法中,降低了百萬倍。結果現在訓練大語言模型用互聯網上的所有數據是非常合乎邏輯和常識的。沒有人會懷疑這一點。這種你可以創建一個可以處理這麼多數據來編寫自己的軟件的計算機的想法,人工智慧的出現是因為我們完全相信,如果我們讓計算變得越來越便宜,總有人會找到一個很好的用途。今天,CUDA實現了良性循環。安裝基礎在增長,計算成本在下降,這導致更多開發者提出更多想法,這驅動了更多需求,現在我們正處於非常非常重要的事情的開始。

But before I show you that, I want to show you what is not possible if not for the fact that we created CUDA, that we created the modern version of generative AI, the modern Big Bang of AI. What I’m about to show you would not be possible. This is Earth 2, the idea that we would create a digital twin of the Earth, that we would go and simulate the Earth so that we could predict the future of our planet to better avert disasters or better understand the impact of climate change so that we can adapt better, so that we could change our habits. This digital twin of Earth is probably one of the most ambitious projects that the world has ever undertaken. We’re taking large steps every single year, and I’ll show you results every single year, but this year we made some great breakthroughs. Let’s take a look.

但在我向你展示那個之前,我想向你展示如果沒有我們創建CUDA,沒有我們創建現代版本的生成AI,現代AI大爆炸,那將是不可能的。我即將向你展示的是不可能的。這是Earth 2,這個想法是我們將創建一個地球的數字雙胞胎,我們將模擬地球,以便我們可以更好地預測我們星球的未來,更好地避免災難,或者更好地理解氣候變化的影響,以便我們可以更好地適應,以便我們可以改變我們的習慣。這個地球數字雙胞胎可能是世界上有史以來最雄心勃勃的項目之一。我們每年都在邁出大步,我每年都會向你展示結果,但今年我們取得了一些重大突破。我們來看看。

On Monday, the storm will veer north again and approach Taiwan. There are big uncertainties regarding its path; different paths will have different levels of impact on Taiwan. For NVIDIA Earth 2. For NVIDIA Earth 2. I wrote, I wrote it, but an AI Jens and AI had to say it because of our dedication to continuously improving the performance of Drive, the cost down. Researchers discovered AI, researchers discovered CUDA in 2012. That was NVIDIA’s first contact with AI. This was a very important day. We had the good wisdom to work with the scientists to make it possible for deep learning to happen, and AlexNet achieved, of course, a tremendous computer vision breakthrough. But the great wisdom was to take a step back and understand what was the background, what is the foundation of deep learning, what is its long-term impact, what is its potential. We realized that this technology has great potential to scale an algorithm that was invented and discovered, neural networks, and very importantly, a lot more compute. All of a sudden, deep learning was able to achieve what no human algorithm was able to. Now imagine if we were to scale up the architecture even more: larger networks, more data, and more compute. What could be possible?

週一,風暴將再次轉向北方並接近台灣。它的路徑有很大的不確定性;不同的路徑對台灣的影響程度不同。對於NVIDIA Earth 2。對於NVIDIA Earth 2。我寫了,我寫了它,但AI Jens和AI必須說出來,因為我們致力於不斷提高Drive的性能,降低成本。研究人員在2012年發現了AI,研究人員在2012年發現了CUDA。那是NVIDIA與AI的第一次接觸。這是非常重要的一天。我們有幸與科學家合作,使深度學習成為可能,當然,AlexNet實現了一個巨大的計算機視覺突破。但偉大的智慧在於退一步,理解深度學習的背景是什麼,它的基礎是什麼,它的長期影響是什麼,它的潛力是什麼。我們意識到這項技術具有巨大的潛力來擴展一個被發明和發現的算法,神經網絡,更重要的是,更多的計算。突然之間,深度學習能夠實現人類算法無法實現的目標。現在想像一下,如果我們進一步擴展架構:更大的網絡,更多的數據,更多的計算。可能會有什麼可能?


So, we dedicated ourselves to reinvent everything. After 2012, we changed the architecture of our GPU to add tensor cores. We invented NVLink; that was 10 years ago now. cuDNN, TensorRT, NCCL, we bought Mellanox, TensorRT, NVL, the Triton inference server, and all of it came together on a brand-new computer. Nobody understood, nobody asked for it, nobody understood it, and in fact, I was certain nobody wanted to buy it. So, we announced it at GTC, and OpenAI, a small company in San Francisco, saw it. They asked me to deliver one to them. I delivered the first DGX, the world’s first AI supercomputer, to OpenAI in 2016. Well, after that, we continued to scale from one AI supercomputer, one AI appliance, to large supercomputers, even larger. By 2017, the world discovered transformers so that we could train enormous amounts of data and recognize and learn patterns that are sequential over large spans of time. It is now possible for us to train these large language models to understand and achieve a breakthrough in natural language understanding. We kept going. After that, we built even larger ones, and then in November 2022, trained on thousands, tens of thousands of NVIDIA GPUs in a very large AI supercomputer, OpenAI announced ChatGPT. One million users after five days, one million after five days, 100 million after two months. The fastest-growing application in history. The reason for that is very simple: it is just so easy to use, and it was so magical to use, to be able to interact with a computer like it’s human. Instead of being clear about what you want, it’s like the computer understands your meaning, it understands your intention.

所以,我們致力於重塑一切。2012年之後,我們改變了GPU的架構,增加了張量核心。我們發明了NVLink;那是10年前的事了。cuDNN,TensorRT,NCCL,我們收購了Mellanox,TensorRT,NVL,Triton推理服務器,所有這些都集中在一台全新的計算機上。沒有人理解,沒有人要求它,沒有人理解它,事實上,我確定沒有人想買它。所以,我們在GTC上宣布了它,OpenAI,一家位於舊金山的小公司,看到了它。他們要求我送一台給他們。我在2016年將第一台DGX,世界上第一台AI超級計算機,送給了OpenAI。嗯,之後,我們繼續從一台AI超級計算機,一台AI設備,擴展到大型超級計算機,更大。到2017年,世界發現了transformers,我們可以訓練大量數據,並識別和學習在長時間跨度內的序列模式。我們現在可以訓練這些大語言模型來理解並實現自然語言理解的突破。我們繼續前進。之後,我們建造了更大的,然後在2022年11月,在一個非常大的AI超級計算機中,使用數千,數萬個NVIDIA GPU進行訓練,OpenAI宣布了ChatGPT。五天後有一百萬用戶,五天後一百萬,兩個月後一億。歷史上增長最快的應用程序。原因非常簡單:它非常容易使用,而且使用起來非常神奇,能夠像與人類一樣與計算機互動。而不是明確地說出你想要什麼,就像計算機理解你的意思一樣,它理解你的意圖。

Oh, I think here it asked the closest night market. As you know, the night market is very important to me. When I was young, I was four and a half years old. I used to love going to the night market because I just loved watching people, and my parents used to take us to the night market. One day, my face, you guys might see that I have a large scar on my face. My face was cut because somebody was washing their knife, and I was a little kid. My memories of the night market are so deep because of that. I used to love, I still love, going to the night market. I just need to tell you guys this: the Tonghua Night Market is really good because there’s a lady who’s been working there for 43 years. She’s the fruit lady, and it’s in the middle of the street, the middle between the two. Go find her, okay? She’s really terrific. I think it would be funny after this if all of you go to see her. Every year she’s doing better and better; her cart has improved. I just love watching her succeed.

哦,我想這裡它問了最近的夜市。你知道,夜市對我來說非常重要。當我年輕時,我四歲半。我曾經非常喜歡去夜市,因為我喜歡看人,我的父母曾經帶我們去夜市。有一天,我的臉,你們可能會看到我臉上有一條很大的疤痕。我的臉被割傷了,因為有人在洗刀,而我還是個小孩子。因為這個,我對夜市的記憶非常深刻。我曾經喜歡,我現在仍然喜歡,去夜市。我只是需要告訴你們:通化夜市真的很好,因為有一位女士已經在那裡工作了43年。她是賣水果的女士,就在街道中間,兩者之間的中間。去找她,好嗎?她真的很棒。我覺得之後你們都去看她會很有趣。她每年都越做越好;她的攤位有了改善。我只是喜歡看她成功。

Anyways, ChatGPT came along, and something very important in this slide here, let me show you something. This slide, the fundamental difference is this: until ChatGPT revealed it to the world, AI was all about perception, natural language understanding, computer vision, speech recognition. It was all about perception and detection. This was the first time the world saw a generative AI. It produced tokens, one token at a time, and those tokens were words. Some of the tokens, of course, could now be images or charts or tables, songs, words, speech, videos. Those tokens could be anything, anything that you can learn the meaning of. It could be tokens of chemicals, tokens of proteins, genes. You saw earlier in Earth 2, we were generating tokens of the weather. We can learn physics. If you can learn physics, you could teach an AI model physics. The AI model could learn the meaning of physics, and it can generate physics. We were scaling down to 1 km, not by using filtering; it was generating. We can use this method to generate tokens for almost anything, almost anything of value. We can generate steering wheel control for a car. We can generate articulation for a robotic arm. Everything that we can learn, we can now generate. We have now arrived not at the AI era but a generative AI era.

不管怎樣,ChatGPT出現了,這張幻燈片中有一個非常重要的內容,讓我給你們看一下。這張幻燈片,根本的區別在於:在ChatGPT向世界展示之前,AI一直都是關於感知,自然語言理解,計算機視覺,語音識別。這一切都是關於感知和檢測的。這是世界第一次看到生成型AI。它一次生成一個標記,而那些標記是單詞。當然,現在有些標記可以是圖像或圖表或表格,歌曲,單詞,語音,視頻。那些標記可以是任何東西,任何你能理解其意義的東西。它可以是化學標記,蛋白質標記,基因標記。你早些時候在Earth 2中看到,我們正在生成天氣的標記。我們可以學習物理。如果你能學習物理,你可以教AI模型物理。AI模型可以理解物理的意義,並且可以生成物理。我們將分辨率降低到1公里,不是通過濾波,而是生成。我們可以用這種方法生成幾乎任何東西,幾乎任何有價值的東西。我們可以生成汽車的方向盤控制。我們可以生成機械臂的關節運動。我們能學的東西,我們現在都能生成。我們現在不再是AI時代,而是生成型AI時代。

What’s really important is this: this computer that started out as a supercomputer has now evolved into a data center, and it produces one thing: it produces tokens. It’s an AI factory. This AI factory is generating, creating, producing something of great value, a new commodity. In the late 1890s, Nikola Tesla invented an AC generator. We invented an AI generator. The AC generator generated electrons; NVIDIA’s AI generator generates tokens. Both of these things have large market opportunities. It’s completely fungible in almost every industry, and that’s why it’s a new industrial revolution. We have now a new factory producing a new commodity for every industry that is of extraordinary value. The methodology for doing this is quite scalable and quite repeatable. Notice how quickly so many different AI models, generative AI models, are being invented, literally daily. Every single industry is now piling on. For the very first time, the IT industry, which is a $3 trillion industry, is about to create something that can directly serve a $100 trillion industry. No longer just an instrument for information storage or data processing but a factory for generating intelligence for every industry. This is going to be a manufacturing industry, not a manufacturing industry of computers, but using the computers in manufacturing. This has never happened before. Quite an extraordinary thing. What we started with accelerated computing led to AI, led to generative AI, and now an industrial revolution.

真正重要的是這一點:這台最初作為超級計算機的計算機現在已經發展成為一個數據中心,並且只生成一樣東西:它生成標記。這是一個AI工廠。這個AI工廠正在生成,創造,生產一種具有巨大價值的新商品。1890年代末,尼古拉·特斯拉發明了交流發電機。我們發明了一個AI發電機。交流發電機生成電子;NVIDIA的AI發電機生成標記。這兩者都有巨大的市場機會。在幾乎每個行業中,它都是完全可替代的,這就是為什麼它是一場新的工業革命。我們現在有了一個新工廠,為每個行業生產一種具有非凡價值的新商品。這種做法的方法是非常可擴展和可重複的。注意到各種不同的AI模型,生成型AI模型,被發明的速度有多快,幾乎是每天都有新模型出現。每個行業現在都在積極參與。第一次,IT行業,這是一個3萬億美元的行業,即將創造出一個可以直接服務於100萬億美元行業的東西。不再僅僅是一個信息存儲或數據處理的工具,而是一個為每個行業生成智慧的工廠。這將成為一個製造業,不是製造計算機的製造業,而是使用計算機進行製造的製造業。這在以前從未發生過。這是一個相當非凡的事情。我們從加速計算開始,導致了AI,導致了生成型AI,現在是一場工業革命。

The impact on our industry is also quite significant. Of course, we could create a new commodity, a new product we call tokens, for many industries, but the impact on ours is also quite profound. For the very first time, as I was saying earlier, in 60 years, every single layer of computing has been changed, from CPUs, general-purpose computing, to accelerated GPU computing. Where the computer needs instructions, now computers process LLMs, large language models, AI models. Whereas the computing model of the past is retrieval-based, almost every time you touch your phone, some pre-recorded text or pre-recorded image or pre-recorded video is retrieved for you and recomposed based on a recommender system to present it to you based on your habits. But in the future, your computer will generate as much as possible, retrieve only what’s necessary. The reason for that is because generated data requires less energy to go fetch information. Generated data also is more contextually relevant; it will encode knowledge, it will encode your understanding of you. Instead of “get that information for me” or “get that file for me,” you just say, “ask me for an answer.” Instead of your computer being a tool that we use, the computer will now generate skills, it performs tasks. Instead of an industry that is producing software, which was a revolutionary idea in the early 90s, remember the idea that Microsoft created for packaging software revolutionized the PC industry. Without packaged software, what would we use the PC to do? It drove this industry, and now we have a new factory, a new computer. What we will run on top of this is a new type of software, and we call it NIMs, Nvidia INF microservices.

這對我們行業的影響也非常顯著。當然,我們可以為許多行業創造一種新商品,我們稱之為標記的新產品,但對我們行業的影響也非常深遠。正如我剛才所說,這是60年來第一次,每一層計算都發生了變化,從CPU,通用計算,到加速GPU計算。計算機需要指令的地方,現在計算機處理LLM,大型語言模型,AI模型。過去的計算模型是基於檢索的,幾乎每次你觸摸你的手機,系統都會為你檢索一些預先錄製的文本或預先錄製的圖像或預先錄製的視頻,然後根據推薦系統根據你的習慣重新組合呈現給你。但在未來,你的計算機將盡可能多地生成,只檢索必要的內容。原因是生成的數據需要較少的能量來獲取信息。生成的數據也更具上下文相關性;它將編碼知識,它將編碼你對自己的理解。你不用再說「幫我獲取那個信息」或「幫我獲取那個文件」,你只需說「問我一個答案」。你的計算機不再是我們使用的工具,計算機現在將生成技能,它執行任務。這不再是一個生產軟件的行業,這在90年代初是一個革命性的想法,記得嗎?微軟創造的包裝軟件的想法徹底改變了PC行業。沒有包裝軟件,我們用PC做什麼?它推動了這個行業,現在我們有了一個新工廠,一台新計算機。我們將在其上運行的是一種新型軟件,我們稱之為NIM,Nvidia INF微服務。


Now, what happens is the NIM runs inside this factory, and this NIM is a pre-trained model; it’s an AI. This AI is, of course, quite complex in itself, but the computing stack that runs AI is insanely complex. When you go and use ChatGPT, underneath their stack is a whole bunch of software. Underneath that pre-trained model is a ton of software, and it’s incredibly complex because the models are large, billions to trillions of parameters. It doesn’t run on just one computer; it runs on multiple computers. It has to distribute the workload across multiple GPUs: tensor parallelism, pipeline parallelism, data parallelism, all kinds of parallelism, expert parallelism, all kinds of parallelism distributing the workload across multiple GPUs, processing it as fast as possible. In a factory, if you run a factory, your throughput directly correlates to your revenues, your throughput directly correlates to quality of service, and your throughput directly correlates to the number of people who can use your service. We are now in a world where data center throughput utilization is vitally important. It was important in the past, but not vitally important. It was important in the past, but people don’t measure it. Today, every parameter is measured: start time, uptime, utilization, throughput, idle time, you name it. Because it’s a factory, when something is a factory, its operations directly correlate to the financial performance of the company. So, we realized that this is incredibly complex for most companies to do.

現在,NIM在這個工廠內運行,這個NIM是一個預訓練模型;它是一個AI。這個AI本身當然非常複雜,但運行AI的計算堆棧是非常複雜的。當你使用ChatGPT時,其堆棧下有一大堆軟件。在那個預訓練模型下有大量的軟件,而且它非常複雜,因為模型很大,從數十億到數萬億參數。它不僅在一台計算機上運行;它在多台計算機上運行。它必須將工作負載分配到多個GPU上:張量並行,管道並行,數據並行,各種並行,專家並行,各種並行將工作負載分配到多個GPU上,儘可能快地處理。在工廠中,如果你運行工廠,你的吞吐量直接與你的收入相關,你的吞吐量直接與服務質量相關,你的吞吐量直接與能夠使用你服務的人數相關。我們現在處於一個數據中心吞吐量利用率至關重要的世界。在過去這很重要,但不是至關重要的。在過去這很重要,但人們並不測量它。今天,每個參數都被測量:啟動時間,正常運行時間,利用率,吞吐量,閒置時間,你說得出來的都有。因為這是一個工廠,當某物是一個工廠時,它的運營直接與公司的財務表現相關。所以,我們意識到這對大多數公司來說是非常複雜的。

What we did was we created this AI in a box, and it contains an incredible amount of software. Inside this container is CUDA, cuDNN, TensorRT, Triton for inference services. It is cloud-native so that you could auto-scale in a Kubernetes environment. It has management services and hooks so that you can monitor your AI services. It has common APIs, standard APIs, so that you could literally chat with this box. You download this NIM and you can talk to it. As long as you have CUDA on your computer, which is now, of course, everywhere, it’s in every cloud, available from every computer maker, it is available in hundreds of millions of PCs. When you download this, you have an AI, and you can chat with it like ChatGPT. All of the software is now integrated, 400 dependencies all integrated into one. We tested this NIM, each one of these pre-trained models, against our entire install base that’s in the cloud, all the different versions of Pascal, Ampere, Hopper, and all kinds of different versions. I even forget some. NIMs, an incredible invention. This is one of my favorites.

我們所做的是創建了這個「盒中AI」,它包含了大量的軟件。在這個容器內有CUDA,cuDNN,TensorRT,Triton推理服務。它是雲原生的,這樣你可以在Kubernetes環境中自動擴展。它有管理服務和掛鉤,這樣你可以監控你的AI服務。它有通用API,標準API,這樣你可以真正與這個盒子聊天。你下載這個NIM,你可以與它交談。只要你的計算機上有CUDA,現在當然到處都有,每個雲端都有,從每個計算機製造商那裡都可以獲得,它在數億台PC上都可用。當你下載這個,你就有了一個AI,你可以像ChatGPT一樣與它聊天。所有的軟件現在都集成了,400個依賴項全部集成到一個中。我們在雲端對我們的整個安裝基礎進行了測試,所有不同版本的Pascal,Ampere,Hopper,各種不同的版本。我甚至忘了一些。NIMs,一個令人難以置信的發明。這是我最喜歡的之一。

As you know, we now have the ability to create large language models and pre-trained models of all kinds. We have all of these various versions, whether it’s language-based or vision-based or imaging-based. We have versions that are available for healthcare, digital biology, we have versions that are digital humans that I’ll talk to you about. The way you use this, just come to ai.nvidia.com, and today we just posted up in Hugging Face the Llama 3 NIM, fully optimized. It’s available there for you to try, and you can even take it with you. It’s available to you for free. You can run it in the cloud, run it in any cloud, you can download this container, put it into your own data center, and you can host it, make it available for your customers. We have, as I mentioned, all kinds of different domains: physics, some of it is for semantic retrieval, called RAGs, vision, languages, all kinds of different languages.

如你所知,我們現在有能力創建大型語言模型和各種預訓練模型。我們有所有這些不同的版本,無論是基於語言的,基於視覺的,還是基於圖像的。我們有可用於醫療保健,數字生物學的版本,我們有數字人的版本,我會向你介紹。你使用這個的方法,只需訪問ai.nvidia.com,今天我們剛剛在Hugging Face上發布了Llama 3 NIM,完全優化。它在那裡可供你試用,你甚至可以隨身攜帶。它對你是免費的。你可以在雲端運行它,在任何雲端運行它,你可以下載這個容器,放入你自己的數據中心,你可以托管它,使其可供你的客戶使用。如我所說,我們有各種不同的領域:物理學,有些是用於語義檢索,叫做RAGs,視覺,語言,各種不同的語言。

The way you use it is by connecting these microservices into large applications. One of the most important applications in the coming future, of course, is customer service agents. Customer service agents are necessary in just about every single industry. It represents trillions of dollars of customer service around the world. Nurses are customer service agents in some ways. Some of them are non-prescription or non-diagnostic-based. Nurses are essentially customer service. Customer service for retail, for quick service foods, financial services, insurance, just tens and tens of millions of customer service can now be augmented by language models and augmented by AI. These boxes that you see are basically NIMs. Some of the NIMs are reasoning agents, given a task, figure out what the mission is, break it down into a plan. Some of the NIMs retrieve information. Some of the NIMs might go and do search. Some of the NIMs might use a tool, like QOP that I was talking about earlier. They could use a tool that could be running on SAP, and so it has to learn a particular language called ABAP. Maybe some NIMs have to do SQL queries. So, all of these NIMs are experts that are now assembled as a team.

你使用它的方法是將這些微服務連接到大型應用程序中。未來最重要的應用程序之一當然是客戶服務代理。客戶服務代理在幾乎每個行業都是必要的。它代表了全球數萬億美元的客戶服務。護士在某些方面是客戶服務代理。他們中的一些不是基於處方或診斷的。護士基本上是客戶服務。零售、快餐食品、金融服務、保險的客戶服務,數以千萬計的客戶服務現在可以通過語言模型和AI來增強。你看到的這些盒子基本上是NIMs。有些NIMs是推理代理,給定一個任務,找出任務是什麼,將其分解成計劃。有些NIMs檢索信息。有些NIMs可能會去搜索。有些NIMs可能會使用一個工具,比如我之前提到的QOP。他們可以使用在SAP上運行的工具,因此它必須學習一種特定的語言,叫做ABAP。也許一些NIMs需要進行SQL查詢。因此,所有這些NIMs都是現在組成團隊的專家。

What’s happening is the application layer has been changed. What used to be applications written with instructions are now applications that are assembling teams. Very few people know how to write programs; almost everybody knows how to break down a problem and assemble teams. Every company, I believe, in the future will have a large collection of NIMs, and you would bring down the experts that you want. You connect them into a team, and you don’t even have to figure out exactly how to connect them. You just give the mission to an agent, to a NIM, to figure out who to break the tasks down and who to give it to. That central, the leader of the application, if you will, the leader of the team would break down the task and give it to the various team members. The team members would perform their task, bring it back to the team leader. The team leader would reason about that and present the information back to you, just like humans. This is in our near future. This is the way applications are going to look.

現在發生的事情是應用層已經發生了變化。以前用指令編寫的應用程序現在是組裝團隊的應用程序。很少有人知道如何編寫程序;幾乎每個人都知道如何分解問題和組裝團隊。我相信未來每個公司都會有大量的NIMs,你會帶來你想要的專家。你將他們連接成一個團隊,你甚至不需要確切地知道如何連接他們。你只需將任務交給一個代理,交給一個NIM,讓它找出如何分解任務以及將任務分配給誰。那個應用程序的中心,團隊的領導者,會分解任務並將其分配給各個團隊成員。團隊成員會執行他們的任務,然後將其帶回給團隊領導者。團隊領導者會對其進行推理,並將信息反饋給你,就像人類一樣。這是我們的近期未來。這就是應用程序將要的樣子。


Now, of course, we could interact with these large AI services with text prompts and speech prompts. However, there are many applications where we would like to interact with what is otherwise a humanlike form. We call them digital humans. Nvidia has been working on digital human technology for some time. Let me show it to you. Well, before I do that, hang on a second. Before I do that, okay, digital humans have the potential of being a great interactive agent with you. They make interactions much more engaging; they could be much more empathetic. We have to cross this incredible chasm, this uncanny chasm of realism, so that the digital humans would appear much more natural. This is, of course, our vision. This is a vision of where we love to go, but let me show you where we are.

現在,當然,我們可以用文本提示和語音提示與這些大型AI服務互動。但是,有很多應用程序我們希望與類似人類形式的東西互動。我們稱他們為數字人。Nvidia已經在數字人技術上工作了一段時間。讓我給你們展示一下。在我這樣做之前,等一下。在我這樣做之前,好吧,數字人有成為你偉大互動代理的潛力。他們使互動更加引人入勝;他們可能更加富有同情心。我們必須跨越這個不可思議的鴻溝,這個真實感的不可思議的鴻溝,使數字人看起來更加自然。這當然是我們的願景。這是我們希望達到的願景,但讓我給你們展示一下我們目前在哪裡。

Great to be in Taiwan. Before I head out to the night market, let’s dive into some exciting frontiers of digital humans. Imagine a future where computers interact with us just like humans can. Hi, my name is Sophie, and I am a digital human brand ambassador for Unique. This is the incredible reality of digital humans. Digital humans will revolutionize industries from customer service to advertising and gaming. The possibilities for digital humans are endless. Using the scans you took of your current kitchen with your phone, they will be AI interior designers helping generate beautiful photorealistic suggestions and sourcing the materials and furniture. We have generated several design options for you to choose from. They’ll also be AI customer service agents, making the interaction more engaging and personalized. Or digital healthcare workers who will check on patients, providing timely personalized care. I did forget to mention to the doctor that I am allergic to penicillin. Is it still okay to take the medications? The antibiotics you’ve been prescribed, cicin and metronidazol, don’t contain penicillin, so it’s perfectly safe for you to take them. They’ll even be AI brand ambassadors, setting the next marketing and advertising trends. Hi, I’m Ima, Japan’s first virtual model. New breakthroughs in generative AI and computer graphics let digital humans see, understand, and interact with us in humanlike ways. From what I can see, it looks like you’re in some kind of recording or production setup. The foundation of digital humans are AI models built on multilingual speech recognition and synthesis and LLMs that understand and generate conversation. The AIs connect to another generative AI to dynamically animate a lifelike 3D mesh of a face. Finally, AI models that reproduce lifelike appearances, enabling real-time path traced subsurface scattering to simulate the way light penetrates the skin, scatters, and exits at various points, giving skin its soft and translucent appearance. Nvidia Ace is a suite of digital human technologies packaged as easy-to-deploy, fully optimized microservices or NIMs. Developers can integrate Ace NIMs into their existing frameworks, engines, and digital human experiences. Neutron SLM and LLM NIMs to understand our intent and orchestrate other models. Riva speech NIMs for interactive speech and translation. Audio, face, and gesture NIMs for facial and body animation. Omniverse RTX with DLSS for neural rendering of skin and hair. Ace NIMs run on Nvidia GDN, a global network of Nvidia-accelerated infrastructure that delivers low-latency digital human processing to over 100 regions.

很高興來到台灣。在我去夜市之前,讓我們深入了解數字人的一些激動人心的前沿。想像一個未來,計算機可以像人類一樣與我們互動。你好,我叫Sophie,我是Unique的數字人品牌大使。這是數字人的令人難以置信的現實。數字人將徹底改變從客戶服務到廣告和遊戲的行業。數字人的可能性是無窮無盡的。使用你用手機拍攝的當前廚房的掃描,他們將是AI室內設計師,幫助生成美麗的照片般逼真的建議,並採購材料和家具。我們已經生成了幾個設計選項供你選擇。他們還將是AI客戶服務代理,使互動更加引人入勝和個性化。或者是數字醫療工作者,將檢查病人,提供及時的個性化護理。我確實忘了告訴醫生我對青黴素過敏。我還能吃這些藥嗎?你被處方的抗生素,cicin和metronidazol,不含青黴素,所以你可以安全服用它們。他們甚至將成為AI品牌大使,設定下一個營銷和廣告趨勢。你好,我是Ima,日本的第一個虛擬模特。生成AI和計算機圖形學的新突破使數字人能夠以類似人類的方式看、理解和與我們互動。從我所看到的情況來看,你似乎在某種錄製或製作環境中。數字人的基礎是基於多語言語音識別和合成的AI模型以及理解和生成對話的LLMs。AI連接到另一個生成AI,以動態動畫化逼真的3D面部網格。最後,AI模型再現逼真的外觀,實現實時路徑跟踪的皮下散射,以模擬光線穿透皮膚、散射並在各個點退出的方式,賦予皮膚柔軟和半透明的外觀。Nvidia Ace是一套數字人技術,打包為易於部署、完全優化的微服務或NIMs。開發人員可以將Ace NIMs集成到他們現有的框架、引擎和數字人體驗中。Neutron SLM和LLM NIMs用於理解我們的意圖並協調其他模型。Riva語音NIMs用於互動語音和翻譯。音頻、面部和手勢NIMs用於面部和身體動畫。Omniverse RTX與DLSS用於神經渲染皮膚和頭髮。Ace NIMs運行在Nvidia GDN上,這是一個全球網絡的Nvidia加速基礎設施,能夠在超過100個地區提供低延遲的數字人處理服務。

Pretty incredible. Well, those Ace runs in the cloud, but it also runs on PCs. We had the good wisdom of including tensor core GPUs in all of RTX, so we’ve been shipping AI GPUs for some time, preparing ourselves for this day. The reason for that is very simple: we always knew that in order to create a new computing platform, you need an install base first. Eventually, the application will come. If you don’t create the install base, how could the application come? So, we installed every single RTX GPU with tensor core processing, and now we have 100 million GeForce RTX AI PCs in the world, and we’re shipping 200. This Computex, we’re featuring four new amazing laptops, all of them able to run AI. Your future laptop, your future PC will become an AI; it’ll be constantly helping you, assisting you in the background. The PC will also run applications that are enhanced by AI. Of course, all your photo editing and your writing and your tools, all the things that you use will all be enhanced by AI. Your PC will also host applications with digital humans that are AIs. There are different ways that AIs will manifest themselves and become used in PCs, but PCs will become a very important AI platform.

相當令人難以置信。好吧,這些Ace在雲端運行,但也可以在PC上運行。我們有幸在所有的RTX中包括了張量核心GPU,所以我們已經出貨AI GPU有一段時間了,為這一天做準備。原因非常簡單:我們總是知道,要創建一個新的計算平台,你首先需要安裝基礎。最終,應用程序會出現。如果你不創建安裝基礎,應用程序怎麼會來呢?所以,我們在每個RTX GPU中都安裝了張量核心處理,現在我們有1億台GeForce RTX AI PC在全球使用,我們出貨了200台。本次Computex,我們展示了四款新的驚人筆記本電腦,所有這些都能運行AI。你未來的筆記本電腦,你未來的PC將成為AI;它會不斷在背後幫助你,協助你。PC也將運行由AI增強的應用程序。當然,你的所有照片編輯、寫作和工具,所有你使用的東西都將由AI增強。你的PC也將托管由數字人作為AI的應用程序。AI將以不同的方式呈現自己並在PC中被使用,但PC將成為一個非常重要的AI平台。

Where do we go from here? I spoke earlier about the scaling of our data centers, and every single time we scaled, we found a new phase change. When we scaled from DGX into large AI supercomputers, we enabled transformers to be able to train on enormously large datasets. What happened was, in the beginning, the data was human-supervised. It required human labeling to train AIs. Unfortunately, there is only so much you can human label. Transformers made it possible for unsupervised learning to happen. Now, transformers just look at an enormous amount of data, look at an enormous amount of video, look at an enormous amount of images, and they can learn from studying an enormous amount of data, finding the patterns and relationships itself. The next generation of AI needs to be physically based. Most of the AIs today don’t understand the laws of physics. It’s not grounded in the physical world. In order for us to generate images and videos and 3D graphics and many physics phenomena, we need AIs that are physically based and understand the laws of physics. The way you could do that is, of course, learning from video is one source. Another way is synthetic data, simulation data. Another way is using computers to learn with each other.

我們從這裡去哪裡?我之前談到了我們數據中心的擴展,每次擴展,我們都發現了一個新的相變。當我們從DGX擴展到大型AI超級計算機時,我們使transformers能夠訓練巨大的數據集。最初發生的事情是,數據是由人類監督的。它需要人工標註來訓練AI。不幸的是,你能進行人工標註的數量是有限的。Transformers使無監督學習成為可能。現在,transformers只需查看大量數據,查看大量視頻,查看大量圖像,他們可以從研究大量數據中學習,自己找到模式和關係。下一代AI需要基於物理。今天的大多數AI不理解物理定律。它不基於物理世界。為了生成圖像和視頻以及3D圖形和許多物理現象,我們需要基於物理並理解物理定律的AI。你可以這樣做的方法當然是從視頻學習是一種來源。另一種方式是合成數據,模擬數據。另一種方式是讓計算機彼此學習。


This is really no different than using AlphaGo, having AlphaGo play itself, self-play. Between the two capabilities, same capabilities, playing each other for a very long period of time, they emerge even smarter. You’re going to start to see this type of AI emerging. If the AI data is synthetically generated and using reinforcement learning, it stands to reason that the rate of data generation will continue to advance. Every single time data generation grows, the amount of computation that we have to offer needs to grow with it. We are about to enter a phase where AIs can learn the laws of physics and understand and be grounded in physical world data. We expect that models will continue to grow, and we need larger GPUs.

這實際上與使用AlphaGo讓AlphaGo自我對弈,自己玩沒有什麼不同。在這兩種能力之間,相同的能力,彼此玩很長一段時間後,他們會變得更聰明。你會開始看到這種類型的AI出現。如果AI數據是合成生成的並使用強化學習,理所當然的是數據生成的速度將繼續提高。每次數據生成增長,我們必須提供的計算量也需要隨之增長。我們即將進入一個階段,AI可以學習物理定律並理解並基於物理世界數據。我們預計模型將繼續增長,我們需要更大的GPU。

Blackwell was designed for this generation. This is Blackwell, and it has several very important technologies. One, of course, is just the size of the chip. We took two of the largest chips that TSMC makes, connected them together with a 10 terabytes per second link between the world’s most advanced CIEs. We then put two of them on a computer node connected with a gray CPU. The gray CPU could be used for several things. In a training situation, it could be used for fast checkpoint and restart. In the case of inference and generation, it could be used for storing context memory so that the AI has memory and understands the context of the conversation you would like to have. It’s our second-generation transformer engine. The transformer engine allows us to adapt dynamically to a lower precision based on the precision and the range necessary for that layer of computation. This is our second-generation GPU that has secure AI so that you could ask your service providers to protect your AI from being either stolen or tampered with. This is our fifth-generation NVLink. NVLink allows us to connect multiple GPUs together, and I’ll show you more of that in a second. This is also our first generation with a reliability and availability engine.

Blackwell是為這一代設計的。這是Blackwell,它有幾項非常重要的技術。當然,其中之一是芯片的大小。我們取了TSMC製造的兩個最大的芯片,通過世界上最先進的CIE之間的10 TB/s鏈接將它們連接在一起。然後我們將它們放在一個計算節點上,並用一個灰色的CPU連接。灰色的CPU可以用於幾個方面。在訓練情況下,它可以用於快速檢查點和重啟。在推理和生成的情況下,它可以用於存儲上下文記憶,這樣AI就有了記憶,並理解你想要進行的對話的上下文。這是我們的第二代變壓器引擎。變壓器引擎允許我們根據計算層所需的精度和範圍動態適應到更低的精度。這是我們的第二代GPU,具有安全AI,這樣你可以要求你的服務提供商保護你的AI不被盜取或篡改。這是我們的第五代NVLink。NVLink允許我們將多個GPU連接在一起,稍後我會給你展示更多內容。這也是我們第一代具有可靠性和可用性引擎的產品。

This RAS system allows us to test every single transistor, flip-flop, memory on-chip, memory off-chip, so that we can, in the field, determine whether a particular chip is failing. The MTBF, the mean time between failure of a supercomputer with 10,000 GPUs, is measured in hours. The mean time between failure of a supercomputer with 100,000 GPUs is measured in minutes. The ability for a supercomputer to run for a long period of time and train a model that could last for several months is practically impossible if we don’t invent technologies to enhance its reliability. Reliability would, of course, enhance its uptime, which directly affects the cost. Lastly, the decompression engine. Data processing is one of the most important things we have to do. We added a data compression engine, a decompression engine, so that we can pull data out of storage 20 times faster than what’s possible today. All of this represents Blackwell, and I think we have one here that’s in production.

這個RAS系統允許我們測試每一個晶體管、觸發器、芯片上的記憶體、芯片外的記憶體,這樣我們可以在現場確定特定芯片是否故障。具有1萬個GPU的超級計算機的平均故障間隔時間(MTBF)以小時計算。具有10萬個GPU的超級計算機的平均故障間隔時間以分鐘計算。如果我們不發明技術來提高其可靠性,超級計算機運行很長時間並訓練一個可能持續幾個月的模型實際上是不可能的。可靠性當然會提高其正常運行時間,這直接影響成本。最後是解壓引擎。數據處理是我們必須做的最重要的事情之一。我們添加了一個數據壓縮引擎,一個解壓引擎,這樣我們可以比今天可能的速度快20倍地從存儲中提取數據。所有這些代表了Blackwell,我想我們這裡有一個正在生產中。

During GTC, I showed you Blackwell in a prototype state. The other side, this is why we practice. Ladies and gentlemen, this is Blackwell. Blackwell is in production. Incredible amounts of technology. This is our production board. This is the most complex, highest performance computer the world has ever made. This is the gray CPU, and these are, you can see, each one of these Blackwell dies, two of them connected together. You see that it is the largest die, the largest chip the world makes. Then we connect two of them together with a 10 terabytes per second link, and that makes the Blackwell computer. The performance is incredible. Take a look at this. You see the computational flops, the AI flops for each generation has increased by 1,000 times in eight years. Moore’s Law in eight years is something along the lines of, oh, I don’t know, maybe 40%, 60%. In the last eight years, Moore’s Law has gone a lot, lot less. Just to compare even Moore’s Law at its best of times compared to what Blackwell could do. The amount of computations is incredible. Whenever we bring the computation high, the thing that happens is the cost goes down, and I’ll show you. What we’ve done is we’ve increased through its computational capability the energy used to train a GP4 2 trillion parameter, 8 trillion tokens.

在GTC期間,我向大家展示了Blackwell的原型狀態。另一邊,這就是為什麼我們要練習。女士們,先生們,這就是Blackwell。Blackwell正在生產中。驚人的技術量。這是我們的生產板。這是世界上最複雜、性能最高的計算機。這是灰色的CPU,你可以看到,每一個Blackwell晶片,它們兩兩相連。你看到這是世界上最大的晶片,最大的芯片。然後我們用每秒10 TB的鏈接將它們兩兩相連,這就形成了Blackwell計算機。性能是令人難以置信的。看看這個。你看到每一代的計算浮點運算、AI浮點運算在八年內增加了1000倍。摩爾定律在八年內大約是40%或60%。在過去的八年裡,摩爾定律下降了很多。即使在摩爾定律最好的時期,與Blackwell的性能相比也是如此。計算量是驚人的。每當我們將計算能力提高時,發生的事情就是成本下降,我會展示給你們看。我們所做的是通過其計算能力增加了訓練一個2萬億參數、8萬億標記的GP4所用的能量。

The amount of energy that is used has gone down by 350 times. Pascal would have taken 1,000 gigawatt hours. 1,000 gigawatt hours means that it would take a gigawatt data center. The world doesn’t have a gigawatt data center, but if you had a gigawatt data center, it would take a month. If you had a 100 megawatt data center, it would take about a year, and so nobody would, of course, create such a thing. That’s the reason why these large language models, ChatGPT, was impossible only eight years ago. By us driving down the increasing performance, the energy efficiency while keeping and improving energy efficiency along the way, we’ve now taken with Blackwell what used to be 1,000 gigawatt hours to three. An incredible advance. Three gigawatt hours, if it’s a 10,000 GPUs, for example, it would only take a few days, 10 days or so. The amount of advance in just eight years is incredible. This is for inference. This is for token generation. Our token generation performance has made it possible for us to drive the energy down by 45,000 times. 177,000 joules per token, that was Pascal. 177,000 joules is kind of like two light bulbs running for two days. It would take two light bulbs running for two days, 200 watts running for two days, to generate one token of GP4. It takes about three tokens to generate one word. So, the amount of energy necessary for Pascal to generate GPT-4 and have a ChatGPT experience with you was practically impossible. Now we only use 0.4 joules per token, and we can generate tokens at incredible rates with very little energy. Blackwell is just an enormous leap.

所使用的能量減少了350倍。Pascal需要1000千兆瓦時。1000千兆瓦時意味著需要一個千兆瓦數據中心。世界上沒有千兆瓦數據中心,但如果你有一個千兆瓦數據中心,它需要一個月。如果你有一個100兆瓦數據中心,它大約需要一年,所以沒有人會創建這樣的東西。這就是為什麼這些大型語言模型,ChatGPT,在八年前是不可能的。通過我們降低成本,提高性能,並在此過程中保持和提高能源效率,我們現在用Blackwell將原來的1000千兆瓦時降到了3。這是一個驚人的進步。如果是1萬個GPU,例如,這只需要幾天,約10天。僅在八年內的進步是驚人的。這是用於推理的。這是用於標記生成的。我們的標記生成性能使我們能夠將能量降低45000倍。177,000焦耳每個標記,那是Pascal。177,000焦耳大概相當於兩個燈泡連續運行兩天。需要兩個燈泡連續運行兩天,200瓦連續運行兩天,才能生成一個GP4的標記。生成一個單詞大約需要三個標記。所以,Pascal生成GPT-4並與你進行ChatGPT體驗所需的能量幾乎是不可能的。現在我們只需0.4焦耳每個標記,我們可以用非常少的能量以驚人的速度生成標記。Blackwell是一個巨大的飛躍。

Even so, it’s not big enough. We have to build even larger machines. The way that we build it is called DGX. This is our Blackwell chips, and it goes into DGX systems. That’s why we should practice. This is a DGX Blackwell. This is air-cooled, has eight of these GPUs inside. Look at the size of the heat sinks on these GPUs, about 15 kilowatts, 15,000 watts, and completely air-cooled. This version supports x86, and it goes into the infrastructure that we’ve been shipping Hoppers into. However, if you would like to have liquid cooling, we have a new system, and this new system is based on this board, and we call it MGX for modular. This modular system, you won’t be able to see this. This can they see this? Can you see this? Are you okay?

即便如此,這還不夠大。我們必須建造更大的機器。我們建造它的方式叫做DGX。這是我們的Blackwell芯片,它們被放入DGX系統。這就是為什麼我們要練習。這是一個DGX Blackwell。這是風冷的,裡面有八個GPU。看看這些GPU上的散熱器大小,大約15千瓦,15000瓦,完全風冷。這個版本支持x86,它進入我們一直在發貨的Hopper的基礎設施中。然而,如果你想要液冷,我們有一個新的系統,這個新系統基於這個板,我們稱之為MGX,代表模塊化。這個模塊化系統,你們看不到這個。你們能看到這個嗎?你們看得到嗎?


This is the MGX system, and here’s the two Blackwell boards. This one node has four Blackwell chips. These four Blackwell chips are liquid-cooled. Nine of them, well, 72 of these GPUs are then connected together with a new NVLink. This is NVLink switch, fifth generation. The NVLink switch is a technology miracle. This is the most advanced switch the world has ever made. The data rate is insane, and these switches connect every single one of these Blackwells to each other so that we have one giant 72 GPU Blackwell. The benefit of this is that in one domain, one GPU domain, this now looks like one GPU. This one GPU has 72 versus the last generation of eight. So, we increased it by nine times. The amount of bandwidth we’ve increased by 18 times, the AI flops we’ve increased by 45 times, and yet the amount of power is only 10 times. This is a 100-kilowatt system, and that is 10 kilowatts, and that’s for one. Now, of course, you could always connect more of these together, and I’ll show you how to do that in a second. What’s the miracle is this chip, this NVLink chip. People are starting to awaken to the importance of this NVLink chip as it connects all these different GPUs together because the large language models are so large, it doesn’t fit on just one GPU. It doesn’t fit on just one node. It’s going to take the entire rack of GPUs like this new DGX that I was just standing next to to hold a large language model that has tens of trillions of parameters. The large NVLink, which in itself is a technology miracle, is 50 billion transistors, 74 ports at 400 gigabits each, four lengths, cross-sectional bandwidth of 7.2 terabytes per second. One of the important things is that it has mathematics inside the switch so that we can do reductions, which is really important in deep learning, right on the chip.

這是MGX系統,這裡有兩個Blackwell板。這個節點有四個Blackwell芯片。這四個Blackwell芯片是液冷的。九個它們,嗯,這些72個GPU然後用一個新的NVLink連接在一起。這是第五代NVLink開關。NVLink開關是一個技術奇蹟。這是世界上最先進的開關。數據速率是驚人的,這些開關將每一個這些Blackwell連接在一起,這樣我們就有了一個巨大的72 GPU Blackwell。這樣的好處是,在一個域中,一個GPU域中,現在看起來像一個GPU。這一個GPU有72個,而上一代只有8個。所以,我們增加了九倍。帶寬增加了18倍,AI浮點運算增加了45倍,但功耗只增加了10倍。這是一個100千瓦的系統,而這是一個10千瓦的,這是一個。當然,你總是可以將更多的這些連接在一起,我會在稍後展示給你們。這個奇蹟是這個芯片,這個NVLink芯片。人們開始意識到這個NVLink芯片的重要性,因為它將這些不同的GPU連接在一起,因為大型語言模型太大了,它不適合一個GPU。它不適合一個節點。它將需要整個GPU機架來容納一個具有數十萬億參數的大型語言模型。這個大的NVLink本身是一個技術奇蹟,有500億個晶體管,每個有74個端口,每個端口400千兆位,四個長度,橫截面帶寬為每秒7.2 TB。重要的一點是,它在開關內有數學運算,因此我們可以在芯片上進行減法運算,這在深度學習中非常重要。

This is what a DGX looks like now. A lot of people ask us, you know, they say, and there’s this confusion about what NVIDIA does and how it is possible that NVIDIA became so big building GPUs. So, there’s an impression that this is what a GPU looks like. Now, this is a GPU. This is one of the most advanced GPUs in the world, but this is a gamer GPU. But you and I know that this is what a GPU looks like. This is one GPU, ladies and gentlemen. DGX GPU. You know, the back of this GPU is the NVLink spine. The NVLink spine is 5,000 wires, two miles, and it’s right here. This is an NVLink spine, and it connects 72 GPUs to each other. This is an electrical mechanical miracle. The transceivers make it possible for us to drive the entire length in copper, and as a result, this switch, the NVLink switch, driving the NVLink spine in copper makes it possible for us to save 20 kilowatts in one rack. Twenty kilowatts could now be used for processing. Just an incredible achievement. So, this is the NVLink spine. Wow. I went down today, and even this is not big enough. Even this is not big enough for AI factories, so we have to connect it all together with very high-speed networking. We have two types of networking. We have Infiniband, which has been used in supercomputing and AI factories all over the world, and it is growing incredibly fast for us. However, not every data center can handle Infiniband because they’ve already invested in their ecosystem in Ethernet for too long, and it does take some specialty and expertise to manage Infiniband switches and Infiniband networks. So, what we’ve done is we’ve brought the capabilities of Infiniband to the Ethernet architecture, which is incredibly hard.

這就是現在的DGX樣子。很多人問我們,你知道的,他們說,對於NVIDIA到底做什麼以及NVIDIA如何通過建造GPU變得如此龐大存在困惑。所以,有一種印象認為這就是GPU的樣子。現在,這是一個GPU。這是世界上最先進的GPU之一,但這是一個遊戲GPU。但你我都知道這就是GPU的樣子。這是一個GPU,女士們,先生們。DGX GPU。你知道,這個GPU的背面是NVLink脊柱。NVLink脊柱是5000根電線,兩英里,就在這裡。這是一個NVLink脊柱,它將72個GPU連接在一起。這是一個電氣機械奇蹟。收發器使我們能夠在銅線中驅動整個長度,因此,這個開關,NVLink開關,在銅線中驅動NVLink脊柱使我們能夠在一個機架中節省20千瓦。20千瓦現在可以用於處理。真的是一個令人難以置信的成就。所以,這是NVLink脊柱。哇。我今天走下來,即便這也不夠大。即便這也不夠大,無法應對AI工廠,所以我們必須用非常高速的網絡將它們全部連接在一起。我們有兩種類型的網絡。我們有Infiniband,已經在全球的超級計算和AI工廠中使用,而且它對我們來說增長得非常快。然而,並不是每個數據中心都能處理Infiniband,因為他們已經在以太網上投資了太久,管理Infiniband開關和Infiniband網絡需要一些專業知識。所以,我們所做的是將Infiniband的能力帶到以太網架構中,這非常困難。

The reason for that is Ethernet was designed for high average throughput because every single node, every single computer, is connected to a different person on the internet, and most of the communication is the data center with somebody on the other side of the internet. However, in deep learning and AI factories, the GPUs are not communicating with people on the internet mostly. They’re communicating with each other. They’re communicating with each other because they’re all collecting partial products, and they have to reduce it and then redistribute it. Chunks of partial products, reduction, redistribution. That traffic is incredibly bursty, and it is not the average throughput that matters. It’s the last arrival that matters because if you’re reducing, collecting partial products from everybody, if I’m trying to take all of your… So, it’s not the average throughput, it’s whoever gives me the answer last. Ethernet has no provision for that. There are several things that we had to create. We created an end-to-end architecture so that the NIC and the switch can communicate, and we applied four different technologies to make this possible.

原因是以太網是為高平均吞吐量設計的,因為每個節點,每台計算機,都連接到互聯網上的不同人,大多數通信是數據中心與互聯網另一端的某人之間的通信。然而,在深度學習和AI工廠中,GPU主要不是與互聯網上的人通信。他們彼此之間在通信。他們之間在通信,因為他們都在收集部分產品,然後必須減少並重新分配它。部分產品的塊,減少,重新分配。這種流量是非常突發的,重要的不是平均吞吐量,而是最後的到達時間。因為如果你正在減少,收集每個人的部分產品,如果我試圖從你們所有人中獲得…所以,重要的不是平均吞吐量,而是最後給我答案的人。以太網對此沒有規定。我們不得不創建幾個東西。我們創建了一個端到端的架構,使網絡接口卡(NIC)和開關可以通信,我們應用了四種不同的技術來實現這一點。


Number one, NVIDIA has the world’s most advanced RDMA, and so now we have the ability to have a network-level RDMA for Ethernet that is incredibly great. Number two, we have congestion control. The switch does telemetry at all times, incredibly fast, and whenever the GPUs or the NICs are sending too much information, we can tell them to back off so that it doesn’t create hot spots. Number three, adaptive routing. Ethernet needs to transmit and receive in order. If we see congestion or we see ports that are not currently being used, irrespective of the ordering, we will send it to the available ports, and Bluefield, on the other end, reorders it so that it comes back in order. That adaptive routing is incredibly powerful. Lastly, noise isolation. There’s more than one model being trained or something happening in the data center at all times, and their noise and their traffic could get into each other and cause jitter. When the noise of one model training causes the last arrival to end up too late, it really slows down the training.

第一,NVIDIA擁有世界上最先進的RDMA,現在我們有能力在以太網上進行網絡級的RDMA,這非常棒。第二,我們有擁塞控制。開關在所有時間進行遙測,非常快,當GPU或NIC發送太多信息時,我們可以告訴他們減速,這樣就不會產生熱點。第三,自適應路由。以太網需要按順序發送和接收。如果我們看到擁塞或看到當前未被使用的端口,不管順序如何,我們將其發送到可用的端口,Bluefield在另一端重新排序,使其按順序返回。這種自適應路由非常強大。最後,噪聲隔離。在任何時候數據中心都有多個模型在訓練或其他事情在發生,他們的噪聲和流量可能會相互干擾並導致抖動。當一個模型訓練的噪聲導致最後的到達時間太晚時,它真的會減慢訓練速度。

Overall, remember, you have built a $5 billion or $3 billion data center, and you’re using this for training. If the network utilization was 40% lower, and as a result, the training time was 20% longer, the $5 billion data center is effectively like a $6 billion data center. The cost impact is quite high. Ethernet with Spectrum-X basically allows us to improve the performance so much that the network is basically free. This is quite an achievement. We have a whole pipeline of Ethernet products behind us. This is Spectrum-X 800. It is 51.2 terabits per second and 256 Radic. The next one coming is 512 Radic, one year from now. 512 Radic, and that’s called Spectrum-X 800 Ultra. The one after that is X1600. The important idea is this: X800 is designed for tens of thousands of GPUs. X800 Ultra is designed for hundreds of thousands of GPUs, and X1600 is designed for millions of GPUs. The days of millions of GPU data centers are coming.

總的來說,記住,你已經建造了一個價值50億或30億美元的數據中心,你正在使用這個來進行訓練。如果網絡利用率降低了40%,因此,訓練時間延長了20%,50億美元的數據中心實際上相當於60億美元的數據中心。成本影響相當高。Spectrum-X的以太網基本上使我們能夠大幅提高性能,以至於網絡基本上是免費的。這是相當大的成就。我們在背後有一整條以太網產品線。這是Spectrum-X 800。它是51.2 Tbps和256 Radic。下一個即將推出的是512 Radic,一年後。512 Radic,這叫做Spectrum-X 800 Ultra。再下一個是X1600。重要的想法是這樣的:X800是為數以萬計的GPU設計的。X800 Ultra是為數十萬個GPU設計的,X1600是為數百萬個GPU設計的。數百萬個GPU數據中心的時代即將到來。

The reason for that is very simple. Of course, we want to train much larger models, but very importantly, in the future, almost every interaction you have with the internet or with a computer will likely have a generative AI running in the cloud somewhere, and that generative AI is working with you, interacting with you, generating videos or images or text or maybe a digital human. So, you’re interacting with your computer almost all the time, and there’s always a generative AI connected to that. Some of it is on-prem, some of it is on your device, and a lot of it could be in the cloud. These generative AIs will also do a lot of reasoning capability. Instead of just one-shot answers, they might iterate on answers so that it improves the quality of the answer before they give it to you. The amount of generation we’re going to do in the future is going to be extraordinary. Let’s take a look at all of this put together. Now, tonight, this is our first nighttime keynote. I want to thank all of you for coming out tonight at 7:00, and what I’m about to show you has a new vibe. There’s a new vibe. This is kind of the nighttime keynote vibe, so enjoy. This black, let’s go, go, go, go, go.

原因非常簡單。當然,我們希望訓練更大的模型,但非常重要的是,在未來,幾乎你與互聯網或計算機的每一次交互都可能有一個生成型AI在雲端某處運行,這個生成型AI與你一起工作,與你互動,生成視頻或圖像或文本或可能是數字人。因此,你幾乎一直在與你的計算機互動,而且總有一個生成型AI連接到那裡。有些是在本地,有些是在你的設備上,很多可能是在雲端。這些生成型AI還將具備大量的推理能力。與其只是一次性的答案,他們可能會反覆改進答案的質量,然後再提供給你。我們未來將進行的生成量將是非凡的。讓我們看看所有這些組合在一起。現在,今晚,這是我們第一次夜間主題演講。我想感謝你們今晚7點來參加,我即將展示的內容有一種新的氛圍。有一種新的氛圍。這有點像夜間主題演講的氛圍,所以盡情享受吧。這個黑色,讓我們開始吧,開始,開始,開始。


Okay. The more you back, the more you save with top ey Tor made that’s to speed al efficient. Now, you can’t do that on a morning keynote. I think that style of keynote has never been done in Computex ever. It might be the last. Only NVIDIA can pull off that. Only I can do that.

好吧。你支持得越多,你節省得越多,使用頂級ey Tor製作,這是為了提高效率。現在,你不能在早上的主題演講中這樣做。我認為這種風格的主題演講在Computex上從未出現過。這可能是最後一次。只有NVIDIA能夠做到這一點。只有我能做到這一點。

Blackwell, of course, is the first generation of NVIDIA platforms that was launched at the beginning. Just as the world knows the generative AI era is here, just as the world realized the importance of AI factories, just as the beginning of this new industrial revolution, we have so much support. Nearly every OEM, every computer maker, every CSP, every GPU cloud, sovereign clouds, even telecommunication companies, enterprises all over the world. The amount of success, the amount of adoption, the amount of enthusiasm for Blackwell is just really off the charts, and I want to thank everybody for that. We’re not stopping there. During this time of incredible growth, we want to make sure that we continue to enhance performance, continue to drive down costs, the cost of training, the cost of inference, and continue to scale out AI capabilities for every company to embrace. The further we drive up performance, the greater the cost decline. The Hopper platform, of course, was the most successful data center processor probably in history, and this is just an incredible, incredible success story. However, Blackwell is here, and every single platform, as you’ll notice, has several things. You’ve got the CPU, you have the GPU, you have NVLink, you have the NIC, and you have the NVLink switch that connects all of the GPUs together as large of a domain as we can, and whatever we can do, we connect it with very large and very high-speed switches. Every single generation, as you’ll see, is not just a GPU, but it’s an entire platform.

當然,Blackwell是NVIDIA平台的第一代,最初推出的。正如世界知道生成型AI時代已經到來,正如世界意識到AI工廠的重要性,正如這場新工業革命的開始,我們有很多支持。幾乎每個OEM,每個計算機製造商,每個CSP,每個GPU雲,主權雲,甚至電信公司,全球各地的企業。對Blackwell的成功、採用和熱情真的是超乎尋常,我想感謝每一個人。我們不會停下來。在這個驚人的增長時期,我們要確保繼續提高性能,繼續降低成本,訓練成本,推理成本,繼續擴展AI能力,以便每個公司都能擁抱。性能提升越多,成本下降越大。當然,Hopper平台可能是歷史上最成功的数据中心處理器,這是一個令人難以置信的成功故事。然而,Blackwell已經到來,每一個平台,如你所見,都有幾個東西。你有CPU,你有GPU,你有NVLink,你有NIC,你有NVLink開關,將所有的GPU連接在一起,形成我們可以的最大域,無論我們能做什麼,我們用非常大和非常高速的開關連接它。每一代,如你所見,不僅僅是一個GPU,而是一個完整的平台。

We build the entire platform, we integrate the entire platform into an AI factory supercomputer. However, then we disaggregate it and offer it to the world. The reason for that is because all of you could create interesting and innovative configurations and all kinds of different styles and fit different data centers and different customers in different places, some of it for edge, some of it for telco. All of the different innovations are possible if we made the systems open and made it possible for you to innovate. So, we design it integrated, but we offer it to you disaggregated so that you could create modular systems. The Blackwell platform is here. Our company is on a one-year rhythm. Our basic philosophy is very simple. One, build the entire data center, scale this aggregated, and sell it to you in parts on a one-year rhythm. We push everything to technology limits, whatever TSMC process technology, we’ll push it to the absolute limits, whatever packaging technology, push it to the absolute limits, whatever memory technology, push it to the absolute limits, serious technology, optics technology, everything is pushed to the limit. Then after that, do everything in such a way so that all of our software runs on this entire install base.

我們建造整個平台,並將其整合到AI工廠超級計算機中。然而,我們隨後將其拆分並提供給全世界。這樣做的原因是,所有人都可以創造有趣且創新的配置和各種不同風格,並適應不同的數據中心和不同地方的不同客戶,有些是用於邊緣計算,有些是用於電信。如果我們將系統開放並使其能夠進行創新,所有不同的創新都是可能的。因此,我們設計了整合的系統,但我們將其拆分提供給您,以便您可以創建模組化系統。Blackwell平台已經到來。我們公司每年都有一個節奏。我們的基本理念非常簡單。首先,建造整個數據中心,擴展這個聚合的系統,並以每年一次的節奏將其部分出售給您。我們將一切推向技術極限,無論是台積電的製程技術,我們將其推向極限,無論是封裝技術,我們將其推向極限,無論是記憶體技術,我們將其推向極限,嚴肅的技術,光學技術,一切都推向極限。然後,在此之後,做一切事情,以便我們的所有軟體都能在這個整個安裝基座上運行。

Software inertia is the single most important thing in computers. When a computer is backwards compatible and it’s architecturally compatible with all the software that has already been created, your ability to upgrade, your ability to buy new systems is incredibly easy. So, we want to make sure that our entire platform is architecturally compatible with the future and the past, as much as possible. So, we build the entire data center and we do everything so that it fits into the one-year rhythm of our company. Lastly, make everything open. Work with the world’s computing ecosystem, the most advanced companies, the most advanced engineers and computer scientists, and make our platform open so that we could advance computing at the fastest rate possible. Every single computer manufacturer in the world is now building Blackwell. Some of them, of course, our partners in Taiwan. ASUS, GIGA, ASRock, PEGATRON, Quanta, Inventec, the entire Taiwanese computing industry. This ecosystem is going to make it possible for the world to advance computing at the fastest rate possible, and with their partnership, with their hard work, with their ingenuity, their dedication, we’re going to make that happen. Thank you, Taiwan.

軟體慣性是計算機中最重要的事情。當一台計算機向後兼容,並且與所有已經創建的軟體在架構上兼容時,您的升級能力,購買新系統的能力變得非常容易。因此,我們希望確保我們的整个平台盡可能在架構上兼容未來和過去。因此,我們建造了整個數據中心,並做了一切事情,使其符合我們公司的年度節奏。最後,開放一切。與世界計算生態系統合作,與最先進的公司,最先進的工程師和計算機科學家合作,使我們的平臺開放,以便我們能夠以最快的速度推進計算。世界上每一個計算機製造商現在都在製造Blackwell。当然,其中一些是我們在台灣的合作夥伴。華碩,技嘉,華擎,和碩,廣達,英業達,整個台灣計算產業。這個生態系統將使世界能夠以最快的速度推進計算,並且通過他們的合作,通過他們的努力,通過他們的創新,他們的奉獻,我們將實現這一目標。謝謝你們,台灣。


The next version of Blackwell, we call it Grace Hopper MGX. MGX is designed to be a modular system, and it is designed to be modular so that you could have one CPU, two GPUs, or two CPUs, four GPUs. You could connect multiple modules together into one configuration. It gives you an enormous amount of flexibility for you to build your own systems.

Blackwell的下一個版本,我們稱之為Grace Hopper MGX。MGX被設計為一個模組化系統,它被設計為模組化,以便你可以有一個CPU,兩個GPU,或兩個CPU,四個GPU。你可以將多個模組連接在一起形成一個配置。它為你提供了巨大的靈活性,使你能夠構建自己的系統。

Let me show you how MGX, Grace Hopper, Blackwell, Spectrum-X, and all of it comes together to build the world’s largest systems. Ladies and gentlemen, let me introduce to you the new NVIDIA AI supercomputer, Israel-1. Israel-1 was designed, and we worked closely with our partners in Taiwan, Korea, and Japan. There are several reasons why we did that. First of all, Taiwan is the home of our manufacturing partners. You are our most important partners, and in many cases, we co-design the systems together. We co-designed Blackwell, we co-designed the Hopper platform, we co-designed Israel-1, and many of these systems will be built by our manufacturing partners here in Taiwan. So, it was very important to us to debut this machine here. This machine is designed to handle AI models on the scale of 100 trillion parameters. It is the most powerful supercomputer in the world. Let me show you some of the specs. The computational throughput is 400 exaFlops, one large language model in memory training. This is a 16x throughput improvement over the fastest supercomputer that we have in operation today, Bluefield 3, Bluefield 4. The amount of computing throughput, the amount of networking throughput, it’s literally unbelievable.

讓我向您展示MGX,Grace Hopper,Blackwell,Spectrum-X以及所有這些如何結合在一起構建世界上最大的系統。女士們,先生們,讓我向您介紹新的NVIDIA AI超級計算機,Israel-1。Israel-1是我們設計的,我們與台灣,韓國和日本的合作夥伴緊密合作。我們這樣做有幾個原因。首先,台灣是我們製造合作夥伴的所在地。你們是我們最重要的合作夥伴,在許多情況下,我們共同設計系統。我們共同設計了Blackwell,我們共同設計了Hopper平台,我們共同設計了Israel-1,許多這些系統將由我們在台灣的製造合作夥伴建造。因此,對我們來說,在這裡首次展示這台機器非常重要。這台機器被設計為處理規模為100萬億參數的AI模型。它是世界上最強大的超級計算機。讓我向您展示一些規格。計算吞吐量為400 exaFlops,一個大型語言模型在記憶體訓練中。這是我們今天運行的最快超級計算機Bluefield 3, Bluefield 4的16倍吞吐量改進。計算吞吐量,網路吞吐量的數量,真的是令人難以置信。

There’s a lot of technology that we put into this. We’re running it, co-developing it with our partners in Taiwan, and it will be available to all the CSPs, all the clouds, all the hyperscalers, and even, of course, the on-prem companies, our telecommunication partners. So, ladies and gentlemen, we put it all together, all of this technology to create a system that is absolutely incredible. NVIDIA Israel-1. The technology that’s inside this is absolutely incredible. The amount of throughput, the cost savings, the ability for us to now drive down the cost of training AI and driving down the cost of inference is going to make AI much, much more accessible to many companies all over the world. I have one more thing to talk to you about, and that is this: we are on the verge of the Industrial Revolution, the fourth industrial revolution, and in that, we believe that there are several parts to it that are very different from the prior. Of course, the first one was the steam engine, then we have electricity, and then after that, we have computers. This next one, of course, is about AI and robotics and Omniverse, the ability to simulate all kinds of digital data, real-time data, and the ability to deploy that, orchestrate that to the real world. This is going to be the fourth industrial revolution, and it’s going to revolutionize many, many industries. This is a factory that was created by a great company, Foxconn. Foxconn designed this factory, and we collaborated with them very closely. I want to show you an example of how our technology will all come together to power the fourth industrial revolution. Foxconn is at the forefront of a new industrial revolution, and it begins in its factories. Partnering with NVIDIA, Foxconn’s Smart Manufacturing Initiative is revolutionizing automation and robotics.

我們在這裡投入了大量的技術。我們正在運行它,與我們在台灣的合作夥伴共同開發,它將可供所有CSP,所有雲端,所有超大規模計算公司,甚至當然還有本地公司,我們的電信合作夥伴使用。因此,女士們,先生們,我們將所有這些技術結合在一起,創建了一個絕對令人難以置信的系統。NVIDIA Israel-1。這裡面的技術是絕對令人難以置信的。吞吐量的數量,成本節省,我們現在降低AI訓練成本和推理成本的能力,將使AI變得更容易讓世界各地的許多公司接觸。我還有一件事要告訴你們,那就是:我們正處於工業革命的邊緣,第四次工業革命,在這次革命中,我們相信它的幾個部分與之前的工業革命非常不同。當然,第一次是蒸汽機,然後是電力,然後是計算機。當然,這下一個是關於AI和機器人技術以及Omniverse,模擬各種數字數據、即時數據並將其部署到現實世界中的能力。這將是第四次工業革命,它將革新許多行業。這是一家偉大公司,富士康創建的工廠。富士康設計了這家工廠,我們與他們密切合作。我想向你們展示一個例子,說明我們的技術如何結合在一起,為第四次工業革命提供動力。富士康處於新工業革命的前沿,這始於其工廠。與NVIDIA合作,富士康的智慧製造計劃正在革新自動化和機器人技術。

The factory you see here is real, digitally recreated and designed in NVIDIA Omniverse. This enables real-time AI training, testing, and digital twin simulation. Foxconn’s industrial AI platform combines Isaac Sim, Metropolis, and Nvidia AI, leveraging the latest advancements in generative AI, robotics simulation, and 3D visualization. AI-enabled robotics can handle dangerous tasks, increase productivity, and enable faster response times to customer demands. Industrial manufacturing demands safety, precision, and reliability, and NVIDIA and Foxconn are bringing cutting-edge AI and robotics technologies into their factories. AI-enabled robots work together, allowing for the safe and efficient completion of tasks. Foxconn’s Smart Manufacturing facilities are powered by NVIDIA AI technology, transforming how products are manufactured and creating a new industrial revolution.

你們在這裡看到的工廠是真實的,數位重建並在NVIDIA Omniverse中設計。這使實時AI訓練、測試和數字雙胞胎模擬成為可能。富士康的工業AI平台結合了Isaac Sim、Metropolis和Nvidia AI,利用生成型AI、機器人模擬和3D可視化的最新進展。AI驅動的機器人可以處理危險任務,提高生產力,並能夠更快地響應客戶需求。工業製造要求安全、精確和可靠,NVIDIA和富士康正在將尖端的AI和機器人技術引入他們的工廠。AI驅動的機器人協同工作,允許安全高效地完成任務。富士康的智慧製造設施由NVIDIA AI技術提供支持,改變了產品的製造方式,創造了一場新的工業革命。

What you just saw is not a dream. It’s not even a science project. It’s running in production. Today, they are improving productivity, reducing costs, and improving the quality of everything that they do. Today, they’re in production. They’ve been running for the last couple of years, and this revolution, the fourth industrial revolution, is happening. It’s happening all over the world. The level of automation, the level of robotics, the level of digitization is just incredible. The most important thing, of course, is to engage and to inspire. We need the next generation of scientists and technologists and engineers to come and imagine and create and invent and build. The only way to do that, of course, is to put technology in their hands. One of the things that we love to do is to take our technology and put it in the hands of researchers, engineers, computer scientists, and even artists, and inspire them to do their best work. I’m really excited to share this next part with you.

你剛才看到的不是夢想。這甚至不是一個科學項目。它正在生產中運行。今天,他們正在提高生產力,降低成本,並提高他們所做的一切的質量。今天,他們正在生產中。他們已經運行了幾年,這場革命,第四次工業革命,正在發生。這正在全世界發生。自動化水平、機器人水平、數字化水平真是令人難以置信。當然,最重要的是參與和啟發。我們需要下一代科學家、技術專家和工程師來想像、創造、發明和建設。當然,做到這一點的唯一方法是將技術放在他們手中。我們喜愛做的一件事是將我們的技術放在研究人員、工程師、計算機科學家甚至藝術家的手中,啟發他們做出最好的工作。我非常興奮地與你分享這下一部分。

These are the very first NVIDIA Fellows for the NVIDIA HBCU Fellowship program. My name is Anthony Gaskins. I’m a rising fourth-year Ph.D. student at North Carolina A&T State University. What is incredibly exciting about the future of AI is the idea that these systems can go on and be independent and intelligent. My name is Breannah Carson. I am a rising third-year Ph.D. student at Florida Agricultural and Mechanical University. My name is Michael Scott. I am a Ph.D. student at Morgan State University. Artificial intelligence is something that will take over the entire world. The possibilities are limitless. My name is Toni N. Adams, and I’m a fourth-year Ph.D. student at Florida A&M University. Being able to use Omniverse to build out a warehouse in virtual space allows us to scale it up as large as we want. This, I’m very excited about. Last year, we made a decision that I think will benefit a lot of people, and that is to put Omniverse in the hands of every single engineering student in Taiwan. All of you, this is a real-time simulator. It is a collaborative platform for engineers, scientists, artists, creators. It’s used for connecting robotics, connecting IoT devices, and for connecting people. You can all work together in this virtual world, and it’s used for simulating digital twins of factories, buildings, cities, and all kinds of things. It is an amazing, amazing piece of technology. It’s been used to create beautiful films, like the ones that I showed you earlier. It’s been used for the creation of artificial intelligence, trained on simulated data. It’s now going to be available to every single engineering student in Taiwan.

這些是第一批NVIDIA HBCU獎學金計劃的NVIDIA研究員。我的名字是Anthony Gaskins。我是北卡羅來納農工州立大學即將升入四年級的博士生。關於AI未來,令人難以置信的激動人心的想法是,這些系統可以獨立存在並且智能化。我的名字是Breannah Carson。我是佛羅里達農工大學即將升入三年級的博士生。我的名字是Michael Scott。我是摩根州立大學的博士生。人工智能是一個將接管整個世界的東西。可能性是無限的。我的名字是Toni N. Adams,我是佛羅里達農工大學的四年級博士生。能夠使用Omniverse在虛擬空間中建造一個倉庫,使我們能夠將其擴展到我們想要的任何規模。這讓我非常興奮。去年,我們做出了一個決定,我認為這將使許多人受益,那就是將Omniverse放在每一位台灣工程學生的手中。你們所有人,這是一個實時模擬器。這是一個為工程師、科學家、藝術家、創作者設計的協作平台。它用於連接機器人、連接物聯網設備,並用於連接人們。你們都可以在這個虛擬世界中一起工作,它用於模擬工廠、建築、城市和各種事物的數字雙胞胎。這是一個驚人的,驚人的技術作品。它已被用來創造美麗的電影,就像我之前展示給你們的那些。它還被用於創建人工智能,基於模擬數據進行訓練。現在,它將向每一位台灣工程學生開放。

The reason for that is because this is the way we will create the future of computing and the future of engineering and the future of many things, whether it’s in healthcare, manufacturing, or whatever industry. If you would like to try it, go to Omniverse, and we would love to get your feedback and love to see the great things that you create. Thank you for inviting me to Computex. It’s great to be back in Taiwan. I hope to see you guys at the night market. Have a great time. Thank you.

這樣做的原因是,這是我們創造計算未來、工程未來以及許多事物未來的方式,無論是醫療、製造業還是任何行業。如果您想嘗試一下,請訪問Omniverse,我們非常希望收到您的反饋,也希望看到您創造的偉大事物。感謝您邀請我來Computex。很高興回到台灣。我希望在夜市見到你們。祝你們玩得開心。謝謝。

圖片出處:  https://news.housefun.com.tw/news/article/116788425243.html

發表迴響

探索更多來自 The Bilingual Lens 雙語視界 的內容

立即訂閱即可持續閱讀,還能取得所有封存文章。

Continue reading