What I really enjoy is bacon sandwiches, but that's not what I wanted to talk about here. Another thing I enjoy is seeing startups evolve from the sparkle of the founder's eyes to offering the first product.
In the mist of time we called May 2020 (I wrote these words two years ago), Jim Tarley wrote a column on the following topic: Creating a universal processor Here EE journal. The focus of this column was on a new type of processor called. genius It was developed by a startup called Tachyum..
As Jim said about the Tatum people at the time, “Their goals are nothing but bold. Prodigy is faster than Intel's Xeon, but consumes a tenth less power than current server processors. It also occupies a quarter of the cost of ownership (TCO). It occupies less silicon than an ARM design. It runs both AI and hyperscale server workloads in an equivalent aprom on a single chip. The versions can all be expanded to 16-128 cores in the same 6400 ball package. It's also 3x cheaper than Intel and AMD chips. It's not hard to achieve that last one. Others? Wait please look.”
Just a year after Jim's column, and a year before the work you're currently reading, I asked a question in my column. Are you ready for human brain-scale AI? I was renewing my memory about that column. It's open on one of the screens when you're talking. (When I say “speak,” I speak figuratively, figuratively. The saying is, “In order to understand recursion, you must first understand recursion.” Reminds me.) Mentioned Johnny and the bomb Slow and wonderful Terry Pratchett (but “I think I ran away with it“As Basil Fawlty famously said).
The reason I brooched this exaggerated book is because it contains a character called Mrs. Takyung. Mrs. Takyung pushes a supermarket trolley while tweeting an unexpected utterance and a mysterious tweet that no one can understand. It is no exaggeration to say that Mrs. Tachon made her “effect” “inefficient”. Eventually, it turned out that her trolley was functioning as a time machine. This explains a lot of things that have confused us in the story, but let's not get lost in the weeds. When I hear the news of Tachyum, Mrs. Tachyon jumps into my heart (I call it with a laugh). Thank you for not being with me (cumaniac laughter).
The reason I was confused here was that the wait was almost over and Tachyum officially launched Prodigy, the “world's first universal processor.” The underlying idea is that people are now using different types of processors to perform different types of tasks. Central processing unit (CPU) for general purpose processing, graphics processing unit (GPU) for graphics, and AI accelerators for parallel data and artificial intelligence (AI) applications.
The really (really) simplified way to visualize this is that the CPU does the best with scalar values, the GPU does the best with vector values, and the AI accelerator does the best with matrix values. Is to do. What Prodigy does is integrate CPU, GPU, and TPU functionality into a single architecture implemented in a single monolithic device.
Tachyum's Prodigy is the world's first universal processor.
(Image source: Tachyum)
Note that in the computing context, the TLA (three-letter abbreviation) TPU is usually assumed to represent a “tensor processing unit”. This is an AI accelerator application-specific integrated circuit (ASIC) developed by Google specifically for machine learning of neural networks, especially using Google's proprietary TensorFlow software. Google started using TPU internally in 2015 and made it available to third parties in 2018. However, in the context of this column, I understand that TPU stands for “Tachyum Processing Unit.”
Tachyum's first commercial product, Prodigy Cloud / AI / HPC supercomputer processor chip, implemented with advanced 5nm process technology, delivers four times the performance of the fastest Xeon and three times the NVIDIA H100 on HPC. It has raw performance, 6 times the raw performance. Achieve up to 10x performance with the same power with AI training and inference workloads.
FPGA-based Prodigy prototype (image source: Tachyum)
Tachyum people say Prodigy is ready to overcome the challenges of increased data center power consumption, lower server utilization, and stagnant performance scaling. Here are some of the highlights of the newly launched Prodigy processor:
- 128 high-performance unified 64-bit cores operating at up to 5.7GHz
- 16 DDR5 memory controllers
- 64 PCIe 5.0 lane
- Multiprocessor support for 4-socket and 2-socket platforms
- Rack solution for both air-cooled and liquid-cooled data centers
- SPECrate2017 Integer performance is about 4 times Intel 8380 and about 3 times AMD7763HPC
- Double precision floating point performance is 3x that of NVIDIA H100
- AIFP8 performance is 6 times higher than NVIDIA H100
Unlike other CPU and GPU solutions, Prodigy is designed to handle vector and matrix processing from scratch rather than thinking later. Prodigy's vector and matrix features support a variety of data types (FP64, FP32, TF32, BF16, Int8, FP8, and TAI). Vector units of 2 x 1024 bits per core. AI sparse and super sparse support. Also, there is no penalty for vector load or store misalignment as it crosses the cache line. This built-in support provides high performance for AI training and inference workloads, improving performance and reducing memory usage.
Tachyum guys and gals are happy to talk to those who listen. “Prodigy is far superior to the best performing processors currently available in the hyperscale, HPC and AI markets. Prodigy is up to 3 times the best performing x86 processor for cloud workloads on HPC. Delivers up to 3x and up to 6x performance for AI applications compared to the highest performance GPUs. Prodigy offers unmatched carbon dioxide emissions reductions by improving performance with less power. By doing so, it solves the problem of sustainable data center growth. This is especially important as the universality of AI continues to attract attention. Prodigy sets a precedent as part of this new world market. Enables TCO reduction for data centers without. “
To back up all of this, Tachyum's Chaps and Chaps spent a lot of time presenting me with an amazing set of charts and graphics, including:
- Prodigy vs. x86 (AMD7763 and Intel8380): FP64 floating point raw performance.
- Prodigy vs. Nvidia H100 GPU (H100DP and H100AI): HPC and AI
- Prodigy and AMD MI250XGPU (MI250XDP and MI250AI): HPC and AI
- Prodigy vs. x86: SPECrate 2017 Integer (AMD7763 Performance and Intel 8380 Performance)
- Prodigy and Nvidia H100: Rack Level Comparison (H100 DGX POD and Prodigy Air Cooled Rack and Liquid Cooling)
I'm not as stupid as it looks (but again, who could be). Using my amazing wetwear processor (one of my three favorite organs), I detected subtle patterns that went through all the bar charts. That is, all columns representing Tachyum's Prodigy stand proudly in the crowd compared to competing products. ..
Sampling of Prodigy will begin later this year and will be in mass production in the first half of 2023. I don't know you, but I can't wait to meet and greet the first Prodigy chip. How about you? Do you have any ideas you would like to share?
Related