Tachyum Runs Supercomputing Vector LINPACK on Prodigy FPGA

2023年12月6日 · 読むのに 4 分

LAS VEGAS, December 6, 2023 – Tachyum^®, creator of Prodigy^®, the world’s first Universal Processor, today announced it has successfully completed vector-based High-Performance LINPACK (HPL) testing using 1kb vectors on the Prodigy FPGA.

LINPACK, a widely used benchmarking program for supercomputers, measures a system’s floating-point computing power by solving a random, dense system of linear equations to determine performance and accuracy. Following scalar LINPACK benchmarks using Prodigy’s IEEE-compliant scalar Floating-Point Unit (FPU), Tachyum has now advanced to vector LINPACK.

Prodigy’s vector unit was designed with a range of cutting-edge features to provide industry-leading performance. Prodigy has two vector pipelines with a 1024b-wide data path. Executing 2x1K SIMD (Single Instruction/Multiple Data) operations per cycle, Prodigy delivers 32 double precision FMA (Fused IEEE floating-point multiply-add) operations per cycle, delivering 64 double precision floating-point operations, because the FMA consists of multiplication and addition per cycle and per core.

Prodigy’s memory access micro-architecture uses an innovative approach to support unaligned data, meaning that unaligned operations are processed without incurring the performance loss that many other architectures in the market encounter. These features, along with high clock rates, enable Tachyum to deliver the highest vectorized data processing performance in the industry. Additionally, masking, unaligned memory access and loop control operations of the Prodigy architecture allow for efficient auto-vectorization in compilers. Compilers from Tachyum’s software ecosystem already fully utilize these features.

Vector instructions have been verified running cleanly on the Prodigy FPGA, including vectorization in GNU Compiler Collection (GCC), libraries, Linux-supporting vectors, as well as running applications and delivering the correct results.

With vector LINPACK completed, Tachyum is now focused on the final stage of vector unit verification and testing with FPU: AI matrix operations.

“Supercomputing is more than merely an exponent of standard computing. There are complex processes to measure performance and capability: ensuring that all bits in vectors are correctly wired, that IEEE flags are correctly reported, and many cases of data-shuffling vector operations, but we got there,” said Dr. Radoslav Danilak, founder and CEO of Tachyum. “We can see the light at the end of the tunnel as we move towards volume production in 2024, and towards fulfilling our multibillion-dollar sales pipeline.”

As a Universal Processor offering utility for all workloads, Prodigy-powered data center servers can seamlessly and dynamically switch between computational domains (such as AI/ML, HPC, and cloud) on a single architecture. By eliminating the need for expensive dedicated AI hardware and dramatically increasing server utilization, Prodigy reduces CAPEX and OPEX significantly while delivering unprecedented data center performance, power, and economics. Prodigy integrates 192 high-performance custom-designed 64-bit compute cores, to deliver up to 4,5x the performance of the highest-performing x86 processors for cloud workloads, up to 3x that of the highest performing GPU for HPC, and 6x for AI applications.

A video demonstrating the vector LINPACK on Prodigy FPGA is available for viewing below.

Follow Tachyum

https://twitter.com/tachyum

https://www.linkedin.com/company/tachyum

https://www.facebook.com/Tachyum/

About Tachyum

Tachyum is transforming the economics of AI, HPC, public and private cloud workloads with Prodigy, the world’s first Universal Processor. Prodigy unifies the functionality of a CPU, a GPU, and a TPU in a single processor to deliver industry-leading performance, cost and power efficiency for both specialty and general-purpose computing. As global data center emissions continue to contribute to a changing climate, with projections of their consuming 10 percent of the world’s electricity by 2030, the ultra-low power Prodigy is positioned to help balance the world’s appetite for computing at a lower environmental cost. Tachyum recently received a major purchase order from a US company to build a large-scale system that can deliver more than 50 exaflops performance, which will exponentially exceed the computational capabilities of the fastest inference or generative AI supercomputers available anywhere in the world today. When complete in 2025, the Prodigy-powered system will deliver a 25x multiplier vs. the world’s fastest conventional supercomputer – built just this year – and will achieve AI capabilities 25,000x larger than models for ChatGPT4. Tachyum has offices in the United States and Slovakia. For more information, visit https://www.tachyum.com/.