Our software engineers are working hard to enable the full potential of the Tachyum Prodigy® Universal Processor from day one with a rich ecosystem of applications, system software, and frameworks and libraries that are ported to run natively on Prodigy.
Furthermore, the Prodigy ecosystem is continuing to grow with an exciting roadmap that is constantly adding new software.
In addition, Prodigy is able to run binaries for x86, Arm, and RISC-V, enabling quick, easy, out-of-the-box testing and evaluation for customers and partners using their existing ISA before running their applications natively on Prodigy.
64KB Pages for OS and Apps
64K pages will enable the much higher performance of the operating system and applications and it is the preferred mode for Tachyum Prodigy.
Part of 64K pages support is 512MB THP support and once released in Linux 32MB mTHP (multi-size THP) pages on Tachyum Prodigy.
Transparent Huge Pages (THP) is a Linux memory management system that reduces the overhead of Translation Lookaside Buffer (TLB) lookups on machines with large amounts of memory by using larger memory pages.
The operating system, file systems, and applications will be optimized to 16KB Indirection Unit (UI) for SSDs to prepare for high-capacity SSDs.
The 64K pages will be default for Tachyum Linux and FreeBSD. The 4KB pages will be only custom-built for customers with special requirements and agreements with Tachyum.
AutoFDO
Tachyum’s Prodigy Universal Processor platform supports out-of-box AutoFDO flow for hardware-assisted profile collection, offering customers a choice of optimization techniques depending on their specific needs.
PERF Instruction Profiling
perf
can instrument CPU performance counters, tracepoints, kprobes, and uprobes (dynamic tracing). It is capable of lightweight profiling. It is also included in the Linux kernel, under tools/perf, and is frequently updated and enhanced.Watchdog
Prodigy’s watchdog timer is used to detect and recover from device or system malfunction and is used to facilitate automatic correction of temporary hardware faults and to prevent errant or malevolent software from disrupting system operations.
Available as both a hardware and virtualized time, Prodigy’s watchdog is part of a self-healing infrastructure within datacenters. It allows the automatic recovery of servers that hang without human intervention. This functionality is important for organizations with hundreds to thousands of servers.
Prodigy’s hardware watchdog timer must be cleared by the operating system based on a programmable timeout value or it will trigger a non-maskable interrupt and will be used to initiate corrective actions and will initiate a soft reset and then try to boot the OS on a single core, save the kernel dump, and then reboot. For a virtualized environment, the watchdogs are virtualized themselves so that each VM maintains an individual timer.
Emulators
gdbsim
This CPU emulator can be used as either a standalone functional emulator of Prodigy ISA or as part of cross-mode GDB allowing to debug Prodigy user mode application on any other host platform.
QEMU
QEMU is a generic and open-source machine emulator and virtualizer. It supports both User-mode emulation and Full-system emulation.
Tachyum port can be used for two purposes - to emulate Prodigy ISA on any other supported architecture or to run pre-existing applications on Prodigy Linux/BSD system.
When used as a user-mode binary emulator, QEMU enables dynamic binary translation from x86-64, ARM v8 and RISC-V.
Binary Tools
Debugging
GDB
GDB, the GNU Project debugger, allows you to see what is going on “inside” another program while it executes or what another program was doing at the moment it crashed.
With GDB, a Prodigy application can be debugged in various scenarios: as a part of cross-toolchain using either functional simulator or QEMU as execution engine on non-Prodigy host; connecting to GDB server running on native or full-system-mode-emulation Linux/BSD kernel, from any supported host; or as a native Linux/BSD debugger.
OpenOCD
“Open On-Chip Debugger” allows to connect to a target using low-level JTAG interface to help facilitate system bring-up and initial stages of very low-level debugging.
KGDB
Kgdb is intended to be used as a source level debugger for the Linux kernel. It is used along with gdb to debug a Linux kernel.
JTAG Debugger
JTAG is more than debugging and programming. Processors use JTAG to provide access to their debug/emulation functions and all FPGAs and CPLDs use JTAG to provide access to their programming functions.
Compilers and Libraries
GCC
The GNU Compiler Collection includes front ends for C, C++, Objective-C, Fortran, Ada, Go, and D, as well as libraries for these languages (libstdc++,…).
Existing since 1987, this compiler is a de-facto standard for almost every major CPU platform.
Glibc
The GNU C Library project provides the core libraries for the GNU system and GNU/Linux systems, as well as many other systems that use Linux as the kernel. These libraries provide critical APIs including ISO C11, POSIX.1-2008, BSD, OS-specific APIs and more. These APIs include such foundational facilities as open, read, write, malloc, printf, getaddrinfo, dlopen, pthread_create, crypt, login, exit and more.
The GNU C Library is designed to be a backwards compatible, portable, and high-performance ISO C library. It aims to ollow all relevant standards including ISO C11, POSIX.1-2008, and IEEE 754-2008.
Go Compiler
The Go language has always been defined by a spec, not an implementation. The Go team has written two different compilers that implement that spec:
gc
andgccgo
.gc
is the original compiler, and the go tool uses it by default,gccgo
is a different implementation with a different focus.
Both of these compilers are natively supported by the Tachyum Prodigy® ISA.
Clang/LLVM
The Clang project provides a language front-end and tooling infrastructure for languages in the C language family (C, C++, Objective C/C++, OpenCL, CUDA, and RenderScript) for the LLVM project.
Clang is considered to be a production quality C, Objective-C, C++ and Objective-C++ compiler. As example, Clang is used in production to build performance-critical software like Chrome or Firefox.
Boot Loaders and Monitors
UEFI
UEFI stands for “Unified Extensible Firmware Interface.” The UEFI Specification defines a new model for the interface between personal-computer operating systems and platform firmware. The interface consists of data tables that contain platform-related information, plus boot and runtime service calls that are available to the operating system and its loader. Together, these provide a standard environment for booting an operating system and running pre-boot applications.
OpenBMC
The OpenBMC project is a Linux Foundation collaborative open-source project whose goal is to produce an open-source implementation of the Baseboard Management Controllers (BMC) Firmware Stack. OpenBMC is a Linux distribution for BMCs meant to work across heterogeneous systems that include enterprise, high-performance computing (HPC), telecommunications, and cloud-scale data centers.
OS Kernels and Distro Tools
Linux kernel
The Linux kernel is a free and open-source, monolithic, Unix-like operating system kernel. It is deployed on a wide variety of computing systems, from personal computers, mobile devices, mainframes, and supercomputers to embedded devices, such as routers, wireless access points, private branch exchanges, set-top boxes, FTA receivers, smart TVs, personal video recorders, and NAS appliances.
Its availability, continuous development, and ongoing support have spawned a plethora of operating system distributions, commonly also called Linux.
FreeBSD
FreeBSD is an operating system used to power modern servers, desktops, and embedded platforms. A large community has continually developed it for almost thirty years. Its advanced networking, security, and storage features have made FreeBSD the platform of choice for many of the busiest web sites and most pervasive embedded networking and storage devices.
Virtualization & Interoperability
Linux KVM
KVM (for Kernel-based Virtual Machine) is a full virtualization solution for Linux on x86 hardware containing virtualization extensions. It consists of a loadable kernel module, kvm.ko, that provides the core virtualization infrastructure and a processor specific module, kvm-intel.ko or kvm-amd.ko.
Xen Project
The Xen Project is focused on advancing virtualization in a number of different commercial and open source applications, including server virtualization, Infrastructure as a Service (IaaS), desktop virtualization, security applications, embedded and hardware appliances, and automotive/aviation.
Docker
Docker is a set of platform as a service products that use OS-level virtualization to deliver software in packages called containers. Containers are isolated from one another and bundle their own software, libraries and configuration files; they can communicate with each other through well-defined channels.
Programming Languages
C/C++
C is a general-purpose, procedural computer programming language supporting structured programming, lexical variable scope, and recursion, with a static type system.
C++ is a general-purpose programming language created by Bjarne Stroustrup as an extension of the C programming language, or “C with Classes”.
Fortran
Fortran is a general-purpose, compiled imperative programming language that is especially suited to numeric computation and scientific computing.
Go
Go is a statically typed, compiled programming language designed at Google. Go is syntactically similar to C, but with memory safety, garbage collection, structural typing, and CSP-style concurrency.
Java-JVM
JVM (Java Virtual Machine) is an abstract machine. It is a specification that provides runtime environment in which java bytecode can be executed.
Lua
Lua is a powerful, efficient, lightweight, embeddable scripting language. It supports procedural programming, object-oriented programming, functional programming, data-driven programming, and data description.
Perl
Perl is a family of two high-level, general-purpose, interpreted, dynamic programming languages. It runs on over 100 platforms from portables to mainframes and is suitable for both rapid prototyping and large-scale development projects.
PHP
PHP is a popular open source general-purpose scripting language that is especially suited to web development and can be embedded into HTML.
Python
Python is an interpreted, high-level and general-purpose programming language. It is used successfully in thousands of real-world business applications around the world, including many large and mission critical systems.
Ruby
Ruby is an interpreted, high-level and general-purpose programming language with a focus on simplicity and productivity.
Tcl
Tcl is a high-level, general-purpose, interpreted, dynamic programming language. It is suitable for a very wide range of uses, including web and desktop applications, networking, administration, testing and many more.
Performance Testing and Optimization
SPEC’s Benchmarks
Benchmark suites developed by The Standard Performance Evaluation Corporation (SPEC).
SPECint2006
Test suite consists of 12 benchmark programs, designed to test exclusively the integer performance of the system.
SPECint2017
Test suite consists of benchmark programs designed to test exclusively the integer performance of the system.
SPECfp2006
Test suite contains 17 benchmark programs, designed to evaluate the floating-point operations performance of the system.
SPECfp2017
Test is organized in 2 suites: SPECrate 2017 Floating Point and SPECspeed 2017 Floating Point containing in total 23 benchmark programs, designed to evaluate the floating-point operations performance of the system.
LINPACK Benchmark
The Linpack Benchmark is a measure of a computer’s floating-point rate of execution. It is determined by running a computer program that solves a dense system of linear equations. The Linpack Benchmark is something that grew out of the Linpack software project. It was originally intended to give users of the package a feeling for how long it would take to solve certain matrix problems.
HPL (High Performance LINPACK)
HPL is a High-Performance Linpack benchmark implementation. The code solves a uniformely random system of linear equations and reports time and floating-point execution rate using a standard formula for operation count.
PERF Counters
perf
can instrument CPU performance counters, tracepoints, kprobes, and uprobes (dynamic tracing). It is capable of lightweight profiling. It is also included in the Linux kernel, under tools/perf, and is frequently updated and enhanced.
Error Handling
Workload Management
Web and E-mail
Apache
Apache is the most widely used open-source cross-platform web server software. The Apache HTTP Server Project is part of the Apache Software Foundation.
Postfix
Postfix is a free and open-source mail transfer agent that routes and delivers electronic mail.
Dovecot
Dovecot is an open-source IMAP and POP3 server for Unix-like operating systems, written primarily with security in mind.
Database Systems
MariaDB
MariaDB Server is one of the most popular open source relational databases. It’s made by the original developers of MySQL. MariaDB supports a lot of different storage engines.
MongoDB
MongoDB is a general purpose, document-based, distributed database built for modern application developers and for the cloud era. MongoDB is a NoSQL database program, which stores data in JSON-like documents with dynamic schema.
SQLite
SQLite is the most used database engine in the world. SQLite is a C-language library that implements a small, fast, self-contained, high reliability, full-featured SQL database engine.
RocksDB
RocksDB is a storage engine with key/value interface, where keys and values are arbitrary byte streams. It is a C++ library. It was developed at Facebook based on LevelDB and provides backwards-compatible support for LevelDB APIs.
Editors and VCS
Git
Git is an open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency.
Subversion, Svn
Subversion (abbreviate Svn after its command name svn) is a software versioning and revision control system distributed as open source under the Apache License.
Vim
Vim is a highly configurable text editor built to make creating and changing any kind of text very efficient. It is included as “vi” with most UNIX systems and with Apple OS X.
Other Applications
Sed
Sed is a stream editor, which is used to perform basic text transformation on an input stream.
Gawk
Gawk is the GNU implementation of Awk, a specialized programming language for the easy manipulation of formatted text, such as tables of data.
Grep
Grep is a command-line utility for searching plain-text data sets for lines that match a regular expression.
gzip
gzip is a single-file/stream lossless data compression utility, where the resulting compressed file generally has the suffix .gz.
tar
GNU tar is an archiving program designed to store multiple files in a single file (an archive), and to manipulate such archives.
Ceph
Ceph is an open-source software storage platform, which implements object storage on a single distributed computer cluster and provides 3-in-1 interfaces for object-, block- and file-level storage.
Operating System Libraries
Buildroot
Buildroot is a simple, efficient and easy-to-use tool to generate embedded Linux systems through cross-compilation.
Can handle everything: Cross-compilation toolchain, root filesystem generation, kernel image compilation and bootloader compilation.
Thanks to its kernel-like menuconfig, gconfig and xconfig configuration interfaces, building a basic system with Buildroot is easy and typically takes 15-30 minutes.
Yocto Project (Open Embedded)
The Yocto Project (YP) is an open-source collaboration project that helps developers create custom Linux-based systems regardless of the hardware architecture.
The project provides a flexible set of tools and a space where embedded developers worldwide can share technologies, software stacks, configurations, and best practices that can be used to create tailored Linux images for embedded and IOT devices, or anywhere a customized Linux OS is needed.
Parallel Processing
RabbitMQ
With tens of thousands of users, RabbitMQ is one of the most popular open source message brokers.
RabbitMQ is lightweight and easy to deploy on premises and in the cloud. It supports multiple messaging protocols. RabbitMQ can be deployed in distributed and federated configurations to meet high-scale, high-availability requirements.
gRPC
gRPC is a modern open source high performance Remote Procedure Call (RPC) framework that can run in any environment. It can efficiently connect services in and across data centers with pluggable support for load balancing, tracing, health checking and authentication. It is also applicable in last mile of distributed computing to connect devices, mobile applications and browsers to backend services.
OpenMP
A specification for a set of compiler directives, library routines, and environment variables that can be used to specify high-level parallelism in Fortran and C/C++ programs. OpenMP is managed by the OpenMP Architecture Review Board (OpenMP ARB).
Open MPI
The Open MPI Project is an open source Message Passing Interface implementation that is developed and maintained by a consortium of academic, research, and industry partners. Open MPI is therefore able to combine the expertise, technologies, and resources from all across the High Performance Computing community in order to build the best MPI library available.
Scientific Libraries
Eigen Library
Dense and Sparse BLAS - optimized using Tachyum vector and matrix instructions in standard and low precision modes.
Dense and Sparse LAPACK - LU, QR, Cholesky, Eigenvector/Eigenvalue, SVD decompositions.
LAPACK
LAPACK is written in Fortran 90 and provides routines for solving systems of simultaneous linear equations, least-squares solutions of linear systems of equations, eigenvalue problems, and singular value problems.
Real and Complex Eigensolvers for Large linear systems
Iterative methods, conjugate gradient and polynomial filtered Lanczos optimized using Tachyum BLAS and GEMM.
FFT Library
Optimized using Tachyum vector and matrix.
ODE/ PDE numerical solvers
AI Frameworks Custom Extensions
Stochastic Rounding for BLAS-GEMM
The BLAS, Basic Linear Algebra Subprograms are routines that provide standard building blocks for performing basic vector and matrix operations.
Mixed precision training
Static and dynamic loss scaling.
Compression algorithms
Magnitude based block pruning
Lottery Ticket
AI Frameworks
PyTorch
Activation & Loss Function – optimized utilizing Tachyum vector instructions in standard and low precision modes.
Dense GEMM (General Matrix Multiply) library implemented utilizing Tachyum matrix instructions in standard and low precision modes, stochastic rounding, single and multithreaded.
Custom Sparse GEMM library implemented utilizing Tachyum vector and matrix instructions.
Convolutional and Dense operators implemented utilizing Tachyum matrix instructions in standard and low precision modes, including depthwise separable and pointwise convolutions.
Circulant and Butterfly Convolutional and Dense operators implemented utilizing custom FFT (fast Fourier transform) for matrix multiplication.
TensorFlow Lite
INT8 quantized version of TensorFlow optimized utilizing Tachyum vector and matrix instructions in standard and low precision modes.
AI Models
Computer Vision
Implemented in standard and low precision modes.
- Resnet: Residual Networks.
- Vision Transformer – ViT, DeiT
- MobileNet: Model designed to be used in mobile applications.
- ShuffleNet: An extremely efficient convolutional neural network for mobile devices.
Object detection and semantic segmentation models
Implemented in standard and low precision modes.
- YOLO: A clever convolutional neural network (CNN) for doing object detection in real-time.
- SSD
- MaskRCNN: A deep neural network aimed to solve instance segmentation problem in machine learning or computer vision.
- EfficientDet
- DETR
NLP Transformer Models
Implemented in standard and low precision modes with block structured sparsity.
- BERT: Bidirectional Encoder Representations from Transformers is a Transformer-based machine learning technique for NLP pre-training developed by Google.
- DistilBERT
- Q8BERT
- Sparse Transformer: Transformer based architecture which utilises sparse factorizations of the attention matrix.
- Performer
- GPT-Neo
- GPT
Scientific ML models, Physics Informed NN, Differentiable Programming
Neural ODE: Neural Ordinary Differential Equations.
Graph Neural ODE: A counterpart to GNNs (graph neural networks) where the input-output relationship is determined by a continuum of GNN layers, blending discrete topological structures and differential equations.
Neural PDE: Neural partial differential equation.
HPC/AI for Life Sciences and Physical Sciences
DeepMD
DeepMD-kit is a package written in Python/C++, designed to minimize the effort required to build deep learning based model of interatomic potential energy and force field and to perform molecular dynamics (MD). This brings new hopes to addressing the accuracy-versus-efficiency dilemma in molecular simulations.
Quantum Espresso
Quantum ESPRESSO is an integrated suite of Open-Source computer codes for electronic-structure calculations and materials modeling at the nanoscale. It is based on density-functional theory, plane waves, and pseudopotentials.
LAMMPS
LAMMPS is a classical molecular dynamics code with a focus on materials modeling. It’s an acronym for Large-scale Atomic/Molecular Massively Parallel Simulator. It can be used to model atoms or, more generically, as a parallel particle simulator at the atomic, meso, or continuum scale.
Tachyum Software Distribution Package Beta and Production Releases
Our complete software distribution package, which is currently in its alpha stage, will receive significant updates throughout the year. The first upgrade has already been delivered, bringing important version updates, such as:
- Latest versions of the QEMU emulator 8.2
- GCC 13.2 (GNU Compiler Collection) and glibc 2.39 (GNU C Library)
- Linux 6.6 LTS (Long Term Support), which contains a large number of changes, updates and improvements
The Tachyum Software Distribution package is scheduled to enter the beta stage in the coming months, followed by the production release later this year.
Security Layer
Security Layer adds support for security monitor and secure applications.
It operates under a separate privilege mode and provides hardware isolation from other modes.
In addition, only in secure mode secure memory and secure I/O can be accessed.
System Management
System Management enables System Management Mode (SMM) and System Management Interrupts (SMI).
SMI handlers are the most important components of SMM drivers. They are used for managing system thermal and other system support functions.
SMM uses separate on-chip protected SRAM and is installed by UEFI.
Tachyum provides a separate privilege mode (ring 00) for SMM to provide availability, safety, and isolation for SMM and SMI.