ATCA compliant ARM+DSP compute blade
Prodrive Technologies’ ATCA-TK2-6PU blade offers unprecedented compute density and raw performance, with Texas Instrument’s latest KeyStone multicore architecture.
Targeting high-performance computing (HPC) and High Performance Embedded Computing (HPeC) workloads, the ATCA-TK2-6PU blade offers a peak performance of 2.7TFLOPS for fixed point and SP floating point math and 690GFLOPS for DP floating point math workloads.
Typical applications include embedded and medical image processing, scientific computing and video transcoding. A set of state-of-the-art connectivity options ensure that the DSP cores are never starved for data.
Each of the six processing units of the ATCA-TK2-6PU consists of three SoCs offering four ARM cores and 24 DSP cores. The devices are interconnected through TI’s HyperLink technology at 40GBaud/link.
The embedded Ethernet switch connects all SoCs within a PU as well as the PU itself with its direct neighbors. For central data distribution between the PUs and other blades the RapidIO and 10 gigabit Ethernet fabric networks can be used.
Each PU has a point-to-point, dual-lane PCI Express link towards Zone-3 for data capturing from a data acquisition RTM making it ideal for embedded HPC.
Scalability and programmability
Linux software on the quad-core ARM offloads compute kernels to the 24 DSP cores. Both task parallel and data parallel workloads are supported through industry standard tools and frameworks, such as OpenMP and OpenCL.
The data plane can be 10 gigabit Ethernet or Serial RapidIO. Industry standard OpenMPI and MPICH can be used for task coordination and data distribution. Control and management planes are based on out-of-band Ethernet and IPMI.
Reliability, availability and serviceability (RAS) and scalability are secured by the ATCA form factor. An ATCA backplane offers full-mesh or dual-star high speed serial communication channels between blades. Using switch blades and top of rack switches makes it easy to scale up to rack and room.
6 processing units (PUs), containing
- 4 ARM Cortex-A15 cores
- 24 C66x DSP cores, 19.2GFLOPs per core
- Up to 1GiB DDR3 per core at 1600MT/s
- 3x Serial RapidIO quad-lane
- 3x PCI Express dual-lane
- 1x 10 gigabit Ethernet
- 3x gigabit Ethernet
2 IDT Serial RapidIO switches
- 480Gbps non-blocking aggregate bandwidth
Broadcom 10 gigabit Ethernet switch
- 160Gbps non-blocking aggregate bandwidth
- ARM Cortex-A9 management CPU
- Non-volatile storage for boot images and applications
AdvancedTCA compliant blade
- PICMG 3.0 R3.0
- PICMG 3.8 R1.0
- PICMG IRTM.0 R1.1