Intel Expands Its Comfort Zone with New ARM-Powered FPGAs for Datacenters



Intel announced it is sampling its Stratix 10 FPGAs, the latest family of field programmable gate arrays that are designed to accelerate a number of datacenter workloads. The new devices, which Intel is calling “the most significant FPGA innovations in over a decade,” offer advanced features like embedded 64-bit ARM processors, second-generation High Bandwidth Memory (HBM2), and DSP blocks





The server applications Intel is targeting with the Stratix 10 family is somewhat tangential to Nvidia's and AMD's newest GPU accelerators, as well as Intel’s own Knights Landing Xeon Phi.. However Intel believes workloads such as signal processing, data compression, data encryption, storage management, and video encoding – in truth tough, practically any server-side application where data throughput is the driving criteria. With the DSP unit offering lots of hardwired flops, these devices can also be used for high performance computing.


Read more...

The debugging set for MALT has been released


 

The first version of tool kit for MALT software development and debugging has been released. The kit includes emulator, debugger and profiler. The emulator enables to execute and debug MALT programs on general-purpose computers running under Unix-like systems. The emulator, its integrated GDB debugger and the profiler significantly simplify development and porting programs on MALT system and also make it possible to evaluate the efficiency of algorithm implementation on MALT without running it on real hardware.

 

Read more ...

Kilocore - World's First 1,000-Processor Chip

 

Image: The University of California

A microchip containing 1,000 independent programmable processors has been designed by a team at the University of California, Davis, Department of Electrical and Computer Engineering.

 

Read more ...

ISC High Performance 2016 Conference

 

ISC Hight Performance 2016

Image: isc-hpc.com

SC High Performance (ISC 2016) Conference, 19-23 June, attracted 3,092 attendees from 53 countries, as well as 146 companies and research organizations showcasing their technologies and services at the ISC exhibition. 

 

Read more ...

C-compiler for programmed accelerator for the first-generation MALT has been developed


 

We’ve developed a C-compiler which generates optimized code for programmed accelerator architecture. On target tasks the performance of the code generated by the compiler is 80% of the code performance written by a programmer in assembly language! The compiler has been developed with the use of domain-specific language (DSL) set for quick translator creation. Such DSL set enables to describe the main phases of translation. In particular, there are Prolog-like descriptions of program conversion rules and combinatorial approach to build a traversal strategy for intermediate representation graphs.

 

Read more ...

We've started to create the first-generation MALT netlist using 28 nanometer TSMC technology


 

We've started to create the first-generation MALT netlist using 28 nanometer HPC+ (high-performance computing) TSMC technology. Planned area of a chip - 12 mm2. Such area is ecological optimum for pilot batch manufacturing under MPW (Multi-Project Wafer). Estimated energy consumption on a target task is 1 W, which enables to achieve considerably higher energy efficiency calculations than on a CPU and GPU.

 

Read more ...

Assembler and emulator for programmed accelerator have been developed as parts of MALT


 

We’ve developed an assembler, maintaining algebraic syntax similar to the one used in C language. Along with that, a program, implemented on the assembler, is also proper for C language. That beneficial side effect of the use of algebraic notation enabled to implement system software modeling for programmed accelerator with high performance via a normal C compiler.

 

Read more ...

The team has grown significantly!


 

The MALT team is growing. We’ve moved to a new laboratory. There are new computers, modern design, a coffee machine - in general, there is everything we need for productive work. There are new members in the team: VHDL specialists and system programming specialists. There are 12 of us already! And that number includes only those who work on a full-time basis.

 

Read more ...

Leopard processor element for vector MALT coprocessor has been designed


 

The development of a processor element for vector accelerator Leopard has been accomplished. The processor element architecture has been chosen according to the requirements for maximum flexibility (from a programming perspective) at high performance and energy efficiency on target tasks. As a result, the architecture based on ALU tree has been chosen.

 

Read more ...

The development of MALT-processors with vector and mixed architecture has been started


 

The development of MALT-processors with vector and mixed architecture has been started.

 

Solutions of mathematical tasks with the ultimate level of complexity from the field of discrete mathematics with perfect or almost perfect parallelization of data with regards to compact (according to core size) and particularly complex mathematical procedures may be energy-efficient implemented only on specialized programmed or configurable computing structures on FPGA/VLSI or in the form of CPU/GPU blocks.

 

Read more ...

210-core processor on FPGA Xilinx Virtex7 has been built


 

Recently we’ve finished the assembling and debugging of a new monster - 210-core processor prototype on FPGA Xilinx Virtex7 2000T. This is the biggest chip in the 7th generation of Xilinx FPGA. And our MALT system is the largest array of independent 32-bit RISC cores prototyped on a single FPGA known today.

 

Read more ...

A new smart memory controller with DMA support has been developed


 

A new version of the smart memory controller has been developed and tested. Now the controller maintains block data transfers - usually this calls ‘DMA’ - a system, enabling to copy data from one memory area to another without involving processing cores. The usage of the mechanism allows to increase performance of intensive data exchange tasks in several times. There is a basic set of "real" atomic operations in the controller, for instance, atomic increment is supported.

 

Read more ...

ISC'14

 

ISC’14 was held June 22-26 in Leipzig, Germany. ISC is the world’s oldest and the most significant high-performance computing conference and exhibition in Europe for the global HPC community.

 

Read more ...