‘ExpoElectronica 2018’ exhibition has been ended


The 21th International exhibition of electronic components, modules and parts ‘Expoelectronica’ was held at Crocus Expo (Moscow) from 17 to 19 April 2018 simultaneously with the 16th International exhibition of Technology, equipment and materials for manufacturing devices of electronics and electrotechnical industries ‘ElectronTechExpo’.


Read more ...

Software development kit 1.3 has been released


OpenCL support has been added to MALT software development kit v1.3, right now it's in test mode. Coprocessor software, required by MALT-C processor family, is a separate software module now and may be installed over MALT basic software development kit, that supports general-purpose processor cores. In the current version of the kit demo examples are separated from test scripts and located in ‘../src/sw/examples’ folder. ‘Malt_start_thr’ function, enabling to start threads with parameter transfer on slave-cores, has been implemented in Libmalt library. The support for emulated processor type selection has been added to the make scripts. When ‘MALT_SIMD=...’ variable is set, MALTemu emulator loads the corresponding dynamic library.


Read more ...

Software development kit 1.2 has been released


The build program support for MALT-C 9Mb96G test chip has been added to current debugging set. Declare ‘TASIC=1’ environment variable during installation to enable compatibility mode. ‘Rm_mod’ and ‘locked’ parameters have been added to Maltemu 1.2 in ‘WriteFill’, ‘ReadModify’ functions. The support for operations with variable-length interconnect buffer has been added. Procedures of calculating the maximum allowed parallelism during required package installation have been implemented in Linux OS and FreeBSD. ‘–Hw-trace’ option, enabling to monitor function calls, has been added to MALTCC frontend.


Read more ...

Software development kit 1.1 has been released


The installer is modular now! Programmer tools 'malt-tool' and 'malt_sw' system are downloaded individually. Installation directory may be changed by ‘MALT_HOME’ environment variable. By default, ‘malt-tools’ is installed to ‘/opt/MALT’ directory, ‘malt_sw’ - to ‘$HOME/MALT’. GNU development tools have been upgraded to latest versions, including gcc - up to 7.2.0 version. Manycore processor support has been added to MALTemu emulator. Thus, the performance of the emulator is almost in proportion to the number of processor cores per host system. The support for any number of arbitrary size {scalar|array} arguments has been added to MALTCC for both SIMD and slave functions.


Read more ...

The first samples of MALT-C 9Mb96G have been received from TSMC foundry


Samples of MALT-C 9Mb96G processor have been received from TSMC foundry (Taiwan). MALT-C 9Mb96G is the first chip belonging to MALT-C family which has been manufactured in silicon using 28 nanometer TSMC HPCPlus (high-performance computing) technological process. The processor contains 9 general-purpose RISC cores and 96 specialized processor elements integrated in 3 SIMD clusters, 32 elements each. MALT-C 9Mb96G samples have been successfully passed entry tests on test vectors on “FORMULA” chip tester and estimated characteristics have matched with achieved characteristics under full load. Thus, full power consumption at operating frequency of 800 MHz was 1 W.


Read more ...

The first-generation 96-core processor project has been sent to MPW manufacturer


The first-generation 96-core processor project has been sent to get manufactured using MPW. VLSIs are going to be manufactured at TSMC semiconductor foundry (Taiwan) using 28 nanometer TSMC HPCPlus (high-performance computing) technological process. MALT-C 9Mb96G is the first MALT-C chip which will be manufactured “in silicon”. The processor contains 9 general-purpose RISC cores and 96 specialized processor elements integrated into 3 SIMD clusters, 32 elements each.


Read more ...

The design of the second-generation processor has been started


The design of the second-generation processor has been started. The first version of processor element, which architecture is based on improved Gepard architecture, has been developed. The testing and measurement of energy consumption on target algorithms, considering time delays, have been completed in CAD Cadence at the frequency of 1 GHz for TSMC HPC+ 28 nm manufacturing process.


Read more ...

Google Reveals Technical Specs and Business Rationale for TPU Processor

Although Google’s Tensor Processing Unit (TPU) has been powering the company’s vast empire of deep learning products since 2015, very little was known about the custom-built processor. This week the web giant published a description of the chip and explained why it’s an order of magnitude faster and more energy-efficient than the CPUs and GPUs it replaces.

First a little context. The TPU is a specialized ASIC developed by Google engineers to accelerate inferencing of neural networks, that is, it speeds the production phase of these applications for networks that have already been trained. So, when a user initiates a voice search, asks to translate text, or looks for an image match, that’s when inferencing goes to work. For the training phase, Google, like practically everyone else in the deep learning business, uses GPUs.

The distinction is important because inferencing can do a lot of its processing with 8-bit integer operations, while training is typically performed with 32-bit or 16-bit floating point operations. As Google pointed out in its TPU analysis, multiplying 8-bit integers can use six times less energy than multiplying 16-bit floating point numbers, and for addition, thirteen times less.

The TPU ASIC takes advantage of this by incorporating an 8-bit matrix multiply unit that can perform 64K multiply accumulate operations in parallel. At peak output, it can deliver of 92 teraops/second. The processor also has 24 MiB of on-chip memory, a relatively large amount for a chip of its size. Memory bandwidth, however, is a rather meagre 34 GB/second. To optimize energy usage, the TPU runs at a leisurely 700 MHz and draws 40 watts of power (75 watts TDP). The ASIC is manufactured on the 28nm process node.

When it comes to computer hardware, energy usage is the prime consideration for Google, since it’s tied directly to total cost of ownership (TCO) in datacenters. And for hyperscale-sized datacenters, energy costs can escalate quickly with hardware that’s too big for its intended task. Or as the authors of the TPU analysis put it: “when buying computers by the thousands, cost-performance trumps performance.”

What they found was that their TPU performed 15 to 30 times faster than the K80 GPU and Haswell CPU across these benchmarks. Performance/watt was even more impressive, with the TPU beating its competition by a factor of 30 to 80. Google projects that if they used the higher-bandwidth GDDR5 memory in the TPU, they could triple the chip’s performance.


Applied Micro Claims Third-Generation ARM Chip Ready to Take on Intel Xeon

Applied Micro announced it is sampling X-Gene 3, its third-generation ARM SoC for servers. According to a report by The Linley Group, the new platform will provide comparable performance to the latest Intel Xeon processors, but at a significantly lower price point.

X-Gene 3 respectable performance profile is a result of its relatively high clock speed and memory bandwidth. The CPU runs its 32 cores at a base frequency of 3.0 GHz, and can achieve 3.3 GHz under turbo mode. To feed those cores with data, the chip includes eight memory channels, which can serve DDR4 devices at up to 2667 MHz, yielding 170 GB/sec of aggregate bandwidth. The SoC also includes 42 lanes of PCIe 3.0 links for external connectivity.

The report states that the X-Gene 3 can handle a “a broad range of cloud workloads, including scale-up and scale-out applications.” It should be particular adept at so-called big data applications like in-memory database processing, thanks to its superior memory bandwidth. Coincidently (or perhaps not), AMD is touting its upcoming “Naples” x86 chip for very similar memory bandwidth capabilities, based on the same 8-channel per socket design.


The design of the first-generation 96-core MALT processor’s front end has been completed


The design of the first-generation 96-core processor’s front end has been completed for VLSI manufacturing at TSMC 28nm HPC+ factory (Taiwan). The basis is intended for high-performance integrated circuit development (high-performance computing, HPC). The designed processor belongs to MALT-C family. It contains 9 general-purpose RISC cores and 96 SIMD processor elements. Estimated chip area 12 mm2, power consumption 1,2 W at the frequency 0,8 GHz. Estimated date of sample delivery: January, 2018.


Read more ...

Intel Expands Its Comfort Zone with New ARM-Powered FPGAs for Datacenters

Intel announced it is sampling its Stratix 10 FPGAs, the latest family of field programmable gate arrays that are designed to accelerate a number of datacenter workloads. The new devices, which Intel is calling “the most significant FPGA innovations in over a decade,” offer advanced features like embedded 64-bit ARM processors, second-generation High Bandwidth Memory (HBM2), and DSP blocks

The server applications Intel is targeting with the Stratix 10 family is somewhat tangential to Nvidia's and AMD's newest GPU accelerators, as well as Intel’s own Knights Landing Xeon Phi.. However Intel believes workloads such as signal processing, data compression, data encryption, storage management, and video encoding – in truth tough, practically any server-side application where data throughput is the driving criteria. With the DSP unit offering lots of hardwired flops, these devices can also be used for high performance computing.


The debugging set for MALT has been released


The first version of tool kit for MALT software development and debugging has been released. The kit includes emulator, debugger and profiler. The emulator enables to execute and debug MALT programs on general-purpose computers running under Unix-like systems. The emulator, its integrated GDB debugger and the profiler significantly simplify development and porting programs on MALT system and also make it possible to evaluate the efficiency of algorithm implementation on MALT without running it on real hardware.


Read more ...

Kilocore - World's First 1,000-Processor Chip

A microchip containing 1,000 independent programmable processors has been designed by a team at the University of California, Davis, Department of Electrical and Computer Engineering.


Read more ...

ISC High Performance 2016 Conference

SC High Performance (ISC 2016) Conference, 19-23 June, attracted 3,092 attendees from 53 countries, as well as 146 companies and research organizations showcasing their technologies and services at the ISC exhibition. 


Read more ...