Participation in the Consortium of Design Centers’ strategy play


On November 28, 2018 we’ve taken part in the event organized by the Consortium of Design Centers of Russian Federation microelectronic industry. Within the event there has been arranged a strategy play on the theme of “Critical information infrastructure as a new market for Russian high-tech complex”.


Read more ...

MALT-C 21Mb480PLVDDR microprocessor synthesis using 28 nanometer TSMC technology


The development of the MALT 21Mb480PLVDDR processor netlist using 28 nanometer HPC+ (high-performance computing) TSMC technology has been completed. The area of the chip is 36 mm2. The project has been transferred to the stage of topology development. The pilot batch manufacturing under MPW (Multi-Project Wafer) is scheduled for the second quarter of 2019.


Read more ...

MALT has been presented in Electronica-2018 in Munich


The MALT project has participated in international exhibition “Electronica-2018” that took place in Munich (Germany) from 13 to 16 November 2018. “Electronica-2018” has gathered manufacturers, designers, distributors and customers from various countries at one place and has been recognized as the world’s largest event in the field of electronic components, systems and electronics industry applications.


Read more ...

MALT on the forum “Microelectronica 2018”


At the beginning of October 2018, an international forum “Microelectronica 2018” has taken place in Alushta (Crimea). We’ve taken an active part at the meeting, Sergey Elizarov, the MALT project leader, has made a presentation on the topic “Multicore computing system testing based on RSA algorithm ideas”.


Read more ...

‘ExpoElectronica 2018’ exhibition has been ended


The 21th International exhibition of electronic components, modules and parts ‘Expoelectronica’ was held at Crocus Expo (Moscow) from 17 to 19 April 2018 simultaneously with the 16th International exhibition of Technology, equipment and materials for manufacturing devices of electronics and electrotechnical industries ‘ElectronTechExpo’.


Read more ...

Software development kit 1.3 has been released


OpenCL support has been added to MALT software development kit v1.3, right now it's in test mode. Coprocessor software, required by MALT-C processor family, is a separate software module now and may be installed over MALT basic software development kit, that supports general-purpose processor cores. In the current version of the kit demo examples are separated from test scripts and located in ‘../src/sw/examples’ folder. ‘Malt_start_thr’ function, enabling to start threads with parameter transfer on slave-cores, has been implemented in Libmalt library. The support for emulated processor type selection has been added to the make scripts. When ‘MALT_SIMD=...’ variable is set, MALTemu emulator loads the corresponding dynamic library.


Read more ...

Software development kit 1.2 has been released


The build program support for MALT-C 9Mb96G test chip has been added to current debugging set. Declare ‘TASIC=1’ environment variable during installation to enable compatibility mode. ‘Rm_mod’ and ‘locked’ parameters have been added to Maltemu 1.2 in ‘WriteFill’, ‘ReadModify’ functions. The support for operations with variable-length interconnect buffer has been added. Procedures of calculating the maximum allowed parallelism during required package installation have been implemented in Linux OS and FreeBSD. ‘–Hw-trace’ option, enabling to monitor function calls, has been added to MALTCC frontend.


Read more ...

Software development kit 1.1 has been released


The installer is modular now! Programmer tools 'malt-tool' and 'malt_sw' system are downloaded individually. Installation directory may be changed by ‘MALT_HOME’ environment variable. By default, ‘malt-tools’ is installed to ‘/opt/MALT’ directory, ‘malt_sw’ - to ‘$HOME/MALT’. GNU development tools have been upgraded to latest versions, including gcc - up to 7.2.0 version. Manycore processor support has been added to MALTemu emulator. Thus, the performance of the emulator is almost in proportion to the number of processor cores per host system. The support for any number of arbitrary size {scalar|array} arguments has been added to MALTCC for both SIMD and slave functions.


Read more ...

The first samples of MALT-C 9Mb96G have been received from TSMC foundry


Samples of MALT-C 9Mb96G processor have been received from TSMC foundry (Taiwan). MALT-C 9Mb96G is the first chip belonging to MALT-C family which has been manufactured in silicon using 28 nanometer TSMC HPCPlus (high-performance computing) technological process. The processor contains 9 general-purpose RISC cores and 96 specialized processor elements integrated in 3 SIMD clusters, 32 elements each. MALT-C 9Mb96G samples have been successfully passed entry tests on test vectors on “FORMULA” chip tester and estimated characteristics have matched with achieved characteristics under full load. Thus, full power consumption at operating frequency of 800 MHz was 1 W.


Read more ...

The first-generation 96-core processor project has been sent to MPW manufacturer


The first-generation 96-core processor project has been sent to get manufactured using MPW. VLSIs are going to be manufactured at TSMC semiconductor foundry (Taiwan) using 28 nanometer TSMC HPCPlus (high-performance computing) technological process. MALT-C 9Mb96G is the first MALT-C chip which will be manufactured “in silicon”. The processor contains 9 general-purpose RISC cores and 96 specialized processor elements integrated into 3 SIMD clusters, 32 elements each.


Read more ...

The design of the second-generation processor has been started


The design of the second-generation processor has been started. The first version of processor element, which architecture is based on improved Leopard architecture, has been developed. The testing and measurement of energy consumption on target algorithms, considering time delays, have been completed in CAD Cadence at the frequency of 1 GHz for TSMC HPC+ 28 nm manufacturing process.


Read more ...

Google Reveals Technical Specs and Business Rationale for TPU Processor

Although Google’s Tensor Processing Unit (TPU) has been powering the company’s vast empire of deep learning products since 2015, very little was known about the custom-built processor. This week the web giant published a description of the chip and explained why it’s an order of magnitude faster and more energy-efficient than the CPUs and GPUs it replaces.

First a little context. The TPU is a specialized ASIC developed by Google engineers to accelerate inferencing of neural networks, that is, it speeds the production phase of these applications for networks that have already been trained. So, when a user initiates a voice search, asks to translate text, or looks for an image match, that’s when inferencing goes to work. For the training phase, Google, like practically everyone else in the deep learning business, uses GPUs.

The distinction is important because inferencing can do a lot of its processing with 8-bit integer operations, while training is typically performed with 32-bit or 16-bit floating point operations. As Google pointed out in its TPU analysis, multiplying 8-bit integers can use six times less energy than multiplying 16-bit floating point numbers, and for addition, thirteen times less.

The TPU ASIC takes advantage of this by incorporating an 8-bit matrix multiply unit that can perform 64K multiply accumulate operations in parallel. At peak output, it can deliver of 92 teraops/second. The processor also has 24 MiB of on-chip memory, a relatively large amount for a chip of its size. Memory bandwidth, however, is a rather meagre 34 GB/second. To optimize energy usage, the TPU runs at a leisurely 700 MHz and draws 40 watts of power (75 watts TDP). The ASIC is manufactured on the 28nm process node.

When it comes to computer hardware, energy usage is the prime consideration for Google, since it’s tied directly to total cost of ownership (TCO) in datacenters. And for hyperscale-sized datacenters, energy costs can escalate quickly with hardware that’s too big for its intended task. Or as the authors of the TPU analysis put it: “when buying computers by the thousands, cost-performance trumps performance.”

What they found was that their TPU performed 15 to 30 times faster than the K80 GPU and Haswell CPU across these benchmarks. Performance/watt was even more impressive, with the TPU beating its competition by a factor of 30 to 80. Google projects that if they used the higher-bandwidth GDDR5 memory in the TPU, they could triple the chip’s performance.


Applied Micro Claims Third-Generation ARM Chip Ready to Take on Intel Xeon

Applied Micro announced it is sampling X-Gene 3, its third-generation ARM SoC for servers. According to a report by The Linley Group, the new platform will provide comparable performance to the latest Intel Xeon processors, but at a significantly lower price point.

X-Gene 3 respectable performance profile is a result of its relatively high clock speed and memory bandwidth. The CPU runs its 32 cores at a base frequency of 3.0 GHz, and can achieve 3.3 GHz under turbo mode. To feed those cores with data, the chip includes eight memory channels, which can serve DDR4 devices at up to 2667 MHz, yielding 170 GB/sec of aggregate bandwidth. The SoC also includes 42 lanes of PCIe 3.0 links for external connectivity.

The report states that the X-Gene 3 can handle a “a broad range of cloud workloads, including scale-up and scale-out applications.” It should be particular adept at so-called big data applications like in-memory database processing, thanks to its superior memory bandwidth. Coincidently (or perhaps not), AMD is touting its upcoming “Naples” x86 chip for very similar memory bandwidth capabilities, based on the same 8-channel per socket design.