ResearchHub | Open Science Community

TrIM, Triangular Input Movement Systolic Array for Convolutional Neural Networks: Architecture and Hardware Implementation

Cristian Sestito et al.Jan 1, 2025

Modern hardware architectures for Convolutional Neural Networks (CNNs), other than targeting high performance, aim at dissipating limited energy. Reducing the data movement cost between the computing cores and the memory is a way to mitigate the energy consumption. Systolic arrays are suitable architectures to achieve this objective: they use multiple processing elements that communicate each other to maximize data utilization, based on proper dataflows like the weight stationary and row stationary. Motivated by this, we have proposed TrIM, an innovative dataflow based on a triangular movement of inputs, and capable to reduce the number of memory accesses by one order of magnitude when compared to state-of-the-art systolic arrays. In this paper, we present a TrIM-based hardware architecture for CNNs. As a showcase, the accelerator is implemented onto a Field Programmable Gate Array (FPGA) to execute the VGG-16 and AlexNet CNNs. The architecture achieves a peak throughput of 453.6 Giga Operations per Second, outperforming a state-of-the-art row stationary systolic array up to

$\sim 3 \times$ in terms of memory accesses, and being up to

$ \sim 11.9 \times$ more energy-efficient than other FPGA accelerators.

Low-Complexity Convolutional Neural Network for Channel Estimation

Simona Sibio et al.Nov 19, 2024

This paper presents a deep learning algorithm for channel estimation in 5G New Radio (NR). The classical approach that uses neural networks for channel estimation requires more than one stage to obtain the full channel matrix. First, the channel has to be constructed by the received reference signal, and then, the precision is improved. In contrast, to reduce the computational cost, the proposed neural network method generates the channel matrix from the information captured from a few subcarriers along the slot. This information is extrapolated by applying the Least Square technique only on the Demodulation Reference Signal (DMRS). The received DMRS placed in the grid can be seen as a 2D low-resolution image and it is processed to generate the full channel matrix. To reduce complexity in the hardware implementation, the convolutional neural network (CNN) structure is selected. This solution is analyzed comparing the Mean Square Error (MSE) and the computational cost with other deep learning-based channel estimation, as well as the traditional channel estimation methods. It is demonstrated that the proposed neural network delivers substantial complexity savings and favorable error performance. It reduces the computational cost by an order of magnitude, and it has a maximum error discrepancy of 0.018 at 5 dB compared to Minimum Mean Square Error (MMSE) channel estimation.

TrIM, Triangular Input Movement Systolic Array for Convolutional Neural Networks: Architecture and Hardware Implementation

Low-Complexity Convolutional Neural Network for Channel Estimation

Scan to connect with one of our mobile apps

Coinbase Wallet app

Coinbase app

Or try the Coinbase Wallet browser extension