ResearchHub | Open Science Community

SW

Shaojun Wei

Author with expertise in Deep Learning in Computer Vision and Image Recognition

Achievements

Cited Author

Key Stats

Upvotes received:

0

Publications:

5

(0% Open Access)

Cited by:

531

h-index:

33

/

i10-index:

120

Reputation

Biology

< 1%

Chemistry

< 1%

Economics

< 1%

Show more

How is this calculated?

Publications

Deep Convolutional Neural Network Architecture With Reconfigurable Computation Patterns

Fengbin Tu et al.Apr 12, 2017

Deep convolutional neural networks (DCNNs) have been successfully used in many computer vision tasks. Previous works on DCNN acceleration usually use a fixed computation pattern for diverse DCNN models, leading to imbalance between power efficiency and performance. We solve this problem by designing a DCNN acceleration architecture called deep neural architecture (DNA), with reconfigurable computation patterns for different models. The computation pattern comprises a data reuse pattern and a convolution mapping method. For massive and different layer sizes, DNA reconfigures its data paths to support a hybrid data reuse pattern, which reduces total energy consumption by 5.9~8.4 times over conventional methods. For various convolution parameters, DNA reconfigures its computing resources to support a highly scalable convolution mapping method, which obtains 93% computing resource utilization on modern DCNNs. Finally, a layer-based scheduling framework is proposed to balance DNA's power efficiency and performance for different DCNNs. DNA is implemented in the area of 16 mm 2 at 65 nm. On the benchmarks, it achieves 194.4 GOPS at 200 MHz and consumes only 479 mW. The system-level power efficiency is 152.9 GOPS/W (considering DRAM access power), which outperforms the state-of-the-art designs by one to two orders.

Artificial Intelligence

Electrical And Electronic Engineering

0

Paper

Artificial Intelligence

Save

FP-BNN: Binarized neural network on FPGA

Shuang Liang et al.Oct 19, 2017

Artificial Intelligence

Electrical And Electronic Engineering

0

Paper

Artificial Intelligence

Save

Research on Performance Optimization of Encryption Algorithms for Network Security Framework

Ting Li et al.Mar 1, 2024

Artificial Intelligence

Theoretical Computer Science

0

Paper

Artificial Intelligence

Theoretical Computer Science

Save

TensorCIM: Digital Computing-in-Memory Tensor Processor With Multichip-Module-Based Architecture for Beyond-NN Acceleration

Yiqi Wang et al.Jan 1, 2024

Computer Networks And Communications

0

Paper

Computer Networks And Communications

Save

CIMFormer: A Systolic CIM-Array-Based Transformer Accelerator With Token-Pruning-Aware Attention Reformulating and Principal Possibility Gathering

Ruiqi Guo et al.Jan 1, 2024

Transformer models have achieved impressive performance in various artificial intelligence (AI) applications. However, the high cost of computation and memory footprint make its inference inefficient. Although digital compute-in-memory (CIM) is a promising hardware architecture with high accuracy, Transformer's attention mechanism raises three challenges in the access and computation of CIM: 1) the attention computation involving Query and Key results in massive data movement and under-utilization in CIM macros; 2) the attention computation involving Possibility and Value exhibits plenty of dynamic bit-level sparsity, resulting in redundant bit-serial CIM operations; and 3) the restricted data reload bandwidth in CIM macros results in a significant decrease in performance for large Transformer models. To address these challenges, we design a CIM accelerator called CIM Transformer (CIMFormer) with three corresponding features. First, the token-pruning-aware attention reformulation (TPAR) is a technique that adjusts attention computations according to the token-pruning ratio. This reformulation reduces the real-time access to and under-utilization of CIM macros. Second, the principal possibility gather-scatter scheduler (PPGSS) gathers the possibilities with greater effective bit-width as concurrent inputs to CIM macros, enhancing the efficiency of bit-serial CIM operations. Third, the systolic X

$\mid$ W-CIM macro array efficiently handles the execution of large Transformer models that exceed the storage capacity of the on-chip CIM macros. Fabricated in a 28-nm technology, CIMFormer achieves a peak energy efficiency of 15.71 TOPS/W, with an over 1.46

$\times$ improvement compared with the state-of-the-art Transformer accelerator at an equivalent situation.

Electrical And Electronic Engineering

Computer Networks And Communications

0

Paper

Electrical And Electronic Engineering

Computer Networks And Communications

Save