default search action
Shaojun Wei
Person information
Refine list
refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
showing all ?? records
2020 – today
- 2024
- [j173]Weiwei Wu, Fengbin Tu, Xiangyu Li, Shaojun Wei, Shouyi Yin:
SWG: an architecture for sparse weight gradient computation. Sci. China Inf. Sci. 67(2) (2024) - [j172]Chenchen Deng, Tianzhu Xiong, Zhaoshi Li, Zhiwei Liu, Yao Wang, Jianfeng Zhu, Jun Yang, Shaojun Wei, Leibo Liu:
CATCAM: a 28 nm constant-time alteration TCAM enabling less than 50 ns update latency. Sci. China Inf. Sci. 67(4) (2024) - [j171]Fengbin Tu, Zihan Wu, Yiqi Wang, Weiwei Wu, Leibo Liu, Yang Hu, Shaojun Wei, Shouyi Yin:
MulTCIM: Digital Computing-in-Memory-Based Multimodal Transformer Accelerator With Attention-Token-Bit Hybrid Sparsity. IEEE J. Solid State Circuits 59(1): 90-101 (2024) - [j170]Jiangxue Liu, Cankun Zhao, Shuohang Peng, Bohan Yang, Hang Zhao, Xiangdong Han, Min Zhu, Shaojun Wei, Leibo Liu:
A Low-Latency High-Order Arithmetic to Boolean Masking Conversion. IACR Trans. Cryptogr. Hardw. Embed. Syst. 2024(2): 630-653 (2024) - [j169]Xiangren Chen, Bohan Yang, Jianfeng Zhu, Jun Liu, Shuying Yin, Guang Yang, Min Zhu, Shaojun Wei, Leibo Liu:
UpWB: An Uncoupled Architecture Design for White-box Cryptography Using Vectorized Montgomery Multiplication. IACR Trans. Cryptogr. Hardw. Embed. Syst. 2024(2): 677-713 (2024) - [j168]Gang Zeng, Jianfeng Zhu, Yichi Zhang, Ganhui Chen, Zhenhai Yuan, Shaojun Wei, Leibo Liu:
A High-Performance Genomic Accelerator for Accurate Sequence-to-Graph Alignment Using Dynamic Programming Algorithm. IEEE Trans. Parallel Distributed Syst. 35(2): 237-249 (2024) - [c170]Zhou Wang, Haochen Du, Baoyi Han, Yanqing Xu, Xiaonan Tang, Yang Zhou, Zhe Zheng, Wenpeng Cui, Yanwei Xiong, Shaojun Wei, Shushan Qiao, Shouyi Yin:
RTPE: A High Energy Efficiency Inference Processor with RISC-V based Transformation Mechanism. AICAS 2024: 297-301 - [c169]Zhou Wang, Haochen Du, Baoyi Han, Yanqing Xu, Xiaonan Tang, Yang Zhou, Zhe Zheng, Wenpeng Cui, Yanwei Xiong, Shaojun Wei, Shushan Qiao, Shouyi Yin:
RCPE: An Excellent Performance Training Processor with RISC-V based Compression Mechanism. AICAS 2024: 302-306 - [c168]Yichi Zhang, Dibei Chen, Gang Zeng, Jianfeng Zhu, Zhaoshi Li, Longlong Chen, Shaojun Wei, Leibo Liu:
Harp: Leveraging Quasi-Sequential Characteristics to Accelerate Sequence-to-Graph Mapping of Long Reads. ASPLOS (3) 2024: 512-527 - [c167]Ting Li, Jinjiang Yang, Yin Zhou, Shaojun Wei:
Research on Performance Optimization of Encryption Algorithms for Network Security Framework. CSAIDE 2024: 650-653 - [c166]Hang Zhao, Cankun Zhao, Wenping Zhu, Bohan Yang, Shaojun Wei, Leibo Liu:
Sparse Polynomial Multiplication-Based High-Performance Hardware Implementation for CRYSTALS-Dilithium. HOST 2024: 150-159 - [c165]Zhiheng Yue, Huizheng Wang, Jiahao Fang, Jinyi Deng, Guangyang Lu, Fengbin Tu, Ruiqi Guo, Yuxuan Li, Yubin Qin, Yang Wang, Chao Li, Huiming Han, Shaojun Wei, Yang Hu, Shouyi Yin:
Exploiting Similarity Opportunities of Emerging Vision AI Models on Hybrid Bonding Architecture. ISCA 2024: 396-409 - [c164]Yubin Qin, Yang Wang, Zhiren Zhao, Xiaolong Yang, Yang Zhou, Shaojun Wei, Yang Hu, Shouyi Yin:
MECLA: Memory-Compute-Efficient LLM Accelerator with Scaling Sub-matrix Partition. ISCA 2024: 1032-1047 - [c163]Zhiheng Yue, Xujiang Xiang, Fengbin Tu, Yang Wang, Yiming Wang, Shaojun Wei, Yang Hu, Shouyi Yin:
15.1 A 0.795fJ/bit Physically-Unclonable Function-Protected TCAM for a Software-Defined Networking Switch. ISSCC 2024: 276-278 - [c162]Yihong Zhu, Wenping Zhu, Yi Ouyang, Junwen Sun, Min Zhu, Qi Zhao, Jinjiang Yang, Chen Chen, Qichao Tao, Guang Yang, Aoyang Zhang, Shaojun Wei, Leibo Liu:
16.2 A 28nm 69.4kOPS 4.4μJ/Op Versatile Post-Quantum Crypto-Processor Across Multiple Mathematical Problems. ISSCC 2024: 298-300 - [c161]Ruiqi Guo, Lei Wang, Xiaofeng Chen, Hao Sun, Zhiheng Yue, Yubin Qin, Huiming Han, Yang Wang, Fengbin Tu, Shaojun Wei, Yang Hu, Shouyi Yin:
20.2 A 28nm 74.34TFLOPS/W BF16 Heterogenous CIM-Based Accelerator Exploiting Denoising-Similarity for Diffusion Models. ISSCC 2024: 362-364 - [c160]Yang Wang, Xiaolong Yang, Yubin Qin, Zhiren Zhao, Ruiqi Guo, Zhiheng Yue, Huiming Han, Shaojun Wei, Yang Hu, Shouyi Yin:
34.1 A 28nm 83.23TFLOPS/W POSIT-Based Compute-in-Memory Macro for High-Accuracy AI Applications. ISSCC 2024: 566-568 - [i11]Jinyi Deng, Xinru Tang, Zhiheng Yue, Guangyang Lu, Qize Yang, Jiahao Zhang, Jinxi Li, Chao Li, Shaojun Wei, Yang Hu, Shouyi Yin:
Efficient Orchestrated AI Workflows Execution on Scale-out Spatial Architecture. CoRR abs/2405.17221 (2024) - [i10]Jiangxue Liu, Cankun Zhao, Shuohang Peng, Bohan Yang, Hang Zhao, Xiangdong Han, Min Zhu, Shaojun Wei, Leibo Liu:
A Low-Latency High-Order Arithmetic to Boolean Masking Conversion. IACR Cryptol. ePrint Arch. 2024: 45 (2024) - 2023
- [j167]Yihong Zhu, Wenping Zhu, Chongyang Li, Min Zhu, Chenchen Deng, Chen Chen, Shuying Yin, Shouyi Yin, Shaojun Wei, Leibo Liu:
RePQC: A 3.4-uJ/Op 48-kOPS Post-Quantum Crypto-Processor for Multiple-Mathematical Problems. IEEE J. Solid State Circuits 58(1): 124-140 (2023) - [j166]Yang Wang, Yubin Qin, Dazheng Deng, Jingchuan Wei, Yang Zhou, Yuanqi Fan, Tianbao Chen, Hao Sun, Leibo Liu, Shaojun Wei, Shouyi Yin:
An Energy-Efficient Transformer Processor Exploiting Dynamic Weak Relevances in Global Attention. IEEE J. Solid State Circuits 58(1): 227-242 (2023) - [j165]Fengbin Tu, Yiqi Wang, Zihan Wu, Ling Liang, Yufei Ding, Bongjin Kim, Leibo Liu, Shaojun Wei, Yuan Xie, Shouyi Yin:
ReDCIM: Reconfigurable Digital Computing- In -Memory Processor With Unified FP/INT Pipeline for Cloud AI Acceleration. IEEE J. Solid State Circuits 58(1): 243-255 (2023) - [j164]Ruiqi Guo, Zhiheng Yue, Xin Si, Hao Li, Te Hu, Limei Tang, Yabing Wang, Hao Sun, Leibo Liu, Meng-Fan Chang, Qiang Li, Shaojun Wei, Shouyi Yin:
TT@CIM: A Tensor-Train In-Memory-Computing Processor Using Bit-Level-Sparsity Optimization and Variable Precision Quantization. IEEE J. Solid State Circuits 58(3): 852-866 (2023) - [j163]Fengbin Tu, Zihan Wu, Yiqi Wang, Ling Liang, Liu Liu, Yufei Ding, Leibo Liu, Shaojun Wei, Yuan Xie, Shouyi Yin:
TranCIM: Full-Digital Bitline-Transpose CIM-based Sparse Transformer Accelerator With Pipeline/Parallel Reconfigurable Modes. IEEE J. Solid State Circuits 58(6): 1798-1809 (2023) - [j162]Fengbin Tu, Yiqi Wang, Ling Liang, Yufei Ding, Leibo Liu, Shaojun Wei, Shouyi Yin, Yuan Xie:
SDP: Co-Designing Algorithm, Dataflow, and Architecture for In-SRAM Sparse NN Acceleration. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 42(1): 109-121 (2023) - [j161]Mingyang Kou, Jiangyuan Gu, Hailong Yao, Shaojun Wei, Shouyi Yin:
TAEM 2.0: A Faster Transfer-Aware Effective Loop Mapping for Heterogeneous Resources on CGRA. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 42(8): 2552-2565 (2023) - [j160]Xiangyu Kong, Jianfeng Zhu, Xingchen Man, Guihuan Song, Yi Huang, Chenchen Deng, Pengfei Gou, Shouyi Yin, Shaojun Wei, Leibo Liu:
M2STaR: A Multimode Spatio-Temporal Redundancy Design for Fault-Tolerant Coarse-Grained Reconfigurable Architectures. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 42(9): 2938-2951 (2023) - [j159]Yiqi Wang, Fengbin Tu, Leibo Liu, Shaojun Wei, Yuan Xie, Shouyi Yin:
SPCIM: Sparsity-Balanced Practical CIM Accelerator With Optimized Spatial-Temporal Multi-Macro Utilization. IEEE Trans. Circuits Syst. I Regul. Pap. 70(1): 214-227 (2023) - [j158]Shaojun Wei, Xinhan Lin, Fengbin Tu, Yang Wang, Leibo Liu, Shouyi Yin:
Reconfigurability, Why It Matters in AI Tasks Processing: A Survey of Reconfigurable AI Chips. IEEE Trans. Circuits Syst. I Regul. Pap. 70(3): 1228-1241 (2023) - [j157]Weiwei Wu, Fengbin Tu, Mengqi Niu, Zhiheng Yue, Leibo Liu, Shaojun Wei, Xiangyu Li, Yang Hu, Shouyi Yin:
STAR: An STGCN ARchitecture for Skeleton-Based Human Action Recognition. IEEE Trans. Circuits Syst. I Regul. Pap. 70(6): 2370-2383 (2023) - [j156]Shuqin Su, Bohan Yang, Vladimir Rozic, Mingyuan Yang, Min Zhu, Shaojun Wei, Leibo Liu:
A Closer Look at the Chaotic Ring Oscillators based TRNG Design. IACR Trans. Cryptogr. Hardw. Embed. Syst. 2023(2): 381-417 (2023) - [j155]Longlong Chen, Jianfeng Zhu, Guiqiang Peng, Mingxu Liu, Shaojun Wei, Leibo Liu:
GEM: Ultra-Efficient Near-Memory Reconfigurable Acceleration for Read Mapping by Dividing and Predictive Scattering. IEEE Trans. Parallel Distributed Syst. 34(12): 3059-3072 (2023) - [c159]Xiaofeng Chen, Ruiqi Guo, Zhiheng Yue, Yang Hu, Leibo Liu, Shaojun Wei, Shouyi Yin:
A Systolic Computing-in-Memory Array based Accelerator with Predictive Early Activation for Spatiotemporal Convolutions. AICAS 2023: 1-5 - [c158]Zhou Wang, Jingchuan Wei, Xiaonan Tang, Boxiao Han, Hongjun He, Leibo Liu, Shaojun Wei, Shouyi Yin:
TPE: A High-Performance Edge-Device Inference with Multi-level Transformational Mechanism. AICAS 2023: 1-5 - [c157]Ruiqi Guo, Yang Wang, Xiaofeng Chen, Lei Wang, Hao Sun, Jingchuan Wei, Leibo Liu, Shaojun Wei, Yang Hu, Shouyi Yin:
CIMFormer: A 38.9TOPS/W-8b Systolic CIM-Array Based Transformer Processor with Token-Slimmed Attention Reformulating and Principal Possibility Gathering. A-SSCC 2023: 1-3 - [c156]Yubin Qin, Yang Wang, Dazheng Deng, Xiaolong Yang, Zhiren Zhao, Yang Zhou, Yuanqi Fan, Jingchuan Wei, Tianbao Chen, Leibo Liu, Shaojun Wei, Yang Hu, Shouyi Yin:
A 28nm 49.7TOPS/W Sparse Transformer Processor with Random-Projection-Based Speculation, Multi-Stationary Dataflow, and Redundant Partial Product Elimination. A-SSCC 2023: 1-3 - [c155]Zhou Wang, Jingchuan Wei, Boxiao Han, Hongjun He, Leibo Liu, Shaojun Wei, Shouyi Yin:
CPE: An Energy-Efficient Edge-Device Training with Multi-dimensional Compression Mechanism. DAC 2023: 1-6 - [c154]Qidie Wu, Jiangyuan Gu, Youxu Lin, Boxiao Han, Hongjun He, Yang Hu, Leibo Liu, Shaojun Wei, Shouyi Yin:
RMP-MEM: A HW/SW Reconfigurable Multi-Port Memory Architecture for Multi-PEA Oriented CGRA. DAC 2023: 1-6 - [c153]Yihong Zhu, Wenping Zhu, Chen Chen, Min Zhu, Zhengdong Li, Shaojun Wei, Leibo Liu:
Mckeycutter: A High-throughput Key Generator of Classic McEliece on Hardware. DAC 2023: 1-6 - [c152]Shuohang Peng, Bohan Yang, Shuying Yin, Hang Zhao, Cankun Zhao, Shaojun Wei, Leibo Liu:
A Low-Randomness First-Order Masked Xoodyak. HOST 2023: 48-56 - [c151]Dibei Chen, Tairan Zhang, Yi Huang, Jianfeng Zhu, Yang Liu, Pengfei Gou, Chunyang Feng, Binghua Li, Shaojun Wei, Leibo Liu:
Orinoco: Ordered Issue and Unordered Commit with Non-Collapsible Queues. ISCA 2023: 11:1-11:14 - [c150]Yubin Qin, Yang Wang, Dazheng Deng, Zhiren Zhao, Xiaolong Yang, Leibo Liu, Shaojun Wei, Yang Hu, Shouyi Yin:
FACT: FFN-Attention Co-optimized Transformer Architecture with Eager Correlation Prediction. ISCA 2023: 22:1-22:14 - [c149]Xiangyu Kong, Yi Huang, Jianfeng Zhu, Xingchen Man, Yang Liu, Chunyang Feng, Pengfei Gou, Minggui Tang, Shaojun Wei, Leibo Liu:
MapZero: Mapping for Coarse-grained Reconfigurable Architectures with Reinforcement Learning and Monte-Carlo Tree Search. ISCA 2023: 46:1-46:14 - [c148]Yibo Wu, Jianfeng Zhu, Wenrui Wei, Longlong Chen, Liang Wang, Shaojun Wei, Leibo Liu:
Shogun: A Task Scheduling Framework for Graph Mining Accelerators. ISCA 2023: 51:1-51:15 - [c147]Zhiheng Yue, Yang Wang, Huizheng Wang, Yabing Wang, Ruiqi Guo, Limei Tang, Leibo Liu, Shaojun Wei, Yang Hu, Shouyi Yin:
CV-CIM: A 28nm XOR-Derived Similarity-Aware Computation-in-Memory for Cost-Volume Construction. ISSCC 2023: 138-139 - [c146]Fengbin Tu, Zihan Wu, Yiqi Wang, Weiwei Wu, Leibo Liu, Yang Hu, Shaojun Wei, Shouyi Yin:
MuITCIM: A 28nm $2.24 \mu\mathrm{J}$/Token Attention-Token-Bit Hybrid Sparse Digital CIM-Based Accelerator for Multimodal Transformers. ISSCC 2023: 248-249 - [c145]Fengbin Tu, Yiqi Wang, Zihan Wu, Weiwei Wu, Leibo Liu, Yang Hu, Shaojun Wei, Shouyi Yin:
TensorCIM: A 28nm 3.7nJ/Gather and 8.3TFLOPS/W FP32 Digital-CIM Tensor Processor for MCM-CIM-Based Beyond-NN Acceleration. ISSCC 2023: 254-255 - [c144]Jinyi Deng, Xinru Tang, Jiahao Zhang, Yuxuan Li, Linyun Zhang, Boxiao Han, Hongjun He, Fengbin Tu, Leibo Liu, Shaojun Wei, Yang Hu, Shouyi Yin:
Towards Efficient Control Flow Handling in Spatial Architecture via Architecting the Control Flow Plane. MICRO 2023: 1395-1408 - [c143]Yi Huang, Lingkun Kong, Dibei Chen, Zhiyu Chen, Xiangyu Kong, Jianfeng Zhu, Konstantinos Mamouras, Shaojun Wei, Kaiyuan Yang, Leibo Liu:
CASA: An Energy-Efficient and High-Speed CAM-based SMEM Seeding Accelerator for Genome Alignment. MICRO 2023: 1423-1436 - [c142]Yang Wang, Yubin Qin, Dazheng Deng, Xiaolong Yang, Zhiren Zhao, Ruiqi Guo, Zhiheng Yue, Leibo Liu, Shaojun Wei, Yang Hu, Shouyi Yin:
A 28nm 77.35TOPS/W Similar Vectors Traceable Transformer Processor with Principal-Component-Prior Speculating and Dynamic Bit-wise Stationary Computing. VLSI Technology and Circuits 2023: 1-2 - [i9]Jinyi Deng, Xinru Tang, Jiahao Zhang, Yuxuan Li, Linyun Zhang, Fengbin Tu, Leibo Liu, Shaojun Wei, Yang Hu, Shouyi Yin:
Towards Efficient Control Flow Handling in Spatial Architecture via Architecting the Control Flow Plane. CoRR abs/2307.02847 (2023) - [i8]Haojia Hui, Jiangyuan Gu, Xunbo Hu, Yang Hu, Leibo Liu, Shaojun Wei, Shouyi Yin:
WindMill: A Parameterized and Pluggable CGRA Implemented by DIAG Design Flow. CoRR abs/2309.01273 (2023) - [i7]Yang Hu, Xinhan Lin, Huizheng Wang, Zhen He, Xingmao Yu, Jiahao Zhang, Qize Yang, Zheng Xu, Sihan Guan, Jiahao Fang, Haoran Shang, Xinru Tang, Xu Dai, Shaojun Wei, Shouyi Yin:
Wafer-scale Computing: Advancements, Challenges, and Future Perspectives. CoRR abs/2310.09568 (2023) - [i6]Shuqin Su, Bohan Yang, Vladimir Rozic, Mingyuan Yang, Min Zhu, Shaojun Wei, Leibo Liu:
A Closer Look at the Chaotic Ring Oscillators based TRNG Design. IACR Cryptol. ePrint Arch. 2023: 40 (2023) - 2022
- [b2]Shaojun Wei, Leibo Liu, Jianfeng Zhu, Chenchen Deng:
Software Defined Chips - Volume I, 2. Springer 2022, ISBN 978-981-19-6993-5, pp. 1-311 - [j154]Chenchen Deng, Min Zhu, Jinjiang Yang, Youyu Wu, Jiaji He, Bohan Yang, Jianfeng Zhu, Shouyi Yin, Shaojun Wei, Leibo Liu:
An energy-efficient dynamically reconfigurable cryptographic engine with improved power/EM-side-channel-attack resistance. Sci. China Inf. Sci. 65(4) (2022) - [j153]Huiyu Mo, Wenping Zhu, Wenjing Hu, Qiang Li, Ang Li, Shouyi Yin, Shaojun Wei, Leibo Liu:
A 12.1 TOPS/W Quantized Network Acceleration Processor With Effective-Weight-Based Convolution and Error-Compensation-Based Prediction. IEEE J. Solid State Circuits 57(5): 1542-1557 (2022) - [j152]Yang Wang, Yubin Qin, Dazheng Deng, Jingchuan Wei, Tianbao Chen, Xinhan Lin, Leibo Liu, Shaojun Wei, Shouyi Yin:
Trainer: An Energy-Efficient Edge-Device Training Processor Supporting Dynamic Weight Pruning. IEEE J. Solid State Circuits 57(10): 3164-3178 (2022) - [j151]Zongsheng Hou, Neng Zhang, Bohan Yang, Hanning Wang, Min Zhu, Shouyi Yin, Shaojun Wei, Leibo Liu:
Efficient FHE Radix-2 Arithmetic Operations Based on Redundant Encoding. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 41(7): 2024-2037 (2022) - [j150]Baofen Yuan, Jianfeng Zhu, Xingchen Man, Zijiao Ma, Shouyi Yin, Shaojun Wei, Leibo Liu:
Dynamic-II Pipeline: Compiling Loops With Irregular Branches on Static-Scheduling CGRA. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 41(9): 2929-2942 (2022) - [j149]Ang Li, Huiyu Mo, Wenping Zhu, Qiang Li, Shouyi Yin, Shaojun Wei, Leibo Liu:
BitCluster: Fine-Grained Weight Quantization for Load-Balanced Bit-Serial Neural Network Accelerators. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 41(11): 4747-4757 (2022) - [j148]Yong Wu, Honglan Jiang, Zining Ma, Pengfei Gou, Yong Lu, Jie Han, Shouyi Yin, Shaojun Wei, Leibo Liu:
An Energy-Efficient Approximate Divider Based on Logarithmic Conversion and Piecewise Constant Approximation. IEEE Trans. Circuits Syst. I Regul. Pap. 69(7): 2655-2668 (2022) - [j147]Zhiheng Yue, Yabing Wang, Yubin Qin, Leibo Liu, Shaojun Wei, Shouyi Yin:
BR-CIM: An Efficient Binary Representation Computation-In-Memory Design. IEEE Trans. Circuits Syst. I Regul. Pap. 69(10): 3940-3953 (2022) - [j146]Yang Wang, Yubin Qin, Leibo Liu, Shaojun Wei, Shouyi Yin:
SWPU: A 126.04 TFLOPS/W Edge-Device Sparse DNN Training Processor With Dynamic Sub-Structured Weight Pruning. IEEE Trans. Circuits Syst. I Regul. Pap. 69(10): 4014-4027 (2022) - [j145]Yang Wang, Dazheng Deng, Leibo Liu, Shaojun Wei, Shouyi Yin:
PL-NPU: An Energy-Efficient Edge-Device DNN Training Processor With Posit-Based Logarithm-Domain Computing. IEEE Trans. Circuits Syst. I Regul. Pap. 69(10): 4042-4055 (2022) - [j144]Jianxun Yang, Fengbin Tu, Yixuan Li, Yiqi Wang, Leibo Liu, Shaojun Wei, Shouyi Yin:
GQNA: Generic Quantized DNN Accelerator With Weight-Repetition-Aware Activation Aggregating. IEEE Trans. Circuits Syst. I Regul. Pap. 69(10): 4069-4082 (2022) - [j143]Xiangren Chen, Bohan Yang, Shouyi Yin, Shaojun Wei, Leibo Liu:
CFNTT: Scalable Radix-2/4 NTT Multiplication Architecture with an Efficient Conflict-free Memory Mapping Scheme. IACR Trans. Cryptogr. Hardw. Embed. Syst. 2022(1): 94-126 (2022) - [j142]Cankun Zhao, Neng Zhang, Hanning Wang, Bohan Yang, Wenping Zhu, Zhengdong Li, Min Zhu, Shouyi Yin, Shaojun Wei, Leibo Liu:
A Compact and High-Performance Hardware Architecture for CRYSTALS-Dilithium. IACR Trans. Cryptogr. Hardw. Embed. Syst. 2022(1): 270-295 (2022) - [c141]Xiangren Chen, Bohan Yang, Yong Lu, Shouyi Yin, Shaojun Wei, Leibo Liu:
Efficient access scheme for multi-bank based NTT architecture through conflict graph. DAC 2022: 91-96 - [c140]Jinyi Deng, Linyun Zhang, Lei Wang, Jiawei Liu, Kexiang Deng, Shibin Tang, Jiangyuan Gu, Boxiao Han, Fei Xu, Leibo Liu, Shaojun Wei, Shouyi Yin:
Mixed-granularity parallel coarse-grained reconfigurable architecture. DAC 2022: 343-348 - [c139]Zhiheng Yue, Yabing Wang, Leibo Liu, Shaojun Wei, Shouyi Yin:
MC-CIM: a reconfigurable computation-in-memory for efficient stereo matching cost computation. DAC 2022: 457-462 - [c138]Shixuan Zheng, Xianjue Zhang, Leibo Liu, Shaojun Wei, Shouyi Yin:
Atomic Dataflow based Graph-Level Workload Orchestration for Scalable DNN Accelerators. HPCA 2022: 475-489 - [c137]Yibo Wu, Liang Wang, Xiaohang Wang, Jie Han, Jianfeng Zhu, Honglan Jiang, Shouyi Yin, Shaojun Wei, Leibo Liu:
Upward Packet Popup for Deadlock Freedom in Modular Chiplet-Based Systems. HPCA 2022: 986-1000 - [c136]Weiliang Chen, Zhaoshi Li, Leibo Liu, Shaojun Wei:
Dynamically Reconfigurable Memory Address Mapping for General-Purpose Graphics Processing Unit. ICTA 2022: 1-2 - [c135]Mingyuan Yang, Yemeng Zhang, Bohan Yang, Hanning Wang, Shouyi Yin, Shaojun Wei, Leibo Liu:
A SHA-512 Hardware Implementation Based on Block RAM Storage Structure. IPDPS Workshops 2022: 132-135 - [c134]Xingchen Man, Jianfeng Zhu, Guihuan Song, Shouyi Yin, Shaojun Wei, Leibo Liu:
CaSMap: agile mapper for reconfigurable spatial architectures by automatically clustering intermediate representations and scattering mapping process. ISCA 2022: 259-273 - [c133]Fengbin Tu, Yiqi Wang, Zihan Wu, Ling Liang, Yufei Ding, Bongjin Kim, Leibo Liu, Shaojun Wei, Yuan Xie, Shouyi Yin:
A 28nm 29.2TFLOPS/W BF16 and 36.5TOPS/W INT8 Reconfigurable Digital CIM Processor with Unified FP/INT Pipeline and Bitwise In-Memory Booth Multiplication for Cloud Deep Learning Acceleration. ISSCC 2022: 1-3 - [c132]Yang Wang, Yubin Qin, Dazheng Deng, Jingchuan Wei, Yang Zhou, Yuanqi Fan, Tianbao Chen, Hao Sun, Leibo Liu, Shaojun Wei, Shouyi Yin:
A 28nm 27.5TOPS/W Approximate-Computing-Based Transformer Processor with Asymptotic Sparsity Speculating and Out-of-Order Computing. ISSCC 2022: 1-3 - [c131]Fengbin Tu, Zihan Wu, Yiqi Wang, Ling Liang, Liu Liu, Yufei Ding, Leibo Liu, Shaojun Wei, Yuan Xie, Shouyi Yin:
A 28nm 15.59µJ/Token Full-Digital Bitline-Transpose CIM-Based Sparse Transformer Accelerator with Pipeline/Parallel Reconfigurable Modes. ISSCC 2022: 466-468 - [c130]Yihong Zhu, Wenping Zhu, Min Zhu, Chongyang Li, Chenchen Deng, Chen Chen, Shuying Yin, Shouyi Yin, Shaojun Wei, Leibo Liu:
A 28nm 48KOPS 3.4µJ/Op Agile Crypto-Processor for Post-Quantum Cryptography on Multi-Mathematical Problems. ISSCC 2022: 514-516 - [i5]Hongjiang Chen, Yang Wang, Leibo Liu, Shaojun Wei, Shouyi Yin:
FAQS: Communication-efficient Federate DNN Architecture and Quantization Co-Search for personalized Hardware-aware Preferences. CoRR abs/2210.08450 (2022) - [i4]Hongjiang Chen, Yang Wang, Leibo Liu, Shaojun Wei, Shouyi Yin:
HQNAS: Auto CNN deployment framework for joint quantization and architecture search. CoRR abs/2210.08485 (2022) - [i3]Yihong Zhu, Wenping Zhu, Chen Chen, Min Zhu, Zhengdong Li, Shaojun Wei, Leibo Liu:
Compact GF(2) systemizer and optimized constant-time hardware sorters for Key Generation in Classic McEliece. IACR Cryptol. ePrint Arch. 2022: 1277 (2022) - 2021
- [j141]Hai Huang, Leibo Liu, Min Zhu, Shouyi Yin, Shaojun Wei:
Fast substitution-box evaluation algorithm and its efficient masking scheme for block ciphers. Sci. China Inf. Sci. 64(8) (2021) - [j140]Fengbin Tu, Weiwei Wu, Yang Wang, Hongjiang Chen, Feng Xiong, Man Shi, Ning Li, Jinyi Deng, Tianbao Chen, Leibo Liu, Shaojun Wei, Yuan Xie, Shouyi Yin:
Evolver: A Deep Learning Processor With On-Device Quantization-Voltage-Frequency Tuning. IEEE J. Solid State Circuits 56(2): 658-673 (2021) - [j139]Jianfeng Zhu, Ao Luo, Guanhua Li, Bowei Zhang, Yong Wang, Gang Shan, Yi Li, Jianfeng Pan, Chenchen Deng, Shouyi Yin, Shaojun Wei, Leibo Liu:
Jintide: Utilizing Low-Cost Reconfigurable External Monitors to Substantially Enhance Hardware Security of Large-Scale CPU Clusters. IEEE J. Solid State Circuits 56(8): 2585-2601 (2021) - [j138]Fengbin Tu, Weiwei Wu, Yang Wang, Hongjiang Chen, Feng Xiong, Man Shi, Ning Li, Jinyi Deng, Tianbao Chen, Leibo Liu, Shaojun Wei, Yuan Xie, Shouyi Yin:
Erratum to "Evolver: a Deep Learning Processor With On-Device Quantization-Voltage-Frequency Tuning". IEEE J. Solid State Circuits 56(9): 2895 (2021) - [j137]Jianxun Yang, Yuyao Kong, Zhao Zhang, Zhuangzhi Liu, Jing Zhou, Yiqi Wang, Yonggang Liu, Chenfu Guo, Te Hu, Congcong Li, Leibo Liu, Jin Zhang, Shaojun Wei, Jun Yang, Shouyi Yin:
TIMAQ: A Time-Domain Computing-in-Memory-Based Processor Using Predictable Decomposed Convolution for Arbitrary Quantized DNNs. IEEE J. Solid State Circuits 56(10): 3021-3038 (2021) - [j136]Neng Zhang, Qiao Qin, Zongsheng Hou, Bohan Yang, Shouyi Yin, Shaojun Wei, Leibo Liu:
Efficient Comparison and Addition for FHE With Weighted Computational Complexity Model. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 40(9): 1896-1908 (2021) - [j135]Yibo Wu, Liang Wang, Xiaohang Wang, Jie Han, Shouyi Yin, Shaojun Wei, Leibo Liu:
A Deflection-Based Deadlock Recovery Framework to Achieve High Throughput for Faulty NoCs. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 40(10): 2170-2183 (2021) - [j134]Hui Wu, Zhe Su, Jilin Zhang, Shaojun Wei, Zhihua Wang, Hong Chen:
A Design Flow for Click-Based Asynchronous Circuits Design With Conventional EDA Tools. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 40(11): 2421-2425 (2021) - [j133]Yihong Zhu, Min Zhu, Bohan Yang, Wenping Zhu, Chenchen Deng, Chen Chen, Shaojun Wei, Leibo Liu:
LWRpro: An Energy-Efficient Configurable Crypto-Processor for Module-LWR. IEEE Trans. Circuits Syst. I Regul. Pap. 68(3): 1146-1159 (2021) - [j132]Huiyu Mo, Leibo Liu, Wenping Zhu, Qiang Li, Shouyi Yin, Shaojun Wei:
A 460 GOPS/W Improved Mnemonic Descent Method-Based Hardwired Accelerator for Face Alignment. IEEE Trans. Multim. 23: 1122-1135 (2021) - [j131]Longlong Chen, Jianfeng Zhu, Yangdong Deng, Zhaoshi Li, Jian Chen, Xiaowei Jiang, Shouyi Yin, Shaojun Wei, Leibo Liu:
An Elastic Task Scheduling Scheme on Coarse-Grained Reconfigurable Architectures. IEEE Trans. Parallel Distributed Syst. 32(12): 3066-3080 (2021) - [c129]Yang Wang, Dazheng Deng, Leibo Liu, Shaojun Wei, Shouyi Yin:
LPE: Logarithm Posit Processing Element for Energy-Efficient Edge-Device Training. AICAS 2021: 1-4 - [c128]Yang Wang, Yubin Qin, Leibo Liu, Shaojun Wei, Shouyi Yin:
HPPU: An Energy-Efficient Sparse DNN Training Processor with Hybrid Weight Pruning. AICAS 2021: 1-4 - [c127]