default search action
Shouyi Yin
Person information
Refine list
refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
showing all ?? records
2020 – today
- 2024
- [j159]Weiwei Wu, Fengbin Tu, Xiangyu Li, Shaojun Wei, Shouyi Yin:
SWG: an architecture for sparse weight gradient computation. Sci. China Inf. Sci. 67(2) (2024) - [j158]Fengbin Tu, Zihan Wu, Yiqi Wang, Weiwei Wu, Leibo Liu, Yang Hu, Shaojun Wei, Shouyi Yin:
MulTCIM: Digital Computing-in-Memory-Based Multimodal Transformer Accelerator With Attention-Token-Bit Hybrid Sparsity. IEEE J. Solid State Circuits 59(1): 90-101 (2024) - [j157]Ruiqi Guo, Xiaofeng Chen, Lei Wang, Yang Wang, Hao Sun, Jingchuan Wei, Huiming Han, Leibo Liu, Shaojun Wei, Yang Hu, Shouyi Yin:
CIMFormer: A Systolic CIM-Array-Based Transformer Accelerator With Token-Pruning-Aware Attention Reformulating and Principal Possibility Gathering. IEEE J. Solid State Circuits 59(10): 3317-3329 (2024) - [j156]Yubin Qin, Yang Wang, Dazheng Deng, Xiaolong Yang, Zhiren Zhao, Yang Zhou, Yuanqi Fan, Jingchuan Wei, Tianbao Chen, Leibo Liu, Shaojun Wei, Yang Hu, Shouyi Yin:
Ayaka: A Versatile Transformer Accelerator With Low-Rank Estimation and Heterogeneous Dataflow. IEEE J. Solid State Circuits 59(10): 3342-3356 (2024) - [c163]Zhou Wang, Haochen Du, Baoyi Han, Yanqing Xu, Xiaonan Tang, Yang Zhou, Zhe Zheng, Wenpeng Cui, Yanwei Xiong, Shaojun Wei, Shushan Qiao, Shouyi Yin:
RTPE: A High Energy Efficiency Inference Processor with RISC-V based Transformation Mechanism. AICAS 2024: 297-301 - [c162]Zhou Wang, Haochen Du, Baoyi Han, Yanqing Xu, Xiaonan Tang, Yang Zhou, Zhe Zheng, Wenpeng Cui, Yanwei Xiong, Shaojun Wei, Shushan Qiao, Shouyi Yin:
RCPE: An Excellent Performance Training Processor with RISC-V based Compression Mechanism. AICAS 2024: 302-306 - [c161]Pengyu He, Yuanzhe Zhao, Heng Xie, Yang Wang, Shouyi Yin, Li Li, Yan Zhu, Rui Paulo Martins, Chi-Hang Chan, Minglei Zhang:
A 28nm 314.6TLFOPS/W Reconfigurable Floating-Point Analog Compute-In-Memory Macro with Exponent Approximation and Two-Stage Sharing TD-ADC. CICC 2024: 1-2 - [c160]Zhiheng Yue, Shaojun Wei, Yang Hu, Shouyi Yin:
CAP: A General Purpose Computation-in-memory with Content Addressable Processing Paradigm. DAC 2024: 22:1-22:6 - [c159]Xujiang Xiang, Zhiheng Yue, Yuxuan Li, Liuxin Lv, Shaojun Wei, Yang Hu, Shouyi Yin:
Dyn-Bitpool: A Two-sided Sparse CIM Accelerator Featuring a Balanced Workload Scheme and High CIM Macro Utilization. DAC 2024: 35:1-35:6 - [c158]Dajiang Liu, Decai Pan, Xiao Xiong, Jiaxing Shang, Shouyi Yin:
PMP: Pattern Morphing-based Memory Partitioning in High-Level Synthesis. DAC 2024: 205:1-205:6 - [c157]Zheng Xu, Xu Dai, Shaojun Wei, Shouyi Yin, Yang Hu:
GSPO: A Graph Substitution and Parallelization Joint Optimization Framework for DNN Inference. DAC 2024: 214:1-214:6 - [c156]Xiaolong Yang, Yang Wang, Yubin Qin, Jiachen Wang, Shaojun Wei, Yang Hu, Shouyi Yin:
FQP: A Fibonacci Quantization Processor with Multiplication-Free Computing and Topological-Order Routing. DAC 2024: 230:1-230:6 - [c155]Dajiang Liu, Yuxin Xia, Jiaxing Shang, Jiang Zhong, Peng Ouyang, Shouyi Yin:
E2EMap: End-to-End Reinforcement Learning for CGRA Compilation via Reverse Mapping. HPCA 2024: 46-60 - [c154]Zhiheng Yue, Huizheng Wang, Jiahao Fang, Jinyi Deng, Guangyang Lu, Fengbin Tu, Ruiqi Guo, Yuxuan Li, Yubin Qin, Yang Wang, Chao Li, Huiming Han, Shaojun Wei, Yang Hu, Shouyi Yin:
Exploiting Similarity Opportunities of Emerging Vision AI Models on Hybrid Bonding Architecture. ISCA 2024: 396-409 - [c153]Yubin Qin, Yang Wang, Zhiren Zhao, Xiaolong Yang, Yang Zhou, Shaojun Wei, Yang Hu, Shouyi Yin:
MECLA: Memory-Compute-Efficient LLM Accelerator with Scaling Sub-matrix Partition. ISCA 2024: 1032-1047 - [c152]Zhiheng Yue, Xujiang Xiang, Fengbin Tu, Yang Wang, Yiming Wang, Shaojun Wei, Yang Hu, Shouyi Yin:
15.1 A 0.795fJ/bit Physically-Unclonable Function-Protected TCAM for a Software-Defined Networking Switch. ISSCC 2024: 276-278 - [c151]Ruiqi Guo, Lei Wang, Xiaofeng Chen, Hao Sun, Zhiheng Yue, Yubin Qin, Huiming Han, Yang Wang, Fengbin Tu, Shaojun Wei, Yang Hu, Shouyi Yin:
20.2 A 28nm 74.34TFLOPS/W BF16 Heterogenous CIM-Based Accelerator Exploiting Denoising-Similarity for Diffusion Models. ISSCC 2024: 362-364 - [c150]Yang Wang, Xiaolong Yang, Yubin Qin, Zhiren Zhao, Ruiqi Guo, Zhiheng Yue, Huiming Han, Shaojun Wei, Yang Hu, Shouyi Yin:
34.1 A 28nm 83.23TFLOPS/W POSIT-Based Compute-in-Memory Macro for High-Accuracy AI Applications. ISSCC 2024: 566-568 - [c149]Yiqi Wang, Zhen He, Chenggang Zhao, Zihan Wu, Mingyu Gao, Huiming Han, Shaojun Wei, Yang Hu, Fengbin Tu, Shouyi Yin:
ETCIM: An Error-Tolerant Digital-CIM Processor with Redundancy-Free Repair and Run-Time MAC and Cell Error Correction. VLSI Technology and Circuits 2024: 1-2 - [c148]Yang Wang, Xiaolong Yang, Yubin Qin, Zhiren Zhao, Ruiqi Guo, Zhiheng Yue, Huiming Han, Shaojun Wei, Yang Hu, Shouyi Yin:
A 22nm 54.94TFLOPS/W Transformer Fine-Tuning Processor with Exponent-Stationary Re-Computing, Aggressive Linear Fitting, and Logarithmic Domain Multiplicating. VLSI Technology and Circuits 2024: 1-2 - [c147]Ruiqi Guo, Xiaofeng Chen, Lei Wang, Fengbin Tu, Shaojun Wei, Yang Hu, Shouyi Yin:
A 28nm 4170-TFLOPS/W/b and 195-TFLOPS/mm2/b Multiply-Free Fully-Digital Floating-Point Compute-In-Memory Macro with Mitchell's Approximation. VLSI Technology and Circuits 2024: 1-2 - [c146]Yubin Qin, Yang Wang, Xiaolong Yang, Zhiren Zhao, Shaojun Wei, Yang Hu, Shouyi Yin:
A 52.01 TFLOPS/W Diffusion Model Processor with Inter-Time-Step Convolution-Attention-Redundancy Elimination and Bipolar Floating-Point Multiplication. VLSI Technology and Circuits 2024: 1-2 - [i13]Jinyi Deng, Xinru Tang, Zhiheng Yue, Guangyang Lu, Qize Yang, Jiahao Zhang, Jinxi Li, Chao Li, Shaojun Wei, Yang Hu, Shouyi Yin:
Efficient Orchestrated AI Workflows Execution on Scale-out Spatial Architecture. CoRR abs/2405.17221 (2024) - [i12]Jiahao Fang, Huizheng Wang, Qize Yang, Dehao Kong, Xu Dai, Jinyi Deng, Yang Hu, Shouyi Yin:
PALM: A Efficient Performance Simulator for Tiled Accelerators with Large-scale Model Training. CoRR abs/2406.03868 (2024) - [i11]Huizheng Wang, Jiahao Fang, Xinru Tang, Zhiheng Yue, Jinxi Li, Yubin Qin, Sihan Guan, Qize Yang, Yang Wang, Chao Li, Yang Hu, Shouyi Yin:
SOFA: A Compute-Memory Optimized Sparsity Accelerator via Cross-Stage Coordinated Tiling. CoRR abs/2407.10416 (2024) - 2023
- [j155]Yihong Zhu, Wenping Zhu, Chongyang Li, Min Zhu, Chenchen Deng, Chen Chen, Shuying Yin, Shouyi Yin, Shaojun Wei, Leibo Liu:
RePQC: A 3.4-uJ/Op 48-kOPS Post-Quantum Crypto-Processor for Multiple-Mathematical Problems. IEEE J. Solid State Circuits 58(1): 124-140 (2023) - [j154]Yang Wang, Yubin Qin, Dazheng Deng, Jingchuan Wei, Yang Zhou, Yuanqi Fan, Tianbao Chen, Hao Sun, Leibo Liu, Shaojun Wei, Shouyi Yin:
An Energy-Efficient Transformer Processor Exploiting Dynamic Weak Relevances in Global Attention. IEEE J. Solid State Circuits 58(1): 227-242 (2023) - [j153]Fengbin Tu, Yiqi Wang, Zihan Wu, Ling Liang, Yufei Ding, Bongjin Kim, Leibo Liu, Shaojun Wei, Yuan Xie, Shouyi Yin:
ReDCIM: Reconfigurable Digital Computing- In -Memory Processor With Unified FP/INT Pipeline for Cloud AI Acceleration. IEEE J. Solid State Circuits 58(1): 243-255 (2023) - [j152]Ruiqi Guo, Zhiheng Yue, Xin Si, Hao Li, Te Hu, Limei Tang, Yabing Wang, Hao Sun, Leibo Liu, Meng-Fan Chang, Qiang Li, Shaojun Wei, Shouyi Yin:
TT@CIM: A Tensor-Train In-Memory-Computing Processor Using Bit-Level-Sparsity Optimization and Variable Precision Quantization. IEEE J. Solid State Circuits 58(3): 852-866 (2023) - [j151]Fengbin Tu, Zihan Wu, Yiqi Wang, Ling Liang, Liu Liu, Yufei Ding, Leibo Liu, Shaojun Wei, Yuan Xie, Shouyi Yin:
TranCIM: Full-Digital Bitline-Transpose CIM-based Sparse Transformer Accelerator With Pipeline/Parallel Reconfigurable Modes. IEEE J. Solid State Circuits 58(6): 1798-1809 (2023) - [j150]Fengbin Tu, Yiqi Wang, Ling Liang, Yufei Ding, Leibo Liu, Shaojun Wei, Shouyi Yin, Yuan Xie:
SDP: Co-Designing Algorithm, Dataflow, and Architecture for In-SRAM Sparse NN Acceleration. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 42(1): 109-121 (2023) - [j149]Mingyang Kou, Jiangyuan Gu, Hailong Yao, Shaojun Wei, Shouyi Yin:
TAEM 2.0: A Faster Transfer-Aware Effective Loop Mapping for Heterogeneous Resources on CGRA. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 42(8): 2552-2565 (2023) - [j148]Xiangyu Kong, Jianfeng Zhu, Xingchen Man, Guihuan Song, Yi Huang, Chenchen Deng, Pengfei Gou, Shouyi Yin, Shaojun Wei, Leibo Liu:
M2STaR: A Multimode Spatio-Temporal Redundancy Design for Fault-Tolerant Coarse-Grained Reconfigurable Architectures. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 42(9): 2938-2951 (2023) - [j147]Yiqi Wang, Fengbin Tu, Leibo Liu, Shaojun Wei, Yuan Xie, Shouyi Yin:
SPCIM: Sparsity-Balanced Practical CIM Accelerator With Optimized Spatial-Temporal Multi-Macro Utilization. IEEE Trans. Circuits Syst. I Regul. Pap. 70(1): 214-227 (2023) - [j146]Shaojun Wei, Xinhan Lin, Fengbin Tu, Yang Wang, Leibo Liu, Shouyi Yin:
Reconfigurability, Why It Matters in AI Tasks Processing: A Survey of Reconfigurable AI Chips. IEEE Trans. Circuits Syst. I Regul. Pap. 70(3): 1228-1241 (2023) - [j145]Weiwei Wu, Fengbin Tu, Mengqi Niu, Zhiheng Yue, Leibo Liu, Shaojun Wei, Xiangyu Li, Yang Hu, Shouyi Yin:
STAR: An STGCN ARchitecture for Skeleton-Based Human Action Recognition. IEEE Trans. Circuits Syst. I Regul. Pap. 70(6): 2370-2383 (2023) - [c145]Xiaofeng Chen, Ruiqi Guo, Zhiheng Yue, Yang Hu, Leibo Liu, Shaojun Wei, Shouyi Yin:
A Systolic Computing-in-Memory Array based Accelerator with Predictive Early Activation for Spatiotemporal Convolutions. AICAS 2023: 1-5 - [c144]Zhou Wang, Jingchuan Wei, Xiaonan Tang, Boxiao Han, Hongjun He, Leibo Liu, Shaojun Wei, Shouyi Yin:
TPE: A High-Performance Edge-Device Inference with Multi-level Transformational Mechanism. AICAS 2023: 1-5 - [c143]Ruiqi Guo, Yang Wang, Xiaofeng Chen, Lei Wang, Hao Sun, Jingchuan Wei, Leibo Liu, Shaojun Wei, Yang Hu, Shouyi Yin:
CIMFormer: A 38.9TOPS/W-8b Systolic CIM-Array Based Transformer Processor with Token-Slimmed Attention Reformulating and Principal Possibility Gathering. A-SSCC 2023: 1-3 - [c142]Yubin Qin, Yang Wang, Dazheng Deng, Xiaolong Yang, Zhiren Zhao, Yang Zhou, Yuanqi Fan, Jingchuan Wei, Tianbao Chen, Leibo Liu, Shaojun Wei, Yang Hu, Shouyi Yin:
A 28nm 49.7TOPS/W Sparse Transformer Processor with Random-Projection-Based Speculation, Multi-Stationary Dataflow, and Redundant Partial Product Elimination. A-SSCC 2023: 1-3 - [c141]Dajiang Liu, Di Mou, Rong Zhu, Yan Zhuang, Jiaxing Shang, Jiang Zhong, Shouyi Yin:
DARIC: A Data Reuse-Friendly CGRA for Parallel Data Access via Elastic FIFOs. DAC 2023: 1-6 - [c140]Zhou Wang, Jingchuan Wei, Boxiao Han, Hongjun He, Leibo Liu, Shaojun Wei, Shouyi Yin:
CPE: An Energy-Efficient Edge-Device Training with Multi-dimensional Compression Mechanism. DAC 2023: 1-6 - [c139]Qidie Wu, Jiangyuan Gu, Youxu Lin, Boxiao Han, Hongjun He, Yang Hu, Leibo Liu, Shaojun Wei, Shouyi Yin:
RMP-MEM: A HW/SW Reconfigurable Multi-Port Memory Architecture for Multi-PEA Oriented CGRA. DAC 2023: 1-6 - [c138]Yubin Qin, Yang Wang, Dazheng Deng, Zhiren Zhao, Xiaolong Yang, Leibo Liu, Shaojun Wei, Yang Hu, Shouyi Yin:
FACT: FFN-Attention Co-optimized Transformer Architecture with Eager Correlation Prediction. ISCA 2023: 22:1-22:14 - [c137]Zhiheng Yue, Yang Wang, Huizheng Wang, Yabing Wang, Ruiqi Guo, Limei Tang, Leibo Liu, Shaojun Wei, Yang Hu, Shouyi Yin:
CV-CIM: A 28nm XOR-Derived Similarity-Aware Computation-in-Memory for Cost-Volume Construction. ISSCC 2023: 138-139 - [c136]Fengbin Tu, Zihan Wu, Yiqi Wang, Weiwei Wu, Leibo Liu, Yang Hu, Shaojun Wei, Shouyi Yin:
MuITCIM: A 28nm $2.24 \mu\mathrm{J}$/Token Attention-Token-Bit Hybrid Sparse Digital CIM-Based Accelerator for Multimodal Transformers. ISSCC 2023: 248-249 - [c135]Fengbin Tu, Yiqi Wang, Zihan Wu, Weiwei Wu, Leibo Liu, Yang Hu, Shaojun Wei, Shouyi Yin:
TensorCIM: A 28nm 3.7nJ/Gather and 8.3TFLOPS/W FP32 Digital-CIM Tensor Processor for MCM-CIM-Based Beyond-NN Acceleration. ISSCC 2023: 254-255 - [c134]Jinyi Deng, Xinru Tang, Jiahao Zhang, Yuxuan Li, Linyun Zhang, Boxiao Han, Hongjun He, Fengbin Tu, Leibo Liu, Shaojun Wei, Yang Hu, Shouyi Yin:
Towards Efficient Control Flow Handling in Spatial Architecture via Architecting the Control Flow Plane. MICRO 2023: 1395-1408 - [c133]Yang Wang, Yubin Qin, Dazheng Deng, Xiaolong Yang, Zhiren Zhao, Ruiqi Guo, Zhiheng Yue, Leibo Liu, Shaojun Wei, Yang Hu, Shouyi Yin:
A 28nm 77.35TOPS/W Similar Vectors Traceable Transformer Processor with Principal-Component-Prior Speculating and Dynamic Bit-wise Stationary Computing. VLSI Technology and Circuits 2023: 1-2 - [i10]Shitong Shao, Xu Dai, Shouyi Yin, Lujun Li, Huanran Chen, Yang Hu:
Catch-Up Distillation: You Only Need to Train Once for Accelerating Sampling. CoRR abs/2305.10769 (2023) - [i9]Jinyi Deng, Xinru Tang, Jiahao Zhang, Yuxuan Li, Linyun Zhang, Fengbin Tu, Leibo Liu, Shaojun Wei, Yang Hu, Shouyi Yin:
Towards Efficient Control Flow Handling in Spatial Architecture via Architecting the Control Flow Plane. CoRR abs/2307.02847 (2023) - [i8]Haojia Hui, Jiangyuan Gu, Xunbo Hu, Yang Hu, Leibo Liu, Shaojun Wei, Shouyi Yin:
WindMill: A Parameterized and Pluggable CGRA Implemented by DIAG Design Flow. CoRR abs/2309.01273 (2023) - [i7]Yang Hu, Xinhan Lin, Huizheng Wang, Zhen He, Xingmao Yu, Jiahao Zhang, Qize Yang, Zheng Xu, Sihan Guan, Jiahao Fang, Haoran Shang, Xinru Tang, Xu Dai, Shaojun Wei, Shouyi Yin:
Wafer-scale Computing: Advancements, Challenges, and Future Perspectives. CoRR abs/2310.09568 (2023) - 2022
- [j144]Chenchen Deng, Min Zhu, Jinjiang Yang, Youyu Wu, Jiaji He, Bohan Yang, Jianfeng Zhu, Shouyi Yin, Shaojun Wei, Leibo Liu:
An energy-efficient dynamically reconfigurable cryptographic engine with improved power/EM-side-channel-attack resistance. Sci. China Inf. Sci. 65(4) (2022) - [j143]Huiyu Mo, Wenping Zhu, Wenjing Hu, Qiang Li, Ang Li, Shouyi Yin, Shaojun Wei, Leibo Liu:
A 12.1 TOPS/W Quantized Network Acceleration Processor With Effective-Weight-Based Convolution and Error-Compensation-Based Prediction. IEEE J. Solid State Circuits 57(5): 1542-1557 (2022) - [j142]Jung-Hwan Choi, Po-Chiun Huang, Shouyi Yin, Woogeun Rhee:
Guest Editorial Introduction to the Special Section on the 2021 Asian Solid-State Circuits Conference (A-SSCC). IEEE J. Solid State Circuits 57(10): 2895-2897 (2022) - [j141]Yang Wang, Yubin Qin, Dazheng Deng, Jingchuan Wei, Tianbao Chen, Xinhan Lin, Leibo Liu, Shaojun Wei, Shouyi Yin:
Trainer: An Energy-Efficient Edge-Device Training Processor Supporting Dynamic Weight Pruning. IEEE J. Solid State Circuits 57(10): 3164-3178 (2022) - [j140]Zongsheng Hou, Neng Zhang, Bohan Yang, Hanning Wang, Min Zhu, Shouyi Yin, Shaojun Wei, Leibo Liu:
Efficient FHE Radix-2 Arithmetic Operations Based on Redundant Encoding. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 41(7): 2024-2037 (2022) - [j139]Baofen Yuan, Jianfeng Zhu, Xingchen Man, Zijiao Ma, Shouyi Yin, Shaojun Wei, Leibo Liu:
Dynamic-II Pipeline: Compiling Loops With Irregular Branches on Static-Scheduling CGRA. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 41(9): 2929-2942 (2022) - [j138]Ang Li, Huiyu Mo, Wenping Zhu, Qiang Li, Shouyi Yin, Shaojun Wei, Leibo Liu:
BitCluster: Fine-Grained Weight Quantization for Load-Balanced Bit-Serial Neural Network Accelerators. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 41(11): 4747-4757 (2022) - [j137]Yong Wu, Honglan Jiang, Zining Ma, Pengfei Gou, Yong Lu, Jie Han, Shouyi Yin, Shaojun Wei, Leibo Liu:
An Energy-Efficient Approximate Divider Based on Logarithmic Conversion and Piecewise Constant Approximation. IEEE Trans. Circuits Syst. I Regul. Pap. 69(7): 2655-2668 (2022) - [j136]Zhiheng Yue, Yabing Wang, Yubin Qin, Leibo Liu, Shaojun Wei, Shouyi Yin:
BR-CIM: An Efficient Binary Representation Computation-In-Memory Design. IEEE Trans. Circuits Syst. I Regul. Pap. 69(10): 3940-3953 (2022) - [j135]Yang Wang, Yubin Qin, Leibo Liu, Shaojun Wei, Shouyi Yin:
SWPU: A 126.04 TFLOPS/W Edge-Device Sparse DNN Training Processor With Dynamic Sub-Structured Weight Pruning. IEEE Trans. Circuits Syst. I Regul. Pap. 69(10): 4014-4027 (2022) - [j134]Yang Wang, Dazheng Deng, Leibo Liu, Shaojun Wei, Shouyi Yin:
PL-NPU: An Energy-Efficient Edge-Device DNN Training Processor With Posit-Based Logarithm-Domain Computing. IEEE Trans. Circuits Syst. I Regul. Pap. 69(10): 4042-4055 (2022) - [j133]Jianxun Yang, Fengbin Tu, Yixuan Li, Yiqi Wang, Leibo Liu, Shaojun Wei, Shouyi Yin:
GQNA: Generic Quantized DNN Accelerator With Weight-Repetition-Aware Activation Aggregating. IEEE Trans. Circuits Syst. I Regul. Pap. 69(10): 4069-4082 (2022) - [j132]Xiangren Chen, Bohan Yang, Shouyi Yin, Shaojun Wei, Leibo Liu:
CFNTT: Scalable Radix-2/4 NTT Multiplication Architecture with an Efficient Conflict-free Memory Mapping Scheme. IACR Trans. Cryptogr. Hardw. Embed. Syst. 2022(1): 94-126 (2022) - [j131]Cankun Zhao, Neng Zhang, Hanning Wang, Bohan Yang, Wenping Zhu, Zhengdong Li, Min Zhu, Shouyi Yin, Shaojun Wei, Leibo Liu:
A Compact and High-Performance Hardware Architecture for CRYSTALS-Dilithium. IACR Trans. Cryptogr. Hardw. Embed. Syst. 2022(1): 270-295 (2022) - [c132]Xiangren Chen, Bohan Yang, Yong Lu, Shouyi Yin, Shaojun Wei, Leibo Liu:
Efficient access scheme for multi-bank based NTT architecture through conflict graph. DAC 2022: 91-96 - [c131]Jinyi Deng, Linyun Zhang, Lei Wang, Jiawei Liu, Kexiang Deng, Shibin Tang, Jiangyuan Gu, Boxiao Han, Fei Xu, Leibo Liu, Shaojun Wei, Shouyi Yin:
Mixed-granularity parallel coarse-grained reconfigurable architecture. DAC 2022: 343-348 - [c130]Zhiheng Yue, Yabing Wang, Leibo Liu, Shaojun Wei, Shouyi Yin:
MC-CIM: a reconfigurable computation-in-memory for efficient stereo matching cost computation. DAC 2022: 457-462 - [c129]Shixuan Zheng, Xianjue Zhang, Leibo Liu, Shaojun Wei, Shouyi Yin:
Atomic Dataflow based Graph-Level Workload Orchestration for Scalable DNN Accelerators. HPCA 2022: 475-489 - [c128]Yibo Wu, Liang Wang, Xiaohang Wang, Jie Han, Jianfeng Zhu, Honglan Jiang, Shouyi Yin, Shaojun Wei, Leibo Liu:
Upward Packet Popup for Deadlock Freedom in Modular Chiplet-Based Systems. HPCA 2022: 986-1000 - [c127]Mingyuan Yang, Yemeng Zhang, Bohan Yang, Hanning Wang, Shouyi Yin, Shaojun Wei, Leibo Liu:
A SHA-512 Hardware Implementation Based on Block RAM Storage Structure. IPDPS Workshops 2022: 132-135 - [c126]Xingchen Man, Jianfeng Zhu, Guihuan Song, Shouyi Yin, Shaojun Wei, Leibo Liu:
CaSMap: agile mapper for reconfigurable spatial architectures by automatically clustering intermediate representations and scattering mapping process. ISCA 2022: 259-273 - [c125]Fengbin Tu, Yiqi Wang, Zihan Wu, Ling Liang, Yufei Ding, Bongjin Kim, Leibo Liu, Shaojun Wei, Yuan Xie, Shouyi Yin:
A 28nm 29.2TFLOPS/W BF16 and 36.5TOPS/W INT8 Reconfigurable Digital CIM Processor with Unified FP/INT Pipeline and Bitwise In-Memory Booth Multiplication for Cloud Deep Learning Acceleration. ISSCC 2022: 1-3 - [c124]Yang Wang, Yubin Qin, Dazheng Deng, Jingchuan Wei, Yang Zhou, Yuanqi Fan, Tianbao Chen, Hao Sun, Leibo Liu, Shaojun Wei, Shouyi Yin:
A 28nm 27.5TOPS/W Approximate-Computing-Based Transformer Processor with Asymptotic Sparsity Speculating and Out-of-Order Computing. ISSCC 2022: 1-3 - [c123]Fengbin Tu, Zihan Wu, Yiqi Wang, Ling Liang, Liu Liu, Yufei Ding, Leibo Liu, Shaojun Wei, Yuan Xie, Shouyi Yin:
A 28nm 15.59µJ/Token Full-Digital Bitline-Transpose CIM-Based Sparse Transformer Accelerator with Pipeline/Parallel Reconfigurable Modes. ISSCC 2022: 466-468 - [c122]Yihong Zhu, Wenping Zhu, Min Zhu, Chongyang Li, Chenchen Deng, Chen Chen, Shuying Yin, Shouyi Yin, Shaojun Wei, Leibo Liu:
A 28nm 48KOPS 3.4µJ/Op Agile Crypto-Processor for Post-Quantum Cryptography on Multi-Mathematical Problems. ISSCC 2022: 514-516 - [i6]Hongjiang Chen, Yang Wang, Leibo Liu, Shaojun Wei, Shouyi Yin:
FAQS: Communication-efficient Federate DNN Architecture and Quantization Co-Search for personalized Hardware-aware Preferences. CoRR abs/2210.08450 (2022) - [i5]Hongjiang Chen, Yang Wang, Leibo Liu, Shaojun Wei, Shouyi Yin:
HQNAS: Auto CNN deployment framework for joint quantization and architecture search. CoRR abs/2210.08485 (2022) - 2021
- [j130]Hai Huang, Leibo Liu, Min Zhu, Shouyi Yin, Shaojun Wei:
Fast substitution-box evaluation algorithm and its efficient masking scheme for block ciphers. Sci. China Inf. Sci. 64(8) (2021) - [j129]Fengbin Tu, Weiwei Wu, Yang Wang, Hongjiang Chen, Feng Xiong, Man Shi, Ning Li, Jinyi Deng, Tianbao Chen, Leibo Liu, Shaojun Wei, Yuan Xie, Shouyi Yin:
Evolver: A Deep Learning Processor With On-Device Quantization-Voltage-Frequency Tuning. IEEE J. Solid State Circuits 56(2): 658-673 (2021) - [j128]Jianfeng Zhu, Ao Luo, Guanhua Li, Bowei Zhang, Yong Wang, Gang Shan, Yi Li, Jianfeng Pan, Chenchen Deng, Shouyi Yin, Shaojun Wei, Leibo Liu:
Jintide: Utilizing Low-Cost Reconfigurable External Monitors to Substantially Enhance Hardware Security of Large-Scale CPU Clusters. IEEE J. Solid State Circuits 56(8): 2585-2601 (2021) - [j127]Fengbin Tu, Weiwei Wu, Yang Wang, Hongjiang Chen, Feng Xiong, Man Shi, Ning Li, Jinyi Deng, Tianbao Chen, Leibo Liu, Shaojun Wei, Yuan Xie, Shouyi Yin:
Erratum to "Evolver: a Deep Learning Processor With On-Device Quantization-Voltage-Frequency Tuning". IEEE J. Solid State Circuits 56(9): 2895 (2021) - [j126]Jianxun Yang, Yuyao Kong, Zhao Zhang, Zhuangzhi Liu, Jing Zhou, Yiqi Wang, Yonggang Liu, Chenfu Guo, Te Hu, Congcong Li, Leibo Liu, Jin Zhang, Shaojun Wei, Jun Yang, Shouyi Yin:
TIMAQ: A Time-Domain Computing-in-Memory-Based Processor Using Predictable Decomposed Convolution for Arbitrary Quantized DNNs. IEEE J. Solid State Circuits 56(10): 3021-3038 (2021) - [j125]Neng Zhang, Qiao Qin, Zongsheng Hou, Bohan Yang, Shouyi Yin, Shaojun Wei, Leibo Liu:
Efficient Comparison and Addition for FHE With Weighted Computational Complexity Model. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 40(9): 1896-1908 (2021) - [j124]Yibo Wu, Liang Wang, Xiaohang Wang, Jie Han, Shouyi Yin, Shaojun Wei, Leibo Liu:
A Deflection-Based Deadlock Recovery Framework to Achieve High Throughput for Faulty NoCs. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 40(10): 2170-2183 (2021) - [j123]Kai Zhou, Shouyi Yin, Peng Ouyang, Yinan Liu, Shibin Tang:
Flexible Rectification of a Speckle Projection System for Depth Sensing. IEEE Trans. Instrum. Meas. 70: 1-13 (2021) - [j122]Huiyu Mo, Leibo Liu, Wenping Zhu, Qiang Li, Shouyi Yin, Shaojun Wei:
A 460 GOPS/W Improved Mnemonic Descent Method-Based Hardwired Accelerator for Face Alignment. IEEE Trans. Multim. 23: 1122-1135 (2021) - [j121]Longlong Chen, Jianfeng Zhu, Yangdong Deng, Zhaoshi Li, Jian Chen, Xiaowei Jiang, Shouyi Yin, Shaojun Wei, Leibo Liu:
An Elastic Task Scheduling Scheme on Coarse-Grained Reconfigurable Architectures. IEEE Trans. Parallel Distributed Syst. 32(12): 3066-3080 (2021) - [c121]Yang Wang, Dazheng Deng, Leibo Liu, Shaojun Wei, Shouyi Yin:
LPE: Logarithm Posit Processing Element for Energy-Efficient Edge-Device Training. AICAS 2021: 1-4 - [c120]Yang Wang, Yubin Qin, Leibo Liu, Shaojun Wei, Shouyi Yin:
HPPU: An Energy-Efficient Sparse DNN Training Processor with Hybrid Weight Pruning. AICAS 2021: 1-4 - [c119]Cheng Li, Jiangyuan Gu, Shouyi Yin, Leibo Liu, Shaojun Wei:
Combining Memory Partitioning and Subtask Generation for Parallel Data Access on CGRAs. ASP-DAC 2021: 204-209 - [c118]Song Zhang, Jiangyuan Gu, Shouyi Yin, Leibo Liu, Shaojun Wei:
A Multiple-Precision Multiply and Accumulation Design with Multiply-Add Merged Strategy for AI Accelerating. ASP-DAC 2021: 229-234 - [c117]Huiyu Shi, Xi Chen, Tianlong Kong, Shouyi Yin, Peng Ouyang:
GLMSnet: Single Channel Speech Separation Framework in Noisy and Reverberant Environments. ASRU 2021: 663-670 - [c116]Zhendong Wang, Rujia Wang, Zihang Jiang, Xulong Tang, Shouyi Yin, Yang Hu:
Towards a Secure Integrated Heterogeneous Platform via Cooperative CPU/GPU Encryption. ATS 2021: 115-120 - [c115]