


default search action
29th HPCA 2023: Montreal, QC, Canada
- IEEE International Symposium on High-Performance Computer Architecture, HPCA 2023, Montreal, QC, Canada, February 25 - March 1, 2023. IEEE 2023, ISBN 978-1-6654-7652-2

Session 1A: Neural Networks and Accelerators 1
- Mingi Yoo, Jaeyong Song

, Jounghoo Lee, Namhyung Kim, Youngsok Kim, Jinho Lee
:
SGCN: Exploiting Compressed-Sparse Features in Deep Graph Convolutional Network Accelerators. 1-14 - Shurui Li

, Hangbo Yang
, Chee Wei Wong, Volker J. Sorger, Puneet Gupta
:
PhotoFourier: A Photonic Joint Transform Correlator-Based Neural Network Accelerator. 15-28 - Bokyung Kim, Shiyu Li, Hai Li:

INCA: Input-stationary Dataflow at Outside-the-box Thinking about Deep Learning Accelerators. 29-41 - Ranggi Hwang

, Minhoo Kang, Jiwon Lee, Dongyun Kam, Youngjoo Lee, Minsoo Rhu:
GROW: A Row-Stationary Sparse-Dense GEMM Accelerator for Memory-Efficient Graph Convolutional Neural Networks. 42-55 - Jo Sanghoon, Hyojun Son, John Kim:

Logical/Physical Topology-Aware Collective Communication in Deep Learning Training. 56-68 - Dongseok Im, Gwangtae Park, Zhiyong Li

, Junha Ryu, Hoi-Jun Yoo:
Sibia: Signed Bit-slice Architecture for Dense DNN Acceleration with Slice-level Sparsity Exploitation. 69-80
Session 1B: NVRAM and Hybrid Memory
- Siddharth Gupta

, Yunho Oh, Lei Yan, Mark Sutherland, Abhishek Bhattacharjee, Babak Falsafi, Peter Hsu:
AstriFlash A Flash-Based System for Online Services. 81-93 - Xijing Han, James Tuck

, Amro Awad
:
Thoth: Bridging the Gap Between Persistently Secure Memories and Memory Interfaces of Emerging NVMs. 94-107 - Hongchao Du

, Qiao Li
, Riwei Pan
, Tei-Wei Kuo
, Chun Jason Xue:
Multi-Granularity Shadow Paging with NVM Write Optimization for Crash-Consistent Memory-Mapped I/O. 108-121 - Yina Lv, Liang Shi, Qiao Li, Congming Gao, Yunpeng Song, Longfei Luo

, Youtao Zhang
:
MGC: Multiple-Gray-Code for 3D NAND Flash based High-Density SSDs. 122-136 - Yiwei Li, Mingyu Gao:

Baryon: Efficient Hybrid Memory Management with Compression and Sub-Blocking. 137-151 - Jianming Huang, Yu Hua:

Root Crash Consistency of SGX-style Integrity Trees in Secure Non-Volatile Memory Systems. 152-164
Session 1C: Caching and Memory Management
- Yunjin Wang, Chia-Hao Chang, Anand Sivasubramaniam, Niranjan Soundararajan:

ACIC: Admission-Controlled Instruction Cache. 165-178 - Carlos Escuin, Asif Ali Khan, Pablo Ibáñez, Teresa Monreal, Jerónimo Castrillón, Víctor Viñals:

Compression-Aware and Performance-Efficient Insertion Policies for Long-Lasting Hybrid LLCs. 179-192 - Youngin Kim, Hyeonjin Kim

, William J. Song
:
NOMAD: Enabling Non-blocking OS-managed DRAM Cache via Tag-Data Decoupling. 193-205 - Anirudh Jain

, Divya Kiran Kadiyala
, Alexandros Daglis:
Safety Hints for HTM Capacity Abort Mitigation. 206-219 - Weijian Chen, Shuibing He, Yaowen Xu, Xuechen Zhang, Siling Yang, Shuang Hu, Xian-He Sun, Gang Chen:

iCache: An Importance-Sampling-Informed Cache for Accelerating I/O-Bound DNN Model Training. 220-232 - Anirban Chakraborty, Sarani Bhattacharya, Sayandeep Saha, Debdeep Mukhopadhyay:

Are Randomized Caches Truly Random? Formal Analysis of Randomized-Partitioned Caches. 233-246
Session 2A: Accelerators
- Hesam Shabani, Abhishek Singh, Bishoy Youhana, Xiaochen Guo:

HIRAC: A Hierarchical Accelerator with Sorting-based Packing for SpGEMMs in DNN Applications. 247-258 - Geonhwa Jeong, Sana Damani, Abhimanyu Rajeshkumar Bambhaniya

, Eric Qin, Christopher J. Hughes
, Sreenivas Subramoney, Hyesoon Kim, Tushar Krishna:
VEGETA: Vertically-Integrated Extensions for Sparse/Dense GEMM Tile Acceleration on CPUs. 259-272 - Haoran You, Zhanyi Sun, Huihong Shi, Zhongzhi Yu, Yang Zhao

, Yongan Zhang, Chaojian Li, Baopu Li, Yingyan Lin:
ViTCoD: Vision Transformer Acceleration via Dedicated Algorithm and Accelerator Co-Design. 273-286 - Chirag Sakhuja, Zhan Shi, Calvin Lin:

Leveraging Domain Information for the Efficient Automated Design of Deep Learning Accelerators. 287-301 - Zhe Zhou, Cong Li, Fan Yang, Guangyu Sun:

DIMM-Link: Enabling Efficient Inter-DIMM Communication for Near-Memory Processing. 302-316
Session 2B: Security
- Mulong Luo

, Wenjie Xiong
, Geunbae Lee, Yueying Li, Xiaomeng Yang, Amy Zhang, Yuandong Tian, Hsien-Hsin S. Lee
, G. Edward Suh
:
AutoCAT: Reinforcement Learning for Automated Exploration of Cache-Timing Attacks. 317-332 - Minbok Wi

, Jaehyun Park
, Seoyoung Ko, Michael Jaemin Kim, Nam Sung Kim, Eojin Lee, Jung Ho Ahn:
SHADOW: Preventing Row Hammer in DRAM with Intra-Subarray Row Shuffling. 333-346 - Erhu Feng, Dong Du, Yubin Xia, Haibo Chen:

Efficient Distributed Secure Memory with Migratable Merkle Tree. 347-360 - Mehrnoosh Raoufi, Jun Yang, Xulong Tang

, Youtao Zhang
:
AB-ORAM: Constructing Adjustable Buckets for Space Reduction in Ring ORAM. 361-373 - Jeonghyun Woo

, Gururaj Saileshwar, Prashant J. Nair:
Scalable and Secure Row-Swap: Efficient and Safe Row Hammer Mitigation in Memory Systems. 374-389
Session 2C: Applications 1
- Yu Wen, Chenhao Xie

, Shuaiwen Leon Song, Xin Fu:
Post0-VR: Enabling Universal Realistic Rendering for Modern VR via Exploiting Architectural Similarity and Data Sharing. 390-402 - Faquan Chen, Rendong Ying, Jianwei Xue, Fei Wen, Peilin Liu:

ParallelNN: A Parallel Octree-based Nearest Neighbor Search Accelerator for 3D Point Clouds. 403-414 - Jyotikrishna Dass

, Shang Wu, Huihong Shi, Chaojian Li, Zhifan Ye
, Zhongfeng Wang, Yingyan Lin:
ViTALiTy: Unifying Low-rank and Sparse Approximation for Vision Transformer Acceleration with a Linear Taylor Attention. 415-428 - Haoran Wang

, Haobo Xu, Ying Wang
, Yinhe Han:
CTA: Hardware-Software Co-design for Compressed Token Attention Mechanism. 429-441 - Peiyan Dong, Mengshu Sun

, Alec Lu, Yanyue Xie, Kenneth Liu, Zhenglun Kong, Xin Meng, Zhengang Li, Xue Lin, Zhenman Fang, Yanzhi Wang:
HeatViT: Hardware-Efficient Adaptive Token Pruning for Vision Transformers. 442-455
Session 3B: Datacenters and HPC
- Bingyao Li, Jieming Yin, Anup Holey, Youtao Zhang

, Jun Yang, Xulong Tang
:
Trans-FW: Short Circuiting Page Table Walk in Multi-GPU Systems via Remote Forwarding. 456-470 - Yuhang Liu, Xin Deng, Jiapeng Zhou, Mingyu Chen, Yungang Bao:

Ah-Q: Quantifying and Handling the Interference within a Datacenter from a System Perspective. 471-484 - Md Rajib Hossen, Kishwar Ahmed, Mohammad A. Islam

:
Market Mechanism-Based User-in-the-Loop Scalable Power Oversubscription for HPC Systems. 485-498 - Yifan Yuan, Jinghan Huang, Yan Sun, Tianchen Wang, Jacob Nelson, Dan R. K. Ports, Yipeng Wang, Ren Wang, Charlie Tai, Nam Sung Kim:

Rambda: RDMA-driven Acceleration Framework for Memory-intensive µs-scale Datacenter Applications. 499-515
Session 3C: GPUs
- Harini Muthukrishnan, Daniel Lustig, Oreste Villa, Thomas F. Wenisch, David W. Nellans:

FinePack: Transparently Improving the Efficiency of Fine-Grained Transfers in Multi-GPU Systems. 516-529 - Aaron Barnes, Fangjia Shen

, Timothy G. Rogers
:
Mitigating GPU Core Partitioning Performance Effects. 530-542 - Rahaf Abdullah, Huiyang Zhou

, Amro Awad
:
Plutus: Bandwidth-Efficient Memory Security for GPUs. 543-555 - Quan Zhou, Haiquan Wang, Xiaoyan Yu, Cheng Li, Youhui Bai, Feng Yan, Yinlong Xu:

MPress: Democratizing Billion-Scale Model Training on Multi-GPU Servers via Memory-Saving Inter-Operator Parallelism. 556-569
Session 4A: Neural Networks and Accelerators 2
- Linyan Mei, Koen Goetschalckx, Arne Symons

, Marian Verhelst
:
DeFiNES: Enabling Fast Exploration of the Depth-first Scheduling Space for DNN Accelerators through Analytical Modeling. 570-583 - Yue Dai

, Youtao Zhang
, Xulong Tang
:
CEGMA: Coordinated Elastic Graph Matching Acceleration for Graph Matching Networks. 584-597 - Yifan Yang, Joel S. Emer, Daniel Sánchez:

ISOSceles: Accelerating Sparse CNNs through Inter-Layer Pipelining. 598-610 - Junkyum Kim, Myeonggu Kang, Yunki Han, Yanggon Kim, Lee-Sup Kim:

OptimStore: In-Storage Optimization of Large Scale DNNs with On-Die Processing. 611-623 - Marcus Chow, Ali Jahanshahi, Daniel Wong

:
KRISP: Enabling Kernel-wise RIght-sizing for Spatial Partitioned GPU Inference Servers. 624-637 - Vahid Janfaza

, Kevin Weston, Moein Razavi, Shantanu Mandal, Farabi Mahmud, Alex Hilty, Abdullah Muzahid:
MERCURY: Accelerating DNN Training By Exploiting Input Similarity. 638-650
Session 4B: PIMs and Persistent Memory
- Ming Zhang, Yu Hua:

Silo: Speculative Hardware Logging for Atomic Durability in Persistent Memory. 651-663 - Chencheng Ye, Yuanchao Xu, Xipeng Shen

, Yan Sha, Xiaofei Liao, Hai Jin, Yan Solihin:
Reconciling Selective Logging and Hardware Persistent Memory Transaction. 664-676 - Alexander Freij, Huiyang Zhou

, Yan Solihin:
SecPB: Architectures for Secure Non-Volatile Memory with Battery-Backed Persist Buffers. 677-690 - Khalid Al-Hawaj

, Tuan Ta, Nick Cebry
, Shady Agwa, Olalekan Afuye, Eric Hall, Courtney Golden, Alyssa B. Apsel, Christopher Batten
:
EVE: Ephemeral Vector Engines. 691-704 - Ben Perach, Ronny Ronen, Shahar Kvatinsky:

On Consistency for Bulk-Bitwise Processing-in-Memory. 705-717 - Marcelo Orenes-Vera, Esin Tureci, David Wentzlaff, Margaret Martonosi:

Dalorex: A Data-Local Program Execution and Architecture for Memory-bound Applications. 718-730
Session 4C: Quantum and FPGAs
- Siwei Tan, Mingqian Yu, Andre Python, Yongheng Shang, Tingting Li

, Liqiang Lu, Jianwei Yin:
HyQSAT: A Hybrid Approach for 3-SAT Problems by Integrating Quantum Annealer with CDCL. 731-744 - Ang Li

, August Ning
, David Wentzlaff:
Duet: Creating Harmony between Processors and Embedded FPGAs. 745-758 - Evan McKinney

, Mingkang Xia, Chao Zhou
, Pinlei Lu, Michael Hatridge
, Alex K. Jones:
Co-Designed Architectures for Modular Superconducting Quantum Computers. 759-772 - Yan-Hao Chen, Yuwei Jin, Fei Hua, Ari B. Hayes, Ang Li, Yunong Shi, Eddy Z. Zhang:

A Pulse Generation Framework with Augmented Program-aware Basis Gates and Criticality Analysis. 773-786 - Poulami Das, Eric Kessler, Yunong Shi:

The Imitation Game: Leveraging CopyCats for Robust Native Gate Selection in NISQ Programs. 787-801
Session 5A: Cloud and Edge Computing
- Junkang Zhu

, Yaoyu Tao, Zhengya Zhang:
eNODE: Energy-Efficient and Low-Latency Edge Inference and Training of Neural ODEs. 802-813 - Jovan Stojkovic, Tianyin Xu, Hubertus Franke, Josep Torrellas:

SpecFaaS: Accelerating Serverless Applications with Speculative Function Execution. 814-827 - Seah Kim

, Hasan Genc, Vadim Vadimovich Nikiforov, Krste Asanovic, Borivoje Nikolic
, Yakun Sophia Shao:
MoCA: Memory-Centric, Adaptive Execution for Multi-Tenant Deep Neural Networks. 828-841 - Junyeol Yu

, Jongseok Kim, Euiseong Seo:
Know Your Enemy To Save Cloud Energy: Energy-Performance Characterization of Machine Learning Serving. 842-854 - Dimosthenis Masouros, Christian Pinto, Michele Gazzetti, Sotirios Xydis, Dimitrios Soudris:

Adrias: Interference-Aware Memory Orchestration for Disaggregated Cloud Infrastructures. 855-869
Session 5B: Encryption and SGX
- Yinghao Yang, Huaizhi Zhang, Shengyu Fan, Hang Lu, Mingzhe Zhang, Xiaowei Li:

Poseidon: Practical Homomorphic Encryption Accelerator. 870-881 - Rashmi Agrawal, Leo de Castro, Guowei Yang

, Chiraag Juvekar, Rabia Tugce Yazicigil, Anantha P. Chandrakasan, Vinod Vaikuntanathan, Ajay Joshi:
FAB: An FPGA-based Accelerator for Bootstrappable Fully Homomorphic Encryption. 882-895 - Yilan Zhu, Xinyao Wang, Lei Ju, Shanqing Guo:

FxHENN: FPGA-based acceleration framework for homomorphic encrypted CNN inference. 896-907 - Md Hafizul Islam Chowdhuryy

, Myoungsoo Jung, Fan Yao
, Amro Awad
:
D-Shield: Enabling Processor-side Encryption and Integrity Verification for Secure NVMe Drives. 908-921 - Shengyu Fan, Zhiwei Wang, Weizhi Xu

, Rui Hou, Dan Meng, Mingzhe Zhang:
TensorFHE: Achieving Practical Computation on Encrypted Data Using GPGPU. 922-934
Session 5C: Reliability
- George Papadimitriou, Dimitris Gizopoulos:

AVGI: Microarchitecture-Driven, Fast and Accurate Vulnerability Assessment. 935-948 - Jiangwei Zhang, Chong Wang, Zhenhua Zhu, Donald Kline

, Alex K. Jones, Huazhong Yang, Yu Wang:
Realizing Extreme Endurance Through Fault-aware Wear Leveling and Improved Tolerance. 964-976 - Chunfeng Du, Suzhen Wu, Jiapeng Wu, Bo Mao, Shengzhe Wang:

ESD: An ECC-assisted and Selective Deduplication for Encrypted Non-Volatile Main Memory. 977-990
Session 6A: Industry Track Session
- Majed Valad Beigi, Yi Cao, Sudhanva Gurumurthi, Charles Recchia, Andrew C. Walton, Vilas Sridharan:

A Systematic Study of DDR4 DRAM Faults in the Field. 991-1002 - Jianguo Yao, Hao Zhou

, Yalin Zhang, Ying Li, Chuang Feng, Shi Chen, Jiaoyan Chen, Yongdong Wang, Qiaojuan Hu:
High Performance and Power Efficient Accelerator for Cloud Inference. 1003-1016 - Sungyeob Yoo

, Hyunsung Kim
, Jinseok Kim, Sunghyun Park, Joo-Young Kim, Jinwook Oh:
LightTrader: A Standalone High-Frequency Trading System with Deep Learning Inference Accelerators and Proactive Scheduler. 1017-1030 - Yiquan Chen

, Jiexiong Xu
, Chengkun Wei
, Yijing Wang, Xin Yuan, Yangming Zhang, Xulin Yu, Yi Chen, Zeke Wang, Shuibing He, Wenzhi Chen:
BM-Store: A Transparent and High-performance Local Storage Architecture for Bare-metal Clouds Enabling Large-scale Deployment. 1031-1044
Session 6B: NICs and Networks
- Hamed Seyedroudbari, Srikar Vanavasam

, Alexandros Daglis:
Turbo: SmartNIC-enabled Dynamic Load Balancing of µs-scale RPCs. 1045-1058 - Yinxiao Feng, Dong Xiang, Kaisheng Ma:

A Scalable Methodology for Designing Efficient Interconnection Network of Chiplets. 1059-1071 - Hans Kasan, John Kim:

VVQ: Virtualizing Virtual Channel for Cost-Efficient Protocol Deadlock Avoidance. 1072-1084
Session 7A: Neural Network and Accelerators 3
- Enrico Reggiani, Alessandro Pappalardo, Max Doblas

, Miquel Moretó, Mauro Olivieri, Osman Sabri Unsal, Adrián Cristal:
Mix-GEMM: An efficient HW-SW Architecture for Mixed-Precision Quantized Deep Neural Networks Inference on Edge Devices. 1085-1098 - Rishov Sarkar

, Stefan Abi-Karam
, Yuqi He, Lakshmi Sathidevi, Cong Hao:
FlowGNN: A Dataflow Architecture for Real-Time Workload-Agnostic Graph Neural Network Inference. 1099-1112 - Size Zheng, Siyuan Chen

, Peidi Song
, Renze Chen, Xiuhong Li, Shengen Yan, Dahua Lin, Jingwen Leng, Yun Liang:
Chimera: An Analytical Optimizing Framework for Effective Compute-intensive Operators Fusion. 1113-1126 - Nivedita Shrivastava, Smruti Ranjan Sarangi:

Securator: A Fast and Secure Neural Processing Unit. 1127-1139 - Shao-Fu Lin, Yi-Jung Chen, Hsiang-Yun Cheng, Chia-Lin Yang:

Tensor Movement Orchestration in Multi-GPU Training Systems. 1140-1152
Session 7B: Microarchitecture and Memory Systems
- Truls Asheim, Boris Grot, Rakesh Kumar:

A Storage-Effective BTB Organization for Servers. 1153-1167 - Haifeng Li, Ke Liu, Ting Liang, Zuojun Li, Tianyue Lu, Hui Yuan, Yinben Xia, Yungang Bao, Mingyu Chen, Yizhou Shan:

HoPP: Hardware-Software Co-Designed Page Prefetching for Disaggregated Memory. 1168-1181 - Sanyam Mehta:

Speculative Register Reclamation. 1182-1194 - Jiwon Lee

, Ju Min Lee
, Yunho Oh
, William J. Song
, Won Woo Ro:
SnakeByte: A TLB Design with Adaptive and Recursive Page Merging in GPUs. 1195-1207 - Xiaoyang Lu

, Rujia Wang, Xian-He Sun:
CARE: A Concurrency-Aware Enhanced Lightweight Cache Management Framework. 1208-1220 - Jovan Stojkovic, Namrata Mantri, Dimitrios Skarlatos, Tianyin Xu, Josep Torrellas:

Memory-Efficient Hashed Page Tables. 1221-1235
Session 7C: Applications 2 & Potpourri
- Yewen Li, Xueqi Li, Ruihao Gao, Wanqi Liu, Guangming Tan:

NvWa: Enhancing Sequence Alignment Accelerator Throughput via Hardware Scheduling. 1236-1248 - Ying Xu, Long Cheng

, Xuyi Cai, Xiaohan Ma
, Weiwei Chen, Lei Zhang, Ying Wang
:
Efficient Supernet Training Using Path Parallelism. 1249-1261 - Quan M. Nguyen, Daniel Sánchez:

Phloem: Automatic Acceleration of Irregular Applications with Fine-Grain Pipeline Parallelism. 1262-1274 - Xiangjun Peng

, Yaohua Wang, Ming-Chang Yang:
CHOPPER: A Compiler Infrastructure for Programmable Bit-serial SIMD Processing Using Memory in DRAM. 1275-1288 - Julián Pavón

, Iván Vargas Valdivieso
, Joan Marimon, Roger Figueras, Francesc Moll, Osman S. Unsal, Mateo Valero, Adrián Cristal:
VAQUERO: A Scratchpad-based Vector Accelerator for Query Processing. 1289-1302

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.


Google
Google Scholar
Semantic Scholar
Internet Archive Scholar
CiteSeerX
ORCID














