


default search action
ICS 2025: Salt Lake City, UT, USA
- Proceedings of the 39th ACM International Conference on Supercomputing, ICS 2025, Salt Lake City, UT, USA, June 8-11, 2025. ACM 2025, ISBN 979-8-4007-1537-2

Approximation
- Lorenzo Carpentieri

, Biagio Cosenza
:
SYprox: Combining Host and Device Perforation with Mixed Precision Approximation on Heterogeneous Architectures. 1-12 - Garrett Gagnon

, Srikanth Malla
, Yangwook Kang
, Liu Liu
:
BitWeaver: Read-Time Truncation in Memory. 13-25 - Wenqi Jia

, Zhewen Hu
, Youyuan Liu
, Boyuan Zhang
, Jinzhen Wang
, Jinyang Liu
, Wei Niu
, Stavros Kalafatis
, Junzhou Huang
, Sian Jin
, Daoce Wang
, Jiannan Tian
, Miao Yin
:
NeurLZ: An Online Neural Learning-based Method to Enhance Scientific Lossy Compression. 26-42 - Jiajun Huang

, Sheng Di
, Yafan Huang
, Zizhong Chen
, Franck Cappello
, Yanfei Guo
, Rajeev Thakur
:
ghZCCL: Advancing GPU-aware Collective Communications with Homomorphic Compression. 43-56
Graph Neural Networks
- Chen Zhuang

, Lingqi Zhang
, Du Wu
, Peng Chen
, Jiajun Huang
, Xin Liu
, Rio Yokota
, Nikoli Dryden
, Toshio Endo
, Satoshi Matsuoka
, Mohamed Wahib
:
Scaling Large-scale GNN Training to Thousands of Processors on CPU-based Supercomputers. 57-72 - Lixing Zhang

, Yingxia Shao
, Shigang Li
:
CoLa: Towards Communication-efficient Distributed Sparse Matrix-Matrix Multiplication on GPUs. 73-87 - Yan Wang

, Qinghua Guo
, Haoran Kong
, Kai Sheng
, Zhen Xie
, Hao Chen
, Weile Jia
, Dingwen Tao
, Xin He
:
Cherry: Breaking the GPU Memory Wall for Large-Scale GNN Training via Micro-Batching. 88-103 - Zitong Li

, Aparna Chandramowlishwaran
:
Fused3S: Fast Sparse Attention on Tensor Cores. 104-118
Sparse Linear Algebra
- Hao Luo

, Qianchao Zhu
, Xiaochen Hao
, Chunxi Lei
, Chengdi Ma
, Chenchen Zhang
, Yun Liang
, Chao Yang
:
StructILU: Dependency-Preserving Incomplete LU with Hierarchical Parallelism for Structured Grid PDEs on GPUs. 119-134 - Jixiao Deng

, Qinglin Wang
, Lin Chen
, Tun Li
, Bo Yang
, Xinhai Chen
, Jie Liu
:
IA-Chol: Input-Aware Cholesky Decomposition on CPU and GPU. 135-148 - Xing Cong

, FuKai Sun
, YiFan Chen
, Chenhao Xie
, Yi Liu
, Depei Qian
:
CB-SpMV: A Data Aggregating and Balance Algorithm for for Cache-Friendly Block-Based SpMV on GPUs. 149-160 - Qi Wang

, Yaobin Wang
, Yi Luo
, Rong Luo
, Pingping Tang
:
HR-SpMM: Adaptive Row Partitioning and Hybrid Kernel Design for Sparse Matrix Multiplication. 161-172
Acceleration
- Yeejoo Han

, Sunwoo Kim
, Seongyeon Park
, Jinho Lee
:
G^3SA: A GPU-Accelerated Gold Standard Genomics Library for End-to-End Sequence Alignment. 173-188 - Zhengang Li

, Hongwu Peng
, Xuan Shen
, Masoud Zabihi
, Xi Xie
, Geng Yuan
, Yanzhi Wang
, Olivia Chen
, Caiwen Ding
:
Graph Convolutional Network Acceleration Using Adiabatic Superconductor Josephson Devices. 189-204 - Jiexiong Guan

, Zhenqing Hu
, Christos D. Antonopoulos
, Nikolaos Bellas
, Spyros Lalis
, Evgenia Smirni
, Gang Zhou
, Gagan Agrawal
, Bin Ren
:
TMModel: Modeling Texture Memory and Mobile GPU Performance to Accelerate DNN Computations. 205-220 - Yuebo Luo

, Shiyang Li
, Junran Tao
, Kiran Gautam Thorat
, Xi Xie
, Hongwu Peng
, Nuo Xu
, Caiwen Ding
, Shaoyi Huang
:
DR-CircuitGNN: Training Acceleration of Heterogeneous Circuit Graph Neural Network on GPUs. 221-235
Applications
- Victor Kamel

, Hanxueyu Yan
, Sean Chester
:
CLOVER: A GPU-native, Spatio-graph-based Approach to Exact kNN. 236-249 - Shanghao Liu

, Hailong Yang
, Xin You
, Zhongzhi Luan
, Yi Liu
, Depei Qian
:
Efficient Locality-aware Instruction Stream Scheduling for Stencil Computation on ARM Processors. 250-264 - Siqi Wang

, Hailong Yang
, Pengbo Wang
, Shaokang Du
, Yufan Xu
, Qingxiao Sun
, Xiaoyan Liu
, Xuezhu Wang
, Xuning Liang
, Zhongzhi Luan
, Yi Liu
, Depei Qian
:
Accelerating Complex Stencil Computations with Adaptive Fusion Strategy. 265-278 - Shuo Xin

, Haiyu Wang
, Sai Qian Zhang
:
A3FR: Agile 3D Gaussian Splatting with Incremental Gaze Tracked Foveated Rendering in Virtual Reality. 279-292 - Ricardo Nobre

, Miguel Graça
, Leonel Sousa
, Aleksandar Ilic
:
EPIClear: Exploiting Domain-Specific Features for Epistasis Detection Acceleration on Tensor Cores. 293-307 - Max Heldman

, Johann Rudi
, Julie Bessac
:
Statistical Treatment of Variable MPI Latencies and MPI-Communication Hiding for Matrix-Free Finite Element Operators. 308-323
GPU Scheduling
- Zizhao Mo

, Huanle Xu
, Wing Cheong Lau
:
Fast and Fair Training for Deep Learning in Heterogeneous GPU Clusters. 324-338 - Seok Namkoong

, Taehyeong Park
, Kiung Jung
, Jinyoung Kim
, Yongjun Park
:
SortingHat: System Topology-aware Scheduling of Deep Neural Network Models on Multi-GPU Systems. 339-354 - Zhuotong Li

, Liang Xu
, Ziqi Huang
, Shuyun Qian
, Hongwei Bu
, Ming Yang
, Mengyun Luan
, Weiguo Chen
, Xu Wen
:
CTCCL: Cost-Efficient Joint Device-Network Load Balancing for LLM Training in RoCE-based Intelligent Computing Network. 355-367 - Runsheng Benson Guo

, Utkarsh Anand
, Arthur Chen
, Khuzaima Daudjee
:
Cephalo: Harnessing Heterogeneous GPU Clusters for Training Transformer Models. 368-383 - Ilyas Turimbetov

, Mohamed Wahib
, Didem Unat
:
A Device-Side Execution Model for Multi-GPU Task Graphs. 384-396
Solvers & Sparsity
- Fan Yuan

, Xiaojian Yang
, Yunqing Huang
, Dezun Dong
, Chuanfu Xu
, Jie Liu
, Xiaoqiang Yue
, Shengguo Li
, Hongxia Wang
:
CRAMG: A Communication-Reduced Algebraic Multigrid Method. 397-411 - Yongxiao Zhou

, Yi Zong
, Yuyang Jin
, Heng Li
, Wei Xue
:
An Efficient 2D Fusion Method for High-Performance Two-Stage Eigensolvers on Modern Heterogeneous Architectures. 412-425 - Chaewon Kim

, Jaehwan Lee
, Jinpyo Kim
, Dohyun Kim
, Kyusu Ahn
, Hyung Uk Cho
, Seungin Baek
, Jaejin Lee
:
SnuSOLVER: Optimizing Sparse Direct Solvers for Heterogeneous Systems. 426-441 - Jordi Wolfson-Pou

, Jan Laukemann
, Fabrizio Petrini
:
MAGNUS: Generating Data Locality to Accelerate Sparse Matrix-Matrix Multiplication on CPUs. 442-457
Processing-in-Memory
- Inyong Hwang

, Donghyeon Kim
, Seokwon Kang
, Taehyeong Park
, Taehoon Kim
, Jiwon Seo
, Hanjun Kim
, Youngsok Kim
, Yongjun Park
:
PIM-CARE: A Compiler-Assisted Dynamic Resource Allocation Framework for Real-world DRAM PIM. 458-472 - Geraldo Francisco de Oliveira Junior

, Mayank Kabra
, Yuxin Guo
, Kangqi Chen
, Abdullah Giray Yaglikçi
, Melina Soysal
, Mohammad Sadrosadati
, Joaquín Olivares Bueno
, Saugata Ghose
, Juan Gómez-Luna
, Onur Mutlu
:
Proteus: Achieving High-Performance Processing-Using-DRAM with Dynamic Bit-Precision, Adaptive Data Representation, and Flexible Arithmetic. 473-494 - Taewoon Kang

, Geonwoo Choi
, Taeweon Suh
, Gunjae Koo
:
SparsePIM: An Efficient HBM-Based PIM Architecture for Sparse Matrix-Vector Multiplications. 495-512 - Melina Soysal

, Konstantina Koliogeorgi
, Can Firtina
, Nika Mansouri-Ghiasi
, Rakesh Nadig
, Haiyu Mao
, Geraldo Francisco de Oliveira Junior
, Yu Liang
, Klea Zambaku
, Mohammad Sadrosadati
, Onur Mutlu
:
MARS: Processing-In-Memory Acceleration of Raw Signal Genome Analysis Inside the Storage Subsystem. 513-534
Efficiency
- Aoyang Tong

, Yu Hua
, Menglei Chen
:
DALdex: A DPU-Accelerated Persistent Learned Index via Incremental Learning. 535-549 - Mingtian Shao

, Ruibo Wang
, Wenzhe Zhang
, Kai Lu
, Yiqin Dai
, Huijun Wu
:
From Islands to Archipelago: Towards Collaborative and Adaptive Burst Buffer for HPC Systems. 550-563 - Yunmo Zhang

, Jiacheng Huang
, Xizhe Yin
, Junqiao Qiu
, Hong Xu
, Chun Jason Xue
:
PIE: Enabling Fast and Scalable Incremental Evolving Graph Analytics on Persistent Memory. 564-579 - Safdar Jamil

, Awais Khan
, Xubin He
, Youngjae Kim
:
DEDUPKV: A Space-Efficient and High-Performance Key-Value Store via Fine-Grained Deduplication. 580-595
Optimizing Compilation
- Quazi Ishtiaque Mahmud

, Ali TehraniJamsaz
, Nesreen K. Ahmed
, Theodore L. Willke
, Ali Jannesari
:
ConTraPh: Contrastive Learning for Parallelization and Performance Optimization. 596-610 - Shilpa Babalad

, Shirish K. Shevade
, Matthew Jacob Thazhuthaveetil
, R. Govindarajan
:
UJOpt: Heuristic Approach for Applying Unroll-and-Jam Optimization and Loop Order Selection. 611-624 - Mohammad Mahdi Salehi Dezfuli

, Kazem Cheshmi
:
Loop Fusion in Matrix Multiplications with Sparse Dependence. 625-639 - Jiamin Lu

, Jingwei Sun
, Yunlong Xu
, Peng Sun
, Guangzhong Sun
:
ConCo: Optimizing Compilation of Concurrent Tensor Programs on Shared GPU. 640-653
Best Papers
- Boyuan Zhang

, Yafan Huang
, Sheng Di
, Fengguang Song
, Guanpeng Li
, Franck Cappello
:
Pushing the Limits of GPU Lossy Compression: A Hierarchical Delta Approach. 654-669 - Zijin Wan

, Xiaojun Dong
, Letong Wang
, Enzuo Zhu
, Yan Gu
, Yihan Sun
:
Parallel Contraction Hierarchies Can Be Efficient and Scalable. 670-688 - Boyuan Zhang

, Bo Fang
, Fanjiang Ye
, Luanzheng Guo
, Fengguang Song
, Nathan R. Tallent
, Dingwen Tao
:
BMQSim: Overcoming Memory Constraints in Quantum Circuit Simulation with a High-Fidelity Compression Framework. 689-704 - Dimitrios Galanopoulos

, Panagiotis Mpakos
, Petros Anastasiadis
, Nectarios Koziris
, Georgios I. Goumas
:
DIV: An Index & Value compression method for SpMV on large matrices. 705-717 - Marco Minutoli

, Reece Neff
, Naw Safrin Sattar
, Hao Lu
, John Feo
, Henning S. Mortveit
, Anil Vullikanti
, Dawen Xie
, Mandy L. Wilson
, Gregor von Laszewski
, Parantapa Bhattacharya
, S. M. Ferdous
, Ananth Kalyanaraman
, Michela Becchi
, Madhav V. Marathe
, Mahantesh Halappanavar
:
DIMPLES: Distributed Influence Maximization for Pandemic pLanning on Exascale Systems. 718-733 - Jiazhi Mi

, Li Chen
, Haoyu Wang
, Ruixiang Gao
, Hongze Zhang
, Ronghong Shen
, Kai Lin
, You Fu
, Huimin Cui
:
Light-FP: Analyze Floating-Point Error in a Highly Condensed Approach. 734-748
Performance Analysis
- Izzet Yildirim

, Hariharan Devarajan
, Anthony Kougkas
, Xian-He Sun
, Kathryn M. Mohror
:
WisIO: Automated I/O Bottleneck Detection with Multi-Perspective Views for HPC Workflows. 749-763 - Pablo Abad

, Pablo Prieto
, Valentin Puente
, José-Ángel Gregorio
:
Efficient Server Consolidation through a balanced mix of Transformer-based and Conventional Applications. 764-775 - Joshua Hoke Davis

, Pranav Sivaraman
, Joy Kitson
, Konstantinos Parasyris
, Harshitha Menon
, Isaac Minn
, Giorgis Georgakoudis
, Abhinav Bhatele
:
Taking GPU Programming Models to Task for Performance Portability. 776-791 - Dragana Grbic

, John M. Mellor-Crummey
:
Analyzing the Performance of Applications at Exascale. 792-806
Heterogeneity
- Arjun Kashyap

, Yuke Li
, Darren Ng
, Xiaoyi Lu
:
Understanding the Idiosyncrasies of Emerging BlueField DPUs. 807-821 - Ahmedur Rahman Shovon

, Yihao Sun
, Kristopher K. Micinski
, Thomas Gilray
, Sidharth Kumar
:
Multi-Node Multi-GPU Datalog. 822-836 - Anqi Guo

, Yuchen Hao
, Xiteng Yao
, Shining Yang
, Jianyu Huang
, Tony Tong Geng
, Martin C. Herbordt
:
SmartNIC-GPU-CPU Heterogeneous System for Large Machine Learning Model with Software-Hardware Codesign. 837-852 - Maxime Gonthier

, Dante D. Sánchez-Gallegos
, Haochen Pan
, Bogdan Nicolae
, Sicheng Zhou
, Hai Duc Nguyen
, Valérie Hayot-Sasson
, J. Gregory Pauloski, Jesús Carretero
, Kyle Chard
, Ian T. Foster
:
D-Rex: Heterogeneity-Aware Reliability Framework and Adaptive Algorithms for Distributed Storage. 853-867
Resource Management
- Zhixin Tong

, Jiuchen Shi
, Quan Chen
, Pu Pang
, Shixuan Sun
, Jie Meng
, Jiang Liu
, En Shao
, Minyi Guo
:
ORION: Optimizing OLAP Query Execution with Proactive Caching and Separate Operators. 868-883 - Hongyi Liu

, Yinping Ma
, Xiaosong Huang
, Lingzhe Zhang
, Tong Jia
, Ying Li
:
ORA: Job Runtime Prediction for High-Performance Computing Platforms Using the Online Retrieval-Augmented Language Model. 884-894 - Fanrong Du

, Jiuchen Shi
, Quan Chen
, Pu Pang
, Li Li
, Minyi Guo
:
Generating Microservice Graphs with Production Characteristics for Efficient Resource Scaling. 895-910 - Ismet Dagli

, Justin Davis
, Mehmet Esat Belviranli
:
HARNESS: Holistic Resource Management for Diversely Scaled Edge Cloud Systems. 911-927
Code Optimization
- Chenchen Zhang

, Hao Luo
, Chao Yang
:
Leonid: Exploring Automated Kernel Fusion in Performance-Portable Programming Models for Scientific Computation. 928-942 - Tianming Cui

, Pen-Chung Yew
, Stephen McCamant
, Antonia Zhai
:
DeCOS: Data-Efficient Reinforcement Learning for Compiler Optimization Selection Ignited by LLM. 943-958 - Djamel Rassem Lamouri

, Iheb Nassim Aouadj
, Smail Kourta
, Riyadh Baghdadi
:
Pearl: Automatic Code Optimization Using Deep Reinforcement Learning. 959-974 - Xiaoyu Hao

, Sen Zhang
, Liang Qiao
, Qingcai Jiang
, Jun Shi
, Junshi Chen
, Hong An
, Xulong Tang
, Hao Shu
, Honghui Yuan
:
CIExplorer: Microarchitecture-Aware Exploration for Tightly Integrated Custom Instruction. 975-990
Energy & Servers
- Anna Yue

, Pen-Chung Yew
, Sanyam Mehta
:
EVeREST-C: An Effective and Versatile Runtime Energy Saving Tool for CPUs. 991-1004 - Siyuan Shen

, Mikhail Khalilov
, Lukas Gianinazzi
, Timo Schneider
, Marcin Chrapek
, Jai Dayal
, Manisha Gajbe
, Robert W. Wisniewski
, Torsten Hoefler
:
EDAN: Towards Understanding Memory Parallelism and Latency Sensitivity in HPC. 1005-1019 - Hao Zhang

, Haibo Zhang
, Chengpeng Xia
, Zhiyi Huang
, Yawen Chen
, Amanda Barnard
:
ROCKET: An RNS-based Photonic Accelerator for High-Precision and Energy-Efficient DNN Training. 1020-1033 - Tapasya Patki

, Barry Rountree
, Torsten Wilde
, Andrea Bartolini
, Stephanie Brink
, Esa Heiskanen
, Sachin Idgunji
, Matthias Maiterth
, James H. Rogers
, Ermal Rrapaj
, Ralf Schneider
, Woong Shin
, Kathleen Shoga
, Christian Simmendinger
, Nicholas J. Wright
, Zhengji Zhao
:
A Global Perspective on Supercomputer Power Provisioning: Case Studies from United States and Europe. 1034-1051
Potpourri
- Peirui Cao

, Rui Ning
, Hongwei Yang
, Zhaochen Zhang
, Chang Liu
, Rui Li
, Yongqi Yang
, Yunzhuo Liu
, Chengyuan Huang
, Tao Sun
, Xiaodong Duan
, Guihai Chen
, Chen Tian
:
PortFC: Designing High-performance Deadlock-free BCube Networks. 1052-1063 - Ali Suvizi

, Guru Venkataramani
:
Auto-Healer: Self-Healing Hardware for Perception Stage Faults in Autonomous Driving Systems. 1064-1078 - Tirthak Patel

, Aditya Ranjan
, Daniel Silver
, Harshitta Gandhi
, William Cutler
, Devesh Tiwari
:
OpaQue: Program Output Obfuscation for Quantum Software Circuits in Quantum Clouds. 1079-1091 - Yang Su

, Sheng Li
, Huilong Jiang
, Haofei Yin
, Rongliang Fu
, Junying Huang
, Xiaochun Ye
, Zhimin Zhang
, Jie Ren
, Xiaoping Gao
, Tsung-Yi Ho
, Dongrui Fan
:
JBSA: A Bit-Serial Accelerator for Deep Neural Networks Using Superconducting SFQ Logic. 1092-1105
Graph Algorith
- Xinbiao Gan

, Tiejun Li
, Chunye Gong
, Jie Liu
, Kai Lu:
YH-Light: Yielding Hierarchy-aware Partitioner for Large-scale Graph Processing. 1106-1116 - Shuai Yang

, Changyou Zhang
:
MG-αGCD: Accelerating Graph Community Detection on Multi-GPU Platforms. 1117-1130 - Sakib Fuad

, Amir Hossein Nodehi Sabet
, Umar Farooq
, Zhijia Zhao
:
GraCFL: A Holistically Designed Vertex-Centric Graph System for CFL Reachability. 1131-1145 - Leo Gold

, Adam Bienkowski
, David Sidoti
, Krishna R. Pattipati
, Omer Khan
:
OPMOS: Ordered Parallel Algorithm for Multi-Objective Shortest-Paths. 1146-1161 - Anju Mongandampulath Akathoott

, Benila Virgin Jerald Xavier
, Martin Burtscher
:
A Multi-GPU Algorithm for Computing Maximal Independent Sets in Large Graphs. 1162-1175
Memory Systems
- Kevin Weston

, Vahid Janfaza
, Avery Johnson
, Abdullah Muzahid
:
A Cost-Effective Dueling Framework for Set-Associative Cache Indexing. 1176-1189 - Nurlan Nazaraliyev

, Elaheh Sadredini
, Nael B. Abu-Ghazaleh
:
DREAM: Device-Driven Efficient Access to Virtual Memory. 1190-1205 - Archit Patke

, Christian Pinto
, Saurabh Jha
, Haoran Qiu
, Zbigniew Kalbarczyk
, Ravishankar K. Iyer
:
Page Migration for Hardware Memory Disaggregation Across a Network. 1206-1218 - Neethu Bal Mallya

, Bhavishya Goel
, Ioannis Sourdis
:
MEMPLEX: A Memory System with Replication and Migration of Data for Multi-Chiplet NUMA Architectures. 1219-1233 - Derrick Greenspan

, Naveed Ul Mustafa
, Jongouk Choi
, Mark Heinrich
, Yan Solihin
:
Persistent Memory Objects on the Cheap. 1234-1249

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.


Google
Google Scholar
Semantic Scholar
Internet Archive Scholar
CiteSeerX
ORCID














