


default search action
ICS 2024: Kyoto, Japan
- Kenji Kise, Valentina Salapura, Murali Annavaram, Ana Lucia Varbanescu:

Proceedings of the 38th ACM International Conference on Supercomputing, ICS 2024, Kyoto, Japan, June 4-7, 2024. ACM 2024
Session 2: Best Paper Nominees
- Yelai Feng

, Huaixi Wang
, Yining Zhu
, Xiandong Liu
, Hongyi Lu
, Qing Liu
:
DAWN: Matrix Operation-Optimized Algorithm for Shortest Paths Problem on Unweighted Graphs. 1-13 - Durga Keerthi Mandarapu

, Vani Nagarajan
, Artem Pelenitsyn
, Milind Kulkarni
:
Arkade: k-Nearest Neighbor Search With Non-Euclidean Distances using GPU Ray Tracing. 14-25 - Bennett Cooper

, Thomas R. W. Scogland
, Rong Ge
:
Shared Virtual Memory: Its Design and Performance Implications for Diverse Applications. 26-37 - Reece Neff

, Mostafa Eghbali Zarch
, Marco Minutoli
, Mahantesh Halappanavar
, Antonino Tumeo
, Ananth Kalyanaraman
, Michela Becchi
:
FuseIM: Fusing Probabilistic Traversals for Influence Maximization on Exascale Systems. 38-49 - Juhyeon Lee, Insung Bahk, Hoseung Kim, Sinjin Jeong, Suyeon Lee, Donghyun Min:

An Autonomous Parallelization of Transformer Model Inference on Heterogeneous Edge Devices. 50-61
Session 3A: Memory and Storage Systems
- Chengtao Lai

, Zhongchun Zhou
, Akash Poptani
, Wei Zhang
:
LCM: LLM-focused Hybrid SPM-cache Architecture with Cache Management for Multi-Core AI Accelerators. 62-73 - Qi Shao

, Angelos Arelakis
, Per Stenström
:
HMComp: Extending Near-Memory Capacity using Compression in Hybrid Memory. 74-84 - Raveendra Soori, Shreyas Prabhu, Harpreet Singh Chawla, Michael Ferdman:

NUCAlloc: Fine-Grained Block Placement in Hashed Last-Level NUCA Caches. 85-97 - Francesc Martínez Palau

, Martí Torrents
, Adrià Armejach
, Marc Casas
:
Exploiting Vector Code Semantics for Efficient Data Cache Prefetching. 98-109
Session 3B: Emerging supercomputing applications
- Du Wu

, Peng Chen
, Xiao Wang
, Isaac Lyngaas
, Takaaki Miyajima
, Toshio Endo
, Satoshi Matsuoka
, Mohamed Wahib
:
Real-time High-resolution X-Ray Computed Tomography. 110-123 - Liang Geng

, Rubao Lee
, Xiaodong Zhang
:
RayJoin: Fast and Precise Spatial Join. 124-136 - Xiao Fu

, Weiling Yang
, Dezun Dong
, Xing Su
:
Optimizing Attention by Exploiting Data Reuse on ARM Multi-core CPUs. 137-149 - Hans Vandierendonck

:
Differentiating Set Intersections in Maximal Clique Enumeration by Function and Subproblem Size. 150-163
Session 5A: Reliability, dependability and availability
- Soheil Khadirsharbiyani

, Movahhed Sadeghi
, Mostafa Eghbali Zarch
, Mahmut Taylan Kandemir
:
Minimizing Coherence Errors via Dynamic Decoupling. 164-175 - Jianping Zeng

, Shao-Yu Huang
, Jiuyang Liu
, Changhee Jung
:
Soft Error Resilience at Near-Zero Cost. 176-187 - Vladyslav Oles

, Anna Schmedding
, George Ostrouchov
, Woong Shin
, Evgenia Smirni
, Christian Engelmann
:
Understanding GPU Memory Corruption at Extreme Scale: The Summit Case Study. 188-200 - Dolores Miao

, Ignacio Laguna
, Cindy Rubio-González
:
Input Range Generation for Compiler-Induced Numerical Inconsistencies. 201-212
Session 5B: Heterogeneous software: GPUs and domain specific accelerators
- Andreas Plesner

, Hans Henrik Brandenborg Sørensen
, Søren Hauberg
:
Accurate Computation of the Logarithm of Modified Bessel Functions on GPUs. 213-224 - Benjamin Brock

, Aydin Buluç
, Katherine A. Yelick
:
RDMA-Based Algorithms for Sparse Matrix Multiplication on GPUs. 225-235 - Benjamin Brock

, Robert Cohn
, Suyash Bakshi
, Tuomas Karna
, Jeongnim Kim
, Mateusz Nowak
, Lukasz Slusarczyk, Kacper Stefanski
, Timothy G. Mattson
:
Distributed Ranges: A Model for Distributed Data Structures, Algorithms, and Views. 236-246 - Wenxuan Zhao

, Liang Yuan
, Baicheng Yan
, Penghao Ma
, Yunquan Zhang
, Long Wang
, Zhe Wang
:
Stencil Computation with Vector Outer Product. 247-258
Session 6A: Cloud and ML Systems Efficiency
- Wei Gao

, Weiming Zhuang
, Minghao Li
, Peng Sun
, Yonggang Wen
, Tianwei Zhang
:
Ymir: A Scheduler for Foundation Model Fine-tuning Workloads in Datacenters. 259-271 - Franz Kevin Stehle

, Wainer Vandelli
, Felix Zahn
, Giuseppe Avolio
, Holger Fröning
:
DeepHYDRA: A Hybrid Deep Learning and DBSCAN-Based Approach to Time-Series Anomaly Detection in Dynamically-Configured Systems. 272-285 - Quentin R. Petit

, Chong Li
, Nahid Emad
:
An Efficient and Scalable Approach to Build Co-occurrence Matrix for DNN's Embedding Layer. 286-297 - Justin McGowen

, Ismet Dagli
, Neil T. Dantam
, Mehmet E. Belviranli
:
Scheduling for Cyber-Physical Systems with Heterogeneous Processing Units under Real-World Constraints. 298-311
Session 6B: Accelerator Designs
- Raúl Taranco

, José-María Arnau, Antonio González
:
SLIDEX: A Novel Architecture for Sliding Window Processing. 312-323 - Zhengang Li

, Alec Lu
, Yanyue Xie
, Zhenglun Kong
, Mengshu Sun
, Hao Tang
, Zhong Jia Xue
, Peiyan Dong
, Caiwen Ding
, Yanzhi Wang
, Xue Lin
, Zhenman Fang
:
Quasar-ViT: Hardware-Oriented Quantization-Aware Architecture Search for Vision Transformers. 324-337 - Sungmin Yun

, Hwayong Nam
, Kwanhee Kyung
, Jaehyun Park
, Byeongho Kim
, Yongsuk Kwon
, Eojin Lee
, Jung Ho Ahn
:
CLAY: CXL-based Scalable NDP Architecture Accelerating Embedding Layers. 338-351 - Xianbin Li

, Yinyi Liu
, Fan Jiang
, Chengeng Li
, Yuxiang Fu
, Wei Zhang
, Jiang Xu
:
NEOCNN: NTT-Enabled Optical Convolution Neural Network Accelerator. 352-362
Session 8A: Supercomputing Software and Security
- Stepan Vanecek

, Martin Schulz
:
sys-sage: A Unified Representation of Dynamic Topologies & Attributes on HPC Systems. 363-375 - Yubo Du

, Yanan Guo
, Youtao Zhang
, Jun Yang
:
RTT-UAF: Reuse Time Tracking for Use-After-Free Detection. 376-387 - Shilpa Babalad

, Shirish K. Shevade
, Matthew Jacob Thazhuthaveetil
, R. Govindarajan
:
Tile Size and Loop Order Selection using Machine Learning for Multi-/Many-Core Architectures. 388-399 - Alexandre Chen

, Brittany A. Erickson
, Jeremy E. Kozdon
, Jee Choi
:
Matrix-free SBP-SAT finite difference methods and the multigrid preconditioner on GPUs. 400-412
Session 8B: Interconnects and Networks
- Pouya Haghi, Cheng Tan, Anqi Guo, Chunshu Wu, Dongfang Liu, Ang Li, Anthony Skjellum, Tong Geng, Martin C. Herbordt:

SmartFuse: Reconfigurable Smart Switches to Accelerate Fused Collectives in HPC Applications. 413-425 - Mert Hidayetoglu

, Simon Garcia De Gonzalo
, Elliott Slaughter
, Yu Li
, Christopher Zimmer
, Tekin Bicer
, Bin Ren
, William Gropp
, Wen-Mei Hwu
, Alex Aiken
:
CommBench: Micro-Benchmarking Hierarchical Networks with Multi-GPU, Multi-NIC Nodes. 426-436 - Jiajun Huang

, Sheng Di
, Xiaodong Yu
, Yujia Zhai
, Jinyang Liu
, Yafan Huang
, Ken Raffenetti
, Hui Zhou
, Kai Zhao
, Xiaoyi Lu
, Zizhong Chen
, Franck Cappello
, Yanfei Guo
, Rajeev Thakur
:
gZCCL: Compression-Accelerated Collective Communication Framework for GPU Clusters. 437-448 - Ram Sharan Chaulagain

, Xin Yuan
:
Enhanced UGAL Routing Schemes for Dragonfly Networks. 449-459
Session 9A: Machine learning systems
- Mingyi Li

, Junmin Xiao
, Kewei Zhang
, Zhiheng Lin
, Chaoyang Shui
, Ke Meng
, Zehua Wang
, Yunfei Pang
, Guangming Tan
:
A Coordinated Strategy for GNN Combining Computational Graph and Operator Optimizations. 460-472 - Wei Gao, Xu Zhang, Shan Huang, Shangwei Guo, Peng Sun, Yonggang Wen

, Tianwei Zhang:
AutoSched: An Adaptive Self-configured Framework for Scheduling Deep Learning Training Workloads. 473-484 - Baorun Mu

, Christina Giannoula
, Shang Wang
, Gennady Pekhimenko
:
Sylva: Sparse Embedded Adapters via Hierarchical Approximate Second-Order Information. 485-497 - Hanxian Huang

, Xin Chen
, Jishen Zhao
:
Fasor: A Fast Tensor Program Optimization Framework for Efficient DNN Deployment. 498-510
Session 9B: Software Design for Accelerators
- Keren Zhou

, Karthik Ganapathi Subramanian
, Po-Hsun Lin
, Matthias Fey
, Binqian Yin
, Jiajia Li
:
FASTEN: Fast GPU-accelerated Segmented Matrix Multiplication for Heterogenous Graph Neural Networks. 511-524 - Mohammad Kefah Taha Issa

, Muhammad Aditya Sasongko
, Ilyas Turimbetov
, Javid Baydamirli
, Dogan Sagbili
, Didem Unat
:
Snoopie: A Multi-GPU Communication Profiler and Visualizer. 525-536 - Yifei Li

, Bole Zhou
, Jiejing Zhang
, Xuechao Wei
, Yinghan Li
, Yingda Chen
:
RadiK: Scalable and Optimized GPU-Parallel Radix Top-K Selection. 537-548 - Chendi Li

, Yufan Xu
, Sina Mahdipour Saravani
, Ponnuswamy Sadayappan
:
Accelerated Auto-Tuning of GPU Kernels for Tensor Computations. 549-561

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.


Google
Google Scholar
Semantic Scholar
Internet Archive Scholar
CiteSeerX
ORCID














