


default search action
ISPASS 2025: Ghent, Belgium
- IEEE International Symposium on Performance Analysis of Systems and Software, ISPASS 2025, Ghent, Belgium, May 11-13, 2025. IEEE 2025, ISBN 979-8-3315-0294-2 
- Nebil Ozer, Gregory Kollmer, Ramyad Hadidi, Bahar Asgari: 
 La Superba: Leveraging a Self-Comparison Method to Understand the Performance Benefits of Sparse Acceleration Optimizations. 1-12
- Yang Yang, Mohammad Sonji, Adwait Jog  : :
 Dissecting Performance Overheads of Confidential Computing on GPU-based Systems. 1-16
- Thomas Rauber, Gudula Rünger: 
 Evaluation and Comparison of the Energy Efficiency of Several Intel Multicore Processors. 1-3
- Iris Uwizeyimana, Natalie Enright Jerger: 
 Carbon-Aware Server Replacement. 1-3
- Tanvi Sharma, Indranil Chakraborty, Mustafa Fayez Ali, Kaushik Roy: 
 Evaluating Compute in Memory Architectures for Matrix Multiplication: A Dataflow-Centric Perspective. 1-3
- Carlos Agulló-Domingo, Óscar Vera-López, Seyda Guzelhan, Lohit Daksha, Aymane El Jerari, Kaustubh Shivdikar, Rashmi S. Agrawal, David R. Kaeli, Ajay Joshi, José L. Abellán: 
 FIDESlib: A Fully-Fledged Open-Source FHE Library for Efficient CKKS on GPUs. 1-3
- Fareed Qararyah  , Mohammad Ali Maleki, Pedro Trancoso: , Mohammad Ali Maleki, Pedro Trancoso:
 An Analytical Cost Model for Fast Evaluation of Multiple Compute-Engine CNN Accelerators. 1-13
- Rachid Karami, Sheng-Chun Kao, Hyoukjun Kwon: 
 Understanding the Performance Horizon of the Latest ML Workloads with NonGEMM Workloads. 1-14
- Jaeyoung Kang, Qirong Xia, Ipoom Jeong, Yongjoo Park, Nam Sung Kim: 
 Intel ® in-Memory Analytics Accelerator: Performance Characterization and Guidelines. 1-13
- Chenji Han, Huai Xu, Guangyao Guo, Yuxuan Wu, Fuxin Zhang: 
 MeMo: Enhancing Representative Sampling via Mechanistic Micro-Model Signatures. 1-13
- Jamin Seo, Jianming Tong, Tushar Krishna, Hyoukjun Kwon: 
 Exploring Constrained Dataflow Accelerators for Real-Time Multi-Task Multi-Model Ml Workloads. 1-11
- Anirudha Agrawal, Shaizeen Aga, Suchita Pati, Mahzabeen Islam: 
 ConCCL: Optimizing ML Concurrent Computation and Communication with GPU DMA Engines. 1-11
- Junsoo Kim, Hunjong Lee, Geonwoo Ko, Gyubin Choi, Seri Ham, Seongmin Hong, Joo-Young Kim: 
 ADOR: A Design Exploration Framework for LLM Serving with Enhanced Latency and Throughput. 15-25
- Zishen Wan, Jiayi Qian, Yuhang Du, Jason Jabbour, Yilun Du, Yang Zhao, Arijit Raychowdhury, Tushar Krishna, Vijay Janapa Reddi: 
 Generative AI in Embodied Systems: System-Level Analysis of Performance, Efficiency and Scalability. 26-37
- Prabhu Vellaisamy, Thomas Labonte, Sourav Chakraborty, Matt Turner, Samantika Sury, John Paul Shen: 
 Characterizing and Optimizing LLM Inference Workloads on CPU-GPU Coupled Architectures. 49-61
- Eunsoo Jung, Eunbi Jeong, Gunjae Koo, Yunho Oh, Myung Kuk Yoon: 
 Hierarchical Traversal Stack Design Using Shared Memory for GPU Ray Tracing. 62-72
- Fangjia Shen, Aaron Barnes, Anusuya Nallathambi, Timothy G. Rogers: 
 RayFlex: An Open-Source RTL Implementation of the Hardware Ray Tracer Datapath. 73-84
- Varsha Singhania, Shaizeen Aga, Mohamed Assem Ibrahim: 
 FinGraV: Methodology for Fine-Grain GPU Power Visibility and Insights. 96-107
- Matin Raayai Ardakani, Andrew Nguyen, Ivan Rosales, Daoxuan Xu, Yuwei Sun, Yifan Sun, David Kaeli, Norman Rubin: 
 Luthier: A Dynamic Binary Instrumentation Framework Targeting AMD GPUs. 137-149
- Kaustubh Manohar Mhatre, Venkata Guru Prashanth Mulleti, Curt John Bansil, Endri Taka, Aman Arora: 
 Performance Analysis of GEMM Workloads on the AMD Versal Platform. 150-161
- Jaeheon Lee, Juhyung Park, Seonggyun Oh, Jinhyung Koo, Sungjin Lee: 
 Beyond the Numbers: Measuring Android Performance Through User Perception. 162-173
- Mansi Choudhary, Chris Kjellqvist, Jiaao Ma, Lisa Wu Wills: 
 COCOSSim: A Cycle-Accurate Simulator for Heterogeneous Systolic Array Architectures. 174-185
- Ritik Raj, Sarbartha Banerjee, Nikhil Chandra, Zishen Wan, Jianming Tong, Ananda Samajdhar, Tushar Krishna: 
 SCALE-Sim V3: a Modular Cycle-Accurate Systolic Accelerator Simulator for End-To-End System Analysis. 186-200
- Kaifeng Xu, Georgios Tziantzioulis, David Wentzlaff: 
 Evaluation of MindPalace for Chip Design Tradeoffs on Function-as-a-Service. 201-212
- Steven van der Vlugt, Leon C. Oostrum, Gijs Schoonderbeek, Ben van Werkhoven, Bram Veenboer, Krijn Doekemeijer, John W. Romein: 
 PowerSensor3: A Fast and Accurate Open Source Power Measurement Tool. 213-226
- Saichand Samudrala, Sushant Kondguli, Paul Gratz  : :
 Benchmarking 3D Gaussian Splatting Rendering. 227-238
- Yongju Lee, Jaewon Kwon  , Cheolhwan Kim, Enhyeok Jang, Jiwon Lee, Hyunwuk Lee, Won Woo Ro: , Cheolhwan Kim, Enhyeok Jang, Jiwon Lee, Hyunwuk Lee, Won Woo Ro:
 COSMOS: An LLC Contention Slowdown Model for Heterogeneous Multi-Core Systems. 264-275
- Lieven Eeckhout: 
 Use Equal-Work or Equal-Time Speedup, Not Geomean Speedup. 276-285
- Noushin Azami, Martin Burtscher  : :
 Identifying Important Data Transformations for Synthesizing Effective Lossless Compressors. 286-296
- Chris Kjellqvist, Brendan Peercy, Alvin R. Lebeck, Lisa Wu Wills: 
 Beethoven: A Heterogeneous Multi-Core Accelerator System Composer. 297-308
- Panteleimonas Chatzimiltis, Georgia Antoniou, Haris Volos, Yiannakis Sazeides: 
 SAGA: A Surrogate Assisted Genetic Algorithm for Fast CPU Power Virus Generation. 309-319
- Sudhanshu Gupta, Niti Madan, Sooraj Puthoor, Nuwan Jayasena, Sandhya Dwarkadas: 
 Concurrent PIM and Load/Store Servicing in PIM-Enabled Memory. 320-334
- Rashid Aligholipour, Yuan Yao: 
 The Fake-Busy and True-Idle Problems of Running Graph Applications on Chiplet-Based Multi-Cores. 347-349
- Alexandra W. Chadwick  , Márton Erdos, Utpal Bora, Akshay Bhosale, Bob Lytton, Yuxin Guo, Richard Cooper, Giacomo Gabrielli, Timothy M. Jones: , Márton Erdos, Utpal Bora, Akshay Bhosale, Bob Lytton, Yuxin Guo, Richard Cooper, Giacomo Gabrielli, Timothy M. Jones:
 The Future of Instruction-Level Parallelism (ILP). 350-352
- Seonho Lee, Jihwan Oh, Seokjin Go, Divya Mahajan: 
 Characterizing Compute-Communication Overlap in GPU-Accelerated Distributed Deep Learning: Performance and Power Implications. 353-355
- Inseong Hwang, Jihoon Jang  , Chaewon Park, Hyun Kim: , Chaewon Park, Hyun Kim:
 PIM-BEACON: A Benchmarking and Emulation Framework Supporting Adaptive CONfigurations in DRAM-Based Processing-in-Memory Systems. 356-358
- Abhinaba Chakraborty  , Wouter Tavernier, Akis Kourtis, Mario Pickavet, Andreas Oikonomakis, Didier Colle: , Wouter Tavernier, Akis Kourtis, Mario Pickavet, Andreas Oikonomakis, Didier Colle:
 Profiling Concurrent Vision Inference Workloads on NVIDIA Jetson. 359-361
- Christin Bose, Cesar Avalos, Junrui Pan, Yechen Liu, Mahmoud Khairy, Clay Hughes, Timothy G. Rogers: 
 ASLink: Modeling Multi-GPU Execution in Accel-Sim. 362-364
- S. M. Mojahidul Ahsan, Mohammad Nouri, Ramesh Reddy Ganapam, Mohammad Alian, Tamzidul Hoque: 
 A Flexible and Accurate Circuit-Level Substrate for Future DRAM Design and Analysis. 371-373
- Pingyi Huo, Anusha Devulapally, Hasan Al Maruf, Meena Arunachalam, Mahmut Taylan Kandemir, Vijaykrishnan Narayanan: 
 TPNM: A CXL Based General Purpose Tiered Process Near Memory Framework. 374-376
- Kewei Yan, Yonghong Yan: 
 A Real-Time, Auto-Regression Method for in-Situ Feature Extraction in Hydrodynamics Simulations. 377-378
- Aniket Chatterjee, Conor James Green, Mithuna Thottethodi: 
 Library of Networks: An Online Tool for Design and Analysis of Network Topologies. 379-381
- Martí Torrents, Paul Caheny, Stijn Eyerman, Wim Heirman: 
 Multi-Core Aware Evaluation of Prefetchers. 382-384
- Martin Troiber, Martin Schulz, Blaise Tine, Hyesoon Kim: 
 Analysis of the RISC-V Vector Extension for Vulkan Graphics Kernels. 388-389
- Rodrigo Huerta, Antonio González: 
 GPU Simulation Acceleration via Parallelization. 390-392
- Wenzhe Guo, Joyjit Kundu, Uras Tos, Giuliano Sisto, Cedric Rolin, Lars-Åke Ragnarsson, Timon Evenblij: 
 Energon: A Sustainability-Driven Modeling Framework for AI Data Centers. 393-395
- Yves Vandriessche, Wim Heirman, Ed Nutting, Jeremy Birch, Judah Daniels, Mae Hood, Pascal Costanza: 
 Measuring Performance Overheads of Software Memory Management Using Functional-First Simulators. 399-400
- Rahul Tripathy, Sumit K. Mandal: 
 Interconnect Performance Estimation for ML Accelerators via Lightweight Analytical Model. 401-403

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.


 Google
Google Google Scholar
Google Scholar Semantic Scholar
Semantic Scholar Internet Archive Scholar
Internet Archive Scholar CiteSeerX
CiteSeerX ORCID
ORCID














