


default search action
ICPP 2019: Kyoto, Japan
- Proceedings of the 48th International Conference on Parallel Processing, ICPP 2019, Kyoto, Japan, August 05-08, 2019. ACM 2019, ISBN 978-1-4503-6295-5

Best Paper for ICPP 2019
- Ian Bogle, Karen D. Devine, Mauro Perego, Sivasankaran Rajamanickam, George M. Slota:

A Parallel Graph Algorithm for Detecting Mesh Singularities in Distributed Memory Ice Sheet Simulations. 1:1-1:10
T1A: Memory Architectures
- Xi Wang, Antonino Tumeo, John D. Leidel, Jie Li

, Yong Chen
:
MAC: Memory Access Coalescer for 3D-Stacked Memory. 2:1-2:10 - Jason Hiebel, Laura E. Brown, Zhenlin Wang:

Machine Learning for Fine-Grained Hardware Prefetcher Control. 3:1-3:9 - Albin Eldstål-Damlin

, Pedro Trancoso
, Ioannis Sourdis:
AVR: Reducing Memory Traffic with Approximate Value Reconstruction. 4:1-4:10 - Hui Sun, Wei Liu, Jianzhong Huang, Song Fu, Zhi Qiao

, Weisong Shi
:
Near-Data Processing-Enabled and Time-Aware Compaction Optimization for LSM-tree-based Key-Value Stores. 5:1-5:11
T1B: Workflow and Data Analysis Systems
- Amelie Chi Zhou, Yao Xiao

, Bingsheng He
, Shadi Ibrahim, Reynold Cheng:
Incorporating Probabilistic Optimizations for Resource Provisioning of Data Processing Workflows. 6:1-6:10 - Maria Malik, Hassan Ghasemzadeh, Tinoosh Mohsenin, Rosario Cammarota, Liang Zhao, Avesta Sasan, Houman Homayoun, Setareh Rafatirad:

ECoST: Energy-Efficient Co-Locating and Self-Tuning MapReduce Applications. 7:1-7:11 - Wujie Shao, Fei Xu, Li Chen, Haoyue Zheng, Fangming Liu:

Stage Delay Scheduling: Speeding up DAG-style Data Analytics Jobs with Resource Interleaving. 8:1-8:11 - Frank Schoeneman, Jaroslaw Zola

:
Solving All-Pairs Shortest-Paths Problem in Large Graphs Using Apache Spark. 9:1-9:10
T1C: Data Centers
- Xiaofeng Hou, Jiacheng Liu

, Chao Li, Minyi Guo:
Unleashing the Scalability Potential of Power-Constrained Data Center in the Microservice Era. 10:1-10:10 - Jiaqi Zheng, Qiming Zheng, Xiaofeng Gao, Guihai Chen

:
Dynamic Load Balancing in Hybrid Switching Data Center Networks with Converters. 11:1-11:10 - Mathieu Bacou

, Grégoire Todeschi, Alain Tchana, Daniel Hagimont:
Nested Virtualization Without the Nest. 12:1-12:10 - Xiaofeng Hou, Mingyu Liang, Chao Li, Wenli Zheng, Quan Chen, Minyi Guo:

When Power Oversubscription Meets Traffic Flood Attack: Re-Thinking Data Center Peak Load Management. 13:1-13:10
T2A: Memory Optimizations
- Adrian Garcia-Garcia

, Juan Carlos Saez
, Fernando Castro
, Manuel Prieto-Matías
:
LFOC: A Lightweight Fairness-Oriented Cache Clustering Policy for Commodity Multicores. 14:1-14:10 - Konstantinos Nikas, Nikela Papadopoulou

, Dimitra Giantsidi, Vasileios Karakostas, Georgios I. Goumas, Nectarios Koziris:
DICER: Diligent Cache Partitioning for Efficient Workload Consolidation. 15:1-15:10 - Yaocheng Xiang

, Chencheng Ye, Xiaolin Wang, Yingwei Luo, Zhenlin Wang:
EMBA: Efficient Memory Bandwidth Allocation to Improve Performance on Intel Commodity Processor. 16:1-16:12 - Jun Xiao, Andy D. Pimentel

, Xu Liu:
CPpf: a prefetch aware LLC partitioning approach. 17:1-17:10
T2B: Parallel Systems Algorithms
- Jinbin Hu, Jiawei Huang, Wenjun Lv, Weihe Li, Jianxin Wang, Tian He:

TLB: Traffic-aware Load Balancing with Adaptive Granularity in Data Center Networks. 18:1-18:10 - Carlos Fernandez Musoles, Daniel Coca

, Paul Richmond
:
HyperPRAW: Architecture-Aware Hypergraph Restreaming Partition to Improve Performance of Parallel Applications Running on High Performance Computing Systems. 19:1-19:10 - Yidan Wang, Zahir Tari

, Xiaoran Huang, Albert Y. Zomaya
:
A Network-aware and Partition-based Resource Management Scheme for Data Stream Processing. 20:1-20:10 - Zhengyu Liao, Shiyou Qian, Jian Cao, Yanhua Cao, Guangtao Xue, Jiadi Yu, Yanmin Zhu, Minglu Li:

PhSIH: A Lightweight Parallelization of Event Matching in Content-based Pub/Sub Systems. 21:1-21:10
T2C: NVRAM and SSD
- Mengting Lu, Fang Wang, Dan Feng, Yuchong Hu:

A Read-leveling Data Distribution Scheme for Promoting Read Performance in SSDs with Deduplication. 22:1-22:10 - Gaoxiang Xu, Dan Feng, Zhipeng Tan, Xinyan Zhang

, Jie Xu, Xi Shu, Yifeng Zhu:
RFPL: A Recovery Friendly Parity Logging Scheme for Reducing Small Write Penalty of SSD RAID. 23:1-23:10 - Bin Xu, Jianzhong Huang, Qiang Cao, Xiao Qin

:
TEA: A Traffic-efficient Erasure-coded Archival Scheme for In-memory Stores. 24:1-24:10 - Jiahao Liu, Fang Wang, Dan Feng:

CostPI: Cost-Effective Performance Isolation for Shared NVMe SSDs. 25:1-25:10
T3A: Parallel Architectures
- Muhammad Waqar Azhar

, Miquel Pericàs, Per Stenström:
SaC: Exploiting Execution-Time Slack to Save Energy in Heterogeneous Multicore Systems. 26:1-26:12 - Yunfan Li, Di Zhu, Lizhong Chen:

Express Link Placement for NoC-Based Many-Core Platforms. 27:1-27:10 - Fazeleh Sadat Hoseini, Aras Atalar, Philippas Tsigas

:
Modeling the Performance of Atomic Primitives on Modern Architectures. 28:1-28:11 - Michihiro Koibuchi

, Ikki Fujiwara
, Naoya Niwa
, Tomohiro Totoki, Shoichi Hirasawa:
The Case for Water-Immersion Computer Boards. 29:1-29:10
T3B: Scheduling
- Zhuozhao Li

, Haiying Shen:
JobPacker: Job Scheduling for Data-Parallel Frameworks with Hybrid Electrical/Optical Datacenter Networks. 30:1-30:10 - Marco D'Amico

, Ana Jokanovic, Julita Corbalán:
Holistic Slowdown Driven Scheduling and Resource Management for Malleable Jobs. 31:1-31:10 - Ana Gainaru, Guillaume Pallez, Hongyang Sun, Padma Raghavan:

Speculative Scheduling for Stochastic HPC Applications. 32:1-32:10 - Guoxin Liu, Haiying Shen, Haoyu Wang:

Cooperative Job Scheduling and Data Allocation for Busy Data-Intensive Parallel Computing Clusters. 33:1-33:11
T3C: I/O Systems
- Ping Xie, Zhu Yuan, Jianzhong Huang, Xiao Qin

:
N-Code: An Optimal RAID-6 MDS Array Code for Load Balancing and High I/O Performance. 34:1-34:10 - Chunjie Zhu, Fang Wang, Binbing Hou:

BPP: A Realtime Block Access Pattern Mining Scheme for I/O Prediction. 35:1-35:10 - Yuanning Gao, Xiaofeng Gao, Guihai Chen

:
DeepHash: An End-to-End Learning Approach for Metadata Management in Distributed File Systems. 36:1-36:10 - Shiyi Cao, Yuanning Gao, Xiaofeng Gao, Guihai Chen

:
AdaM: An Adaptive Fine-Grained Scheme for Distributed Metadata Management. 37:1-37:10
T4A: On Node Optimization
- Seonmyeong Bak, Yanfei Guo, Pavan Balaji, Vivek Sarkar:

Optimized Execution of Parallel Loops via User-Defined Scheduling Policies. 38:1-38:10 - Nicolas Denoyelle, Brice Goglin, Emmanuel Jeannot, Thomas Ropars:

Data and Thread Placement in NUMA Architectures: A Statistical Learning Approach. 39:1-39:10 - Emre Ates

, Yijia Zhang, Burak Aksar
, Jim M. Brandt, Vitus J. Leung, Manuel Egele, Ayse K. Coskun:
HPAS: An HPC Performance Anomaly Suite for Reproducing Performance Variations. 40:1-40:10 - Daniel Zahka, Brian Kocoloski, Kate Keahey:

Reducing Kernel Surface Areas for Isolation and Scalability. 41:1-41:10
T4B: Parallel Algorithms 1
- Yulin Che, Zhuohang Lai, Shixuan Sun, Qiong Luo

, Yue Wang:
Accelerating All-Edge Common Neighbor Counting on Three Processors. 42:1-42:10 - Liang Yuan, Shan Huang, Yunquan Zhang, Hang Cao:

Tessellating Star Stencils. 43:1-43:10 - Sivan Toledo, Amit Waisel:

Parallel Algorithms for Evaluating Matrix Polynomials. 44:1-44:10 - Ancy Sarah Tom, George Karypis

:
A 2D Parallel Triangle Counting Algorithm for Distributed-Memory Architectures. 45:1-45:10
T4C: Communication Architectures
- Xiaojun Shang, Zhenhua Liu, Yuanyuan Yang

:
Network Congestion-aware Online Service Function Chain Placement and Load Balancing. 46:1-46:10 - Rohit Zambre

, Megan Grodowitz, Aparna Chandramowlishwaran
, Pavel Shamis:
Breaking Band: A Breakdown of High-performance Communication. 47:1-47:10 - Jesper Larsson Träff, Sascha Hunold

:
Cartesian Collective Communication. 48:1-48:11
T5A: System Software for GPUs
- Akbar Majidi

, Xiaofeng Gao, Shunjia Zhu, Nazila Jahanbakhsh, Guihai Chen
:
Adaptive Routing Reconfigurations to Minimize Flow Cost in SDN-Based Data Center Networks. 50:1-50:10 - David Troendle, Tuan Ta, Byunghyun Jang:

A Specialized Concurrent Queue for Scheduling Irregular Workloads on GPUs. 51:1-51:11 - Kaijie Fan

, Biagio Cosenza
, Ben H. H. Juurlink:
Predictable GPUs Frequency Scaling for Energy and Performance. 52:1-52:10 - Hyunjun Kim

, Sungin Hong, Hyeonsu Lee, Euiseong Seo, Hwansoo Han:
Compiler-Assisted GPU Thread Throttling for Reduced Cache Contention. 53:1-53:10
T5B: Parallel Algorithms 2
- Rui Xia, Haipeng Dai

, Jiaqi Zheng, Rong Gu, Xiaoyu Wang, Guihai Chen
:
SAFE: Service Availability via Failure Elimination Through VNF Scaling. 54:1-54:10 - Yitong Guan, Chuanyou Li, Xueyan Tang:

On Max-min Fair Resource Allocation for Distributed Job Execution. 55:1-55:10 - Wei Zhou, K. Preston White, Hongfeng Yu:

Improving Short Job Latency Performance in Hybrid Job Schedulers with Dice. 56:1-56:10 - Tingzhe Zhou, Maged M. Michael, Michael F. Spear

:
A Practical, Scalable, Relaxed Priority Queue. 57:1-57:10
T5C: Networking
- Ke Wu, Dezun Dong, Cunlu Li, Shan Huang

, Yi Dai:
Network Congestion Avoidance through Packet-chaining Reservation. 58:1-58:10 - Rui Li, Yu Pang, Jin Zhao, Xin Wang:

A Tale of Two (Flow) Tables: Demystifying Rule Caching in OpenFlow Switches. 59:1-59:10 - Xuebing Li, Bingyang Liu, Yang Chen, Yu Xiao

, Jiaxin Tang, Xin Wang:
Artemis: A Practical Low-latency Naming and Routing System. 60:1-60:10 - Yunren Bai, Zihan Xu, Haixia Wang, Dongsheng Wang:

Fast Recovery Techniques for Erasure-coded Clusters in Non-uniform Traffic Network. 61:1-61:10
T6A: Accelerator Applications
- Yohei Miki

:
Gravitational Octree Code Performance Evaluation on Volta GPU. 62:1-62:10 - Robin Kobus, Daniel Jünger, Christian Hundt, Bertil Schmidt

:
Gossip: Efficient Communication Primitives for Multi-GPU Systems. 63:1-63:10 - Ali Eker, Barry Williams, Kenneth Chiu, Dmitry Ponomarev:

Controlled Asynchronous GVT: Accelerating Parallel Discrete Event Simulation on Many-Core Clusters. 64:1-64:10 - Chengxin Guo, Hong Chen, Feng Zhang, Cuiping Li:

Distributed Join Algorithms on Multi-CPU Clusters with GPUDirect RDMA. 65:1-65:10
T6B: Fault Tolerance
- Ji Zhang, Ke Zhou, Ping Huang, Xubin He, Zhili Xiao, Bin Cheng, Yongguang Ji, Yinhu Wang:

Transfer Learning based Failure Prediction for Minority Disks in Large Data Centers of Heterogeneous Disk Systems. 66:1-66:10 - Carlos Pachajoa, Markus Levonyak

, Wilfried N. Gansterer
, Jesper Larsson Träff:
How to Make the Preconditioned Conjugate Gradient Method Resilient Against Multiple Node Failures. 67:1-67:10 - Yingyao Rong, Weigang Wu, Zhiguang Chen:

COMBFT: Conflicting-Order-Match based Byzantine Fault Tolerance Protocol with High Efficiency and Robustness. 68:1-68:10 - Da Yan, James Cheng, Hongzhi Chen, Cheng Long, Purushotham V. Bangalore:

Lightweight Fault Tolerance in Pregel-Like Systems. 69:1-69:10
T6C: Applications 1 - Simulations
- Marquita Ellis, Giulia Guidi, Aydin Buluç

, Leonid Oliker, Katherine A. Yelick
:
diBELLA: Distributed Long Read to Long Read Alignment. 70:1-70:11 - Zonghao Feng, Shuang Qiu, Lipeng Wang, Qiong Luo

:
Accelerating Long Read Alignment on Three Processors. 71:1-71:10 - Kai Xu, Zhenya Song, Yuandong Chan, Shida Wang, Xiangxu Meng, Weiguo Liu, Wei Xue:

Refactoring and Optimizing WRF Model on Sunway TaihuLight. 72:1-72:10 - Mauro Del Ben, Osni Marques, Andrew Canning:

Improved Unconstrained Energy Functional Method for Eigensolvers in Electronic Structure Calculations. 73:1-73:11
T7A: Programming Systems and Runtimes
- Zhuohang Lai, Qiong Luo

, Xiaolong Xie:
Efficient Data-Parallel Primitives on Heterogeneous Systems. 74:1-74:10 - D. Brian Larkins

, John Snyder, James Dinan:
Accelerated Work Stealing. 75:1-75:10 - Bibek Wagle

, Mohammad Alaul Haque Monil
, Kevin A. Huck
, Allen D. Malony, Adrian Serio, Hartmut Kaiser
:
Runtime Adaptive Task Inlining on Asynchronous Multitasking Runtime Systems. 76:1-76:10 - Masahiro Yasugi, Daisuke Muraoka, Tasuku Hiraishi, Seiji Umatani, Kento Emoto:

HOPE: A Parallel Execution Model Based on Hierarchical Omission. 77:1-77:11
T7B: Performance Modeling
- Xingfu Wu

, Valerie E. Taylor
, Justin M. Wozniak, Rick Stevens, Thomas S. Brettin, Fangfang Xia:
Performance, Energy, and Scalability Analysis and Improvement of Parallel Cancer Deep Learning CANDLE Benchmarks. 78:1-78:11 - Sandeep Madireddy

, Prasanna Balaprakash
, Philip H. Carns, Robert Latham, Glenn K. Lockwood, Robert B. Ross
, Shane Snyder, Stefan M. Wild
:
Adaptive Learning for Concept Drift in Application Performance Modeling. 79:1-79:11 - Fahim Chowdhury, Yue Zhu, Todd Heer, Saul Paredes, Adam Moody, Robin Goldstone, Kathryn M. Mohror, Weikuan Yu

:
I/O Characterization and Performance Evaluation of BeeGFS for Deep Learning. 80:1-80:10 - Suraj Kumar, Lionel Eyraud-Dubois, Sriram Krishnamoorthy

:
Performance Models for Data Transfers: A Case Study with Molecular Chemistry Kernels. 81:1-81:10
T7C: Simulation Techniques
- Haozhao Wang, Song Guo, Ruixuan Li:

OSP: Overlapping Computation and Communication in Parameter Server for Fast Machine Learning. 82:1-82:10 - Jan Hückelheim, Navjot Kukreja, Sri Hari Krishna Narayanan

, Fabio Luporini, Gerard Gorman, Paul D. Hovland:
Automatic Differentiation for Adjoint Stencil Loops. 83:1-83:10 - Sidharth Kumar, Steve Petruzza, Will Usher, Valerio Pascucci

:
Spatially-aware Parallel I/O for Particle Data. 84:1-84:10 - Yi Liu, Xiaowei Guo

, Chao Li
, Canqun Yang, Xinbiao Gan, Peng Zhang, Yi Wang, Ran Zhao, Sijiang Fan:
The Communication-Overlapped Hybrid Decomposition Parallel Algorithm for Multi-Scale Fluid Simulations. 85:1-85:12
T8A: Deep Learning
- Haoyue Zheng, Fei Xu, Li Chen, Zhi Zhou, Fangming Liu:

Cynthia: Cost-Efficient Cloud Resource Provisioning for Predictable Distributed Deep Neural Network Training. 86:1-86:11 - Wenjia Zheng, Michael Tynes, Henry Gorelick, Ying Mao, Long Cheng

, Yantian Hou:
FlowCon: Elastic Flow Configuration for Containerized Deep Learning Applications. 87:1-87:10 - Yang Cheng, Dan Li, Zhiyuan Guo, Binyao Jiang, Jiaxin Lin, Xi Fan, Jinkun Geng, Xinyi Yu, Wei Bai, Lei Qu, Ran Shu, Peng Cheng, Yongqiang Xiong, Jianping Wu:

DLBooster: Boosting End-to-End Deep Learning Workflows with Offloading Data Preprocessing Pipelines. 88:1-88:11 - Wei Gao, Jiarui Fang

, Wenlai Zhao, Jinzhe Yang, Long Wang, Lin Gan, Haohuan Fu, Guangwen Yang:
swATOP: Automatically Optimizing Deep Learning Operators on SW26010 Many-Core Processor. 89:1-89:10
T8B: Tools and Their Use
- Allen D. Malony, Srinivasan Ramesh, Kevin A. Huck

, Nicholas Chaimov
, Sameer Shende:
A Plugin Architecture for the TAU Performance System. 90:1-90:11 - Tao Wang

, Nikhil Jain, David Beckingsale, David Böhme
, Frank Mueller, Todd Gamblin:
FuncyTuner: Auto-tuning Scientific Applications With Per-loop Compilation. 91:1-91:10 - Jakub Kurzak, Yaohung M. Tsai, Mark Gates

, Ahmad Abdelfattah, Jack J. Dongarra:
Massively Parallel Automated Software Tuning. 92:1-92:10 - Chih-Min Lin, Sheng-Yu Fu, Ding-Yong Hong

, Yu-Ping Liu, Jan-Jan Wu, Wei-Chung Hsu:
Exploiting Vector Processing in Dynamic Binary Translation. 93:1-93:10
T8C: Applications 2 - Emerging Applications
- Jingya Zhou, Jianxi Fan, Jin Wang:

Cosin: Controllable Social Influence Maximization and Its Distributed Implementation in Large-scale Social Networks. 94:1-94:10 - Huayi Jin, Chentao Wu, Xin Xie, Jie Li

, Minyi Guo, Hao Lin, Jianfeng Zhang:
Approximate Code: A Cost-Effective Erasure Coding Framework for Tiered Video Storage in Cloud Systems. 95:1-95:10 - Chen Zhang, Qiang Cao, Jie Yao, Yuanyuan Dong, Puyuan Yang:

VScan: Efficiently Analyzing Surveillance Videos via Model-joint Mechanism. 96:1-96:10 - Xin Chen, Dmytro Konobrytskyi, Thomas M. Tucker, Thomas R. Kurfess, Richard W. Vuduc:

Faster parallel collision detection at high resolution for CNC milling applications. 97:1-97:10
T9A: Neural Networks
- Deguang Wang, Junzhong Shen, Mei Wen, Chunyuan Zhang:

An Efficient Design Flow for Accelerating Complicated-connected CNNs on a Multi-FPGA Platform. 98:1-98:10 - Leyuan Wang, Zhi Chen, Yizhi Liu, Yao Wang, Lianmin Zheng, Mu Li, Yida Wang:

A Unified Optimization Approach for CNN Model Inference on Integrated GPUs. 99:1-99:10 - André Weißenberger, Bertil Schmidt

:
Massively Parallel ANS Decoding on GPUs. 100:1-100:10
T9B: Parallel Data Structures
- Mengxing Liu, Jiankai Xing, Kang Chen, Yongwei Wu:

Building Scalable NVM-based B+tree with HTM. 101:1-101:10 - Benjamin Brock, Aydin Buluç

, Katherine A. Yelick
:
BCL: A Cross-Platform Distributed Data Structures Library. 102:1-102:10 - Caixin Gong, Shuibing He, Yili Gong, Yingchun Lei:

On Integration of Appends and Merges in Log-Structured Merge Trees. 103:1-103:10
T9C: IoT and Edge Computing
- Zichuan Xu, Yutong Zhang, Weifa Liang

, Qiufen Xia, Omer F. Rana
, Alex Galis, Guowei Wu, Pan Zhou:
NFV-Enabled Multicasting in Mobile Edge Clouds with Resource Sharing. 104:1-104:10 - Ke Li

, Haowei Huang, Xiaofeng Gao, Fan Wu, Guihai Chen
:
QLEC: A Machine-Learning-Based Energy-Efficient Clustering Algorithm to Prolong Network Lifespan for IoT in High-Dimensional Space. 105:1-105:10 - Alexandre Da Silva Veith

, Felipe Rodrigo de Souza, Marcos Dias de Assunção, Laurent Lefèvre, Julio Cesar Santos dos Anjos
:
Multi-Objective Reinforcement Learning for Reconfiguring Data Stream Analytics on Edge Computing. 106:1-106:10

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.


Google
Google Scholar
Semantic Scholar
Internet Archive Scholar
CiteSeerX
ORCID














