default search action
IPDPS 2017: Orlando, FL, USA
- 2017 IEEE International Parallel and Distributed Processing Symposium, IPDPS 2017, Orlando, FL, USA, May 29 - June 2, 2017. IEEE Computer Society 2017, ISBN 978-1-5386-3914-6
Keynote 1
- Tandy J. Warnow:
Computational Challenges in Constructing the Tree of Life. 1
Session 1: Graph Algorithms
- Gal Yehuda, Daniel Keren, Islam Akaria:
Monitoring Properties of Large, Distributed, Dynamic Graphs. 2-11 - Patrick Flick, Srinivas Aluru:
Parallel Construction of Suffix Trees and the All-Nearest-Smaller-Values Problem. 12-21 - Ariful Azad, Mathias Jacquelin, Aydin Buluç, Esmond G. Ng:
The Reverse Cuthill-McKee Algorithm in Distributed-Memory. 22-31 - Maciej Besta, Florian Marending, Edgar Solomonik, Torsten Hoefler:
SlimSell: A Vectorizable Graph Representation for Breadth-First Search. 32-41
Session 2: Computational Biology
- Haidong Lan, Weiguo Liu, Yongchao Liu, Bertil Schmidt:
SWhybrid: A Hybrid-Parallel Framework for Large-Scale Protein Sequence Database Search. 42-51 - Yuandong Chan, Kai Xu, Haidong Lan, Weiguo Liu, Yongchao Liu, Bertil Schmidt:
PUNAS: A Parallel Ungapped-Alignment-Featured Seed Verification Algorithm for Next-Generation Sequencing Read Alignment. 52-61 - Jing Zhang, Sanchit Misra, Hao Wang, Wu-chun Feng:
Eliminating Irregularities of Protein Sequence Search on Multicore Architectures. 62-71 - Jie Wang, Xinfeng Xie, Jason Cong:
Communication Optimization on GPU: A Case Study of Sequence Alignment Algorithms. 72-81
Session 3: Caches
- Bingchao Li, Jizhou Sun, Murali Annavaram, Nam Sung Kim:
Elastic-Cache: GPU Cache Architecture for Efficient Fine- and Coarse-Grained Cache-Line Management. 82-91 - Qi Zeng, Jih-Kwon Peir:
Content-Aware Non-Volatile Cache Replacement. 92-101 - Jiguang Wan, Wei Wu, Ling Zhan, Qing Yang, Xiaoyang Qu, Changsheng Xie:
DEFT-Cache: A Cost-Effective and Highly Reliable SSD Cache for RAID Storage. 102-111 - Pengcheng Li, Dhruva R. Chakrabarti, Chen Ding, Liang Yuan:
Adaptive Software Caching for Efficient NVRAM Data Persistence. 112-122
Session 4: Cloud & OS
- Song Wu, Chao Niu, Jia Rao, Hai Jin, Xiaohai Dai:
Container-Based Cloud Platform for Mobile Computation Offloading. 123-132 - Hao He, Jiang Hu, Dilma Da Silva:
Enhancing Datacenter Resource Management through Temporal Logic Constraints. 133-142 - Jie Zhang, Xiaoyi Lu, Dhabaleswar K. Panda:
High-Performance Virtual Machine Migration Framework for MPI Applications on SR-IOV Enabled InfiniBand Clusters. 143-152 - Swann Perarnau, Judicael A. Zounmevo, Matthieu Dreher, Brian C. Van Essen, Roberto Gioiosa, Kamil Iskra, Maya B. Gokhale, Kazutomo Yoshii, Peter H. Beckman:
Argo NodeOS: Toward Unified Resource Management for Exascale. 153-162
Session 5: Distributed Algorithms
- Andrea Clementi, Luciano Gualà, Guido Proietti, Giacomo Scornavacca:
Rational Fair Consensus in the Gossip Model. 163-171 - Calvin Newport:
Leader Election in a Smartphone Peer-to-Peer Network. 172-181 - Karine Altisen, Ajoy K. Datta, Stéphane Devismes, Anaïs Durand, Lawrence L. Larmore:
Leader Election in Asymmetric Labeled Unidirectional Rings. 182-191 - Petra Berenbrink, Peter Kling, Christopher Liaw, Abbas Mehrabian:
Tight Load Balancing Via Randomized Local Search. 192-201
Session 6: Numerical Simulation
- Hiroshi Nakashima, Yoshiki Summura, Keisuke Kikura, Yohei Miyake:
Large Scale Manycore-Aware PIC Simulation with Efficient Particle Binning. 202-212 - Amrita Mathuriya, Ye Luo, Anouar Benali, Luke Shulenburger, Jeongnim Kim:
Optimization and Parallelization of B-Spline Based Orbital Evaluations in QMC on Multi/Many-Core Shared Memory Processors. 213-223 - Kshitij Mehta, Maxime R. Hugues, Oscar R. Hernandez, David E. Bernholdt, Henri Calandra:
One-Way Wave Equation Migration at Scale on GPUs Using Directive Based Programming. 224-233 - Mathias Jacquelin, Wibe A. de Jong, Eric J. Bylaska:
Towards Highly scalable Ab Initio Molecular Dynamics (AIMD) Simulations on the Intel Knights Landing Manycore Processor. 234-243
Session 7: Novel Architectures
- Xubin Tan, Jaume Bosch, Miquel Vidal, Carlos Álvarez, Daniel Jiménez-González, Eduard Ayguadé, Mateo Valero:
General Purpose Task-Dependence Management Hardware for Task-Based Dataflow Programming Models. 244-253 - Halit Dogan, Farrukh Hijaz, Masab Ahmad, Brian Kahne, Peter Wilson, Omer Khan:
Accelerating Graph and Machine Learning Workloads Using a Shared Memory Multicore Architecture with Auxiliary Support for In-hardware Explicit Messaging. 254-264 - Xiang Pan, Anys Bacha, Radu Teodorescu:
Respin: Rethinking Near-Threshold Multiprocessor Design with Non-volatile Memory. 265-275 - Syed Mohammad Asad Hassan Jafri, Ahmed Hemani, Kolin Paul, Naeem Abbas:
MOCHA: Morphable Locality and Compression Aware Architecture for Convolutional Neural Networks. 276-286
Session 8: Performance Modeling and Tuning
- Biagio Cosenza, Juan José Durillo, Stefano Ermon, Ben H. H. Juurlink:
Autotuning Stencil Computations with Structural Ordinal Regression Learning. 287-296 - Sabela Ramos, Torsten Hoefler:
Capability Models for Manycore Memory Systems: A Case-Study with Xeon Phi KNL. 297-306 - David Beckingsale, Olga Pearce, Ignacio Laguna, Todd Gamblin:
Apollo: Reusable Models for Fast, Dynamic Tuning of Input-Dependent Code. 307-316 - Ryan D. Friese, Nathan R. Tallent, Abhinav Vishnu, Darren J. Kerbyson, Adolfy Hoisie:
Generating Performance Models for Irregular Applications. 317-326
Session 9: Communication & Coordination
- Keishla D. Ortiz-Lopez, Jennifer L. Welch:
Bounded Reordering Allows Efficient Reliable Message Transmission. 327-336 - Dongxiao Yu, Yuexuan Wang, Tigran Tonoyan, Magnús M. Halldórsson:
Dynamic Adaptation in Wireless Networks Under Comprehensive Interference via Carrier Sense. 337-346 - Pawel Garncarek, Tomasz Jurdzinski, Krzysztof Lorys:
Fault-Tolerant Online Packet Scheduling on Parallel Channels. 347-356 - Torsten Hoefler, Amnon Barak, Amnon Shiloh, Zvi Drezner:
Corrected Gossip Algorithms for Fast Reliable Broadcast on Unreliable Systems. 357-366
Session 10: Tools 1
- Hao Xu, Shasha Wen, Alfredo Giménez, Todd Gamblin, Xu Liu:
DR-BW: Identifying Bandwidth Contention in NUMA Architectures with Supervised Learning. 367-376 - Hui Zhang, Jeffrey K. Hollingsworth:
Data Centric Performance Measurement Techniques for Chapel Programs. 377-386 - Young Wn Song, Yann-Hang Lee:
A Parallel FastTrack Data Race Detector on Multi-core Systems. 387-396 - Gokcen Kestor, Sriram Krishnamoorthy, Wenjing Ma:
Localized Fault Recovery for Nested Fork-Join Programs. 397-408
Session 11: Networks
- Roberto Gioiosa, Antonino Tumeo, Jian Yin, Thomas Warfel, David J. Haglin, Santiago Betelú:
Exploring DataVortex Systems for Irregular Applications. 409-418 - Jiyan Sun, Yan Zhang, Xin Wang, Shihan Xiao, Zhen Xu, Hongjing Wu, Xin Chen, Yanni Han:
DC2-MTCP: Light-Weight Coding for Efficient Multi-Path Transmission in Data Center Network. 419-428 - Yi Dai, Kefei Wang, Gang Qu, Liquan Xiao, Dezun Dong, Xingyun Qi:
A Scalable and Resilient Microarchitecture Based on Multiport Binding for High-Radix Router Design. 429-438 - Nikhil Jain, Abhinav Bhatele, Xiang Ni, Todd Gamblin, Laxmikant V. Kalé:
Partitioning Low-Diameter Networks to Eliminate Inter-Job Interference. 439-448
Session 12: Libraries & Frameworks
- Jan Wroblewski, Kazuaki Ishizaki, Hiroshi Inoue, Moriyoshi Ohara:
Accelerating Spark Datasets by Inlining Deserialization. 449-458 - Hong Zhang, Hai Huang, Liqiang Wang:
MRapid: An Efficient Short Job Optimizer on Hadoop. 459-468 - Samuel K. Gutierrez, Kei Davis, Dorian C. Arnold, Randal S. Baker, Robert W. Robey, Patrick S. McCormick, Daniel Holladay, Jon A. Dahl, R. Joe Zerr, Florian Weik, Christoph Junghans:
Accommodating Thread-Level Heterogeneity in Coupled Parallel Applications. 469-478 - Yuechao Pan, Yangzihao Wang, Yuduo Wu, Carl Yang, John D. Owens:
Multi-GPU Graph Analytics. 479-490
Industry Tutorial
- Julie Bernauer:
NVIDIA Deep Learning Tutorial. 491
Keynote 2
- Mark Seager:
A Scalable System Architecture to Addressing the Next Generation of Predictive Simulation Workflows with Coupled Compute and Data Intensive Applications. 492
Session 13: Motion Planning & Similarity Search
- Sergio Rajsbaum, Armando Castañeda, David Flores-Peñaloza, Manuel Alcantara:
Fault-Tolerant Robot Gathering Problems on Graphs With Arbitrary Appearing Times. 493-502 - Akhil Krishnan, Mikhail Markov, Borzoo Bonakdarpour:
Distributed Vehicle Routing Approximation. 503-512 - Gokarna Sharma, Ramachandran Vaidyanathan, Jerry L. Trahan, Costas Busch, Suresh Rai:
O(log N)-Time Complete Visibility for Asynchronous Robots with Lights. 513-522 - Vincent T. Lee, Justin Kotalik, Carlo C. del Mundo, Armin Alaghi, Luis Ceze, Mark Oskin:
Similarity Search on Automata Processors. 523-534
Session 14: Applications
- Yulong Ao, Chao Yang, Xinliang Wang, Wei Xue, Haohuan Fu, Fangfang Liu, Lin Gan, Ping Xu, Wenjing Ma:
26 PFLOPS Stencil Computations for Atmospheric Modeling on Sunway TaihuLight. 535-544 - Bram Veenboer, Matthias Petschow, John W. Romein:
Image-Domain Gridding on Graphics Processors. 545-554 - Beverly A. Sanders, Jason N. Byrd, Nakul Jindal, Victor F. Lotrich, Dmitry I. Lyakh, Ajith Perera, Rodney J. Bartlett:
Aces4: A Platform for Computational Chemistry Calculations with Extremely Large Block-Sparse Arrays. 555-564 - Shun Yao, Dantong Yu:
PhiOpenSSL: Using the Xeon Phi Coprocessor for Efficient Cryptographic Calculations. 565-574
Session 15: Tools 2
- Xuewen Cui, Thomas R. W. Scogland, Bronis R. de Supinski, Wu-chun Feng:
Directive-Based Partitioning and Pipelining for Graphics Processing Units. 575-584 - Xiaoqing Luo, Frank Mueller, Philip H. Carns, Jonathan Jenkins, Robert Latham, Robert B. Ross, Shane Snyder:
ScalaIOExtrap: Elastic I/O Tracing and Extrapolation. 585-594 - Jen-Cheng Huang, Lifeng Nai, Pranith Kumar, Hyojong Kim, Hyesoon Kim:
SimProf: A Sampling Framework for Data Analytic Workloads. 595-604 - Hao Wang, Jing Zhang, Da Zhang, Sarunya Pumma, Wu-chun Feng:
PaPar: A Parallel Data Partitioning Framework for Big Data Applications. 605-614
Session 16: Data and Graph Analytics
- Jiarui Fang, Haohuan Fu, Wenlai Zhao, Bingwei Chen, Weijie Zheng, Guangwen Yang:
swDNN: A Library for Accelerating Deep Learning Applications on Sunway TaihuLight. 615-624 - Md. Naim, Fredrik Manne, Mahantesh Halappanavar, Antonino Tumeo:
Community Detection on the GPU. 625-634 - Heng Lin, Xiongchao Tang, Bowen Yu, Youwei Zhuo, Wenguang Chen, Jidong Zhai, Wanwang Yin, Weimin Zheng:
Scalable Graph Traversal on Sunway TaihuLight with Ten Million Cores. 635-645 - George M. Slota, Sivasankaran Rajamanickam, Karen D. Devine, Kamesh Madduri:
Partitioning Trillion-Edge Graphs in Minutes. 646-655
Session 17: Linear Algebra
- Jianyu Huang, Leslie Rice, Devin A. Matthews, Robert A. van de Geijn:
Generating Families of Practical Fast Matrix Multiplication Algorithms. 656-667 - Mathieu Faverge, Julien Langou, Yves Robert, Jack J. Dongarra:
Bidiagonalization and R-Bidiagonalization: Parallel Tiled Algorithms, Critical Paths and Distributed-Memory Implementation. 668-677 - Tobias Wicky, Edgar Solomonik, Torsten Hoefler:
Communication-Avoiding Parallel Algorithms for Solving Triangular Systems of Linear Equations. 678-687 - Ariful Azad, Aydin Buluç:
A Work-Efficient Parallel Sparse Matrix-Sparse Vector Multiplication Algorithm. 688-697
Session 18: Power Management
- Abdulaziz Tabbakh, Murali Annavaram, Xuehai Qian:
Power Efficient Sharing-Aware GPU Data Management. 698-707 - Rahul Boyapati, Jiayi Huang, Ningyuan Wang, Kyung Hoon Kim, Ki Hwan Yum, Eun Jung Kim:
Fly-Over: A Light-Weight Distributed Power-Gating Mechanism for Energy-Efficient Networks-on-Chip. 708-717 - Zhenhua Li, Yuanyuan Yang:
RCube: A Power Efficient and Highly Available Network for Data Centers. 718-727 - Thang Cao, Wei Huang, Yuan He, Masaaki Kondo:
Cooling-Aware Job Scheduling and Node Allocation for Overprovisioned HPC Systems. 728-737
Session 19: Scheduling
- Vincenzo Bonifaci, Gianlorenzo D'Angelo, Alberto Marchetti-Spaccamela:
Algorithms for Hierarchical and Semi-Partitioned Parallel Scheduling. 738-747 - Odorico Machado Mendizabal, Ruda S. T. De Moura, Fernando Luís Dotti, Fernando Pedone:
Efficient and Deterministic Scheduling for Parallel State Machine Replication. 748-757 - Guillaume Aupy, Clement Brasseur, Loris Marchal:
Dynamic Memory-Aware Task-Tree Scheduling. 758-767 - Olivier Beaumont, Lionel Eyraud-Dubois, Suraj Kumar:
Approximation Proofs of a Fast and Efficient List Scheduling Algorithm for Task-Based Runtime Systems on Multicores and GPUs. 768-777
Session 20: Code Optimization
- Philippe Clauss, Ervin Altintas, Matthieu Kuhn:
Automatic Collapsing of Non-Rectangular Loops. 778-787 - Yonghong Yan, Jiawen Liu, Kirk W. Cameron, Mariam Umar:
HOMP: Automated Distribution of Parallel Loops and Data in Highly Parallel Accelerator-Based Systems. 788-798 - Jaime Arteaga Molina, Stéphane Zuckerman, Guang R. Gao:
Multigrain Parallelism: Bridging Coarse-Grain Parallel Programs and Fine-Grain Event-Driven Multithreading. 799-808 - Josep M. Pérez, Vicenç Beltran, Jesús Labarta, Eduard Ayguadé:
Improving the Integration of Task Nesting and Dependencies in OpenMP. 809-818
Keynote 3
- Mateo Valero:
Runtime Aware Architectures. 819
Best Papers
- Scott Beamer, Krste Asanovic, David A. Patterson:
Reducing Pagerank Communication via Propagation Blocking. 820-831 - Michael G. Gowanlock, Cody M. Rude, David M. Blair, Justin D. Li, Victor Pankratius:
Clustering Throughput Optimization on the GPU. 832-841 - Pablo Fuentes, Enrique Vallejo, Ramón Beivide, Cyriel Minkenberg, Mateo Valero:
FlexVC: Flexible Virtual Channel Management in Low-Diameter Networks. 842-854 - Benjamin Klenk, Holger Fröning, Hans Eberle, Larry Dennison:
Relaxations for High-Performance Message Passing on Massively Parallel SIMT Processors. 855-865
Session 21: Algorithms
- Reza Mokhtari, Michael Stumm:
The SEPO Model of Computation to Enable Larger-Than-Memory Hash Tables for GPU-Accelerated Big Data Analytics. 866-875 - Wei Xie, Yong Chen:
Elastic Consistent Hashing for Distributed Storage Systems. 876-885 - Chenhan D. Yu, William B. March, George Biros:
An N log N Parallel Fast Direct Solver for Kernel Matrices. 886-896 - Pieter Ghysels, Xiaoye Sherry Li, Christopher Gorman, François-Henry Rouet:
A Robust Parallel Preconditioner for Indefinite Systems Using Hierarchical Matrices and Randomized Sampling. 897-906
Session 22: Coordination
- Sergei Arnautov, Pascal Felber, Christof Fetzer, Bohdan Trach:
FFQ: A Fast Single-Producer/Multiple-Consumer Concurrent FIFO Queue. 907-916 - Ivan Walulya, Philippas Tsigas:
Scalable Lock-Free Vector with Combining. 917-926 - Wei-Lun Hung, Vijay K. Garg:
Automatic-Signal Monitors with Multi-object Synchronization. 927-936 - Yujie An, Quentin F. Stout:
Optimal Algorithms for a Mesh-Connected Computer with Limited Additional Global Bandwidth. 937-946
Session 23: Power Management 2
- Sridutt Bhalachandra, Allan Porterfield, Stephen L. Olivier, Jan F. Prins:
An Adaptive Core-Specific Runtime for Energy Efficiency. 947-956 - Ryuichi Sakamoto, Thang Cao, Masaaki Kondo, Koji Inoue, Masatsugu Ueda, Tapasya Patki, Daniel A. Ellsworth, Barry Rountree, Martin Schulz:
Production Hardware Overprovisioning: Real-World Performance Optimization Using an Extensible Power-Aware Resource Management Framework. 957-966 - Qi Zhu, Bo Wu, Xipeng Shen, Li Shen, Zhiying Wang:
Co-Run Scheduling with Power Cap on Integrated CPU-GPU Systems. 967-977 - Vignesh Adhinarayanan, Wu-chun Feng, David H. Rogers, James P. Ahrens, Scott Pakin:
Characterizing and Modeling Power and Energy for Extreme-Scale In-Situ Visualization. 978-987
Session 24: MPI
- Wim Lavrijsen, Costin Iancu:
Application Level Reordering of Remote Direct Memory Access Operations. 988-997 - Sergio M. Martin, Marsha J. Berger, Scott B. Baden:
Toucan - A Translator for Communication Tolerant MPI Applications. 998-1007 - Yanfei Guo, Charles J. Archer, Michael Blocksome, Scott Parker, Wesley Bland, Ken Raffenetti, Pavan Balaji:
Memory Compression Techniques for Network Address Management in MPI. 1008-1017 - Salvatore Di Girolamo, Flavio Vella, Torsten Hoefler:
Transparent Caching for RMA Systems. 1018-1027
Session 25: ML & Tensors
- El Mahdi El Mhamdi, Rachid Guerraoui:
When Neurons Fail. 1028-1037 - Venkatesan T. Chakaravarthy, Jee W. Choi, Douglas J. Joseph, Xing Liu, Prakash Murali, Yogish Sabharwal, Dheeraj Sreedhar:
On Optimizing Distributed Tucker Decomposition for Dense Tensors. 1038-1047 - Jiajia Li, Jee Choi, Ioakeim Perros, Jimeng Sun, Richard W. Vuduc:
Model-Driven Sparse CP Decomposition for Higher-Order Tensors. 1048-1057 - Shaden Smith, Jongsoo Park, George Karypis:
Sparse Tensor Factorization on Many-Core Processors with High-Bandwidth Memory. 1058-1067
Session 26: Resource Management
- Ali Pourmiri, Mahdi Jafari Siavoshani, Seyed Pooya Shariatpanahi:
Proximity-Aware Balanced Allocations in Cache Networks. 1068-1077 - Wei Chen, Jia Rao, Xiaobo Zhou:
Addressing Performance Heterogeneity in MapReduce Clusters with Elastic Tasks. 1078-1087 - Masahiro Tanaka, Kenjiro Taura, Kentaro Torisawa:
Autonomic Resource Management for Program Orchestration in Large-Scale Data Analysis. 1088-1097 - Tao Gao, Yanfei Guo, Boyu Zhang, Pietro Cicotti, Yutong Lu, Pavan Balaji, Michela Taufer:
Mimir: Memory-Efficient and Scalable MapReduce for Large Supercomputing Systems. 1098-1108
Session 27: Compression & Memoization
- Bo Mao, Hong Jiang, Suzhen Wu, Yaodong Yang, Zaifa Xi:
Elastic Data Compression with Improved Performance and Space Efficiency for Flash-Based Storage Systems. 1109-1118