Accelerated Constrained Sparse Tensor Factorization on Massively Parallel Architectures

by Yongseok P Soh, Ramakrishnan Kannan, Piyush K Sao, Jee Choi

Publication Type

Conference Paper

Book Title

ICPP '24: Proceedings of the 53rd International Conference on Parallel Processing

Publication Date

August, 2024

Page Numbers

107 to 116

Volume

Publisher Location

New York, New York, United States of America

Conference Name

ICPP24: The 53rd International Conference on Parallel Processing

Conference Location

Gotland, Sweden

Conference Sponsor

ACM

Conference Date

Aug 12, 2024 - Aug 15, 2024

Abstract

This study presents the first constrained sparse tensor factorization (cSTF) framework that optimizes and fully offloads computation to massively parallel GPU architectures, and the first performance characterization of cSTF on GPU architectures. In contrast to prior work on tensor factorization, where the matricized tensor times Khatri-Rao product (MTTKRP) is the primary performance bottleneck, our systematic analysis of the cSTF algorithm on GPUs reveals that adding constraints creates an additional bottleneck in the update operation for many real-world sparse tensors. While executing the update operation on the GPU brings significant speedup over its CPU counterpart, it remains a significant bottleneck. To further accelerate the update operation, we propose cuADMM, a new update algorithm that leverages algorithmic and code optimization strategies to minimize both computation and data movement on GPUs. As a result, our framework delivers significantly improved performance compared to prior state-of-the-art. On 10 real-world sparse tensors, our framework achieves geometric mean speedup of 5.1 脳 (max 41.59 脳) and 7.01 脳 (max 58.05 脳) on the NIVIDA A100 and H100 GPUs, respectively, over the state-of-the-art SPLATT library running on a 26-core Intel Ice Lake Xeon CPU.

麻豆影音

Accelerated Constrained Sparse Tensor Factorization on Massively Parallel Architectures

Abstract

Researchers

Organizations