GSplit: Scaling Graph Neural Network Training on Large Graphs via Split-Parallelism...

by Seung-hwan Lim

Publication Type

Conference Paper

Book Title

Proceedings of The Eighth Annual Conference on Machine Learning and Systems

Publication Date

May, 2025

Page Number

273

Conference Name

The Eighth Annual Conference on Machine Learning and Systems

Conference Location

Santa Clara, California, United States of America

Conference Sponsor

Systems and Machine Learning Foundation

Conference Date

May 12, 2025

Abstract

Graph neural networks (GNNs), an emerging class of machine learning models for graphs, have gained popularity for their superior performance in various graph analytical tasks. Mini-batch training is commonly used to train GNNs on large graphs, and data parallelism is the standard approach to scale mini-batch training across multiple GPUs. Data parallel approaches contain redundant work as subgraphs sampled by different GPUs contain significant overlap. To address this issue, we introduce a hybrid parallel mini-batch training paradigm called Split parallelism. Split parallelism avoids redundant work by splitting the sampling, loading, and training of each mini-batch across multiple GPUs. Split parallelism, however, introduces communication overheads that can be more than the savings from removing redundant work. We further present a lightweight partitioning algorithm that probabilistically minimizes these overheads. We implement spllit parllelism in GSplit and show that it outperforms state-of-the-art mini-batch training systems like DGL, Quiver, and P3.

麻豆影音

GSplit: Scaling Graph Neural Network Training on Large Graphs via Split-Parallelism...

Abstract

Researchers

Organizations