Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why is Pipeline parallelism not compatible with ZeRO-2 and ZeRO-3? #1629

Open
Dounm opened this issue Dec 10, 2021 · 3 comments
Open

Why is Pipeline parallelism not compatible with ZeRO-2 and ZeRO-3? #1629

Dounm opened this issue Dec 10, 2021 · 3 comments
Labels
enhancement New feature or request

Comments

@Dounm
Copy link

Dounm commented Dec 10, 2021

Could you explain why Pipeline parallelism is not compatible with ZeRO-2 and ZeRO-3? Are there any design tradeoffs?

Because as far as I know, it is pretty common to train large models with both DataParallel and PipelineParallel together. and with the constraint above, the offload mechanism cannot be enabled due to its dependency on ZeRO-2/3.

Also, Megatron-DeepSpeed:pretrain_gpt.py use GPTModelPipe, which is a subclass of PipelineModule as the model module passed to deepspeed.initialize(), so it's impossible to enable ZeRO-2/3 in the config json, is there any examples to run with ZeRO-2/3?

@Dounm Dounm added the enhancement New feature or request label Dec 10, 2021
@2catycm
Copy link

2catycm commented May 6, 2023

Some relevant information are in #1110

@fxmarty
Copy link

fxmarty commented May 6, 2024

ZeRO Stage 3: The 16-bit model parameters are partitioned across the processes. ZeRO-3 will automatically collect and partition them during the forward and backward passes.

How is that not pipeline parallelism?

@fxmarty
Copy link

fxmarty commented May 6, 2024

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants