Add architecture that allows embeddings to be stored on the CPU but training done on the GPU #26

nathancooperjones · 2021-07-14T17:32:16Z

Is your feature request related to a problem? Please describe.

A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

When we have many users and/or items, the sizes of these embeddings tables quickly increases the amount of GPU memory we consume leading to an OOM error even before starting training.

A clear and concise description of what you want to happen.

Ideally, the embedding tables will instead live on the CPU, and during training, be indexed into and only bring that subset of the embeddings to the GPU for the model forward and backward. Then, after the batch, the subset of embeddings are released from GPU memory.

A clear and concise description of any alternative solutions or features you've considered.

It's also possible to accomplish this by splitting the embeddings up over many GPUs and use a model parallel, multi-GPU solution. But this solution outlined above allows a scalable model on a single node, which may be more desirable to more users of the library.

Add any other context or information about the feature request here.

See here for some related discussion on this.

nathancooperjones added enhancement New feature or request help wanted Extra attention is needed labels Jul 14, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add architecture that allows embeddings to be stored on the CPU but training done on the GPU #26

Add architecture that allows embeddings to be stored on the CPU but training done on the GPU #26

nathancooperjones commented Jul 14, 2021

Add architecture that allows embeddings to be stored on the CPU but training done on the GPU #26

Add architecture that allows embeddings to be stored on the CPU but training done on the GPU #26

Comments

nathancooperjones commented Jul 14, 2021