Anyone able to run 7B on google colab? #120

andrewmlu · 2023-03-05T03:10:58Z

Interested to see if anyone is able to run on google colab. Seems like 16 GB should be enough and is granted often for colab free. Not sure if Colab Pro should do anything better, but if anyone is able to, advice would be much appreciated.

reycn · 2023-03-05T07:56:51Z

Not for free users. The model must be loaded into the CPU memory (if we are talking about this repo) but Colab only provides a capacity of less than 13G. Therefore, it cannot be run on the free version of Colab.

However, it may work if you have a Pro subscription. I look forward to hearing about your outcome.

Daviljoe193 · 2023-03-05T08:16:03Z

What about loading it onto the TPU? KoboldAI can load 20B models (Like Erebus 20B) just fine to TPU, barring the fact it takes about 15 minutes to load a 40~ GB model, which Erebus specifically is split into 23 parts, though once running, they're pretty fast. Maybe there's a way to reduce the per-segment size from 15~ GB down to 2 GB, by splitting it into smaller parts, the same way Erebus is.

elephantpanda · 2023-03-05T09:09:36Z

I got it to run on a Shadow PC #105 which has only 12GB RAM. So it crunched the page file a fair bit. But still loaded the model in about 110 seconds. As it clears the RAM after moving it to the GPU.

So it can work on a computer with less than 14GB RAM, but perhaps Google Colab doesn't have a page file? IDK.

brendan-donohoe · 2023-03-05T21:08:00Z

It's possible to make it work on the free version. Since colab gives you more GPU VRAM than RAM, what you'll want to do is load the checkpoint into CUDA rather than CPU. Once you've done that, split the state dict on the layers, save the sharded state dict, and then after freeing your GPU memory (or in another run) sequentially load each shard into the model on the GPU afterward, making sure to delete each shard once you're done. You'll save quite a bit of RAM during the loading process, and from there it should work.

andrewmlu · 2023-03-05T21:08:50Z

Update: able to run on google colab pro. Seems 12 GB of ram is the issue.

andrewmlu · 2023-03-05T21:24:20Z

It's possible to make it work on the free version. Since colab gives you more GPU VRAM than RAM, what you'll want to do is load the checkpoint into CUDA rather than CPU. Once you've done that, split the state dict on the layers, save the sharded state dict, and then after freeing your GPU memory (or in another run) sequentially load each shard into the model on the GPU afterward, making sure to delete each shard once you're done. You'll save quite a bit of RAM during the loading process, and from there it should work.

I attempted loading on GPU, and still it is unable to fully load. CUDA out of memory.

brendan-donohoe · 2023-03-05T21:45:46Z

Here's a notebook that goes through the steps I just mentioned and works for me using colab pro's standard GPU (~15 GB VRAM) and regular RAM runtime (~12.7 GB RAM), which I think is identical to the free version but I'm not completely certain. If the free colab gives less VRAM than the pro standard, it may indeed be impossible, but it should at least use compute units more efficiently on pro:

https://pastebin.com/Le2zaJCy

This uses a 15 GB T4 GPU. If you have colab pro, there's an option to run 13B that should work as well, though you'll have to be patient executing the second cell. Colab is slow to save files, so you may have to wait and check your drive to make sure that everything has saved as it should before proceeding.

Daviljoe193 · 2023-03-06T11:05:47Z

I've gotten this one notebook from a 4chan user to work for me on the free tier. It's VERY cumbersome to get working, but it does work. All I changed when I ran it was to not use Google Drive, and instead get the model from somebody who mirrored the model on Huggingface (Brave soul, but the model got flagged and is probably gonna vanish from there). It splits the model like I mentioned, so again, maybe if somebody could get it working with a TPU, and split the models like this notebook does, then maybe the higher parameter models could be workable without needing a Colab Pro subscription.

usmanovaa · 2023-03-22T14:38:50Z

I was able to run the model on Colab Pro.
It took 27gb RAM for me.

For this recommend you to switch on TPU (there are 35gb RAM). And add "low_cpu_mem_usage = True" in "from_pretrained".

zeeboi9 · 2023-05-07T15:28:08Z

Interested to see if anyone is able to run on google colab. Seems like 16 GB should be enough and is granted often for colab free. Not sure if Colab Pro should do anything better, but if anyone is able to, advice would be much appreciated.

i was able to run it in normal colab but it is horribly slow because the model is linked to g drive can anyone help me make the loading times and the time the model takes to type out the response faster?

link to code or commands becasue it is a linux environment -- https://colab.research.google.com/drive/1otfwOihFBtNznj7ZXqiUJV_OXPm_BnN3?usp=sharing

johnwick123f · 2023-07-13T17:13:40Z

I am writing this a few months later, but its easy to run the model if you use llama cpp and a quantized version of the model. You can even run a model over 30b if you did.
You don't even need colab. On my phone, its possible to run a 3b model and it outputs 1 token or half per second which is slow but pretty surprising its working on my phone!

liushiyi1994 · 2023-07-21T15:23:36Z

I am writing this a few months later, but its easy to run the model if you use llama cpp and a quantized version of the model. You can even run a model over 30b if you did. You don't even need colab. On my phone, its possible to run a 3b model and it outputs 1 token or half per second which is slow but pretty surprising its working on my phone!

I'm doing some edge computing research, mind if i ask how do you run it on the phone?

windmaple · 2023-08-08T02:26:31Z

I am writing this a few months later, but its easy to run the model if you use llama cpp and a quantized version of the model. You can even run a model over 30b if you did. You don't even need colab. On my phone, its possible to run a 3b model and it outputs 1 token or half per second which is slow but pretty surprising its working on my phone!

I'm doing some edge computing research, mind if i ask how do you run it on the phone?

llama.cpp supports Android. Ref: https://github.com/ggerganov/llama.cpp#android

subramen closed this as completed Aug 15, 2023

lintool mentioned this issue Aug 30, 2023

What's the largest open-source LLM we can run on Colab? castorini/ura-projects#4

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Anyone able to run 7B on google colab? #120

Anyone able to run 7B on google colab? #120

andrewmlu commented Mar 5, 2023

reycn commented Mar 5, 2023 •

edited

Loading

Daviljoe193 commented Mar 5, 2023 •

edited

Loading

elephantpanda commented Mar 5, 2023

brendan-donohoe commented Mar 5, 2023 •

edited

Loading

andrewmlu commented Mar 5, 2023

andrewmlu commented Mar 5, 2023

brendan-donohoe commented Mar 5, 2023 •

edited

Loading

Daviljoe193 commented Mar 6, 2023 •

edited

Loading

usmanovaa commented Mar 22, 2023

zeeboi9 commented May 7, 2023

johnwick123f commented Jul 13, 2023

liushiyi1994 commented Jul 21, 2023 •

edited

Loading

windmaple commented Aug 8, 2023

Anyone able to run 7B on google colab? #120

Anyone able to run 7B on google colab? #120

Comments

andrewmlu commented Mar 5, 2023

reycn commented Mar 5, 2023 • edited Loading

Daviljoe193 commented Mar 5, 2023 • edited Loading

elephantpanda commented Mar 5, 2023

brendan-donohoe commented Mar 5, 2023 • edited Loading

andrewmlu commented Mar 5, 2023

andrewmlu commented Mar 5, 2023

brendan-donohoe commented Mar 5, 2023 • edited Loading

Daviljoe193 commented Mar 6, 2023 • edited Loading

usmanovaa commented Mar 22, 2023

zeeboi9 commented May 7, 2023

johnwick123f commented Jul 13, 2023

liushiyi1994 commented Jul 21, 2023 • edited Loading

windmaple commented Aug 8, 2023

reycn commented Mar 5, 2023 •

edited

Loading

Daviljoe193 commented Mar 5, 2023 •

edited

Loading

brendan-donohoe commented Mar 5, 2023 •

edited

Loading

brendan-donohoe commented Mar 5, 2023 •

edited

Loading

Daviljoe193 commented Mar 6, 2023 •

edited

Loading

liushiyi1994 commented Jul 21, 2023 •

edited

Loading