Skip to content

Commit

Permalink
Reduce model loading time (ggml-org#43)
Browse files Browse the repository at this point in the history
* Use buffering

* Use vector

* Minor

---------

Co-authored-by: Georgi Gerganov <[email protected]>
  • Loading branch information
maekawatoshiki and ggerganov authored Mar 13, 2023
1 parent 2a20f48 commit 63fd76f
Showing 1 changed file with 4 additions and 0 deletions.
4 changes: 4 additions & 0 deletions main.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,10 @@ struct llama_model {
bool llama_model_load(const std::string & fname, llama_model & model, gpt_vocab & vocab, int n_ctx) {
printf("%s: loading model from '%s' - please wait ...\n", __func__, fname.c_str());

std::vector<char> f_buf(1024*1024);

auto fin = std::ifstream(fname, std::ios::binary);
fin.rdbuf()->pubsetbuf(f_buf.data(), f_buf.size());
if (!fin) {
fprintf(stderr, "%s: failed to open '%s'\n", __func__, fname.c_str());
return false;
Expand Down Expand Up @@ -325,6 +328,7 @@ bool llama_model_load(const std::string & fname, llama_model & model, gpt_vocab
printf("%s: loading model part %d/%d from '%s'\n", __func__, i+1, n_parts, fname_part.c_str());

fin = std::ifstream(fname_part, std::ios::binary);
fin.rdbuf()->pubsetbuf(f_buf.data(), f_buf.size());
fin.seekg(file_offset);

// load weights
Expand Down

0 comments on commit 63fd76f

Please sign in to comment.