Skip to content
This repository has been archived by the owner on May 15, 2024. It is now read-only.

Optimize retrieval to do minimal fetcher from singularity #195

Merged
merged 15 commits into from
Nov 3, 2023

Conversation

gammazero
Copy link
Collaborator

Read HTTP range headers and do fetch for entire range instead of allowing the io.CopyN, used by http.ServeContent, to do multiple small fetches.

Fixes #143

@codecov-commenter
Copy link

codecov-commenter commented Oct 30, 2023

Codecov Report

Merging #195 (419afb6) into main (0ef904d) will increase coverage by 6.71%.
The diff coverage is 51.89%.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #195      +/-   ##
==========================================
+ Coverage   31.80%   38.51%   +6.71%     
==========================================
  Files           5        6       +1     
  Lines         761      914     +153     
==========================================
+ Hits          242      352     +110     
- Misses        482      516      +34     
- Partials       37       46       +9     
Files Coverage Δ
blob/blob.go 37.50% <ø> (ø)
integration/singularity/range_reader.go 78.37% <78.37%> (ø)
integration/singularity/reader.go 61.78% <63.46%> (+61.78%) ⬆️
integration/singularity/store.go 35.36% <2.27%> (-1.56%) ⬇️

@gammazero gammazero marked this pull request as draft October 30, 2023 16:42
@gammazero gammazero marked this pull request as ready for review October 30, 2023 20:22
@filecoin-project filecoin-project deleted a comment from willscott Oct 31, 2023
@gammazero gammazero requested review from xinaxu and rvagg October 31, 2023 16:26
@gammazero gammazero marked this pull request as draft October 31, 2023 19:54
@gammazero gammazero marked this pull request as ready for review October 31, 2023 22:43
@xinaxu
Copy link
Collaborator

xinaxu commented Nov 1, 2023

I see the intent of reducing number of requests to Singularity API, and I understand it is not simple to implement io.ReadSeekCloser by making requests to an HTTP API.
If we don't have the Store.Get interface, we can simply open another HTTP request to Singularity API with the same range header and let the Singularity API handle the request. It seems that Store.Get is making this more difficult than I thought.
I am concerned that having a custom implementation of ServeContent may be hard to maintain and test in long run, and outweights the benefits of complying to Store.Get.
Hence, I am proposing to have Singularity_store implement a pass_through interface and bypass the store.Get interface

PassThrough interface {
  Pass(context.Context, http.ResponseWriter, http.Request, ID) error
}

handleBlobGetByID(w, r, id) {
  if pass, ok := m.store.(PassThrough); ok {
    return pass.Pass(ctx, w, r, id)
  } else {
    blobReader, err := m.store.Get(r.Context(), id)
  }
}

what do you think?

Copy link
Collaborator

@xinaxu xinaxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see above

@gammazero
Copy link
Collaborator Author

@xinaxu I think it makes sense. So the PassThrough will translate the blob ID into a singularity file ID, and then call the singularity API. The generated Singularity client code is called like this:

r.client.File.RetrieveFile(&file.RetrieveFileParams{
    Context: ctx,
    ID:      fileID,
    Range:   ptr.String(byteRange),
}, w)

So it seems like the generated client will not be usable for PassThrough. Should a new client function be created, or should motion construct a custom HTTP request?

@xinaxu
Copy link
Collaborator

xinaxu commented Nov 2, 2023

So it seems like the generated client will not be usable for PassThrough.

Could you explain why it is not usable?

@gammazero
Copy link
Collaborator Author

Could you explain why it is not usable?

Actually, I suppose it will now that I see that the Range parameter is the whole range header, not just a single range as I had previously though. I am coding/testing pass-through now.

Read HTTP range headers and do fetch for entire range instead of allowing the io.CopyN, used by http.ServeContent, to do multiple small fetches.

Fixes #143
Need to parse range headers so that entire range can be requested from singularity. Otherwise it is necessary to read the whole file.

An optimization would be to determine if the ranges are contiguous, and then only make one request that spans contiguous ranges. This is not necessary though if clients already combine contiguous ranges into a single larger range.
}
logger.Info("Retrieved file", "id", id.String())
}

func (s *SingularityStore) Get(ctx context.Context, id blob.ID) (io.ReadSeekCloser, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: do you think it's better to just return

new Error("not implemented, please use PassGet")

so we don't end up maintaining code path that never gets hit

Copy link
Collaborator Author

@gammazero gammazero Nov 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left Get because I did not know if there would ever be a case where we want to retrieve a blob from singularity, but we are not doing this in response to an HTTP request. For example, suppose we want to retrieve a file from singularity, and then do something to that file (filter it, compress it, etc.), and then return the result. In that case, we will need the original Get. WDYT?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can always remove it later in another PR if we determine there is no use.

@gammazero gammazero requested a review from xinaxu November 2, 2023 23:09
@gammazero gammazero merged commit 9218c3e into main Nov 3, 2023
@gammazero gammazero deleted the optimize-retrieve branch November 3, 2023 17:05
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Retrieval Optimization To Singularity
3 participants