-
Notifications
You must be signed in to change notification settings - Fork 5
Optimize retrieval to do minimal fetcher from singularity #195
Conversation
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## main #195 +/- ##
==========================================
+ Coverage 31.80% 38.51% +6.71%
==========================================
Files 5 6 +1
Lines 761 914 +153
==========================================
+ Hits 242 352 +110
- Misses 482 516 +34
- Partials 37 46 +9
|
3b94982
to
f249e19
Compare
3a0c7c2
to
bfb3d2b
Compare
bfb3d2b
to
a837654
Compare
I see the intent of reducing number of requests to Singularity API, and I understand it is not simple to implement io.ReadSeekCloser by making requests to an HTTP API. PassThrough interface {
Pass(context.Context, http.ResponseWriter, http.Request, ID) error
}
handleBlobGetByID(w, r, id) {
if pass, ok := m.store.(PassThrough); ok {
return pass.Pass(ctx, w, r, id)
} else {
blobReader, err := m.store.Get(r.Context(), id)
}
} what do you think? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see above
@xinaxu I think it makes sense. So the PassThrough will translate the blob ID into a singularity file ID, and then call the singularity API. The generated Singularity client code is called like this: r.client.File.RetrieveFile(&file.RetrieveFileParams{
Context: ctx,
ID: fileID,
Range: ptr.String(byteRange),
}, w) So it seems like the generated client will not be usable for PassThrough. Should a new client function be created, or should motion construct a custom HTTP request? |
Could you explain why it is not usable? |
Actually, I suppose it will now that I see that the |
Read HTTP range headers and do fetch for entire range instead of allowing the io.CopyN, used by http.ServeContent, to do multiple small fetches. Fixes #143
Need to parse range headers so that entire range can be requested from singularity. Otherwise it is necessary to read the whole file. An optimization would be to determine if the ranges are contiguous, and then only make one request that spans contiguous ranges. This is not necessary though if clients already combine contiguous ranges into a single larger range.
c4eb394
to
419afb6
Compare
} | ||
logger.Info("Retrieved file", "id", id.String()) | ||
} | ||
|
||
func (s *SingularityStore) Get(ctx context.Context, id blob.ID) (io.ReadSeekCloser, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: do you think it's better to just return
new Error("not implemented, please use PassGet")
so we don't end up maintaining code path that never gets hit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left Get
because I did not know if there would ever be a case where we want to retrieve a blob from singularity, but we are not doing this in response to an HTTP request. For example, suppose we want to retrieve a file from singularity, and then do something to that file (filter it, compress it, etc.), and then return the result. In that case, we will need the original Get
. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can always remove it later in another PR if we determine there is no use.
Read HTTP range headers and do fetch for entire range instead of allowing the
io.CopyN
, used byhttp.ServeContent
, to do multiple small fetches.Fixes #143