-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
scalars: multiplex data fetches within a tag #4050
Conversation
Summary: The data loader behavior maintains a fine-grained cache of key-value pairs. This is useful because when the set of requested keys expands, only the new keys need to be fetched. But, until now, the behavior has been hard-coded to fire a separate request for each key-value pair. Clients can have either fine-grained cache invalidation or efficient batched requests, but not both at the same time. This patch enriches the behavior to support just that. In a future change, the scalars dashboard will take advantage of this to batch requests for multiple runs and a single tag. In doing so, we need to shuffle around the API a bit. Instead of asking clients to provide `getDataLoadUrl: (Item) => string` plus a separate function `requestData: (url: string) => Promise<Data>` (where “`Item`” and “`Data`” are the keys and values, respectively), clients now provide a single function `requestData` that takes raw `Item`s (now plural), performs the request(s), and returns the data. The function provides a stream of key-value pairs, which is represented in callback style for convenience. (We don’t want to drag Observable into this.) The purpose of this approach, as opposed to a perhaps more natural approach that simply adapts `getDataLoadUrl` to return some kind of request struct with a callback to map a response into key-value pairs, is to accommodate the variety of existing clients. The structures get pretty wild: e.g., `tf-line-chart-data-loader` mixes in the behavior but doesn’t actually provide the needed properties; they’re provided instead by `tf-scalar-card`, but then `tf-hparams-session-group-details` further overrides some of those properties of `tf-scalar-card` with an entirely different backend. It’s a bit wordier for clients, but at least there are fewer moving pieces to keep track of. Test Plan: The scalars, custom scalars, distributions, histograms, and hparams dashboards all work. The fine-grained invalidation on the scalars dashboard works: e.g., set the tag filter to `mnist` and then to `mnist|hparams`, and watch only the hparams demo data load; then, set it to `hparams` and watch the MNIST charts disappear without any repaints to the hparams demo charts. The post-load callback properly causes scalar charts’ domains to adjust. The refresh button in the dashboard UI properly invalidates and re-fetches data. (Make sure to run with `TB_POLYMER3=1` until that’s the default.) wchargin-branch: dlb-batch-finegrained wchargin-source: 14ec8abde36c4563a4922209d361fc5bd16b6061
Summary: DO NOT SUBMIT until corresponding internal change is submitted. Test Plan: Manually tested; to be documented. wchargin-branch: scalars-mux-runs wchargin-source: 9459c8cc6bc0b041016dfc689306639a04080217
No internal change actually needed. (The relevant internal ML dashboard |
With a networked data provider (Google-internally) on a dataset with |
wchargin-branch: scalars-mux-runs wchargin-source: 2e1bd5c8cd3393ac0a79c64b0c87ea43bf346b8a # Conflicts: # tensorboard/plugins/scalar/polymer3/tf_scalar_dashboard/tf-scalar-card.ts
wchargin-branch: scalars-mux-runs wchargin-source: 2e1bd5c8cd3393ac0a79c64b0c87ea43bf346b8a
wchargin-branch: scalars-mux-runs wchargin-source: 0484e726315d46c282c3d61994ffb9e4dc07c20b
Accepts form-encoded POST data with a (required) singleton key `tag` and a | ||
repeated key `runs`. Returns a JSON object mapping run names to arrays of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the contract if I violate this constraint? (0 or 2+ tag
)? Unlike the zero length runs
case, it looks like we are throwing 400.
Also, do you think it is a good idea to perhaps write an example of the request?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Zero-length runs
is fine; it just returns an empty object. Missing
tag
is a 400. Duplicate tag
s will silently pick one of them
(probably last?). I can make that an error, though; probably a good
idea. Thanks.
Also, do you think it is a good idea to perhaps write an example of
the request?
Yep. Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I expected to see FormData but this works too :)
return this.requestManager.request(url, {tag, runs}).then((allData) => { | ||
for (const run of runs) { | ||
const item = {tag, run}; | ||
if (Object.prototype.hasOwnProperty.call(allData, run)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason why we have to use Object.prototype.hasOwnProperty? (i.e., why not allData.hasOwnProperty
?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you have a run called hasOwnProperty
, then that would be a type
error.
> ({train: 1}).hasOwnProperty("train")
true
> ({train: 1, hasOwnProperty: 2}).hasOwnProperty("train")
Thrown:
TypeError: {(intermediate value)(intermediate value)}.hasOwnProperty is not a function
> Object.prototype.hasOwnProperty.call({train: 1, hasOwnProperty: 2}, "train")
true
Yeah.
I prefer to write this kind of code defensively—even if it seems
unlikely, it sometimes happens (e.g.: #1283).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One day https://github.com/hasOwnProperty will show up in a JSON
object mapping GitHub login names to database IDs, and someone will have
a fun time…
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Thanks.
wchargin-branch: scalars-mux-runs wchargin-source: a4ca83ea5bf6b665052c076df835592105e5eefb
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks!
return this.requestManager.request(url, {tag, runs}).then((allData) => { | ||
for (const run of runs) { | ||
const item = {tag, run}; | ||
if (Object.prototype.hasOwnProperty.call(allData, run)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you have a run called hasOwnProperty
, then that would be a type
error.
> ({train: 1}).hasOwnProperty("train")
true
> ({train: 1, hasOwnProperty: 2}).hasOwnProperty("train")
Thrown:
TypeError: {(intermediate value)(intermediate value)}.hasOwnProperty is not a function
> Object.prototype.hasOwnProperty.call({train: 1, hasOwnProperty: 2}, "train")
true
Yeah.
I prefer to write this kind of code defensively—even if it seems
unlikely, it sometimes happens (e.g.: #1283).
Accepts form-encoded POST data with a (required) singleton key `tag` and a | ||
repeated key `runs`. Returns a JSON object mapping run names to arrays of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Zero-length runs
is fine; it just returns an empty object. Missing
tag
is a 400. Duplicate tag
s will silently pick one of them
(probably last?). I can make that an error, though; probably a good
idea. Thanks.
Also, do you think it is a good idea to perhaps write an example of
the request?
Yep. Done.
return this.requestManager.request(url, {tag, runs}).then((allData) => { | ||
for (const run of runs) { | ||
const item = {tag, run}; | ||
if (Object.prototype.hasOwnProperty.call(allData, run)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Thanks.
Accepts form-encoded POST data with a (required) singleton key `tag` and a | ||
repeated key `runs`. Returns a JSON object mapping run names to arrays of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I expected to see FormData but this works too :)
Summary: In #4050, we added multiplexing to scalar chart fetches for performance. This works nicely in local TensorBoard and public Colab, but the POST request machinery isn’t yet supported in the Google-internal version of Colab, so #4050 caused a regression there. This patch conditionally rolls back the change for Colab only. This includes rolling it back for public Colab, where it worked fine. We hope that this isn’t too big of a problem, since we expect that most Colab users are inspecting datasets with few runs (generated from within Colab) rather than massive hyperparameter sweeps. We also hope that we can simply revert this patch once the Google-internal behavior is fixed. Jupyter environments are unaffected (and still work). We used to have a global `window.TENSORBOARD_ENV` that listed whether we were in Colab, but we removed that in #2798 because we thought that it was no longer necessary. Since in that PR we switched to create an iframe rather than manually linking and loading the HTML (much nicer, to be sure), we now plumb this information via a query parameter. This also has the advantage that it’s easy to test the Colab codepaths by simply adding that query parameter to a normal TensorBoard instance. Test Plan: First, test that normal TensorBoard (`bazel run //tensorboard`) still works with multiplexing, and that adding `?tensorboardColab=true` causes it to send multiple GET requests instead. Then, build the Pip package (`bazel run //tensorboard/pip_package:extract_pip_packages`), upload it to public Colab, and install it into the runtime. Verify that the scalar charts still work there, albeit without multiplexing. Finally, cherry-pick these changes into google3 via a test sync and follow the test plan at <http://cl/333398676> to verify the fix. wchargin-branch: scalars-nomux-colab wchargin-source: 453503f1dea196985e889be483c2a7675cc87aa1
Summary: In #4050, we added multiplexing to scalar chart fetches for performance. This works nicely in local TensorBoard and public Colab, but the POST request machinery isn’t yet supported in the Google-internal version of Colab, so #4050 caused a regression there. This patch conditionally rolls back the change for Colab only. This includes rolling it back for public Colab, where it worked fine. We hope that this isn’t too big of a problem, since we expect that most Colab users are inspecting datasets with few runs (generated from within Colab) rather than massive hyperparameter sweeps. We also hope that we can simply revert this patch once the Google-internal behavior is fixed. Jupyter environments are unaffected (and still work). We used to have a global `window.TENSORBOARD_ENV` that listed whether we were in Colab, but we removed that in #2798 because we thought that it was no longer necessary. Since in that PR we switched to create an iframe rather than manually linking and loading the HTML (much nicer, to be sure), we now plumb this information via a query parameter. This also has the advantage that it’s easy to test the Colab codepaths by simply adding that query parameter to a normal TensorBoard instance. Fixes #4174. Test Plan: First, test that normal TensorBoard (`bazel run //tensorboard`) still works with multiplexing, and that adding `?tensorboardColab=true` causes it to send multiple GET requests instead. Then, build the Pip package (`bazel run //tensorboard/pip_package:extract_pip_packages`), upload it to public Colab, and install it into the runtime. Verify that the scalar charts still work there, albeit without multiplexing. Finally, cherry-pick these changes into google3 via a test sync and follow the test plan at <http://cl/333398676> to verify the fix. wchargin-branch: scalars-nomux-colab
Summary:
As of this patch, a
tf-scalar-card
can make just one network requestto fetch its data, instead of one request per time series (i.e., one
request per run, since each scalar chart renders a single tag). This
reduces network overhead, improves throughput due to higher request
concurrency, and offers the opportunity for data providers to more
efficiently request the data in batch.
This is implemented with a new POST route
/scalars_multirun
, since thelist of run names may be long. The frontend is configured to batch
requests to at most 64 runs at once, so the multiplexing is bounded.
This only affects the scalars plugin. Other scalar chart sources, like
PR curves, custom scalars, and the hparams metrics views, are not
affected.
Supersedes #3835, with the same idea and same general backend approach,
but using the frontend APIs enabled by #4045.
Test Plan:
On the hparams demo with four charts showing, each fetching 50 runs, we
now make only four requests as opposed to 200. On a Google-internal
networked data provider, this improves end-to-end time (measured from
“first spinner appears” to “last spinner disappears”) by about 50%, from
22±1 seconds to 11±1 seconds. (Before this patch, the network requests
were being queued all the way to the end of the test period.)
Changing the batch size to 4 and then running on a dataset with 14 runs
shows that the requests are properly batched, including the last one
with just 2 runs.
Testing hparams, custom scalars, and PR curves shows that they continue
to work, even when multiple time series are requested.
wchargin-branch: scalars-mux-runs