Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attempt to estimate the minimum required buffer length when initializing StreamsSequenceStream instances #9925

Merged

Conversation

Snuffleupagus
Copy link
Collaborator

For most other DecodeStream based streams, we'll attempt to estimate the minimum buffer length based on the raw stream data. The purpose of this is to avoid having to unnecessarily re-size the buffer, thus reducing the number of intermediate allocations necessary when decoding the stream data.
However, currently no such optimization is attempted for StreamsSequenceStream, and given that they can often be quite large that seems unfortunate. To improve this, at least somewhat, this patch utilizes the raw sizes of the StreamsSequenceStream sub-streams to estimate the minimum required buffer length.

Most likely this patch won't have a huge effect on memory consumption, however for pathological cases it should help reduce peak memory usage slightly.
One example is the PDF file in issue #2813, where currently the StreamsSequenceStream instances would grow their buffers as 2 MiB -> 4 MiB -> 8 MiB -> 16 MiB -> 32 MiB. With this patch, the same stream bufferss grow as 8 MiB -> 16 MiB -> 32 MiB, thus avoiding a total of 12 MiB of intermediate allocations (since there's two StreamsSequenceStream used, for rendering/text-extraction).

…izing `StreamsSequenceStream` instances

For most other `DecodeStream` based streams, we'll attempt to estimate the minimum `buffer` length based on the raw stream data. The purpose of this is to avoid having to unnecessarily re-size the `buffer`, thus reducing the number of *intermediate* allocations necessary when decoding the stream data.
However, currently no such optimization is attempted for `StreamsSequenceStream`, and given that they can often be quite large that seems unfortunate. To improve this, at least somewhat, this patch utilizes the raw sizes of the `StreamsSequenceStream` sub-streams to estimate the minimum required `buffer` length.

Most likely this patch won't have a huge effect on memory consumption, however for pathological cases it should help reduce peak memory usage slightly.
One example is the PDF file in issue 2813, where currently the `StreamsSequenceStream` instances would grow their `buffer`s as `2 MiB -> 4 MiB -> 8 MiB -> 16 MiB -> 32 MiB`. With this patch, the same stream `buffers`s grow as `8 MiB -> 16 MiB -> 32 MiB`, thus avoiding a total of `12 MiB` of *intermediate* allocations (since there's two `StreamsSequenceStream` used, for rendering/text-extraction).
@Snuffleupagus
Copy link
Collaborator Author

/botio-windows test

@pdfjsbot
Copy link

From: Bot.io (Windows)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.215.176.217:8877/c1b60031b515e90/output.txt

@pdfjsbot
Copy link

From: Bot.io (Windows)


Success

Full output at http://54.215.176.217:8877/c1b60031b515e90/output.txt

Total script time: 26.48 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: Passed

@timvandermeij
Copy link
Contributor

/botio-linux test

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Received

Command cmd_test from @timvandermeij received. Current queue size: 0

Live output at: http://54.67.70.0:8877/0a57612ad95fb08/output.txt

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Success

Full output at http://54.67.70.0:8877/0a57612ad95fb08/output.txt

Total script time: 19.47 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: Passed

@timvandermeij
Copy link
Contributor

/botio-linux preview

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Received

Command cmd_preview from @timvandermeij received. Current queue size: 0

Live output at: http://54.67.70.0:8877/eb6fc56209447cb/output.txt

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Success

Full output at http://54.67.70.0:8877/eb6fc56209447cb/output.txt

Total script time: 2.89 mins

Published

@timvandermeij timvandermeij merged commit a2c317f into mozilla:master Jul 29, 2018
@timvandermeij
Copy link
Contributor

Nice find!

@Snuffleupagus Snuffleupagus deleted the StreamsSequenceStream-maybeLength branch July 30, 2018 08:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants