Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Break added documents into batches when processing solution/compilation translation states. #72621

Merged
merged 29 commits into from
Mar 21, 2024

Conversation

CyrusNajmabadi
Copy link
Member

I noticed high overhead due to how many forks we make of all data when we linearize each added document into a different state. Batching heavily removes that overhead, while also allowing us to still benefit from parallelization in many cases.

@CyrusNajmabadi CyrusNajmabadi requested a review from ToddGrun March 20, 2024 19:03
@CyrusNajmabadi CyrusNajmabadi requested review from a team as code owners March 20, 2024 19:03
@dotnet-issue-labeler dotnet-issue-labeler bot added Area-IDE untriaged Issues and PRs which have not yet been triaged by a lead labels Mar 20, 2024
// Note: we intentionally kick these off in a fire-and-forget fashion. If we get canceled, all
// the tasks will attempt to cancel. If we complete, that's only because these tasks would
// complete as well. There's no need to track this with any sort of listener as this work just
// acts to speed up getting to a compilation, but is otherwise unobservable.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we were already doing this when we got to the add-docuemtns action, which already processes thigns in parallel. so this didn't seem to actually help as much as i wanted, and it meant a lot of task explosion as we were processing a long action queue.

@@ -9,6 +9,7 @@
using System.Diagnostics.CodeAnalysis;
using System.Linq;
using System.Runtime.CompilerServices;
using System.Runtime.InteropServices;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was this added for?

@dotnet dotnet deleted a comment from azure-pipelines bot Mar 20, 2024
Comment on lines 156 to 157
var trees = await Task.WhenAll(tasks).ConfigureAwait(false);
currentCompilation = currentCompilation.AddSyntaxTrees(trees);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we not want to collect all the trees separately, and then add them at the end?

for (int i = batchStart, batchEnd = Math.Min(batchStart + AddDocumentsBatchSize, total); i < batchEnd; i++)
{
var document = this.Documents[i];
tasks.Add(Task.Run(async () => await document.GetSyntaxTreeAsync(cancellationToken).ConfigureAwait(false), cancellationToken));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Consider ignoring cancellation exceptions in these tasks (always returning null instead), and then check for cancellation after the WhenAll completes to avoid throwing more exceptions than necessary.

Copy link
Contributor

@ToddGrun ToddGrun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

using System.Collections.Immutable;
using System.Linq;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't spot what required this.


for (int i = batchStart, batchEnd = Math.Min(batchStart + AddDocumentsBatchSize, total); i < batchEnd; i++)
{
var document = this.Documents[i];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

var document = this.Documents[i];

One particularly slow GetSyntaxTreeAsync call here could completely block any new work from starting. What about using a semaphore to do the parallelism? Something like:

var throttler = new SemaphoreSlim(initialCount: AddDocumentsBatchSize);
var tasks = this.Documents.Select(async document =>
{
    await throttler.WaitAsync(cancellationToken).ConfigureAwait(false);
    try
    {
        return await document.GetSyntaxTreeAsync(cancellationToken).ConfigureAwait(false);
    }
    finally
    {
        throttler.Release();
    }
});

var batch = await Task.WhenAll(tasks).ConfigureAwait(false);
trees.AddRange(batch);

var semaphore = new SemaphoreSlim(initialCount: AddDocumentsBatchSize);
foreach (var document in this.Documents)
{
tasks.Add(Task.Run(async () =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tasks.Add(Task.Run(async () =>

The only slight disadvantage of the current form that I see is that all the tasks are always created now and that cancellation will have to go through all of them. By having the semaphore usage outside the task, could that be avoided?

@CyrusNajmabadi CyrusNajmabadi merged commit c4f4202 into dotnet:main Mar 21, 2024
30 checks passed
@CyrusNajmabadi CyrusNajmabadi deleted the docStates branch March 21, 2024 15:41
@dotnet-policy-service dotnet-policy-service bot added this to the Next milestone Mar 21, 2024
@RikkiGibson RikkiGibson modified the milestones: Next, 17.10 P3 Mar 25, 2024
@CyrusNajmabadi
Copy link
Member Author

@jasonmalinowski For review when you get back.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-IDE untriaged Issues and PRs which have not yet been triaged by a lead
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants