Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce virtual function usage to mitigate performance penalty due to CFG #10610

Open
skyline75489 opened this issue Jul 11, 2021 · 8 comments
Open
Labels
Area-Performance Performance-related issue Issue-Feature Complex enough to require an in depth planning process and actual budgeted, scheduled work. Product-Conpty For console issues specifically related to conpty Product-Terminal The new Windows Terminal.
Milestone

Comments

@skyline75489
Copy link
Collaborator

skyline75489 commented Jul 11, 2021

What is CFG?

Control Flow Guard (CFG) is MSVC/Win32 feature that can combat memory exploits & vulnerabilities. I found this accidentally when I was exploring the performance bottleneck in #10564. I found that even if I replace the entire GetPatternId with return 0, that call is still very expensive. Then I realized it's not about what's inside the call, it's about the call itself. For every virtual function call, the compiler will insert the following instruction before the call:

call __guard_dispatch_icall_fptr, [SomeMethod]

This will then leads to several extra instructions before calling the actual method.

Disabling CFG

CFG can be disabled in the system setting (restart required). With CFG disabled system-wide, the call __guard_dispatch_icall_fptr will simple return quickly without the extra instructions.

CFG can also be disabled in compiler option (/guard:cf). I failed to actually disable this in WT for some reason. I suppose with the option disabled, no __guard_dispatch_icall_fptr will be inserted any more.

Proposed solution

OK here's my refurbished plan;

  1. Remove all render related methods in IRenderEngine and replace them with a single method PaintFrame(IRenderData *pData) (Initial refactor of IRenderEngine interface #10615)
  2. Replace InvalidateSomething methods in IRenderEngine with TriggerSomething to give the actual engine more control & reduce the coupling with renderer.cpp.
  3. Let's see what we can do for the left methods.

CC @DHowett @miniksa @zadjii-msft @lhecker for discussion.

@skyline75489 skyline75489 added Issue-Feature Complex enough to require an in depth planning process and actual budgeted, scheduled work. Area-Performance Performance-related issue Product-Conpty For console issues specifically related to conpty Product-Terminal The new Windows Terminal. labels Jul 11, 2021
@ghost ghost added the Needs-Triage It's a new issue that the core contributor team needs to triage at the next triage meeting label Jul 11, 2021
@skyline75489
Copy link
Collaborator Author

As for the actual performance gain, I found 10% improvement in VT throughput in my experimental branch.

For the main branch right now, the performance gain isn't really obvious, because other code paths are too slow 😅 .

@DHowett
Copy link
Member

DHowett commented Jul 11, 2021

WDDM, BGFX and GDI are selected at runtime; there is no separate build of conhost for OneCore machines.

@skyline75489
Copy link
Collaborator Author

That's sad :( . Can we at least make VtRenderer a compile-time thing? And those renderer in WT.

@DHowett
Copy link
Member

DHowett commented Jul 11, 2021

When the DX engine is in conhost it too is selected at runtime.

I suspect we will see more improvement from reducing how many times we have to call through the virtual functions and increasing the amount of data we exchange for each one than by simply using a static interface.

@skyline75489
Copy link
Collaborator Author

reducing how many times we have to call through the virtual functions

Great idea!

If you're OK with what I did in #10596 (comment) , we can still reduce a lot of unnecessary virtual function calls.

@DHowett
Copy link
Member

DHowett commented Jul 11, 2021

I'm not sure. Wouldn't better batching help every renderer at the same time?

For what it's worth, I did once propose having a "render feature flags" enum such that a render engine could specify what it needed (rather than what it is). I think that that is a better granularity than trying to teach the base renderer about the different kinds of render engine. After all, what you've done is reimplement RTTI but as an enum value returned from a method, and it would break encapsulation in the same exact way as using RTTI would.

@zadjii-msft zadjii-msft removed the Needs-Triage It's a new issue that the core contributor team needs to triage at the next triage meeting label Aug 5, 2021
@zadjii-msft zadjii-msft added this to the Terminal Backlog milestone Aug 5, 2021
@skyline75489
Copy link
Collaborator Author

At first I think "render feature flag" sounds like a nice idea. However, after the experiment (#10615), I think that is not really needed, because most of the functionalities in the base class can simply be extracted into smaller helper classes. We might want to choose the "composition over inheritance" path.

@zadjii-msft zadjii-msft modified the milestones: Terminal Backlog, Backlog Jan 4, 2022
@sfiruch
Copy link

sfiruch commented Jul 20, 2023

Making the dispatch explicit would also solve any CFG-related performance issues:

if(flagIsFeature1Available)
   useFeature1();
else if(flagIsFeature2Available)
   useFeature2();
else
   useFallback();

This is very easy for branch predictors to handle, so could be used in hot code paths.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-Performance Performance-related issue Issue-Feature Complex enough to require an in depth planning process and actual budgeted, scheduled work. Product-Conpty For console issues specifically related to conpty Product-Terminal The new Windows Terminal.
Projects
None yet
Development

No branches or pull requests

4 participants