-
Notifications
You must be signed in to change notification settings - Fork 8.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce virtual function usage to mitigate performance penalty due to CFG #10610
Comments
As for the actual performance gain, I found 10% improvement in VT throughput in my experimental branch. For the main branch right now, the performance gain isn't really obvious, because other code paths are too slow 😅 . |
WDDM, BGFX and GDI are selected at runtime; there is no separate build of conhost for OneCore machines. |
That's sad :( . Can we at least make VtRenderer a compile-time thing? And those renderer in WT. |
When the DX engine is in conhost it too is selected at runtime. I suspect we will see more improvement from reducing how many times we have to call through the virtual functions and increasing the amount of data we exchange for each one than by simply using a static interface. |
Great idea! If you're OK with what I did in #10596 (comment) , we can still reduce a lot of unnecessary virtual function calls. |
I'm not sure. Wouldn't better batching help every renderer at the same time? For what it's worth, I did once propose having a "render feature flags" enum such that a render engine could specify what it needed (rather than what it is). I think that that is a better granularity than trying to teach the base renderer about the different kinds of render engine. After all, what you've done is reimplement RTTI but as an enum value returned from a method, and it would break encapsulation in the same exact way as using RTTI would. |
At first I think "render feature flag" sounds like a nice idea. However, after the experiment (#10615), I think that is not really needed, because most of the functionalities in the base class can simply be extracted into smaller helper classes. We might want to choose the "composition over inheritance" path. |
Making the dispatch explicit would also solve any CFG-related performance issues:
This is very easy for branch predictors to handle, so could be used in hot code paths. |
What is CFG?
Control Flow Guard (CFG) is MSVC/Win32 feature that can combat memory exploits & vulnerabilities. I found this accidentally when I was exploring the performance bottleneck in #10564. I found that even if I replace the entire
GetPatternId
withreturn 0
, that call is still very expensive. Then I realized it's not about what's inside the call, it's about the call itself. For every virtual function call, the compiler will insert the following instruction before the call:This will then leads to several extra instructions before calling the actual method.
Disabling CFG
CFG can be disabled in the system setting (restart required). With CFG disabled system-wide, the
call __guard_dispatch_icall_fptr
will simple return quickly without the extra instructions.CFG can also be disabled in compiler option
(/guard:cf)
. I failed to actually disable this in WT for some reason. I suppose with the option disabled, no__guard_dispatch_icall_fptr
will be inserted any more.Proposed solution
OK here's my refurbished plan;
IRenderEngine
and replace them with a single methodPaintFrame(IRenderData *pData)
(Initial refactor of IRenderEngine interface #10615)InvalidateSomething
methods inIRenderEngine
withTriggerSomething
to give the actual engine more control & reduce the coupling withrenderer.cpp
.CC @DHowett @miniksa @zadjii-msft @lhecker for discussion.
The text was updated successfully, but these errors were encountered: