Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

geo_heap and cb_heap sizes #47

Open
ghost opened this issue Mar 25, 2018 · 23 comments
Open

geo_heap and cb_heap sizes #47

ghost opened this issue Mar 25, 2018 · 23 comments

Comments

@ghost
Copy link

ghost commented Mar 25, 2018

over at https://github.com/Firerat/wine-pba/tree/heap_size_envvars
specifically 0011-wined3d-GEO-and-CB-heap-size-override-envvars.patch
0010-OPTIONAL-wined3d-GEO-and-CB-heap-size-envvars.patch

I have added option to tweak the geo and cb heap sizes
the reason behind this was to enable PBA on my puny laptop
it works __PBA_GEO_HEAP=200 __PBA_CB_HEAP=1 wine somedx9game.exe

@GloriousEggroll has used this to get FF XIV to behave along with Guild Wars 2 and Warframe render lag
@shmerl has had some limited success with TheWitcher3

for reference, current PBA hardcoded defaults are Geo = 512mb and cb = 128

I would love for people to both

  1. check my code ( my c is very limited )
  2. try these envvars to resolve issues with specific games

at least initially it is probably better to concentrate on the geo heap size, if things appear to be missing, up it a little.

please mention your vid cards ram

this one should apply over an existing wine-pba
wined3d-GEO-and-CB-heap-size-override-envvars.patch.gz
0010-OPTIONAL-wined3d-GEO-and-CB-heap-size-envvars.patch.gz

no need to get my full repo or anything like that

@IngeniousDox
Copy link

I don't think I have issues right now, but I'm willing to build and play with the env vars to see if I can get better results (or worse).

Could you explain what each of the envars do? What increasing / decreasing gives? And what size they should be min/max. And perhaps a suggested size based on VRAM.

@ghost
Copy link
Author

ghost commented Mar 26, 2018

I'm not sure I can explain it very well, I shall try

https://comminos.com/posts/2018-02-21-wined3d-profiling.html

Suppose we had access to a large, persistently mapped buffer in the host address space. We never had to unmap it to make a draw call, and writes to it were coherently visible to the GPU without any GL calls.

If the flag D3DLOCK_NOOVERWRITE was provided, then we can return the address of the last persistent mapping for that buffer.
Just pointer arithmetic- no need to talk to the command stream thread!
D3DLOCK_NOOVERWRITE fundamentally lets us ignore synchronization.
If the flag D3DLOCK_DISCARD was provided, then we can remap the buffer to an unused section of persistently mapped GPU memory.
A bit trickier to implement- we’ll need an allocator in order to avoid fragmentation.
Otherwise, we need to wait for the GPU to finish using the buffer (i.e. talk to the command stream thread).
This is fine for our purposes, since this (common) streaming geometry technique doesn’t need to wait for buffers.

This is the GEO and CB bit

Enter the holy grail: ARB_buffer_storage. This lets us allocate an immutable section of GPU memory, and allow persistent (always mapped) and coherent (write-through) maps of it. We’re effectively replacing the role of the driver here, which would handle DISCARD (INVALIDATE in GL) and NOOVERWRITE (UNSYNCHRONIZED in GL) buffer maps itself.

This is an AZDO (approaching zero driver overhead) style GL extension. If you’re interested, check out this article by NVIDIA.

in short , a big chunk of video ram that we use so we don't have to shuffle around in memory so much

some practical observations

  • FF XIV , ( using dx9 ) this game would crash with the default GEO=512 , CB=128
    reducing the CB heap to 1, and the GEO to 256 prevents the crash
    Note, setting CB to 0 would actually get you the default 128, due to my crude coding

  • dx11 had other issues, ( un-rendered textures ) observed upstream also.. However that may have been resolved upstream ( could also be related to mesa, idk ) , also see TheWitcher3 below

  • The Witcher3, un-rendered vegetation , upping GEO to 2048, ( and CB to 1024.. but that may not be relevant ) brings back vegetation

  • Warframe, significant lag rendering when changing zones ( invisible mobs if I recall the video correctly ) , upping GEO to 2048 fixes

And what size they should be min/max. And perhaps a suggested size based on VRAM.

Well, this is the kind of data I wish to collect
from the code

        // TODO(acomminos): kill this magic number. perhaps base on vram.
        GLsizeiptr geo_heap_size = 512 * 1024 * 1024;
        // We choose a constant buffer size of 128MB, the same as NVIDIA claims to
        // use in their Direct3D driver for discarded constant buffers.
        GLsizeiptr cb_heap_size = 128 * 1024 * 1024;

For now, leave CB_HEAP at 128, ( unless dx9, in that case set it at 1 , refer to FF XIV )

With the very limited data we have currently, it seems 2048 is likely a sweetspot.. trouble is I only have 1024 on my gt620 , so the defaults are near max for me.. oh the irony 😞

EDIT:
it is probably worth noting that my initial ( hardcoded ) hax to get PBA working on my laptop was
GEO = 1/2 vram, CB = 1/8th vram
end result was no change on desktop, and PBA working on laptop ( geo 128 ., cb 32 )

@IngeniousDox
Copy link

IngeniousDox commented Mar 26, 2018

Aight, the way I understand it now:

  1. geo_heap = Heap for changing geometry, just allocating enough in advance so the d3d thread never has to stall on making a buffer. Buffer space in here is used, then reused whenever NOOVERWRITE calls are made and it is still available. DISCARD buffers are just discarded after use, and bufferspace reused after the GPU is done with it.

  2. cb_heap = Heap for shader constants data, which should never change at all. (Introduced in dx10, so that is why it isn't used for dx9)

I understood 1) already, but I simply didn't understand what 2) was used for, but I'm assuming that is a direct mapping of DX10/11 constant shader buffers now.

Right, so we need to set sizes enough for both to make it smooth. Running out of heap space will result in the allocator having to drop older data I guess. So having more is better, but only if you don't run out of VRAM.

For my 4gb card, I'll start with GEO=1536, and CB=256, and pay attention to my VRAM usage.

PS: I compiled and ran in the VSync bug, so I won't timetest test yet. I'll rebuild fri/sat when 3.5 is out when the vsync bug is gone, and then test more.

EDIT: Reduced my starting values a bit, and removed a sentence.

@SveSop
Copy link

SveSop commented Mar 30, 2018

Have you found any performance gains tweaking these values @IngeniousDox ?

@ghost
Copy link
Author

ghost commented Mar 30, 2018

@SveSop it is not about performance

@IngeniousDox
Copy link

I figured that if the GEO/CB heap isn't big enough, you might end up in a situation where you run out of room. I do not know what happens at that point, and if you could get more performance by just having a bigger heap in the first place. Almost impossible for me to check.

That said, I played some WoW last night with those values, but the max VRAM wow ever used was around 1400mb. This was in Antorus. I don't think I gained any fps over the "original values".

@SveSop
Copy link

SveSop commented Mar 30, 2018

@FireRat I know that it is not made for performance... but it COULD have a side-effect of more performance if a particular game gets somewhat starved or whatever of heap space no?

And since me and @IngeniousDox both play wow, it was worth asking if he found something :)

@ghost
Copy link
Author

ghost commented Mar 31, 2018

I've updated my patch
https://github.com/Firerat/wine-pba/commit/9c275c9bfb341a5ea6f813306d53ccd5698368a6
0010-OPTIONAL-wined3d-GEO-and-CB-heap-size-envvars.patch.gz

no change in function, but I believe it is technically cleaner
I should put some some safe guards in, which should be easier now since I understand c a little better

@Svyatpro
Copy link

Svyatpro commented Sep 9, 2018

I think adding new dll like wined3d_dx9.dll with hardcoded GEO=256 and CB=1 for d3d9.dll and using default PBA envars for wined3d.dll for d3d11.dll could be an easy solution for most cases.

@ghost
Copy link
Author

ghost commented Sep 12, 2018

nah, a basic shell script is a far better solution.

@Svyatpro
Copy link

Yeah, but only for Linux. I am mostly testing on Windows so I have to tune values and recompile each time.

@ghost
Copy link
Author

ghost commented Sep 12, 2018

fair enough ..

here is my original patch, which used registy
https://gist.github.com/Firerat/2b90b59ecf78bbaa1a420b66ca9088ef
no idea if it will apply cleanly to current,

@ghost
Copy link
Author

ghost commented Sep 12, 2018

oh forgot
in user.reg

[Software\Wine\Direct3D]
"cb_heap"="1"
"geo_heap"="256"

@Svyatpro
Copy link

It seems it is using User-mode system memory instead of GPU memory. When I use /3GB switch it works without crashes on any GEO and CB values. I have 3GB of video memory but only 3.5GB of system memory where only 2GB of memory available for User-mode space.

@ghost
Copy link
Author

ghost commented Sep 14, 2018

I guess your application is 32bit
https://en.m.wikipedia.org/wiki/3_GB_barrier

it crashes because the 'blind' creation of the heap goes beyond the 32bit address space
the only thing you can do is reduce the heap sizes, or use a 64bit application.

@ghost ghost closed this as completed Sep 14, 2018
@ghost ghost reopened this Sep 14, 2018
@Svyatpro
Copy link

I am using 32 bit version of Windows

@qwertychouskie
Copy link

How about https://en.wikipedia.org/wiki/Physical_Address_Extension, at least for Wine? How hard would it be to implement this ability in Wine?

@ghost
Copy link
Author

ghost commented Sep 14, 2018

I am using 32 bit version of Windows

well, your 3gb video ram is wasted on that OS

How about https://en.wikipedia.org/wiki/Physical_Address_Extension, at least for Wine? How hard would it be to implement this ability in Wine?

read
https://en.m.wikipedia.org/wiki/3_GB_barrier
section Windows version dependencies

and note that Svyatpro is using 32bit wine on a 32bit windows (undisclosed version) OS
hacking pae into wine32 would not help them since a microsoft windows 32bit (non-server) OS can not use pae

normally (without PBA) the vram gets freed once the gfx job is done , so address space isn't an issue but performance is poor.
PBA hacks it so it stays in vram ( Persistent Buffer allocator ) performance is better, but 32bit clients are prone to address space running out which results in crash.

IF a 64bit client is available you should use that to avoid the limited address space problem.
IF you only have access to 32bit client then use my patches to limit the heap sizes
https://gitlab.com/Firer4t/wine-pba

some 32bit clients work fine with PBA's defaults, some don't

@Svyatpro
Copy link

Svyatpro commented Sep 14, 2018

I have WS03 which "can" see up to 64GB of RAM being 32-bit and PAE works without problems. But, the problem is that it anyway provide only 2GB of User-mode memory by default. I heard about AWE (Address Windowing Extensions) but it seems there are no other application that using it except SQL Server.

Personally, I don't care of how I waste my VRAM. I just care of WineD3D quality and I think it should be stable in all cases like DX9 on XP with any amount of RAM or VRAM - it doesn't crash being 32-bit and having low User-mode memory.

@ghost
Copy link
Author

ghost commented Sep 15, 2018

personally I think you should provide details initially instead of dripping in details bit by bit.

I'm not wasting anymore time on this.

good luck.

@IngeniousDox
Copy link

Just for the record: PBA in a form, is being scheduled for inclusion into Wine. Wine PBA in its current form is a good proof of concept, but as we all know it has its quirks. But it works. And it has been noticed.

Now I don't know the details, but they will either adapt the current implementation. Or write it from the ground up with the same concepts, but more "robust. So for now use what there is, and just have patience till it lands in actual wine in the coming months.

Source: Discord discussion with people part of Proton development.

@ghost
Copy link
Author

ghost commented Sep 15, 2018

that does seem promising Doc Dox
I think it would be best to start from scratch
I may seek out that discord, i've not looked at proton yet.

@IngeniousDox
Copy link

Actually, it was discussed on the DXVK. Don't even know if there is a Proton server out there. ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants