Can ghidra mimic `xdbg` way of populating modules and their `pdb`? #7475

bukowa · 2025-02-11T11:08:12Z

bukowa
Feb 11, 2025

xbg has this nice symbols table where I can right click -> download pbo for all modules:
It also automatically shows full paths to them. I wonder what would be the correct way to mimic this in ghidra at the momemnt?

This seems really efforthless!

d-millar · 2025-02-11T12:26:26Z

d-millar
Feb 11, 2025
Collaborator

dbgeng -> lm

1 reply

bukowa Feb 11, 2025
Author

Will that do some magic in ghidra or just display them in dgeng ?

d-millar · 2025-02-11T13:04:26Z

d-millar
Feb 11, 2025
Collaborator

also, for programs you have loaded or are loading, you can automate the downloading of the PDBs.

11 replies

bukowa Feb 12, 2025
Author

Can I somehow load all dlls that ghidra can see in the debugger modules window into current project?

d-millar Feb 12, 2025
Collaborator

Not automatically. That said, given the contention issues you're having with resources, that seems like a very bad idea. (Although I guess technically, you could force a dump, load the dump file, and select everything in "Modules To Map" - still a bad idea.)

bukowa Feb 12, 2025
Author

issues you're having with resources

What do you mean?

d-millar Feb 12, 2025
Collaborator

#7482

bukowa Feb 12, 2025
Author

I don't think we understand each other on this one. As you can see in the first post image there's a bunch of modules that xbg knows about. It know full path and can instantly download all PBO. Ghidra is not mapping them like that and I wonder how to mimic this behaviour.

d-millar · 2025-02-12T19:24:40Z

d-millar
Feb 12, 2025
Collaborator

OK, let me try to be a bit clearer.... There are two areas we could be talking about and possibly also a crossover between the two. There is the dynamic space associated with the debugger and a running target, and there is the static space associated with the Ghidra project and whatever programs you have loaded.

Loading symbols into the dynamic space, typically for use in the debugger, is a function of the underlying API. This functionality may or may not be exported to the CLI. The dbgeng/dbgmodel APIs have built-in support for refreshing symbols, which is exposed in the CLI. Refreshing all the symbols is a VERY expensive task, so we do not automate that on a per-event basis, although that can be done relatively straightforwardly. We also generally try to implement functionality that has general utility and some common way of implementing it across debugger sets. For example, breakpoints are common to the Windows, Linux, and macOS debuggers and are implemented in similar ways. Downloading symbols is a little different, and you've already highlighted the issues here, e.g. using lldb on Windows is not a common use case and supporting symbol downloads through some GUI interface would be highly idiosynchratic. x64dbg supports debugging of x32/x64 executables on Windows-only, so the developers have a lot more flexibility regarding functions tailored to a single, very narrowly-defined environment. They do a great job in that environment, but their use cases differ considerably from ours.

Loading symbols into the static space is typically done at load time, although there are cases when you might want to re-provision the symbols at some point after. The reason for this is that the symbols directly influence in a positive way the analysis of the binary. Doing it after the fact produces less satisfying results. This is also a VERY expensive operation, particularly if you wish the symbols to re-shape the analysis. If you decide to builk download hundreds of DLLS and apply the results to the existing project, you are queueing up a boatload of work. Your results will also suffer from the fact that the symbol information was not available for the original analysis.

All that said, if you think applying symbols to the static analysis after the fact from the debugger has some utility, feel free to put in a ticket. However, in doing so, you'll need to put together a detailed description of your use case and why you think this solution is on-par with or supplants the existing solutions. We have weekly reviews to discus the merits of every ticket, and, if we think it has merit, we will prioritize it and add it to our work queues. Alternatively, you could write a script for your own use and/or submit it for public consumption if it proves useful to you.

1 reply

bukowa Feb 12, 2025
Author

Thank you for education, this is very informative. I've got just one... more question; could you please clarify the dynamics between the static and dynamic space in context of symbols in ghidra? Because the interaction between these two seems like a drag&drop or plug&play I didn't know there are some underlying things I should consider. Does it mean that before I start the debugging session I should have all the symbols already dragged from the static analysis (and analyzed) into the debugger window and then make sure everything is mapped in the modules window?

Loading symbols into the static space is typically done at load time
Doing it after the fact produces less satisfying results

d-millar · 2025-02-12T20:02:39Z

d-millar
Feb 12, 2025
Collaborator

Right, the two processes are essentially distinct. The overlap occurs in the mapping from the dynamic view to the static view, i.e. symbols are not moved from the debugger to the static view or vice versa, but, if you're at an offset in the debugger and the view is tracking accurately to the static view, obviously symbols applied to the static view should inform your understanding. There are some exceptions to this in that the Dynamic View shares an interface with the Static View. For example, you can drag & drop a structures from the DataType Manager onto the Dynamic Listing and that should work.

Bear in mind, generally speaking, the two views are drawing from the same source, i.e. the symbols generated at compile time, whether archived, online, or local. So, the question I guess I would put to you is what exactly is your use case, i.e. what do you want to do that you cannot do with the current tool set. More specifically, is there something you're trying to do that requires loading symbols in the debugger and applying them to the project?

1 reply

bukowa Feb 12, 2025
Author

There are some exceptions

Are there any other important cases that I should pay close attention to?

More specifically, is there something you're trying to do that requires loading symbols in the debugger and applying them to the project?

I should be more clearer too! Is this the XY problem here? The end goal is a) to follow the call stack of the program while having as much view of what is really happening inside each (imported) function (decompiler) b) to search inside all the modules that the program uses (nested imports).

Program I am trying to analyze imports many modules at runtime, here screenshot of place I am talking about:

From my understanding I have to map them manually which requires them (optimally) preloaded in the project .

Now I am stuck. I don't know where they may come from at this point (what directories) - I want to have them in my project correctly mapped - and then - properly - analyzed using pdb (if available). Debugger helps here because it knows where they come from (at load time) - and that's where my problem is at - doing it manually - that's a lot of tedious work at place here doing that manually for each of them (mapping from the paths that Debugger exposes) - and it's prone to errors - ghidra lacks any options to map this automatically.
When the application starts there is even more modules that are loaded when the application runs,

d-millar · 2025-02-13T02:35:17Z

d-millar
Feb 13, 2025
Collaborator

OK, getting somewhere now! I think I understand your use case and the feature you want. Am going to try to summarize it, just to make sure we're on the same page, suggest a couple possible solutions, and perhaps argue that this may not be the best approach.

The problem, if I'm understanding it correctly, is that the debugger has (a) an accurate list of loaded modules during the target's execution, (b) the full paths to those modules, and a call stack with runtime addresses in those modules. You would like to load these modules into the current project to (a) aid understanding, and (b) enable search. In particular, while you could easily use the "Import File" function or cut&paste the path from the debugger module list into the Import function for one or two modules, doing that for hundreds of modules is an unpleasant prospect. Hopefully, I've haven't misrepresented your description, but let me know if I have obviously.

So, solutions:

Option one: select "Load Libraries from Disk" when you initially import your target. This will load all of the modules associated with imports into the project, assuming they can be resolved. It will not load system DLLs or modules loaded at runtime via mechanisms like LoadLibrary.
Option two: modify one of the many Import-related scripts to do the same post-initial import. There may even be a script that already does this - not sure, I will ask tomorrow. Ideally, this would be a headless batch script to avoid the thread contention issues discussed earlier.
Option three: only slightly more ambitious, modify the script to use the debugger's module list to find and import modules. Caveats here: your project obvious has to have access to the machine where the debugger is running. Also, for reasons I'll discuss in a minute, you might want the ability to filter this list.
Option four: use the existing "Import From File System" function (right-click in Modules), but modify the table to allow selections of more than one module. This would also allow you to use the filter function to narrow the range of modules.

This last point is super-important and bears more discussion. I can't think of any common reverse engineering scenario where you would want to import the entire list. Most of these modules are kernel modules and highly unlikely to be relevant to any analysis your undertaking of the target. (If you're interested in understanding the kernel, that's fine, but there are much better ways to do it.) The list you posted above, for example, is entirely composed of kernel DLLs, except for ntdll.dll, which is basically the gate to the syscall mechanism. If you need to understand functions in ntdll or the system calls behind them, you're much better off googling them than disassembling the constituent code.

The same could be said for the call stack, in general. It's very unlikely that the top half of the call stack will be worth exploring in the context of any current execution. I know you expressed wanting to know "what is really happening inside each (imported) function", but that's generally not a sane strategy for RE, if you'll pardon my saying so.

Two other small points: you mentioned having to map all the modules by hand. I'm not sure I understand what you're referring to here. The debugger matches running targets to static programs based on names and some heuristics. 90% of the time (or better) that's a accurate match. Discrepancies occur when a program was renamed after compilation or renamed on disk (or renamed on import into Ghidra). Those are cases where "Map Module" is needed, but they shouldn't be the norm. These matches will hold even if you decide to import a module while the target is running.

Also, you mentioned wanting to search using the project. There are times this makes sense, but again not the norm. Searching memory (even all of loaded memory) in the debugger is much, much more efficient than searching multiple programs in the database. (I'm actually not even sure you can do searches across the project database - am pretty sure the search feature operates per program. Will check tomorrow.) A case for using the static listing might be wanting to compose a complex search or searching for things only found in the static listing. Do you have a particular search in mind?

OK, so apologies for the long-winded rant, but I think we're moving the ball forward here. I would probably lean towards option four, extending the selection range, as this seems closest to your original request, relatively straighforward, and possibly useful for other contexts. Let me know what you think.

1 reply

bukowa Feb 13, 2025
Author

Option one: select "Load Libraries from Disk" when you initially import your target

Ok just tried this one - after loading a project. In File -> Load Libraries. Seems to work just fine for static modules.

Option two: modify one of the many Import-related scripts

Another reason to read them all I guess :) At least by description.

Option three:: modify the script to use the debugger's module list to find and import modules.

This seems like a strong solution. Especially with combination of option four.

Option four: use the existing "Import From File System" function (right-click in Modules), but modify the table to allow selections of more than one module

Not sure I get the context of "table modification". But this feels strong with the option three.

system calls behind them, you're much better off googling them than disassembling

:( This is interesting - let me please - land to another question here - or two - why - if we have pbo for these files - why they aren't properly decompiled? Seems like windbg can get pbo for these files and they match checksum - but then - (*I tried multiple reloads of symbols, cache etc.) when asked windbg to show information it constantly tells me that checksums mismatch for these low level libraries. I tried to find information on this but no luck.

0:000> !analyze v

*** WARNING: Check Image - Checksum mismatch - 
Dump: 0x1f2b47
File: 0x1f2550
C:\users\buk\appdata\local\programs\python\python313\sym\ntdll.dll\AB0DECE31f8000\ntdll.dll
PROCESS_NAME:  <cut>
ERROR_CODE: (NTSTATUS) 0x80000003 - {EXCEPTION}  Breakpoint  A breakpoint has been reached.
SYMBOL_NAME:  ntdll!LdrpDoDebuggerBreak+30
MODULE_NAME: ntdll
IMAGE_NAME:  ntdll.dll
FAILURE_BUCKET_ID:  BREAKPOINT_80000003_ntdll.dll!LdrpDoDebuggerBreak
FAILURE_ID_HASH:  {06f54d4d-201f-7f5c-0224-0b1f2e1e15a5}

0:000> lmDvm ntdll

Browse full module list
start             end                 module name
00007ffd`24470000 00007ffd`24668000   ntdll      (pdb symbols)          
C:\users\buk\appdata\local\programs\python\python313\sym\ntdll.pdb\9FF79BBA19EBED309623072EA067B20F1\ntdll.pdb
    Loaded symbol image file: C:\Windows\SYSTEM32\ntdll.dll
    Image path: ntdll.dll
    Image name: ntdll.dll
    Browse all global symbols  functions  data

but that's generally not a sane strategy for RE, if you'll pardon my saying so.

You are probably right, but... having this ability seems like a nice and quick way to learn - actually reading code and some other stuff - inside ghidra - not microsoft docs iykwim

you mentioned having to map all the modules by hand

These that are loaded dynamically by the executable.

Do you have a particular search in mind?

While executable loads (nested) a lot of modules, I want to know which of them uses (refs to) symbol IsDebuggerPresent. Seems like the program does some checks against this flag and it's more than one module. Do you mean I should search in memory in the debugger to find refs to this symbol - and when I find matches - I load these libraries into ghidra - and then I analyze statically - and then I search again ? Or do I set a breakpoint on this function when debugging?

Maybe I just overestimate the capabilities of what is possible here with static-dynamic combination. I thought it would be super easy to just have a full map of everything that is loaded in the program - analyzed - like one beautiful picture painted by ghidra. But I think theres a problem here like you mention - theres no way to search - even in ghidra debugger window - for refs to all opened modules.

d-millar · 2025-02-13T14:31:14Z

d-millar
Feb 13, 2025
Collaborator

@nsadeveloper789 and I are leaning towards at least option 4. I'm not sure you'd need anything more if 4 were implemented, i.e. if multi-entry selection were enabled (and the ability to use that correctly on the back end), then you could select the entire table contents or any subset, right-click load, and all of those would be imported (with PDBs, assuming that was set up correctly in the tool options). Option 3 would be a bonus for batch processing.

Regarding PDB mismatches, it looks like windbg is confused by the fact that you have multiple copies of ntdll.dll, i.e. one in your local python path and one in system32. Haven't seen that before, so, sadly, no guesses on a fix. And I would never expect the decompilation to be perfect without some effort on the user's part, symbols or no. Worth working through the decompiler-related sections of the GhidraClass for pointers.

Re M$ documentation, heard, although for straight API descriptions they're not that horrible. There are other options as well. The Windows Internals books are a good starting point for understanding the kernel.

IsDebuggerPresent - set the breakpoint. That's the obvious first-shot approach. If the program has more complicated anti-RE, try setting a hardware breakpoint. If that doesn't work, well, time to break out a copy of "Crackproof Your Software" and equivalent.

One last note, which I should have thought of earlier: are you debugging with "dbgmodel" checked in the starup dialog? If so, maybe try unselecting that - might improve the single-stepping experience.

1 reply

bukowa Feb 13, 2025
Author

Option 4 seems great if the back end can receive information about modules path on the file system from debugger.
Ill play around without dbgmodel, I always had this enabled. Thanks for tips on the program itself.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can ghidra mimic `xdbg` way of populating modules and their `pdb`? #7475

{{title}}

Replies: 6 comments 16 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

Can ghidra mimic xdbg way of populating modules and their pdb? #7475

bukowa Feb 11, 2025

Replies: 6 comments · 16 replies

d-millar Feb 11, 2025 Collaborator

bukowa Feb 11, 2025 Author

d-millar Feb 11, 2025 Collaborator

bukowa Feb 12, 2025 Author

d-millar Feb 12, 2025 Collaborator

bukowa Feb 12, 2025 Author

d-millar Feb 12, 2025 Collaborator

bukowa Feb 12, 2025 Author

d-millar Feb 12, 2025 Collaborator

bukowa Feb 12, 2025 Author

d-millar Feb 12, 2025 Collaborator

bukowa Feb 12, 2025 Author

d-millar Feb 13, 2025 Collaborator

bukowa Feb 13, 2025 Author

d-millar Feb 13, 2025 Collaborator

bukowa Feb 13, 2025 Author

Can ghidra mimic `xdbg` way of populating modules and their `pdb`? #7475

bukowa
Feb 11, 2025

Replies: 6 comments 16 replies

d-millar
Feb 11, 2025
Collaborator

bukowa Feb 11, 2025
Author

d-millar
Feb 11, 2025
Collaborator

bukowa Feb 12, 2025
Author

d-millar Feb 12, 2025
Collaborator

bukowa Feb 12, 2025
Author

d-millar Feb 12, 2025
Collaborator

bukowa Feb 12, 2025
Author

d-millar
Feb 12, 2025
Collaborator

bukowa Feb 12, 2025
Author

d-millar
Feb 12, 2025
Collaborator

bukowa Feb 12, 2025
Author

d-millar
Feb 13, 2025
Collaborator

bukowa Feb 13, 2025
Author

d-millar
Feb 13, 2025
Collaborator

bukowa Feb 13, 2025
Author