Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for unicode static variable names in smali bytecode regex #18

Closed
mnixry opened this issue Feb 15, 2025 · 7 comments
Closed

Support for unicode static variable names in smali bytecode regex #18

mnixry opened this issue Feb 15, 2025 · 7 comments
Labels
bug Something isn't working

Comments

@mnixry
Copy link

mnixry commented Feb 15, 2025

Thank you for the amazing work.

I'm currently working with some modules that have been obfuscated by LSParanoid, which appears to use Unicode static variable names for further obfuscation.

The existing regular expression approach for smali bytecode doesn't match Unicode static variable names.

I attempted to modify the regex this way:

SGET_SPUT = re.compile(
    r"s(?:put|get)(?:-(?:wide|object|boolean|byte|char|short))?\s+([vp][0-9]+),\s+((L[a-zA-Z0-9$_\- /]+;)->([^:]+):(\[*(?:L[a-zA-Z0-9$_\- /]+;|[VZBSCIJFD])))",
)

It seems to work, but I'm unsure if this regex is safe and what the equivalent would be for invoke-static and other bytecodes that interact with static variable names.

@giacomoferretti
Copy link
Owner

Yes, you are right. The regex does not include all the possible SimpleName values.

The correct regex for SimpleName should be this:

[a-zA-Z0-9$_\- \u00A0-\u1FFF\u2000-\u200A\u2010-\u2027\u202F\u2030-\uD7FF\uE000-\uFFEF\U00010000-\U0010FFFF]+

Reference: https://source.android.com/docs/core/runtime/dex-format#simplename

@giacomoferretti
Copy link
Owner

Also, did you get it to work just by modifying that regex?

Your sample contains functions with parameters, where those parameters are used to call getString(J). This is currently not supported because it only performs a simple line-by-line scan.

paranoid_deobfuscator.paranoid.ParanoidSmaliParserError: Register not found
{
    "registers": {
        "v0": {
            "type": "const",
            "value": 29
        }
    },
    "register": "p0",
    "line": "invoke-static {p0, p1}, LGs;->l(J)Ljava/lang/String;"
}

@mnixry
Copy link
Author

mnixry commented Feb 17, 2025

Also, did you get it to work just by modifying that regex?

No, I didn’t. I attempted to comment out the section of the code that raises exceptions, and it partially works —— some strings do get restored. However, I’m unable to recompile the modified Smali code back into a DEX file :(

@giacomoferretti
Copy link
Owner

SGET_SPUT = re.compile(
r"s(?:put|get)(?:-(?:wide|object|boolean|byte|char|short))?\s+([vp][0-9]+),\s+((L[a-zA-Z0-9$- /]+;)->([^:]+):([*(?:L[a-zA-Z0-9$- /]+;|[VZBSCIJFD])))",
)

Forgot to mention, you regex is valid. Never thought about it that way. Great job!

No, I didn’t. I attempted to comment out the section of the code that raises exceptions, and it partially works —— some strings do get restored. However, I’m unable to recompile the modified Smali code back into a DEX file :(

Yep, I need to write new code to handle this case.

@giacomoferretti
Copy link
Owner

However, I’m unable to recompile the modified Smali code back into a DEX file :(

Can you try with the latest commit? 8329c5c

I successfully deobfuscated and reassembled the app, going from 8234 references to Gs.l to only 8. The remaining are inside method that uses the parameter, read more in #19.

@mnixry
Copy link
Author

mnixry commented Feb 18, 2025

However, I’m unable to recompile the modified Smali code back into a DEX file :(

Can you try with the latest commit? 8329c5c

I successfully deobfuscated and reassembled the app, going from 8234 references to Gs.l to only 8.

Your solution was exactly what I needed 👍. I successfully managed to deobfuscate and reassemble the app, significantly removed the string obfuscation. Thank you very much for your quick response and for your hard work on this project!

The remaining are inside method that uses the parameter, read more in #19.

The details in #19 were especially helpful, and I realized how hard it is. I have a quick question: are the remaining references caused by optimization, or is this the intended behavior of the obfuscator? Thanks again for your help!

@giacomoferretti
Copy link
Owner

giacomoferretti commented Feb 18, 2025

are the remaining references caused by optimization, or is this the intended behavior of the obfuscator? Thanks again for your help!

Actually, I've never seen anything like this. I am almost certain that this is not the expected behavior of paranoid, but probably some optimization done afterwards.

Btw, I added a function that saves the chunks to a file and you can manually "deobfuscate" a string by giving it a long value.

Example:

$ python -m paranoid_deobfuscator helpers extract-chunks tests/samples/LuckyTool_v1.2.7.18005 luckytools-chunks.json
$ python -m paranoid_deobfuscator helpers deobfuscate-string -- luckytools-chunks.json -650834787994841
[-24fee48588cd9]:<this>

In this case, the string is "<this>".

NOTE: You have to include -- inside the command, otherwise it will give you an error, because click is trying to parse the long as an option.

Let me know if you have any problems. Commit: 5ec909f


EDIT:

In your case, you have 8 methods that uses Gs.l (A.K.A. getString). You need to manually search for calls to those instructions.

For example: Bz.p has 23 references.

Image

Now, copy each of those values and put them inside the command above, you'll get the deobfuscated string.

The other option is for you to write a Frida script.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants