TON Static Analysis based on Symbolic Execution #489

korifey · 2024-03-16T20:18:59Z

Summary

Static analysis tool for TON offers following advantages:

Interprocedural and context-sensitive
Path-sensitive
Bytecode level
Bit-precise
Minimal false-positives
Extensible with new user rules
Can be extended to automated test generation and property based testing tool

Context

TON requires advanced tools for testing smart contracts so we'd like to implement the best possible bunch of tools on the same symbolic execution core. We want to start with Static Analysis but a single core will enormously reduce development costs for other tools like Property-based testing and Automated test generation in future.

From a business perspective, we believe we can create a unique selling point for TON. Our tool will not only find bugs and vulnerabilities but will also significantly automate smart contracts audit and reduce costs for users.

Scenarios

Static analyzer usually takes source/bytecode of a smart contract as input and provides analysis results as output verifying some rules. Contemporary static analysis tool provides output in SARIF format supported by many tools.
There are several forms how user can run the tool:

CLI (Command Line Interface) - the simplest form that we'll support first.
IDE Plugin (e.g. vscode) - can run static analyzer in background while modifying code, so developer gets results immediately.
CI Plugin (e.g. github) - static analysis is one of the quality gates for code.

Typically static analysis can have predefined and custom rules. Some predefined examples:

Integer overflows/underflows
Division by zero
Dead code
Cells overflow
Cells underflow
Sending of invalid messages
Non-deterministic storage growth

It's extremely important to provide flexible API to define user-defined checks for certain classes of analysis such as:

Taint analysis
Typestate analysis
Temporal properties checks (like sum_backers <= balance)
Gas usage warnings

Technical advantages of the proposed approach

The core of analyzer is built upon a cutting-edge symbolic execution engine for TVM bytecode. It uses symbolic execution with SMT solving and taint analysis to find bugs, vulnerabilities, and automatically generate unit tests with mathematical precision.

Interprocedural and context-sensitive. The analyzer considers all possible behaviors of a contract taking into account complex data flow with function calls and recursion.
Path-sensitive. The analyzer takes into account all branching conditions in a contract. The majority of existing static analyzers do not support this, leading to multiple false alarms. Our engine checks the feasibility of various execution paths of a contract.
Bytecode level. Compilers have bugs. If an analyzer hasn't found an error in source code, strictly speaking, this does not mean the contract which is actually executed doesn't have it. Our engine works on TVM bytecode level and misses nothing. Additionally, libraries are written on bytecode. As we noticed developers love to inject bytecode right into their smart contracts. So, source level analysis is not enough.
Bit-precise: the engine will take into account all potential integer overflows, bit-level hacks, etc. This is especially important, taking into account the TVM memory model.
Minimal false-positives. The semantics of Ton Virtual Machine is modeled with mathematical precision. This means the analyzer understands what the contract does and how much gas it costs without over-approximation of language features. Thus, the only source of false alarms is the interaction of a contract with the environment, but user assistance might fix that.
Extensible with new user rules. Users can give the analyzer more information on what the environment does: which invariants are true for the data in the message etc. Also, the engine supports customizable taint analysis, and users can configure it with new rules to check, for instance, that potentially malicious data does not get into sensitive APIs.

Team

We are a team of professionals with a rich industry and academic background: 10+ years in developer tools, 7+ years in code analysis and symbolic execution.
We’ve been delivering precise and scalable symbolic execution engines for C/C++, Java and .NET Core into industrial settings. We gained awards in acknowledged academic competitions:

kleef, our overhaul of the well-known klee engine, was chosen the best symbolic execution engine in TestComp 2024.
UTBot Java, was chosen the best symbolic execution engine in SBFT 2023.
UTBot Python, was chosen the best symbolic execution engine for Python in SBFT 2024.

Plan for milestone 1

Since we've already spent 1 month on R&D How to implement symbolic execution for TVM based on USVM, we need additional 6 weeks to deliver a demo of our static analysis capabilities.

Sprint 0 (already done)

Symbolic interpreter supporting bytecode instructions:

Jumps
Conditions
Cells

Sprint 1 (2 weeks)

Support bytecode instructions to verify some exiting real smart contracts:

Dictionaries
Loops

Sprint 2 (2 weeks)

Try to interpret some real contracts.
Support base checks for demo:

Integer overflow/underflow (TVM exit code 4) in all instructions that can throw it (such as ADD, SUB, etc, except divisions).
Division by zero (also TVM exit code 4). Checking all instructions DIV, MOD, DIVMOD, etc.
Cell underflow (TVM exit code 9) in all cell parsing instructions (such as LDU, LDREF, etc)
Cell overflow (TVM exit code 8) in all builder writing instructions (such as STU, STREF, etc)

Sprint 3 (2 weeks)

Polish checks on real contracts, maybe support additional instructions.
Provide CLI that takes a contract and returns check results in SARIF format.
Provide documentation and report how it works.

References

Similar engines have been successfully applied for the analysis of Ethereum smart contracts on EVM bytecode level. Among them are mythril, manticore, and oyente.

Some of mentioned links above:

Estimate suggested reward

💰 9000 USD
🕑 6 weeks (3 sprints)

anton-trunov · 2024-03-19T08:30:47Z

This is an absolutely fantastic proposal based on state-of-the-art research! Compared to the one in #436 it works at the level of TVM which makes it suitable for any language that compiler down to TVM.

@korifey I'm wondering in case some bugs are found at the level of TVM instructions, how do you report the error to the programmer so they would understand where to look for it in the source code which is FunC or Tact at the moment?

howardpen9 · 2024-03-19T10:32:27Z

happy to see more dev tool guys! (also support in Tact plz)

korifey · 2024-03-19T16:00:13Z

This is an absolutely fantastic proposal based on state-of-the-art research! Compared to the one in #436 it works at the level of TVM which makes it suitable for any language that compiler down to TVM.

@korifey I'm wondering in case some bugs are found at the level of TVM instructions, how do you report the error to the programmer so they would understand where to look for it in the source code which is FunC or Tact at the moment?

Thanks for you good words.

How to properly render in source code errors found by bytecode analyzer?
Usually high-level languages compilers remain some debug information (like line number) when compile source code into low level code. For example in Java:
public equals(Ljava/lang/Object;)Z L0 LINENUMBER 149 L0 ALOAD 0 ALOAD 1 IF_ACMPNE L1 ICONST_1 GOTO L2 L1

So, if we get such kind of support from Tact/FuncC compiler, analyzer will return nice SARIF traces.

Another option is to generate test that lead to error. This is an ultimate power of symbolic execution - not only show error but create test (exploit) that demonstrate error behavior.

anton-trunov · 2024-03-19T22:16:44Z

Usually high-level languages compilers remain some debug information (like line number) when compile source code into low level code

This is a nice solution and is certainly doable.

I have one more concern, though. Symbolic execution can be quite slow, so it could be challenging to integrate analyzers based on it into some interactive dev tools like IDEs like Intellij IDEA or programmer's editors like VS Code. How do you solve this issue?

korifey · 2024-03-22T10:20:30Z

Usually high-level languages compilers remain some debug information (like line number) when compile source code into low level code

This is a nice solution and is certainly doable.

I have one more concern, though. Symbolic execution can be quite slow, so it could be challenging to integrate analyzers based on it into some interactive dev tools like IDEs like Intellij IDEA or programmer's editors like VS Code. How do you solve this issue?

In fact for traditional programming languages programs can be very large (millions lines of code) so symbolic execution alone is too slow. In this case we use primary filter of vulnerabilities by dataflow (IFDS) + point-to. Then symbolic execution comes to play, filtering out false positives of previous stage. But smart contract are relatively small programs and highly optimized summary-based symbolic execution seems like enough for small contracts.
Speaking about interactive mode (background analysis in IDE during user typing), I'd say it's not a target scenario for this bounty. Default way of running delivered analysis in IDE is "push-button" scenario when user explicitly say "analyze this" and wait until compilation (to get bytecode) and wait some time until precise analysis results.

I'm not sure compilation phase for Tact is fast enough for real interactive mode. Our combined dataflow & symbolic execution engine allows different tradeoffs between precision and speed. We could implement source code level analysis but it requires another bounty and more R&D.

korifey · 2024-03-28T09:29:16Z

Hi guys, what is your opinion about the proposal? Do you need more clarification or not?

Damtev · 2024-04-24T13:48:51Z

Hi @anton-trunov. Is there any progress on the remaining debug information (especially line numbers) during compilation?

anton-trunov · 2024-04-25T06:10:29Z

Hi @Damtev, no progress so far from our side, unfortunately. I opened an issue tact-lang/tact#296, so you can subscribe to it and track progress. We have an upcoming Tact v1.3.0 release this week or next week tops. We'll try our best to get line number information in Tact v1.4.0

korifey · 2024-05-02T16:44:34Z

Hi, we finished first milestone and prepared some video and script for you to show capabilities of proposed approach.

Here is video report: https://youtu.be/0r5TbLAXF9o
Here is open source repository with examples, README: https://github.com/explyt/ton-bounties/

P.S. We are ready to continue evolve our engine for TON static analysis, property based testing and beyond

delovoyhomie · 2024-08-09T11:48:56Z

https://tonviewer.com/transaction/5f8107b2caea9d60bab96a57acf9f0bc522f4676c1a1f2810f416c700acbba2d

korifey added the Developer Tool Related to tools or utilities used by developers label Mar 16, 2024

byakuren-hijiri mentioned this issue Mar 20, 2024

TON Static Analyzer #436

Open

delovoyhomie added the Approved This proposal is approved by the committee label Mar 30, 2024

delovoyhomie assigned korifey Mar 30, 2024

anton-trunov mentioned this issue Apr 25, 2024

Include debug information needed for third-party tools tact-lang/tact#296

Open

anton-trunov mentioned this issue Apr 26, 2024

Incremental and error-resilient parser tact-lang/tact#286

Open

delovoyhomie mentioned this issue Jun 14, 2024

CONTINUOUS AUDIT OF TON SMART CONTRACTS #603

Closed

delovoyhomie closed this as completed Aug 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TON Static Analysis based on Symbolic Execution #489

TON Static Analysis based on Symbolic Execution #489

korifey commented Mar 16, 2024 •

edited

Loading

anton-trunov commented Mar 19, 2024

howardpen9 commented Mar 19, 2024

korifey commented Mar 19, 2024 •

edited

Loading

anton-trunov commented Mar 19, 2024

korifey commented Mar 22, 2024

korifey commented Mar 28, 2024

Damtev commented Apr 24, 2024

anton-trunov commented Apr 25, 2024

korifey commented May 2, 2024 •

edited

Loading

delovoyhomie commented Aug 9, 2024

TON Static Analysis based on Symbolic Execution #489

TON Static Analysis based on Symbolic Execution #489

Comments

korifey commented Mar 16, 2024 • edited Loading

Summary

Context

Scenarios

Technical advantages of the proposed approach

Team

Plan for milestone 1

Sprint 0 (already done)

Sprint 1 (2 weeks)

Sprint 2 (2 weeks)

Sprint 3 (2 weeks)

References

Estimate suggested reward

anton-trunov commented Mar 19, 2024

howardpen9 commented Mar 19, 2024

korifey commented Mar 19, 2024 • edited Loading

anton-trunov commented Mar 19, 2024

korifey commented Mar 22, 2024

korifey commented Mar 28, 2024

Damtev commented Apr 24, 2024

anton-trunov commented Apr 25, 2024

korifey commented May 2, 2024 • edited Loading

delovoyhomie commented Aug 9, 2024

korifey commented Mar 16, 2024 •

edited

Loading

korifey commented Mar 19, 2024 •

edited

Loading

korifey commented May 2, 2024 •

edited

Loading