Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Instances of operations on uninitialized values #753

Closed
elichai opened this issue May 19, 2020 · 10 comments · Fixed by #754
Closed

Instances of operations on uninitialized values #753

elichai opened this issue May 19, 2020 · 10 comments · Fixed by #754

Comments

@elichai
Copy link
Contributor

elichai commented May 19, 2020

Hi,
Currently we have 2 instances of operations on uninitialized values, in the cmov implementation.
all the _cmov implementations read from the resulting variable(ie r->n[0] & mask0) so if r points to uninitialized values this is undefined behavior [1] [2]
Unlike the memcpy debate, actual user operations and conditions on uninit values are assumed to not happen in the compiler and are used for optimizations (especially in clang)

Instance 1:
In ecdsa_sign, if noncefp returns 0 then we get to:

secp256k1/src/secp256k1.c

Lines 517 to 518 in 41fc785

secp256k1_scalar_cmov(&r, &secp256k1_scalar_zero, !ret);
secp256k1_scalar_cmov(&s, &secp256k1_scalar_zero, !ret);

with both r and s being unitialized.
Instance 2:
In ecdh, we pass &tmpa here:
ECMULT_CONST_TABLE_GET_GE(&tmpa, pre_a, i, WINDOW_A);
to the ECMULT_CONST_TABLE_GET_GE in an uninit state, which become the r value in the macro.
Then in:
secp256k1_fe_cmov(&(r)->x, &(pre)[m].x, m == idx_n); \
secp256k1_fe_cmov(&(r)->y, &(pre)[m].y, m == idx_n); \

we cmov into &(r)->x which will perform operations on uninit values.
notice that under VERIFY we actually do clear those before:
VERIFY_SETUP(secp256k1_fe_clear(&(r)->x)); \
VERIFY_SETUP(secp256k1_fe_clear(&(r)->y)); \

Possible solutions:
The only solution I can currently think of is default initializing those arguments.

Mitigations:
We could add VALGRIND_CHECK_MEM_IS_DEFINED(r, sizeof(*r)); to all the cmovs but we only enable valgrind(-DVALGRIND) under tests, which use the VERIFY macro too, so that will not catch the second case.

[1] http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_451.htm
[2] https://markshroyer.com/2012/06/c-both-true-and-false/

@real-or-random
Copy link
Contributor

That's an interesting find. I'm curious how you found this.

Currently we have 2 instances of operations on uninitialized values, in the cmov implementation.
all the _cmov implementations read from the resulting variable(ie r->n[0] & mask0) so if r points to uninitialized values this is undefined behavior [1] [2]
Unlike the memcpy debate, actual user operations and conditions on uninit values are assumed to not happen in the compiler and are used for optimizations (especially in clang)

To be fair, all these values have its address taken and from types guaranteed no to have trap representations, so (ignoring differences between C versions) these are unspecified values, and not immediate UB. But yes, if the result of the cmov is unspecified, that's not much better. This means that the result of the sign function is unspecified.

Currently compilers don't exploit this, so this is not a vulnerability, but it should be fixed.

Possible solutions:
The only solution I can currently think of is default initializing those arguments.

This seems sensible to me. It's certainly better than exposing unspecified values to the caller or foregoing constant-time code.

If we want to make sure that valgrind will complain about uninitialized values in 1), we could alternatively initialize to zero but then call VALGRIND_MAKE_MEM_UNDEFINED, and keep the cmov.
I think the same applies to 2.

Mitigations:
We could add VALGRIND_CHECK_MEM_IS_DEFINED(r, sizeof(*r)); to all the cmovs but we only enable valgrind(-DVALGRIND) under tests, which use the VERIFY macro too, so that will not catch the second case.

Well, why not. Depending on your abstraction, it's unexpected that cmov functions read the target. I guess this was the reason why this was not caught by code review. We should also add a note to the docs of the cmov functions.

@practicalswift
Copy link
Contributor

Wow, great find @elichai!

Did you find these by fuzzing under MSan?

Very exciting to follow your great work!

@peterdettman
Copy link
Contributor

secp256k1_fe_cmov(&(r)->x, &(pre)[m].x, m == idx_n); \
secp256k1_fe_cmov(&(r)->y, &(pre)[m].y, m == idx_n); \

Note that the logic for ECMULT_CONST_TABLE_GET_GE would work fine still if the first loop iteration was an unconditional copy. The m == idx_n can be ignored because either the first value is the correct one, or else it will be replaced by the correct one in a later iteration.

I suppose it should be verified that the table size is > 0, and that the index is definitely in bounds.

@elichai
Copy link
Contributor Author

elichai commented May 20, 2020

That's an interesting find. I'm curious how you found this.

Wow, great find @elichai!

Did you find these by fuzzing under MSan?

Very exciting to follow your great work!

Thanks.
Actually no, I couldn't get MSan or valgrind to trip here (without adding an explicit call to VALGRIND_CHECK_MEM_IS_DEFINED)
I think I saw some weird gcc-10 warning on the scalar_cmov impl in scalar_low that made me investigate these functions, but for some reason I can't recreate that, so I'm honestly not sure I got there

@elichai
Copy link
Contributor Author

elichai commented May 20, 2020

Well, why not. Depending on your abstraction, it's unexpected that cmov functions read the target. I guess this was the reason why this was not caught by code review. We should also add a note to the docs of the cmov functions.

Mostly because it has overhead https://godbolt.org/z/MSTRKM. (scroll down to main)
at least this overhead doesn't add any conditions that allow sidechannel attacks :)

But also this interacts badly with valgrind_ctime_test, because both the input and output are definitely tainted, so this makes valgrind trip :/

@real-or-random
Copy link
Contributor

I think I saw some weird gcc-10 warning on the scalar_cmov impl in scalar_low that made me investigate these functions, but for some reason I can't recreate that, so I'm honestly not sure I got there

Ah I see. It's on my list to try to new sanitizer in gcc 10. It has landed in Arch Linux now, so I could give it a try.

Adding MEM_DEFINED to the cmov: We can leave this aside then. (I'm not sure how arbitrary this is, we do this nowhere else.) But I don't really understand the issues. Efficiency should not be a concern if do this in VERIFY only. And how exactly does it affect the ct time test?

@elichai
Copy link
Contributor Author

elichai commented May 20, 2020

Efficiency should not be a concern if do this in VERIFY only.

Oh, I thought about adding it under VALGRIND, but yeah under VERIFY probably makes more sense.

And how exactly does it affect the ct time test?

It trips valgrind in the cmov, although putting it under VERIFY solves this.

I'll open a PR with those in separate commits so I can drop that if there's any resistance

@jonasnick
Copy link
Contributor

Any ideas why this wasn't caught by the existing tests run under valgrind, such as

CHECK(is_empty_signature(&sig));
?

@sipa
Copy link
Contributor

sipa commented May 22, 2020

@jonasnick I think the answer is just that Valgrind can only detect bugs that are present in the compiled code. x86 machine code does not have a requirement of initializing a register before you can use it, which a compiler can exploit.

Specifically, I think Valgrind keeps track of definedness per bit of every register and memory byte, but will treat e.g. (x & 0) as 0, even when (bits of) x are undefined. That makes sense because at the machine code level, whatever value x has, the output will always be a well-defined 0.

In other words: the fact that the Valgrind test does not detect this is evidence that (on the platform the Valgrind test is compiled for), this code does not result in non-constant time effects (but doesn't guarantee it's correct, of course).

@real-or-random
Copy link
Contributor

See https://valgrind.org/docs/manual/mc-manual.html#mc-manual.uninitvals

"A complaint is issued only when your program attempts to make use of uninitialised data in a way that might affect your program's externally-visible behaviour." And as @sipa points out correctly, "uninit & 0" is always "0" for the compiled code that runs your real machine, but not the abstract C "machine" that we have in mind when talking about the semantics of C.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants
@sipa @real-or-random @elichai @jonasnick @peterdettman @practicalswift and others