Following up on the SHA-3 buffer overflow in the “official” SHA-3 implementation, many people have been asking me how the vulnerability was found…

As I explain in an upcoming paper at ACISP 2023, the vulnerability was found using the KLEE symbolic execution engine. A preprint is now available.

So, people have been asking me how to use KLEE… In this blog post, I’ll give a brief intro to three indispensable tools: KLEE Web, CodeQL for GitHub, and deps.dev. No downloads nor installs required!

KLEE

In an older version of BoringSSL’s HKDF implementation, we have the following code. (This pattern was found by Jaroslav Lobačevski of GitHub Security Lab. A similar pattern appeared in an older version of OpenSSL’s HKDF implementation.)

if (done + todo > out_len) {
  todo = out_len - done;
}

If we move the done variable to the other side of the inequality, it’s easy to see that we’re computing the minimum of todo and outlen - done:

if (todo > out_len - done) {
  todo = out_len - done;
}

In C/C++, however, these two code snippets are not the same if “done + todo” overflows. The variables involved are all unsigned, so in C/C++ the integer overflow is defined to wrap around. This if-statement is the root cause of the SHA-3 buffer overflow. (The code pattern was actually found after the SHA-3 buffer overflow was discovered. The paper explains how KLEE was used to find the SHA-3 buffer overflow.)

Can “done + todo” overflow? If so, for which input value(s)? I’ll explain how to use KLEE to answer this question in less than a minute…

First, we remove the lines of code that are unrelated to the variables that we are interested in. Then, the lines in black are added to make out_len a symbolic variable (meaning it can take any value), and we add a klee_assert() to give us the value of out_len that leads to the integer overflow.

The code below can be copy-pasted into the middle column of KLEE Web. Then, click “RUN KLEE” on the top right. (Clicking more than once may be necessary until “Executing KLEE” shows up.)

#include <stddef.h>
#include <stdint.h>
#include <assert.h>
#include <klee/klee.h>

void HKDF_expand(size_t out_len) {
  const size_t digest_len = 32;
  size_t n, done = 0;
  unsigned i;

  n = (out_len + digest_len - 1) / digest_len;
  if (out_len + digest_len < out_len || n > 255) {
    return;
  }

  for (i = 0; i < n; i++) {
    uint8_t ctr = i + 1;
    size_t todo = digest_len;
    klee_assert(done + todo >= done);
    if (done + todo > out_len) {
      todo = out_len - done;
    }
    done += todo;
  }
}

int main() {
  size_t out_len;
  klee_make_symbolic(&out_len, sizeof(out_len), "out_len");
  HKDF_expand(out_len);
  return 0;
}

KLEE terminates in less than a minute without any error, meaning that the addition can never overflow. (Otherwise, we would see “Assertion fail” in red text as KLEE finds a value of out_len that fails the assertion.)

I can’t think of a quick and easy way to solve this problem without KLEE… If there is, thanks for letting me know!

CodeQL

Now that we know about a potentially vulnerable code pattern, how can we easily detect this pattern in existing code? This is where CodeQL comes in…

For public GitHub repositories, CodeQL scanning is free and can be enabled with just a few clicks: navigate to the main page of the repository, click “Settings” > “Code security and analysis.” Look for “CodeQL analysis,” then click “Set Up” > “Advanced.” This creates a new codeql.yml file. Look for the line:

# queries: security-extended,security-and-quality

and replace it by:

queries: +security-extended,security-and-quality,github/codeql/cpp/ql/src/experimental/Security/CWE/CWE-190/IfStatementAdditionOverflow.ql@main

(As one line without the # at the beginning.) This adds my proposed CodeQL query to detect the code pattern that caused the SHA-3 buffer overflow.

Then just click “Commit changes” twice and you’re all done! GitHub will compile the code. (And give a warning if the code doesn’t compile!) Then, scanning results will automatically appear in the “Security” tab of the repository.

Do you know of other vulnerabilities that can be eradicated by detecting a code pattern? Check out the CodeQL Bug Bounty program and submit your own CodeQL query!

Deps.dev

The SHA-3 buffer overflow also impacts projects that rely on a vulnerable package. For example, Python projects may rely (directly or indirectly) on the abandoned pysha3 package.

It’s unfortunately quite common that Python packages (and packages for other programming languages) contain unpatched security vulnerabilities. So, we need a way to find out if a project directly or indirectly depends on a vulnerable package.

This is where deps.dev comes in. Check out the pysha3 page for a list of projects that depend on this package. Or check if your own project(s) have vulnerable dependencies!

The deps.dev website also lists the OpenSSF Scorecard, which shows (among other things) whether code scanning tools such as CodeQL are turned on.

In the meanwhile, the vulnerable code pattern found by CodeQL has been eliminated in both BoringSSL and OpenSSL. KLEE confirmed that the integer addition can’t overflow, but I hope you agree that it’s always a good idea to detect and eliminate potentially vulnerable patterns!

SHA-3 Buffer Overflow (Part 2)