Demo to see how compiler optimization can introduce security vulnerabilities

Took a module under Dr Roland Yap, CS4239 Software Security, at my alma mater NUS (National University of Singapore) School of Computing, back in 2017 Semester 1. It was part of its Lifelong Education programme, under SCALE (School of Continuing and Lifelong Education), where modules were opened up to the general public.

One interesting lesson was from a lab assignment that showed how compiler optimization can introduce security vulnerabilities. A few years later, I brought it up during a company workshop on Ethereum, as the speaker mentioned about the optimizer in the Solidity compiler.

This lesson also got me curious into inspecting the JavaScript code transpiled from a TypeScript project at work, and pointing out bugs introduced by the transpiler to the project leader, who was strongly advocating TypeScript over vanilla JavaScript as the holy grail (while allowing the team to turn off type checks and using the any type liberally). Btw, minifying/mangling JavaScript code can introduce security vulnerabilities too – see backdooring your javascript using minifier bugs for more info.

It would be hard to explain the lesson in plain English, so I have cleaned up the code and put in extensive comments. It’s been 5 years already, so I guess it should be alright to post the lab assignment code here 🙂

/**
 * NUS CS4239 Lab 10 (2017 Semester 1)
 *
 * Shows how compiler optimization can introduce security vulnerabilities.
 * On macOS, clang can be used if gcc is not available.
 *
 * Scenario 1:
 *   - Compile with no optimization: gcc -O0 -o level0.out cs4239-lab10.c
 *   - Run: ./level0.out
 *   - Sample output:
 *       &secret 0x7ffea7f70192 (address)
 *       secret = 12345 (value, ASCII codes: 49 50 51 52 53)
 *       peek address @0x7ffea7f70192: 0 0 0 0 0 (ASCII codes)
 *   - Buffer clearing code working at optimization level 0.
 *
 * Scenario 2:
 *   - Compile with optimization level 3: gcc -O3 -o level3.out cs4239-lab10.c
 *   - Run: ./level3.out
 *   - Sample output:
 *       &secret 0x7ffe8f76ebb2 (address)
 *       secret = 12345 (value, ASCII codes: 49 50 51 52 53)
 *       peek address @0x7ffe8f76ebb2: 49 50 51 52 53 (ASCII codes)
 *   - Buffer clearing code NOT working at optimization level 3. Compiler
 *     optimized away the function call to zero_buf().
 *   - A possible patch would be to include unistd.h and add usleep(1) after
 *     the loop in zero_buf(). The patch works as it is making a system call,
 *     which causes the compiler to copy zero_buf() as an inline function.
 *     It could be that system calls are not removed by the optimizer.
 *
 * This security issue arises from a potential mismatch of intent between the
 * programmer and the compiler. The spirit of the C mantra is
 * "Trust the programmer". In this case, the programmer does his due diligence
 * in clearing a buffer storing sensitive information, but the compiler ends up
 * optimizing away the buffer clearing code. An attacker may then be able to
 * sift out the secret key as it is not cleared.
 */

#include <stdio.h>
#include <stddef.h>

int offset = 0;
char *stack;

/**
 * Set 1st N bytes of buffer to zero
 *
 * @link See https://stackoverflow.com/a/25653168 for difference btw char * and
 *     char[] types.
 * @param {char *} p - Pointer to string, i.e. character buffer.
 * @param {size_t} n - No. of bytes to set to zero.
 * @returns {void}
 */
void zero_buf(char *p, size_t n)
{
    int i;
    for (i = 0; i < n; i++) {
        p[i] = 0;
    }
}

/**
 * Print out secret and remove it from memory
 *
 * @returns {void}
 */
void hide()
{
    char secret[] = "12345";
    char *p;
    int a;

    printf("&secret %p (address)\n", secret);
    printf(
        "secret = %s (value, ASCII codes: %d %d %d %d %d)\n",
        secret,
        secret[0],
        secret[1],
        secret[2],
        secret[3],
        secret[4]
    );

    p = (char *) &a;
    offset = secret - p;
    stack = p;

    zero_buf(secret, 5); // remove secret from memory
}

/**
 * Main function
 *
 * @returns {int}
 */
int main()
{
    hide();

    printf(
        "peek address @%p: %d %d %d %d %d (ASCII codes)\n",
        stack + offset,
        stack[offset],
        stack[offset + 1],
        stack[offset + 2],
        stack[offset + 3],
        stack[offset + 4]
    );

    return 0;
}