Side-channel hacks

One of the more technically challenging parts of the RFD77 design is its layers of defences against side-channel attacks. This is particularly relevant to the soft-token component, which expects to perform cryptographic operations on the same physical hosts that run untrusted customer workloads, using key material that those customer workloads are never meant to be able to extract. A quick browse of the literature on timing, cache, DRAM side-channels and other kinds of information leakage between processes on the same CPU should quickly convince anybody that this is far from an easy task.

The first layer of defence, of course, is to make algorithm and code choices to prevent and reduce leaks at the source. To that end, the soft-token makes extensive use of Ed25519, ChaCha20 and other algorithms conceived and designed expressly with many common side-channel attacks in mind, wherever it is possible. We also made the call to ban the use of ECDSA on the host as part of the soft-token infrastructure, given that the at-large track record of ECDSA implementations and side-channel leakage has been vastly worse than other public-key algorithm families. In the points at which we still use RSA today, we have a plan for phasing it out (in favour of Ed25519), and we are careful to use state-of-the-art implementations that are kept up to date, to attempt to stay ahead of the problems.

As a second line of defence, aimed at the well-known Flush+Reload, Flush+Flush and other cache side-channel techniques that rely on (or at least are vastly easier with) shared memory pages, we eliminate the possibility to share physical page frames with the soft-token code. The illumos operating system has no support for Kernel Same-Page Merging or other features like it, so to do this we use a combination of static linking of cryptographic code into the soft-token binary; and a technique to change our code mappings to private duplicated pages.

The plan is to add this functionality to the illumos linker before shipping RFD77, but for now we use a hack:

static void
unshare_code(void)
{
    Dl_mapinfo_t mi;
    uint cnt;
    volatile char *ptr, *base, *limit;
    size_t sz;
    char tmp;
    intptr_t pgsz = sysconf(_SC_PAGE_SIZE);
    intptr_t pgmask = ~(pgsz - 1);

    bzero(&mi, sizeof (mi));

    /* Retrieve the list of mappings for our own object */
    VERIFY0(dlinfo(RTLD_SELF, RTLD_DI_MMAPCNT, &mi.dlm_acnt));
    mi.dlm_maps = calloc(mi.dlm_acnt, sizeof (mmapobj_result_t));
    VERIFY(mi.dlm_maps != NULL);
    VERIFY0(dlinfo(RTLD_SELF, RTLD_DI_MMAPS, &mi));

    for (cnt = 0; cnt < mi.dlm_rcnt; ++cnt) {

        /* We only care about executable mappings for now. */
        if ((mi.dlm_maps[cnt].mr_prot & PROT_EXEC) == PROT_EXEC) {
            ptr = mi.dlm_maps[cnt].mr_addr;
            sz = mi.dlm_maps[cnt].mr_msize;
            limit = ptr + sz;
            base = (volatile char *)((intptr_t)ptr & pgmask);

            VERIFY0(mprotect((caddr_t)base, sz,
                PROT_READ | PROT_WRITE | PROT_EXEC));

            /*
             * Touch every page in the mapping to make it private
             * to this process.
             */
            for (; ptr < limit; ptr += pgsz) {
                tmp = *ptr;
                *ptr = tmp;
            }

            VERIFY0(mprotect((caddr_t)base, sz,
                mi.dlm_maps[cnt].mr_prot));
        }
    }

    free(mi.dlm_maps);
}

This code uses the illumos linker introspection facilities (in the form of dlinfo(3c)) to find the mappings for the statically linked binary, then walks through them, changes each temporarily to R|W|X protection (we can't use R|W sadly because this might be the mapping containing the code we're running right now!), touches one byte in every page, then changes the mapping back to its original state.

This results in the kernel performing COW on all of the pages containing our code and giving us our own private copies of each. Now when we execute from these pages, we don't leak as much information as easily to other processes who might map the same binary out of the filesystem cache (since they will get the pages we originally had, instead of the new ones we now have after COW). We can also run this function every time we fork() to isolate ourselves from our child processes, which is quite handy. This technique isn't perfect by any means, but it does take quite a few of these well-known attacks and make them much less easy.

We also make certain that the binaries running as part of the soft-token are not visible in the filesystem of any non-global zone.

As a third layer of defence against CPU-cache-centric side-channel attacks, the RFD77 design proposes the use of Intel CAT. This mitigation is aimed squarely at attack families such as Prime+Probe which do not require shared pages with the target. In the CATalyst paper by Fangfei Liu et al, they show that the Intel CAT can be used to effectively divide the CPU cache into sub-segments and "pin" pages into it so they do cannot be pushed out. We intend to use an implementation of their technique for this third mitigation measure. We haven't implemented this functionality as yet, though we have made some careful study of the Intel documentation on the feature and looked at how it will fit into the kernel. The plan currently is to set aside some of the system's CPU cache for the exclusive use of the soft-token.

As the small number of pages containing the critical cryptographic code and data will be pinned into the dedicated CPU cache segment and cannot leave it, this method should also provide some protection against the DRAM buffer side-channel discussed by Michael Schwartz et al in their "Malware Guard Extension" paper. This will come at a high performance cost for the soft-token code — however, we still believe based on our analysis and benchmarks so far that this will be far more cost-effective than buying dedicated high-performance hardware cryptographic modules for every machine in a large cloud installation.

Finally, a note on something that is not a cache side-channel defence mechanism: Intel SGX. We do plan to use SGX to help make harvesting keys in bulk very difficult even for an attacker who has gained control of the hypervisor kernel, but it does not constitute any kind of serious defence against side-channel attacks in and of itself (see e.g. the discussion in Schwartz et al's paper). We are not planning to expose SGX to end-users on SmartOS or SDC at any point soon.

One of the very nice parts about the RFD77 design that makes me a little less scared about attacks like these is that we are not seeking to defend the soft-token entirely against a compromised kernel (we only seek to make it expensive enough to defeat that going to the hardware PIV module will be just as fast). We are also in the position of being able to alter the operating system itself very widely to support our goals. Timing side-channels on a shared CPU are still vaguely terrifying when working at a company built around sharing CPUs between users, but with these plans in place I'm happy to say that I don't think it will be our weakest link the security of this part of the system.