Dissecting Lemire’s nearly divisionless random

Screen Shot 2020-09-30 at 9.46.24 AM.png

This blog post is very late. It’s well over a year since I posted a coding challenge on twitter. The idea was simple, I’ve always felt that code readability is undervalued so I figured I’d put cold hard cash up. I announced a $1,000 pot, divided into $500, $300, and $200 prizes for the most readable implementations of Daniel Lemire’s nearly divisionless algorithm for selecting a random number from an interval. I now have winners to announce and congratulate, and they’re in this blog post, but there’s more to this story.

I didn’t know it at the time but I’d end up reviewing the entries with colleagues to see if they could follow what was happening (spoiler: they couldn’t). When people got stuck, we’d whiteboard our way through so that we could better understand why and where people were getting stuck. Oh, to be able to whiteboard in person again!

I learned a lot about how people think through problems, and that process led to me writing a seperate implementation of Lemire’s Algorithm, with a twenty-to-one comment to code ratio (that’s a lot of comments), and finding a sort-of security issue with the algorithm in the process. I say “sort of” because you really have a ridiculously enormous struggle to find a case where it’s going to be practical to exploit. But it still surprised me, and you probably can’t use Lemire’s algorithm in Vegas.

Why Lemire’s Algorithm?

I chose Lemire’s algorithm because it is brilliant. When I read Lemire’s code I get that kind of brain-tingling and gawk at the sheer “How on earth did someone think of this” of it all. Lemire has a mastery of code and how code is executed, and then pairs that with transcendent creativity and concision. Lemire also writes well, and the papers that accompany his code and algorithms are easily some of the most cogent and approachable you’ll find in academia. They are short and clear and avoid the jargon and obtuseness that plagues the field, while containing just enough formalism to be rigorous.

Lemire’s algorithm is a solution to the problem “Give me a random number between 0 and N, not including N itself”. For simulating a dice, N would be 6 and you’d get back either 0, 1, 2, 3, 4, or 5 with equal probability. Computers like to start at zero. We can always add one to the result to get the familiar results you’d get on a real dice. I’ve worked on random number generators and written quite a few. In 20 years of doing that, I’d never come across a solution as cool as Lemire’s.

Before Lemire, the best-known solutions to this problem required one or two divisions per number generated. You probably learned long division when you were quite young. You may remember that it can be get pretty tedious and cumbersome. It’s the same for computers. Division is actually one of the slowest things we can ask a computer to do. Lemire avoids division by translating numbers into a much larger number “space” and using binary-friendly powers-of-two operations instead. His technique almost always avoids needing a division.

The second reason I chose Lemire’s algorithm is that it is impenetrable upon first reading. There are lucky few people who are so practiced and conversant in number manipulation that they can see inside of algorithms like Neo in the matrix, but I don’t mean them. To the average reader, myself included, it’s not clear what’s going on and why.

Lemire’s reference code is fourteen lines long, like a sonnet. Twelve of the lines are simple, and any proficient programmer could tell what they are doing. Two specific lines of the code are pretty hard to decipher.. But the real difficulty is how hard it is to understand why all 14 lines are doing what they are doing, and why it results in a fair and correct algorithm for choosing a random number.

As a case in point, if you read the comments to Lemire’s blog, you’ll find several misunderstandings of the code. That’s within an audience of primed Lemire fans and critics who are familiar with the field. Lemire’s accompanying paper is great and very readable, but it still takes effort and concentration to follow everything. I work on cryptography and write cryptographic code for a living and I’m not ashamed to tell you it took me about 3 readings to really get it. The algorithm is based on an interaction between two modular spaces, one of which is sized 2 to power of 64, and the other is “N” sized, and if I just totally lost you there, I beg forgiveness and if you bear with me I promise there’s a link to my detailed dissection of how and why everything works.

All of this makes Lemire’s algorithm a really good challenge for creating a more readable version. Ideally something that an average coder can read in one pass, understand, and agree that it’s correct.

Why does readability matter?

Code readability is the most important pillar of code correctness and maintainability. When code is unreadable, it is harder to spot errors. That’s true when you’re doing a code review and it’s even more true when you’re adding more code to an existing code base. Great testing can compensate for some of the correctness challenges, but it doesn’t do as much for maintainability.

In s2n, one of our development principles is to write clear readable code with a light cognitive load. We’re serious about it, and I wasn’t going to try and use Lemire’s algorithm in s2n without a very readable implementation.

Working in software development I’ve heard the opinion more than once about how programmers and coders should be more familiar with the fundamental mathematics of logic and number theory that underpin their field, and that academic style papers and explanations should be sufficient. I happen to work with teams that are made up of both expert coders and expert mathematicians, and even there I don’t think this is a good idea.

Software engineering is engineering, which means translating science into solutions that work in the real world. To wildly generalize, there are better dividends to be found in focusing on deepening ones understanding of the “real world” part of that, the users, the customers, the business models, usability, ergonomics, and so on, than on the “science” part of that. Some science is needed, but there’s a shallow limit. Just as a civil engineer doesn’t need too detailed an understanding of the chemistry of concrete. PHs and setting times, but not quantum numbers.

Code should be self-documenting. The average programmer, including someone straight out of college, should be able to understand and work on a well put-together codebase. This greatly improves the odds of finding problems, and that’s what matters.

The contest

One of the reasons I’ve been so tardy about this blog post is that the contest didn’t go as I’d expected.

Let’s start with the great thing that happened. I got several submissions, more than enough to pick winners from. Many of these submissions are good and do improve upon Lemire’s code. The most common improvement was to use more descriptive names for the variables. Lemire uses variable names such as “s”, “t”, “l” which seemingly don’t correspond to much. In such a short piece of code, this actually isn’t too big an offense. I kept these terms in my own implementation, because I want to be consistent with the paper, but it’s absolutely a valid improvement. Another great improvement is to write and include tests, which is really an essential part of software development.

The first place winner did both of these, and also cleared up some of the confusing lines of code by making an implicit truncation explicit. According to GitHub the winner had 5 contributors, but twitter user komu_wairagu submitted it, so that’s who I’ll be contacting to Venmo $500.

The second place winner comes from Nimrod Aviram, who I know professionally as the discoverer of the DROWN vulnerability and fun person to hang out with at RealWorldCrypto. Nimrod’s implementation includes some great testing too.

The third place winner wants to stay anonymous, and cheated their way to some bonus points by writing a great implementation in LISP. How can you not love that? If they change their minds about anonymity I’ll update this post with a link. The sha256 checksum of their implementation is ae008644b2f0b10eddbbce3a0508b103b19925e4e0c9354c569ff0816acb760c.

Now on to the not as great, and one of the reasons I’ve taken this long to write. It doesn’t feel right to criticize any of these efforts, made in spare time and as part of a fun challenge I put out there, but none of them actually explained Lemire’s algorithm or broke down how and why it worked. I asked several colleagues to look through these implementations and tell me if they understood what was going on, and the response was a uniform zero. Noone could fully understand even one of the implementations, or Lemire’s original. When I asked people to also read the paper, things got a bit better, but everyone still seemed to trip on some issues. These are smart people who absolutely know what they are doing. I’m sure given more time they could fully understand everything, but that wasn’t the goal. The goal was to have understanding after one or two, or at most just a few, readings.

What did people find hard to follow?

Talking through things with people I found some common problems. Some directly in the code, and others in the paper. The first line of code that threw people was:

uint64_t l = ( uint64_t ) m;

This line of code does an unusual operation called truncation. It takes a 128-bit value “m” and truncates that to its least significant 64-bits. All of this is implicit in the code, rather than explicit. Reading the code, you have to recall that m is 128-bits. Then there’s also the question of why are we doing this truncation? I’ll leave that for a minute.

The next problematic line of code was:

uint64_t t = -s % s;

This line of code is using an unusual combination of unary negation, on an unsigned int, and the modulus operator. Nobody remembers what unary negation does to an unsigned int - and why should they - and it’s gymnastic math to try and do in your head. Then why is it using the modulus operator? what is going on? Worse still is that Lemire’s paper references this line of code in the paper only obliquely. It’s not in the body of the paper; the pseudo-code is coded differently and you have to realize that the pseudo-code and this line have equivalent effects. If you happen to notice it, some OpenBSD code that’s in the appendix, where this technique seems to have come from, can be used as a Rosetta Stone. This line seems to also have tripped up commenters on Lemire’s blog.

Coming back to the “why”, and to the paper itself, people found it hard to piece together. One concrete challenge was that early on Lemire establishes a lemma about how a range of contiguous numbers of size N * S will always include N values where X % S is 0, N values where X % S is 1 and so on, up to S - 1. The lemma is absolutely true, and people could intuit that, but the example used in establishing the lemma is a range starting at zero. Later in the paper the same lemma is applied to a range that doesn’t start at zero. People weren’t as confident, or couldn’t as easily “see” why it had to be true in both cases and was safe to apply.

The (very) annotated edition

Talking all of this into account I’ve worked up an annotated version of Lemire’s algorithm, for inclusion in s2n. The code is virtually identical to Lemire’s, but it’s preceded by a 300-ish line comment that explains how and why the algorithm works. The explanation mostly follows the structure and lemmas of Lemire’s paper, but I’ve made some significant departures.

The first departure is that the comment includes fully worked out examples of the algorithm being applied, but with fictional 3-bit and 6-bit integer types. This restricted size makes it plausible to just work through all of the values and convince yourself that everything is correct. We can really see inside the machine this way.

The second departure is that the comment uses parallel number lines to trace what happens. Since Lemire’s algorithm works by translating random values in a small number space to a much bigger one, this trick of using the concurrent lines really helps track what’s happening. Though it did force some improvement in my ascii art skills.

The explanation also works out examples of the confusing integer truncation, as well as a fully-worked out logic table for that unary negation business. Give it a read, and let me know if you can follow. At the very least, the comment will explain why “random() % s” doesn’t work on its own (this is a common coding error.).

The surprising sort-of security issue

In writing out the algorithm this way, and thoroughly insisting on understanding it, I discovered the sort-of security issue. I’m sure it’s obvious to Lemire, though it’s not mentioned in the paper, which seems a bit of a miss.

The basic problem is that Lemire’s algorithm will take different amounts of time to run. Lemire’s algorithm uses a technique called rejection sampling. It’s normal for algorithms using this technique to take a variable amount of time, but usually that doesn’t reveal anything about the result. With Lemire’s algorithm, those differences in time actually do reveal information about what the result was, and I include some examples in my version.

If the output was meant to be secret, say for an encryption key, or as the seed for a Vegas slot machine, this is bad. The difference in timing is very small, probably not even measurable, and due to the nature of random functions it’s going to be immune from statistical attempts to measure it through repetition. Also with its 64-bit and 128-bit integers, the differences in timing are going to be very rare unless you’re using a highly unusual bound for your random number. Here’s another fun twitter thread discussing the worst-case values. It’s not a big a deal in my opinion, this side-channel is far smaller and drastically less easy to encounter than what you can routinely find in error handling code. This function still works for what we need in s2n, where the output is public anyway, but it would probably technically disqualify the algorithm from certain applications like Vegas slot machines.

If I hadn’t worked the problem through, I wouldn’t have realized any of this, and so now we get to have a careful note that makes clear we can’t use the same algorithm in a private or secret context. I hadn’t expected that going in, but to me, this is what readability is all about and it was worth the time spent rather than simply copy and pasting the code in.

The code I wrote is now going through s2n’s code review, and already that’s resulted in some further improvements. Hopefully it’ll make it in to the repository in the next week or so. Even though I’m the original author of s2n, that’s out of my hands. One of our other rules is that no one can merge their own code. You can watch along on GitHub if you’re interested.

Previous
Previous

Cutting edge film and digital in 2020.