VBA's Pseudo Random Number Generator

(Keywords: VB, VBA, PRNG, LCPRNG, rand, rnd, randomize, randomise, invert, modulo, reverse engineer)

There it is - hexadecimal code

[NOTE: This documentation refers to prior versions of Visual Basic. However, the concepts are useful for PRNGS in general.]

The pseudo-random number generator discussed is the one provided by Microsoft in its Visual Basic library, used by Visual Basic itself and its variant, VBA. Most generators that are supplied with 3GL and 4GL languages are very standard and are variations of the Linear Congruential Pseudo-Random Number Generators (LCPRNGs). These operate in the following way:

Each new number is the product of the previous number times a constant (a) plus another constant (c). The result is kept modulo a third number (m), which is invariably the width of its ‘long’ integer size (or fraction thereof). Standard examples are m=2^64, 2^48 or 2^32. Mathematically, this is represented as: r(i+1)=r(i)*a+c (mod m), where r(0) is the ‘seed’ value.

Although the algorithm in use can be obtained from analysing a string of outputs from the generator and deducing the values of a, c and M, Microsoft has published theirs on their web site (
Knowledge article) and it turns out that they use an LCPRNG with:

a = 16598013, c = 2820163 and m=2^24 = 16777216


These PRNGs operate on integer spaces, VB’s returning an integer number from 0 to 16777215 (M-1). VB provides one more step in this, converting the resulting number to a floating point representation. It does this by simply dividing the resulting integer by 2^24 and thus returning a floating point number in the range 0-1.

VB’s random number generator uses a standard LCPRNG model, with a,c & m as described above. This gives it a period of about 17 million before it cycles. (Most of the sensible PRNGs will use a modulus of, say, 2^64 and return the top 32 bits only, which makes it more random and harder (though still possible) to invert due to the much larger period. Certainly, more bits of output are required in order to crack these.)

One extra function that VB provides is the ‘randomize()’ function. This function takes a floating point number as input and reseeds the PRNG, so that r(0) is some function of this input. It is easy to determine that this function is just an exclusive or of the first and third bytes of the input with the second and fourth. This is a 'lossy' mapping, which should theoretically make it impractical to determine what possible values of f were used in the call from just the output alone. If a good hash function were used for randomize(), there would be around 2^42 alternative f's that would create the same R(0). As we will see, VB's randomize() function is not very one-way, using XORs on two input bytes to create two of the output bytes.

Now suppose we are given a few floating point numbers that have been produced by VB’s rnd() function. Firstly, we can multiply them by 2^24 to get the original in integer form. So we have a few integers in our hands that we know have been generated from the rnd() function. How easy is it for us to determine what the seed value was that started this sequence? Is it a difficult problem? Can we step back through the generator and determine all of the previous random numbers prior to those we had at hand? Can we determine what was used by the randomize() function to create the seed?

Reversing the Algorithm

One less well-known feature of LCPRNGS is the fact that they are invertible. That is, given a particular number r, its predecessor, r(-1) can be found and the sequence essentially run backwards. This is done quite simply. Firstly, the multiplicative inverse of a mod m is found. This is the number which, when multiplied by a modulo m equals 1.

Under modulo arithmetic, the inverse of a, ainv, acts like rational division in normal arithmetic. Thus a *ainv = a/a = 1. Given this, we can invert the LCPRNG equation, arriving at r(i)=ainv(r(i+1)-c) (mod m) , where it is now only required to determine ainv from a. Although this can be determined readily by
trial and error (because VB's rnd() space is so small), a simple algorithm for computing multiplicative inverses is also available and called the 'Extended Euclidean Algorithm'.

In our case, a=16598013 and ainv is found to be
602453. This can be checked by 16598013*602453 = 9999522725889 = 1 mod 2^24. So, given any number output from the VB random number generator, we can calculate the previous number in the sequence by applying the general rule r(i)=602453(r(i+1)-2820163) (mod 16777216)
It is quite common to want to find out what generator and seed values were used to produce a certain sequence of output characters, which are invariably number in a smaller range, e.g. 0-9 or A-Z (ascii). It is not difficult, given only a handful of these in many cases, to determine the seed.

Summary

In summary, we have shown that VB's pseudo random number generator, rnd(), is quite weak for anything but very simple needs:

  • It has too small a period, only 2^24 = 16 million values before it cycles.
  • The output stream is too predictable. For example, all the output numbers alternate between even and odd!
  • The randomize() function is not a strong hash function and enables much information to be determined about the initialiser value from the output stream.

So be warned if you were thinking of using VB's PRNG for any cryptographic purposes!

SW July, 2001

 

Excel's PRNG

(Keywords: Excel, PRNG, FPRN, rand, rnd, randomize, randomise, reverse engineer, seed, rho, period)

Binary tunnel

This article concentrates on the earlier versions of Excel where the PRNG was rather problematic. See another article for more information on the generators in different versions.

"In Excel, FPRNs (Floating Point Random Numbers) are returned by the RAND function.However the algorithms are different. In Excel 2000, RAND is an FPRN. In Excel 2003,RAND is generated from three separate IRN(16)s, converted to FP values, and rescaledto the zero to one interval, giving a single FPRN output. The seed in this case is the threeseparate IRN(16)s which change on every RAND call. These three are hidden fromaccess and can’t be modified. This means that a given sequence can never be repeated."



In Excel 2003, however, Microsoft apparently tried to fix the weaknesses in the generator - but got it quite wrong. See
this article.

"If you have a spreadsheet that uses the RAND() function, and you’re using Excel 2003, you may be getting bad results – and you probably don’t even know it. Excel 2003’s RAND() has a big, big bug in it.In short, RAND() should produce a pseudo-random number between 0 and 1.  Microsoft reworked the function for Excel 2003 so that it would produce a better quality of randomness, especially when you ask for a large number of random numbers.Guess what?  If you do try to get a lot of random number, the function gets buggy, instead of numbers from 0 to 1 you start getting negative numbers.  If that weren't enough, the negative numbers aren't sufficiently distributed to be considered 'random'.And to top it all off, Microsoft has been told about the problem and has done nothing.  Like serious Excel problems in the past this new bug has been ignored."




Excel (at least pre 2003) has a rather interesting PRNG that it inherited from the earliest versions. Interesting because it is 'almost' linear - but not quite. As Excel deals with floating point numbers as a rule, it seems that the algorithm chosen tries to generate numbers using a floating point algorithm rather than an integer one. (see
Microsoft Knowledge Base article.)

First random number:
  random_number=fractional part of (9821 * r + 0.211327),    where r = .5

Successive random numbers:
   random_number=fractional part of (9821 * r + 0.211327),    where r = the previous random number

In fact, it almost appears as if the developer copied a Linear Congruential PRNG algorithm into floating point space and tried to make it fit. Even if this isn't the case, the fit isn't so special. It turns out that this quasi-linear implementation makes the algorithm much worse by introducing large numbers of loops within the stream.

These loops are called rho loops in the mathematical literature on account of the shape of the Greek letter rho. Basically, you start at a certain seeded number, iterate for some amount of time and then suddenly a number is hit that appeared before. When this happens the series enters a loop, never to exit. If the loop is small, then the numbers produced are extremely non random.

The algorithm with the default starting seed generates about a million unique numbers before repeating even though the floating point numbers have vastly more bit size. VBA's algorithm - weak by comparison to most PRNGs in use - generates about 16 million. A 32 bit generator will do about 2 billion.

Simple code to find the inverse of a number modulo 2^24


Appendices

'C' trial and error test program for finding the inverse of a in the VBA PRNG.
(This is quite an inefficient way to calculate the multiplicative inverse of a number. However, it will work for small spaces like 2^24 and it is left as an exercise for the reader to create an algorithm based on the extended Euclid’s method. See
here for more details.