job I often interviewed candidates for computer programming positions. One of the most clever questions, or so I thought, was the following:

We discovered the question when one of my co-worker's wife needed to solve this for a program she was working on.

This was an interesting problem and had two solutions of which I knew. One could solve the problem in O(1) time, but was potentially memory intensive. The other could solve the problem with a much smaller amount of memory but was O(logn). Both used an array as an internal memory structure to map the values as appropriate. I still believe my old co-workers may ask this question, so I will leave the details of the answers as an excercise to the reader.

At my new job I was discussing non-uniform random distribution with a co-worker, he needed a bell curve distribution, and I realized he was discussing the problem this interview question was all about. I explained the two answers to him, thinking that I was being clever. He then proceeded to tell me the solution he used, and I was highly impressed. Since he had a specific distribution in mind, he could actually write a function which took the output of rand() as the input and directly calculated an output using no interal memory structures (other than variables on the stack).

It made me realize that in essence any solution of this problem was just a mapping from X (the output of rand()) to Y the number you are looking for. The simplest case is uniform random distribution in which you have Y = X. While it seems obvious in retrospect I had never thought of the problem in those terms.

With this new perspective on the problem I realized the solutions we were looking for in our interviews were completely generic, and could solve any distribution you cared to make an array for. However, it was probably never going to be the most efficient for a specific distribution, and it makes me wonder if you can programatically come up with a function for a non-uniform random distribution given a set of points, where the function uses no internal memory structure other than variables on the stack. Maybe some of my mathier friends can shed some light on it.

At my old
*There is a function rand(). The funtion rand() will return the numbers between 0 and 1. The function rand() can return 0, but it will not return 1. All numbers returned by the rand() function have the same probability of being returned. We will call this uniform random distribution.*

Given rand() write an algorithm which can return non-uniformly distributed numbers. Specifically as input you will be given an array of floats. Each float in the array determines the percentage chance of the index of that array element being returned. For example if given the array [0.1, 0.2, 0.3, 0.5] 0 will be returned 10% of the time, 1 will be returned 20% of the time, and so on. The values of the array are guaranteed to sum to 1. We will call this non-uniform random distribution.Given rand() write an algorithm which can return non-uniformly distributed numbers. Specifically as input you will be given an array of floats. Each float in the array determines the percentage chance of the index of that array element being returned. For example if given the array [0.1, 0.2, 0.3, 0.5] 0 will be returned 10% of the time, 1 will be returned 20% of the time, and so on. The values of the array are guaranteed to sum to 1. We will call this non-uniform random distribution.

We discovered the question when one of my co-worker's wife needed to solve this for a program she was working on.

This was an interesting problem and had two solutions of which I knew. One could solve the problem in O(1) time, but was potentially memory intensive. The other could solve the problem with a much smaller amount of memory but was O(logn). Both used an array as an internal memory structure to map the values as appropriate. I still believe my old co-workers may ask this question, so I will leave the details of the answers as an excercise to the reader.

At my new job I was discussing non-uniform random distribution with a co-worker, he needed a bell curve distribution, and I realized he was discussing the problem this interview question was all about. I explained the two answers to him, thinking that I was being clever. He then proceeded to tell me the solution he used, and I was highly impressed. Since he had a specific distribution in mind, he could actually write a function which took the output of rand() as the input and directly calculated an output using no interal memory structures (other than variables on the stack).

It made me realize that in essence any solution of this problem was just a mapping from X (the output of rand()) to Y the number you are looking for. The simplest case is uniform random distribution in which you have Y = X. While it seems obvious in retrospect I had never thought of the problem in those terms.

With this new perspective on the problem I realized the solutions we were looking for in our interviews were completely generic, and could solve any distribution you cared to make an array for. However, it was probably never going to be the most efficient for a specific distribution, and it makes me wonder if you can programatically come up with a function for a non-uniform random distribution given a set of points, where the function uses no internal memory structure other than variables on the stack. Maybe some of my mathier friends can shed some light on it.

## Comments

(Anonymous)and

n

∑ (p) = 1

0

and

R = rand(); and 0 <= R < 1

also

consider the rough sketch

R

0---------------------------->1

0-->p1------->p1----->p2--->pn

and if R is uniform then the following Java code should work,

it may not be the most efficient :-) but ...

public int getIndex( float[] pa )

{

float sum = 0.0f;

float R = rand();

for( int i = 0; i < pa.length; i++ )

{

sum += pa[i];

if( R < sum ) // < rather than <= because it could never be 1

{

return i;

}

}

return 0; // got to cover all eventualities for the compiler

}

Regards,

Angels husband Matt

grieveAs written your solution will take O(n), but it is a precursor to the O(logn) solution. We certainly would have accepted it as a legitimate answer during the interview. With just a minor bit of tweaking you can get the O(logn) answer. :)