by Tanada » Wed 31 Dec 2008, 01:47:56
$this->bbcode_second_pass_quote('yippleflipple', 't')his is really interesting.. is there something specific or fundamental that causes this shape in nature?? Im no math whiz but seems like it can be applied to human behavior, or maybe thats just me

For any series of values in increasing order you always start low and get larger. Seems simple enough right? Well as a consequence of that fact if you have less numbers than the entire set any random sampling will be weighted towards the low end.
Take the street address numbers as an example. Almost any street is going to have a 100 block, and except for really short streets a 200, 300 and 400 as well. However there are many streets in a randomly constructed city that are not more than 19 blocks in leangth on one side of the dividing line (e-w or n-s in most cities) So if a city grid is 8 blocks to the mile and the average small city is 3 miles across then you will get blocks 100-1200 in each direction assuming equal distribution. This gives you blocks 100, 1000, 1100 and 1200 all starting with a 1. Now add a little complexity, say the city is 3 miles by 4 miles roughly rectangular in shape. That gives you another array of block numbers from 100-1600 in the other direction, which leaves you with a second data set of 100, 1000, 1100, 1200, 1300, 1400, 1500, 1600 all trending to give you a 1 as the first didget. By the same token you have only 200, 300, 400, 500, 600, 700, 800, 900 to give you 2, 3, 4, 5, 6, 7, 8, 9. Unless the city is carefully planned and rigorously laid out you will have subdivisions and side streets and such cropping up that will add more low numbers like 200 and 300 but less high numbers like 800 and 900. Throw all those random values in and you get more of any value as it aproaches 1 and your graph makes one of those neat declining value curves as shown above.
Now go back and look at Angrybill's data. Because the set of possible numbers was 1-99 and only 41 inputs were used not every number was picked. Presuming that there were no doubles a truely random sample would give nearly equal values for first didget 1-9, however people like certain numbers and are predisposed to choose them. For example I picked 23 because its my favorite number and has been since around 1980, it has nothing to do with that Jim Carey movie of a couple years ago. Based on Angrybill's data only 4 numbers in the 2 set were picked out of 41. In a truely random sample that would be dead on, each of the subsets has 11 possible intergers for example 9, 90-99 or 3, 30-39. 41 choices gives you 4.55 per subset. However 1, 10-19 received 10 entries instead of 4.55 while 5, 50-59 and 8, 80-89 each only received 2. Assuming no doubles then 10 of the 11 possible for 1, 10-19 were chosen. To really get a Bedford's Law type distribution you have to sample a statistical universe of subsets, Angrybill only has one sample set of 41 answers to work with. In the street adress number sample the numbers came from a lot of different streets, i.e. from an array of subsets, not a single subset.
As someone mentioned a lot of people asked this question will pick a partial birthdate, and here again the artificial rules that govern a calender weight things to the low end. If you pick day of the month you max out at 31 choices, 1-31. That gives you 11 in the 1, 10-19, 11 in the 2, 20-29 but only 3 in the 3, 30, 31. Now add in that 5 months have 30 days and one month has 28/29 and you put even more emphasis lower on the set. What about those who pick their birth month instead? Well that gets you 1, 10, 11, 12 for the 1 interger and only one occurence of the remaining numbers.
Is that clear as mud for you now?