First, let me mention two dead ends. The number of finishers, plotted against time gives a sigmoidal curve which immediately had me thinking of a simple logistic curve used in population dynamics - it doesn't work, even if one considers overlapping discrete populations (which makes the math intractable). The number of finishers in large races of homogeneous populations, when plotted agains the logarithm of times gives a bell-shaped normal distribution; the math is ridiculously difficult and the results are largely meaningless (the fact that the curve becomes linear between 0.3 standard deviations either side of the mean led nowhere).
The solution was to use the engineering trick of assuming a generalized power function,
[obviously, I haven't figured out how to do superscripts yet.]
Almost any curve can be described with four constants. If one chooses four points, one has four equations with four unknowns and can solve for a,b,c and d. The problem with this is that any four points give a different result from any other four. After playing with data for a while, I began to notice that often 3a/b=b/c=c/3d. This was an unexpected gift. What I'd been overlooking was the fact that one is looking at a smoothe curve, with the first and second derivatives being 0 at n=0. This simplifies the equation to
n=a(1- T/To) [cubed]
where n is place, T is finish time, and a and To are constants. If one plots the cube root of finish place versus time, a linear regression results in an intercept (giving a) and a slope of a/To. To is a characteristic time, a theoretical time which cannot be run, even with an infinite number of entrants doing the race an infinite number of times. This time can be used to compare races. One takes one's time from a race one's done, divides it by the To of that race and multiplies that by the To of another race to find one's expected finish time in that race.
Few people besides me would bother to do this much work. There is a shortcut that often works. If the data co-operate, To can be found from the times of the first, eighth, twenty-seventh and sixty-fourth finishers.
Those six equations will usually give a bunch of different results. Those using the winner's time, T(1), are generally the least accurate. At least a few of these will give a similar result, which will be very close to the To found by a linear regression. The two equations in bold are easy to use and to remember.
Does it work?
Using the To values for the 2007 Superior Trail 50 Mile and 2008 Superior Sawtooth 100, it predicted Chris Gardner would finish the 100 in just over 22 hours (he ran 21:57) and Adam Harmer would finish in 24:48 (he didn't finish).
Here's a few numbers for this year:
Superior 100: To=17:12 (In 2007, To=16:57. This implies that the course was more difficult than last year, though everyone agrees it was easier. The margin of error is about 10 minutes, which is 1% - can you predict times within 1%?)
Lean Horse 100: To=12:21
Rocky Raccoon 100: To=10:09 (Note that this is much faster than the world record!)
Hardrock 100: To=23:35!
That looks about right. Superior is 70% harder than Rocky Raccoon, Hardrock is twice as hard as Lean Horse.
This brings about an interesting point. Kyle Skaggs ran faster than is possible at Hardrock. How can this be? The math could, of course, be wrong, but there are other possibilities. It's possible that Skaggs, who trains on the course, took a shortcut in the night; it's also possible that he's the only person who's ever trained properly for the race and the others have been out there just to run and not compete. It's possible Skaggs is a freak of nature (either naturally or by dint of chemicals). It's also possible - and this is my personal view - that this is a performance like Bob Beamon's long jump record, something thought impossible until done and unlikely to be repeated by him or anyone else for decades.