"There's only one hard and fast rule in running: sometimes you have to run one hard and fast."








Monday, December 1, 2008

Comparing Trail Races

The one thing that's always bothered me about trail races is getting a finishing time and place and having no idea what it means in the context of other races. Comparing races is frustratingly difficult: different lengths, different terrain, different runners, different weather. There have been a number of attempts to compare results, but I think I've finally come up with a solution that's fairly accurate and easy to use (trust me on that last bit - there's some difficult math coming, but it can all be disregarded).

First, let me mention two dead ends. The number of finishers, plotted against time gives a sigmoidal curve which immediately had me thinking of a simple logistic curve used in population dynamics - it doesn't work, even if one considers overlapping discrete populations (which makes the math intractable). The number of finishers in large races of homogeneous populations, when plotted agains the logarithm of times gives a bell-shaped normal distribution; the math is ridiculously difficult and the results are largely meaningless (the fact that the curve becomes linear between 0.3 standard deviations either side of the mean led nowhere).

The solution was to use the engineering trick of assuming a generalized power function,

n=aT+bT(squared)+cT(cubed)+dT(fourth power)

[obviously, I haven't figured out how to do superscripts yet.]

Almost any curve can be described with four constants. If one chooses four points, one has four equations with four unknowns and can solve for a,b,c and d. The problem with this is that any four points give a different result from any other four. After playing with data for a while, I began to notice that often 3a/b=b/c=c/3d. This was an unexpected gift. What I'd been overlooking was the fact that one is looking at a smoothe curve, with the first and second derivatives being 0 at n=0. This simplifies the equation to


n=a(1- T/To) [cubed]

where n is place, T is finish time, and a and To are constants. If one plots the cube root of finish place versus time, a linear regression results in an intercept (giving a) and a slope of a/To. To is a characteristic time, a theoretical time which cannot be run, even with an infinite number of entrants doing the race an infinite number of times. This time can be used to compare races. One takes one's time from a race one's done, divides it by the To of that race and multiplies that by the To of another race to find one's expected finish time in that race.

Few people besides me would bother to do this much work. There is a shortcut that often works. If the data co-operate, To can be found from the times of the first, eighth, twenty-seventh and sixty-fourth finishers.

To=2T(1)-T(8).

To=3/2T(1)-1/2T(27)

To=3T(8)-2T(27)

To=(4T(1)-T(64))/3

To=4T(27)-3T(64)

To=2T(8)-T(64)



Those six equations will usually give a bunch of different results. Those using the winner's time, T(1), are generally the least accurate. At least a few of these will give a similar result, which will be very close to the To found by a linear regression. The two equations in bold are easy to use and to remember.

Does it work?

Using the To values for the 2007 Superior Trail 50 Mile and 2008 Superior Sawtooth 100, it predicted Chris Gardner would finish the 100 in just over 22 hours (he ran 21:57) and Adam Harmer would finish in 24:48 (he didn't finish).

Here's a few numbers for this year:

Superior 100: To=17:12 (In 2007, To=16:57. This implies that the course was more difficult than last year, though everyone agrees it was easier. The margin of error is about 10 minutes, which is 1% - can you predict times within 1%?)

Lean Horse 100: To=12:21

Rocky Raccoon 100: To=10:09 (Note that this is much faster than the world record!)

Hardrock 100: To=23:35!

That looks about right. Superior is 70% harder than Rocky Raccoon, Hardrock is twice as hard as Lean Horse.

This brings about an interesting point. Kyle Skaggs ran faster than is possible at Hardrock. How can this be? The math could, of course, be wrong, but there are other possibilities. It's possible that Skaggs, who trains on the course, took a shortcut in the night; it's also possible that he's the only person who's ever trained properly for the race and the others have been out there just to run and not compete. It's possible Skaggs is a freak of nature (either naturally or by dint of chemicals). It's also possible - and this is my personal view - that this is a performance like Bob Beamon's long jump record, something thought impossible until done and unlikely to be repeated by him or anyone else for decades.

9 comments:

brothergrub said...

I am a total numbers geek - My economics degree came with a lot of coursework in stats including regression analysis... CAN'T WAIT to go home and start crunching the numbers! coooooool!

brothergrub said...

Why the 1st, 8th, 27th and 64th ? Wouldn't that skew the numbers for results with larger vs. smaller fields? (8th of 120 runners as opposed to 8th of 240 runners...)

Wayne said...

Welcome back, Steve. How was your 5 minute retirement?

Wayne said...

ok, my bad... now I see it was 3 days and 5 minutes. I'm glad you weren't gone long... I say keep posting.

SteveQ said...

Kevin, I tried to come up with something that anyone could do, so cube roots of perfect cubes gives simple answers. I skipped the higher numbers because most 100s have a limit under 125 entrants.

Wayne, still no running, though. A THIRD knee injury on my one attempt! I thought this post might be useful or at least interesting.

Carl Gammon said...

Very interesting. For races that don't even have 64 finishers, do you just solve whatever equations you can for To?

Theoretically, it looks like anyone able to finish the Superior 100 in the allowed time of 38+ hours should be able to break 24 hours at Rocky Raccoon. (But then, my math might be suspect.)

johnmaas said...

Neat statistical stuff there, Steve.
I was tempted into toying with the numbers.
My calculations come up with a projected 15:17 finish at Rocky Raccoon. I know that is not possible for me.
Perhaps the Rocky Raccoon results used to figure the To are influenced by a very strong and competive field there and the Lean Horse field was much less talented.
Or maybe I should plan a trip to Houston.......???

SteveQ said...

John, I included the Rocky Raccoon because the results ARE odd - you couldn't do 15 hours, Skaggs couldn't do 10 (c'mon, 6 minute miles?!). One can show anything with statistics... and, yes, the fields are very different, which is probably why that one is off - it pulls in those who can run 13-14 hours and not those who relish difficult terrain.

Steve said...

Wow, that retirement didn't last long.......welcome back!

To brothergrub: you put the "geek" in "numbers geek". Ha ha!