Discussion: OUSA Rankings Formula

Working from memory here...

The Personal Course Difficulties* are merged together using a harmonic mean to get the overall Course Difficulty. The course difficulty is then divided by each competitor's time to get a new number of ranking points (substituting for the initial 50). Lather-rinse-repeat until the numbers stop changing (there's some threshold of change it has to get below).

(Oh, that agrees with what's in the link above, so that's good.)

( * "Personal Course Difficulty" is a boringized version of the original term, which was Personal Gnarliness Value.)

(And it was not always thus. There were old methodologies that got improved upon, until we came to the current one. 35 years ago, it was all done with paper and a calculator.)

May 10, 2023 1:05 AM #

Thanks! But once the ranking points are updated, you also need to update the Personal Course Difficulty, right? Do you recall how that is done? Initially it's by taking the product of running time with ranking points, but if that's done after the first step then they all evaluate to the same number: the prior step's Course Difficulty.

May 10, 2023 1:20 AM #

Oh, right, the critical step: what you say would be true if there were only one race. But instead you average each person's ranking points from all of their races during the time period in question (typically the past year), and then you crank through the calculation again. There are some details about exactly how many races count (maybe they all count during the iteration, but not for the final step after you've reached convergence?). And I don't recall what's currently done for things like DNF, that's something that has changed over the years.

May 10, 2023 4:48 PM #

soonerjcb:

I spent about a year trying to figure this out. I think I finally did. But it would take another year to be able to explain it better than...it basically works as good as any other method.

May 10, 2023 9:02 PM #

It sounds the only thing I’m missing is how the new iteration of ranking points updates course difficulty, because from then I think I understand how that updates ranking points. It’s somehow done using prior rankings?

May 10, 2023 9:49 PM #

ajriley:

If you do good you get more point, if you do bad you get less point

May 10, 2023 9:57 PM #

"Prior" in the sense that it's the points from the previous iteration. Everybody starts with 50, and after you do the calculation as described, the faster people have more points and the slower people have lower point values. Then you do it again, and they all move a bit more. Etc. At each step, you multiply each person's new ranking points by their time, and harmonic mean those PGVs together to get a new Course Difficulty.

In days of yore, the whole thing was seeded with the previous year's rankings, and there was no iteration, it was just one step. Then when we got enough computing power to iterate until it was stable, there were experiments to make sure that it didn't depend on the initial conditions. One of those was to give one person 100 points and everybody else one point. Another was to start everybody at 50 points. They all converged to the same answer.

May 10, 2023 10:29 PM #

Ah ok I think I get it. So the step where you multiply the new ranking points is not just the ranking points from the last step, it's the average of the ranking points from last step with the however many ranking points accrued from races in the last year per runner, then that number gets multiplied by the runner's time to get a slightly updated PGV.

May 11, 2023 1:09 AM #

but this just a preliminary step, before we tally in our privilege points, right?

May 11, 2023 1:25 AM #

Danny: right, but... it's not just this new race that's getting recalculated, but rather all of the races in the past year. So in principle, for the rolling rankings, every time a new race happens, or one ages out, the Course Difficulties for all of the races could change, because the whole calculation starts from scratch. And in fact they probably do, but not by much, assuming the data set is large and well-connected. (I could dream up pathological cases where a particular lightly-attended race would shift significantly, but that's not very likely to happen.)

yurets: the way things were calculated many years ago, your comment wouldn't have been entirely off-base...

May 11, 2023 1:50 AM #

That's actually really interesting. I don't have any experience using harmonic means, but it seems remarkable that numerical tests show that the iterative method converges to the same number regardless of starting value! Is there any published research about these properties? Does this general schema have a name so that I could look more into it? Or is this a homebrewed method that only we use?

May 11, 2023 2:13 AM #

>>Does this general schema have a name

Contraction Mapping/ Fixed Point Theorems,
any topology text

May 11, 2023 3:19 AM #

A historical note: prior to the use of harmonic mean, the method in use was to take the PGVs and sort them, then use the 40% percentile value (there was a rationale for doing this instead of using the median). That dated back to before iteration. Harmonic mean was a better approach, but the people who had proposed the improvements from the really early method (where only the top three results were considered) didn't think of it. The 40% percentile method wasn't always stable under iteration; sometimes it oscillated. I'd be somewhat surprised if any reasonable approach could converge to different values based on initial conditions, but instability is a different matter. A good process will neither oscillate nor diverge.

(Oh god, I'm having flashbacks to a class that I took in 1982 that involved drawing a lot of root-locus diagrams. I dropped that class.)

May 11, 2023 11:39 AM #

cmpbllj:

Hi, we're the sport that:

-uses hieroglyphics (clue sheets)
-stares at barely decipherable brightly colored pieces of paper.
-runs full speed through thickets. Or tries to.
-determines our best by a simple iterative process of convergence of the harmonic mean of the personal gnarliness value.

Definitely on track to take over the sporting world...

But, it's the weekend of the Billygoat race, so all is well in the world...

May 11, 2023 3:06 PM #

soonerjcb:

These are from my notes:

To get a score, they (basically) do the following: Give you a daily ranking score. Adjust it for Course Difficulty. To do this, they need to assign a "Personal" Course Difficulty score to everyone and eventually calculate an overall Course Difficulty from it.

How is this done….okay. Let me try.
1. The equation is one that is run over and over again until the results become static. It is iterative.
2. Essentially it compares you against the field, adjusts your average, which adjust the field average which adjusts your ranking points until it becomes static.

Use the value 50 for initial ranking points if no existing score exists for competitor. Run a similar equation to the one below through the computer. The key to the equation is to get the Course Difficulty Rating and divide by your time. How they get that is a bit of a process. Basically they try and compare how difficulty the course was for you individually by pseudo-comparing the entire field against their expected results. See below Disney example for the idea and math as I understand it.

EQUATION FOR PCD: Ranking Points x Time
Mickey Mouse - 50 x 19.15 min = 957.8 (new competitor) PCD Score Initial
Donald Duck - 50 x 26.10 min = 1305 (new competitor) PCD Score. Initial
Goofy - 79.25 x 28.30 min = 2242.8 (existing competitor) PCD Score. Initial
Cinderella- 32.20 x 42.80 min = 1378.2 (existing competitor) PCD Score. Initial
Aladdin- 50 x 114.75 min = 5737.5 (new competitor) PCD Score. Initial
Next Find Initial CD for everyone
Calculate First CD (Course Difficulty) Average - Use inverse harmonic mean to calculate (5/(1/A)+(1/B)+(1/C)+(1/D)+(1/E)). => 1584.2387
Next Find new Ranking Points
Initial CD for everyone / your Time
Mickey Mouse - 1584.2387 / 19.5 = 81.24 points (a quick Logic check makes sense…Mickey had the fastest time..his points should rise in the first iteration).
Next Recalculate everyones second round PCD using the new Ranking Points
Mickey Mouse = 81.24 x 19.5 minutes = 1555.756. So Mickey started with a 957 PCD, and after the first iteration, it rose to 1555. His initial ranking points of 50 rose to 81.24. Mickey is the man...errr, mouse!

Repeat for all competitors and then recalculate the harmonic mean. This changes all calculations by a bit. Then the equation is run again, and they change by a touch less. On and on until change no longer occurs. The final answer is the one with no change.

-Except this is not completely how it is done. There are plenty of things in the program that are done to deal with specific situations and strange things that are one-off’s.

Back to the simplification: run fast, score well. Compare times against others in your age category. Do at least 4 NRE’s to get points.

Hope this helps and is correctish:).

May 11, 2023 4:39 PM #

I guess I should clarify as a competitor I'm not worried at all about its efficacy. I'm simply asking as a math student who has worked with evaluating ranking systems before because this process is just downright interesting to me. Understanding the exact methodology here helps me have a more complete picture of the types of algorithms used in generating rankings, a topic which I am casually researching.

soonerjcb: your process sounds like it aligns with what J-J was saying except for when you iterate. If you were to now compute rating points * course time per competitor, you would recover a PCD of exactly what the total CD was last step, and so nothing changes. The process of taking the new preliminary ranking points from this race and averaging it with ranking points from past races is what allows the system to continue to change incrementally.

J-J: I certainly agree that good processes shouldn't diverge or oscillate, but it's not exactly clear to me what exactly is making this process converge, hence my interest! Thanks for breaking it down for me, I'll be able to make and test statements about the algorithm's properties now.

May 11, 2023 6:16 PM #

feet:

@cmpllj: we could make the rankings simpler if only we agreed to standardize the courses, for example, Start-5000m marked route-Finish. Seems a small price to pay.

Face it, orienteering in the U.S. is already for nerds with aerobic ability (aeronerds? nerdrobics?) so we may as well lean in and have a ranking system that's worth proving theorems about.

Danny: I confess I'm kind of hoping for a counterexample to convergence. The proof, if it is robust, will probably indeed be a matter of a small tweak to a familiar fixed-point theorem. Much more interesting if there's a weird case on the edge of the parameter space that breaks it. My intuition is that something weird could happen when the competitors' rankings are updated given that some races are dropped - problems seem like they could arise when each competitor has different worst races. (For example, take a case where there are two different groups of runners, A and B, and only one race where there are people from multiple groups, and in that race there is only one person from A and one from B.* Trivial observation is that the relative rankings of runners from A vs. runners from B is crucially dependent on the result of that race. Now imagine that that race gets dropped from the rankings calculation for at least one of those runners. Seems like weird things could happen.)

*Or duplicate everybody so there are four runners in this race, to meet the minimum runner count required for races to be included.

May 11, 2023 7:13 PM #

Yeah, the biggest threat to numerical stability is a poorly-connected dataset. East coast and west coast, and only a few people make the trip. Or an oddball race that almost nobody attends except locals. If you had two disjoint populations, the best they could do would be to converge on two independent solutions that don't really have any relation to each other, even though people might assume that they do. If you have one strong runner who ventures to the other side just once and has a bad run (or vice versa), that can drive the two populations apart.

May 11, 2023 7:19 PM #

sandrunner:

May 11, 2023 7:30 PM #

feet:

Actually, here's a better kind of problematic example. Imagine there are three groups of runners: A, B, and C, who entirely race among themselves. Choose three individual runners, one from each group, called a, b, and c.

Then add three races
Race ab, in which a beats b.
Race bc, in which b beats c.
Race ca, in which c beats a.

And adjust the margins in these races so that that they either count in or get dropped from the ranking scores for a, b, or c. This will crucially affect the assessment of which group is best.

Or variants of this kind of idea.

(I still think that the 'drop the worst ranking scores of some runners' is the most likely source of a problem in numerical stability, since which races those are depends on the race difficulty scores, so this could generate some oscillation.

My feeling is that in realistic data sets, dropping some races is a good thing for generating sensible rankings, because it means that someone who makes a 30 minute mistake once but is otherwise excellent does not get assigned a low ranking score. Doing that would make the runners they beat when running well get penalized when they should not be. So, to be clear, I think the real-world system SHOULD have this feature. But it seems like a potential source of oscillation in poorly-connected data sets.

May 11, 2023 7:33 PM #

graeme:

I suspect that negative times will break the convergence. That won't bother orienteers but might trouble passing mathematicians. Even convergence isn't enough, you need to have only one fixed point.
In the UK we have an algorithm that excludes "silly scores" (like when you turn up an hour late for your start time and they wont adjust it). That did produce instabilities where the score alternates between silly and not-silly on each iteration.

May 11, 2023 7:56 PM #

Wow, we discovered non-transitivity of "beats" relation... time to prove a version of Arrow Theorem for orienteering rankings

May 11, 2023 10:51 PM #

I'm not sure, but I think there may be no dropping of races during the iteration process, only at the final step where you extract the published score for each runner. But what to do with DNFs is a good question. You can argue that a runner deserves a low score for not completing a course. But it doesn't make sense to be averaging in some artificial large time when assessing the difficulty of a course. The "would have medaled but didn't punch at the water stop" situation which has occurred several times recently is the prime example of that. Those shouldn't figure into the calculation (I don't know if they are).

May 12, 2023 11:02 PM #

bill3:

I was the OUSA Rankings coordinator from 1993 to 2002 and spent many hours watching datasets converge. The rankings were calculated in Excel using an Excel macro. I know some things have changed since I stopped doing the rankings but I still have pretty good recollection of what I was doing with the calculations at the time. First of all, I can tell you is that everything discussed here is mostly true. Races were not dropped until the iteration process was over. However, non-finishes (DNF, MSP, DSQ) were assigned 20min + course time limit (so usually 200 minutes) but those scores were NOT included in the iterations as they didn't offer a useful point for the calculation. For that same reason, I switched from using the harmonic mean the first few years to using the 40th percentile to calculate the CD (course difficulty). While the harmonic mean would give a lower weight to someone with a abnormally slow time compared to their ranking, using the 40th percentile would not give any weight to an abnormally slow time. Using the 40th percentile didn't seem to make the rankings any less stable. But still many ranking datasets would not converge at all because of the lack of connectivity or overlap of competitors. (As an aside, I always wondered if you could mathematically describe connectivity as a single number and relate it to whether a dataset would converge or not). To deal with these unconverging datasets, I instituted a rule where I would not include a race with less than 5 competitors (they would be given credit days, which I believe are no longer part of the current ranking system). This would solve most of the non-convergence issues but often, it did solve the issue for the White and Yellow courses because of the low numbers of competitors in those classes and the lack of traveling to multiple races around the country. I was under pressure though to make sure juniors in the M/F-12 and M/F-14 categories got rankings so sometimes I would just run 10 iterations, stop and call it good, I didn't know what else to do in those cases. Anyway, this idea that poorly-connected data sets lead to inaccurate rankings is definitely a real problem in my opinion and I've seen lots of published rankings in the last 20 years that look suspect. But I think this ranking system is the best that we can do with the data that we have.

May 13, 2023 2:31 AM #

I'll note that in the original ranking system, White and Yellow courses were just given a flat, arbitrary Course Difficulty, always the same. (Or something like that.)(IIRC)

May 18, 2023 5:10 AM #

ebone:

The IOF ranking system is very different from the OUSA system. The IOF system accounts for the spread in times, so that a race with more tightly clustered times (due to relatively easy terrain or course setting) can still produce a wide distribution of ranking points. The OUSA system, on the other hand, has no way of accounting for course difficulty and corresponding results spread. Thus, courses where times are more spread out tend to generate a very wide point spread between the top runners and the back of the pack finishers. This usually seems like a virtue, because it effectively gives more weight to performance in tougher races, but it can also be a vice when a very good performance in a sprint race is nonetheless not rewarded with commensurately high ranking points, simply because the fast terrain produced less spread in the finish time distribution. I'd be interested in whether others have observations regarding the stengths or predictive power of one or the other system. I guess this could be tested by feeding historical results data from IOF elite races into the OUSA system and seeing if the resulting rankings are more or less closely correlated with subsequent race results than the IOF rankings were.

May 18, 2023 12:02 PM #

I haven't looked at the IOF system in a long time,but my recollection is that it has other very deep shortcomings.

Jun 18, 2023 3:21 PM #

CompassCoyote:

(Warning: Interesting, but ultimately not very useful, topic on the subject of ranking system 'shortcomings.')

Does anyone have any insight as to how the IOF, ergo OUSA, age classes were determined (to what extent are the classes not mostly arbitrary)?

It seems highly unlikely that orienteering success naturally falls into the 21+ then multiple of five year after 35 structure, convenient and brain-friendly though it is.

I've anecdotally observed that the top finishers in the next older class than mine are often faster than those in my class, and I can show (in the crudest form) that, within the male subclasses M50+ through M60+ (all on Green courses), Age and Score are moderately positively correlated.

The obvious problem is that there is too much Score variability within too small a sample size (esp so within the Age subsets comprising the same individuals over time) to really make a scientific assessment, but the implication is that the groupings may be more arbitrary than we'd like to imagine they are. I.e., being ranked nth in Mxx+ doesn't really mean anything if there's nothing demonstrably unique about the Mxx+ class.

Given sufficient data and allowance for course length we might discover age and sex performance clusters that naturally exist in the data and be able to propose a new set of "Individual Championship Classes" based on those clusters. The discovery of natural clusters might reveal (total speculation) that there really ought to be a M27+ class, that M45+ through M55+ should be combined to a single class, or that F43-49 and M51-60 should be a single class, etc. The same data might theoretically generate a Sex and Age handicapping formula.

Another approach would be to (tenuously) assume that there's no relationship between age and the mental parts of orienteering, forsake actual orienteering data and instead look at the mass of data from the 1,000's of 5k foot race times, and use that data to derive self-evident performance clusters.

The result might mean fewer (or more) awards to give out at the next NRE. :-D

Jun 18, 2023 4:25 PM #

Cristina:

It would be interesting to see what such clusters exist, but the source data would have to have lots of instances of all people actually running their actual age class (or at least course). I guess WMOC would have that? At US NREs, plenty of stronger orienteers run 'up' to a younger age group, weakening any conclusions from the existing natural data. The best M35 runners may have lower scores than the best M40 runners, but is that because the fastest 35-39 year old men are still running in M21? At WMOC I suppose the problem is that you don't get the best runners in the younger age groups as they may find other competitions more interesting.

It is an interesting question, and I suspect five year age groups in orienteering (at least for the first few decades of "masters" competition) would not really be justified by the data.

Jun 18, 2023 4:37 PM #

This phenomenon IMO deserves a special name, to be studied by the sport science.
I would name it as "JJ Paradox in Orienteering", when someone becomes faster and faster with age...has to be due to a special secret diet.

Jun 18, 2023 5:32 PM #

gordhun:

The person to ask about the origin and evolution of orienteering age classes in America is Hans Bengtsson. After all, he has competed in all of them.
My memory might be playing tricks but I seem to remember originally age class divisions were 35-43-50 and perhaps older but instead of being called by age they were the 'veterans' 'old boys' and 'younger old boys' or something like that. That was following the Scandanavian example.
Then, also following the IOF, we switched to 5 year age groups (Canada 10 years). There were and continue to be two reasons for running other than our normal age class. One is to get a better course challenge. For instance, recently at a Canadian regional competition there were two entered in M 75. In Canada that class gets lumped with those over 80 and given the depth and quality of current M 75 orienteers in Canada is a ridiculous place to have them. They both elected to run with the M 65 category (and finished in the middle of the field).
The other reason to run out of category has been that many old guys have found it more reasonable to compete against younger guys than against Peter Gagarin and perhaps other superstars.

Jun 18, 2023 7:38 PM #

The reason to have age categories is so that you can get everybody out in the woods with interval starts in a reasonable amount of time. This assumes that the age categories all have their own distinct courses. But in the USA, we don't have enough people for that to be an issue, and we only have multiple courses to provide different lengths. For that reason, I essentially refuse to recognize the five year groupings, and as far as I'm concerned, all men over 35 for example who are running Red are really competing against each other. And I therefore sign up for M35 despite the fact that I'm old enough to run in a different Red category. (And I would be quite pleased if everyone else were to do likewise.)

As for getting faster with age, there's something to be said for continuing to get smarter.

Jun 18, 2023 9:14 PM #

Mr Wonderful:

Source cited

I always thought of a classic race at my speed as sort of a half marathon, so using that a surrogate, behold, we see times start to increase from the "fast person well" right around....age 35. You are welcome to quibble that the slope of decline in time is not steep enough to be 5 years, but some grouping seems reasonable. Eg, 35->40 is only a two minute slow down, but 35->45 is a six minute slow down and now you are outside of the target winning time window.

Jun 18, 2023 9:22 PM #

But for the M35, M40, M45 group in the USA, who all run the same course, at national-level events, it's not uncommon for the best time to be someone who is in fact over 50.

Jun 18, 2023 11:57 PM #

blairtrewin:

Presumably, though, a lot of the actual M35s/40s are running M21? Agree that WMOC would be a good data set to explore this, as almost everybody runs their age group and the standard is relatively consistent from year to year.

Jun 19, 2023 1:14 AM #

For sure some of the best M35s are running M21. But it's still the case that the M50s are often faster than the M45s.

Jun 19, 2023 1:54 AM #

At the age of 45 males did not quite mature yet, so cannot handle the brutality of a good orienteering course. At 50 --- it is finally time to prove yourself, "now or never"

In reality, it is only because ridiculously few people go orienteering in America.
At a major event elsewhere, if a top M50 moves to M45, one could still probably get to the A-final, but not much more.

Jun 19, 2023 4:33 AM #