in: Orienteering; General

May 10, 2023 12:35 AM
#

Can someone help fill in the gaps in the wording on OUSA's ranking methodology? Each competitor starts with 50 points, then that gets multiplied by their time in minutes to get their first Personal Course Difficulty. Then what? How are these Personal Course Difficulties used to update the competitors points in the iterative process?

May 10, 2023 12:59 AM
#

Working from memory here...

The Personal Course Difficulties* are merged together using a harmonic mean to get the overall Course Difficulty. The course difficulty is then divided by each competitor's time to get a new number of ranking points (substituting for the initial 50). Lather-rinse-repeat until the numbers stop changing (there's some threshold of change it has to get below).

(Oh, that agrees with what's in the link above, so that's good.)

( * "Personal Course Difficulty" is a boringized version of the original term, which was Personal Gnarliness Value.)

(And it was not always thus. There were old methodologies that got improved upon, until we came to the current one. 35 years ago, it was all done with paper and a calculator.)

The Personal Course Difficulties* are merged together using a harmonic mean to get the overall Course Difficulty. The course difficulty is then divided by each competitor's time to get a new number of ranking points (substituting for the initial 50). Lather-rinse-repeat until the numbers stop changing (there's some threshold of change it has to get below).

(Oh, that agrees with what's in the link above, so that's good.)

( * "Personal Course Difficulty" is a boringized version of the original term, which was Personal Gnarliness Value.)

(And it was not always thus. There were old methodologies that got improved upon, until we came to the current one. 35 years ago, it was all done with paper and a calculator.)

May 10, 2023 1:05 AM
#

Thanks! But once the ranking points are updated, you also need to update the Personal Course Difficulty, right? Do you recall how that is done? Initially it's by taking the product of running time with ranking points, but if that's done after the first step then they all evaluate to the same number: the prior step's Course Difficulty.

May 10, 2023 1:20 AM
#

Oh, right, the critical step: what you say would be true if there were only one race. But instead you average each person's ranking points from all of their races during the time period in question (typically the past year), and *then* you crank through the calculation again. There are some details about exactly how many races count (maybe they all count during the iteration, but not for the final step after you've reached convergence?). And I don't recall what's currently done for things like DNF, that's something that has changed over the years.

May 10, 2023 4:48 PM
#

I spent about a year trying to figure this out. I think I finally did. But it would take another year to be able to explain it better than...it basically works as good as any other method.

May 10, 2023 9:02 PM
#

It sounds the only thing I’m missing is how the new iteration of ranking points updates course difficulty, because from then I think I understand how that updates ranking points. It’s somehow done using prior rankings?

May 10, 2023 9:57 PM
#

"Prior" in the sense that it's the points from the previous iteration. Everybody starts with 50, and after you do the calculation as described, the faster people have more points and the slower people have lower point values. Then you do it again, and they all move a bit more. Etc. At each step, you multiply each person's new ranking points by their time, and harmonic mean those PGVs together to get a new Course Difficulty.

In days of yore, the whole thing was seeded with the previous year's rankings, and there was no iteration, it was just one step. Then when we got enough computing power to iterate until it was stable, there were experiments to make sure that it didn't depend on the initial conditions. One of those was to give one person 100 points and everybody else one point. Another was to start everybody at 50 points. They all converged to the same answer.

In days of yore, the whole thing was seeded with the previous year's rankings, and there was no iteration, it was just one step. Then when we got enough computing power to iterate until it was stable, there were experiments to make sure that it didn't depend on the initial conditions. One of those was to give one person 100 points and everybody else one point. Another was to start everybody at 50 points. They all converged to the same answer.

May 10, 2023 10:29 PM
#

Ah ok I think I get it. So the step where you multiply the new ranking points is not just the ranking points from the last step, it's the average of the ranking points from last step with the however many ranking points accrued from races in the last year per runner, then that number gets multiplied by the runner's time to get a slightly updated PGV.

May 11, 2023 1:09 AM
#

but this just a preliminary step, before we tally in our privilege points, right?

May 11, 2023 1:25 AM
#

Danny: right, but... it's not just this new race that's getting recalculated, but rather __all__ of the races in the past year. So in principle, for the rolling rankings, every time a new race happens, or one ages out, the Course Difficulties for all of the races could change, because the whole calculation starts from scratch. And in fact they probably do, but not by much, assuming the data set is large and well-connected. (I could dream up pathological cases where a particular lightly-attended race would shift significantly, but that's not very likely to happen.)

yurets: the way things were calculated many years ago, your comment wouldn't have been entirely off-base...

yurets: the way things were calculated many years ago, your comment wouldn't have been entirely off-base...

May 11, 2023 1:50 AM
#

That's actually really interesting. I don't have any experience using harmonic means, but it seems remarkable that numerical tests show that the iterative method converges to the same number regardless of starting value! Is there any published research about these properties? Does this general schema have a name so that I could look more into it? Or is this a homebrewed method that only we use?

May 11, 2023 2:13 AM
#

>>Does this general schema have a name

Contraction Mapping/ Fixed Point Theorems,

any topology text

Contraction Mapping/ Fixed Point Theorems,

any topology text

May 11, 2023 3:19 AM
#

A historical note: prior to the use of harmonic mean, the method in use was to take the PGVs and sort them, then use the 40% percentile value (there was a rationale for doing this instead of using the median). That dated back to before iteration. Harmonic mean was a better approach, but the people who had proposed the improvements from the really early method (where only the top three results were considered) didn't think of it. The 40% percentile method wasn't always stable under iteration; sometimes it oscillated. I'd be somewhat surprised if any reasonable approach could converge to different values based on initial conditions, but instability is a different matter. A good process will neither oscillate nor diverge.

(Oh god, I'm having flashbacks to a class that I took in 1982 that involved drawing a lot of root-locus diagrams. I dropped that class.)

(Oh god, I'm having flashbacks to a class that I took in 1982 that involved drawing a lot of root-locus diagrams. I dropped that class.)

May 11, 2023 11:39 AM
#

Hi, we're the sport that:

-uses hieroglyphics (clue sheets)

-stares at barely decipherable brightly colored pieces of paper.

-runs full speed through thickets. Or tries to.

-determines our best by a simple iterative process of convergence of the harmonic mean of the personal gnarliness value.

Definitely on track to take over the sporting world...

But, it's the weekend of the Billygoat race, so all is well in the world...

-uses hieroglyphics (clue sheets)

-stares at barely decipherable brightly colored pieces of paper.

-runs full speed through thickets. Or tries to.

-determines our best by a simple iterative process of convergence of the harmonic mean of the personal gnarliness value.

Definitely on track to take over the sporting world...

But, it's the weekend of the Billygoat race, so all is well in the world...

May 11, 2023 3:06 PM
#

These are from my notes:

To get a score, they (basically) do the following: Give you a daily ranking score. Adjust it for Course Difficulty. To do this, they need to assign a "Personal" Course Difficulty score to everyone and eventually calculate an overall Course Difficulty from it.

How is this done….okay. Let me try.

1. The equation is one that is run over and over again until the results become static. It is iterative.

2. Essentially it compares you against the field, adjusts your average, which adjust the field average which adjusts your ranking points until it becomes static.

Use the value 50 for initial ranking points if no existing score exists for competitor. Run a similar equation to the one below through the computer. The key to the equation is to get the Course Difficulty Rating and divide by your time. How they get that is a bit of a process. Basically they try and compare how difficulty the course was for you individually by pseudo-comparing the entire field against their expected results. See below Disney example for the idea and math as I understand it.

EQUATION FOR PCD: Ranking Points x Time

Mickey Mouse - 50 x 19.15 min = 957.8 (new competitor) PCD Score Initial

Donald Duck - 50 x 26.10 min = 1305 (new competitor) PCD Score. Initial

Goofy - 79.25 x 28.30 min = 2242.8 (existing competitor) PCD Score. Initial

Cinderella- 32.20 x 42.80 min = 1378.2 (existing competitor) PCD Score. Initial

Aladdin- 50 x 114.75 min = 5737.5 (new competitor) PCD Score. Initial

Next Find Initial CD for everyone

Calculate First CD (Course Difficulty) Average - Use inverse harmonic mean to calculate (5/(1/A)+(1/B)+(1/C)+(1/D)+(1/E)). => 1584.2387

Next Find new Ranking Points

Initial CD for everyone / your Time

Mickey Mouse - 1584.2387 / 19.5 = 81.24 points (a quick Logic check makes sense…Mickey had the fastest time..his points should rise in the first iteration).

Next Recalculate everyones second round PCD using the new Ranking Points

Mickey Mouse = 81.24 x 19.5 minutes = 1555.756. So Mickey started with a 957 PCD, and after the first iteration, it rose to 1555. His initial ranking points of 50 rose to 81.24. Mickey is the man...errr, mouse!

Repeat for all competitors and then recalculate the harmonic mean. This changes all calculations by a bit. Then the equation is run again, and they change by a touch less. On and on until change no longer occurs. The final answer is the one with no change.

-Except this is not completely how it is done. There are plenty of things in the program that are done to deal with specific situations and strange things that are one-off’s.

Back to the simplification: run fast, score well. Compare times against others in your age category. Do at least 4 NRE’s to get points.

Hope this helps and is correctish:).

To get a score, they (basically) do the following: Give you a daily ranking score. Adjust it for Course Difficulty. To do this, they need to assign a "Personal" Course Difficulty score to everyone and eventually calculate an overall Course Difficulty from it.

How is this done….okay. Let me try.

1. The equation is one that is run over and over again until the results become static. It is iterative.

2. Essentially it compares you against the field, adjusts your average, which adjust the field average which adjusts your ranking points until it becomes static.

Use the value 50 for initial ranking points if no existing score exists for competitor. Run a similar equation to the one below through the computer. The key to the equation is to get the Course Difficulty Rating and divide by your time. How they get that is a bit of a process. Basically they try and compare how difficulty the course was for you individually by pseudo-comparing the entire field against their expected results. See below Disney example for the idea and math as I understand it.

EQUATION FOR PCD: Ranking Points x Time

Mickey Mouse - 50 x 19.15 min = 957.8 (new competitor) PCD Score Initial

Donald Duck - 50 x 26.10 min = 1305 (new competitor) PCD Score. Initial

Goofy - 79.25 x 28.30 min = 2242.8 (existing competitor) PCD Score. Initial

Cinderella- 32.20 x 42.80 min = 1378.2 (existing competitor) PCD Score. Initial

Aladdin- 50 x 114.75 min = 5737.5 (new competitor) PCD Score. Initial

Next Find Initial CD for everyone

Calculate First CD (Course Difficulty) Average - Use inverse harmonic mean to calculate (5/(1/A)+(1/B)+(1/C)+(1/D)+(1/E)). => 1584.2387

Next Find new Ranking Points

Initial CD for everyone / your Time

Mickey Mouse - 1584.2387 / 19.5 = 81.24 points (a quick Logic check makes sense…Mickey had the fastest time..his points should rise in the first iteration).

Next Recalculate everyones second round PCD using the new Ranking Points

Mickey Mouse = 81.24 x 19.5 minutes = 1555.756. So Mickey started with a 957 PCD, and after the first iteration, it rose to 1555. His initial ranking points of 50 rose to 81.24. Mickey is the man...errr, mouse!

Repeat for all competitors and then recalculate the harmonic mean. This changes all calculations by a bit. Then the equation is run again, and they change by a touch less. On and on until change no longer occurs. The final answer is the one with no change.

-Except this is not completely how it is done. There are plenty of things in the program that are done to deal with specific situations and strange things that are one-off’s.

Back to the simplification: run fast, score well. Compare times against others in your age category. Do at least 4 NRE’s to get points.

Hope this helps and is correctish:).

May 11, 2023 4:39 PM
#

I guess I should clarify as a competitor I'm not worried at all about its efficacy. I'm simply asking as a math student who has worked with evaluating ranking systems before because this process is just downright interesting to me. Understanding the exact methodology here helps me have a more complete picture of the types of algorithms used in generating rankings, a topic which I am casually researching.

soonerjcb: your process sounds like it aligns with what J-J was saying except for when you iterate. If you were to now compute rating points * course time per competitor, you would recover a PCD of exactly what the total CD was last step, and so nothing changes. The process of taking the new preliminary ranking points from this race and averaging it with ranking points from past races is what allows the system to continue to change incrementally.

J-J: I certainly agree that good processes shouldn't diverge or oscillate, but it's not exactly clear to me what exactly is making*this* process converge, hence my interest! Thanks for breaking it down for me, I'll be able to make and test statements about the algorithm's properties now.

soonerjcb: your process sounds like it aligns with what J-J was saying except for when you iterate. If you were to now compute rating points * course time per competitor, you would recover a PCD of exactly what the total CD was last step, and so nothing changes. The process of taking the new preliminary ranking points from this race and averaging it with ranking points from past races is what allows the system to continue to change incrementally.

J-J: I certainly agree that good processes shouldn't diverge or oscillate, but it's not exactly clear to me what exactly is making

May 11, 2023 6:16 PM
#

feet:

@cmpllj: we could make the rankings simpler if only we agreed to standardize the courses, for example, Start-5000m marked route-Finish. Seems a small price to pay.

Face it, orienteering in the U.S. is already for nerds with aerobic ability (aeronerds? nerdrobics?) so we may as well lean in and have a ranking system that's worth proving theorems about.

Danny: I confess I'm kind of hoping for a counterexample to convergence. The proof, if it is robust, will probably indeed be a matter of a small tweak to a familiar fixed-point theorem. Much more interesting if there's a weird case on the edge of the parameter space that breaks it. My intuition is that something weird could happen when the competitors' rankings are updated given that some races are dropped - problems seem like they could arise when each competitor has different worst races. (For example, take a case where there are two different groups of runners, A and B, and only one race where there are people from multiple groups, and in that race there is only one person from A and one from B.* Trivial observation is that the relative rankings of runners from A vs. runners from B is crucially dependent on the result of that race. Now imagine that that race gets dropped from the rankings calculation for at least one of those runners. Seems like weird things could happen.)

*Or duplicate everybody so there are four runners in this race, to meet the minimum runner count required for races to be included.

Face it, orienteering in the U.S. is already for nerds with aerobic ability (aeronerds? nerdrobics?) so we may as well lean in and have a ranking system that's worth proving theorems about.

Danny: I confess I'm kind of hoping for a counterexample to convergence. The proof, if it is robust, will probably indeed be a matter of a small tweak to a familiar fixed-point theorem. Much more interesting if there's a weird case on the edge of the parameter space that breaks it. My intuition is that something weird could happen when the competitors' rankings are updated given that some races are dropped - problems seem like they could arise when each competitor has different worst races. (For example, take a case where there are two different groups of runners, A and B, and only one race where there are people from multiple groups, and in that race there is only one person from A and one from B.* Trivial observation is that the relative rankings of runners from A vs. runners from B is crucially dependent on the result of that race. Now imagine that that race gets dropped from the rankings calculation for at least one of those runners. Seems like weird things could happen.)

*Or duplicate everybody so there are four runners in this race, to meet the minimum runner count required for races to be included.

May 11, 2023 7:13 PM
#

Yeah, the biggest threat to numerical stability is a poorly-connected dataset. East coast and west coast, and only a few people make the trip. Or an oddball race that almost nobody attends except locals. If you had two disjoint populations, the best they could do would be to converge on two independent solutions that don't really have any relation to each other, even though people might assume that they do. If you have one strong runner who ventures to the other side just once and has a bad run (or vice versa), that can drive the two populations apart.

May 11, 2023 7:19 PM
#

May 11, 2023 7:30 PM
#

feet:

Actually, here's a better kind of problematic example. Imagine there are three groups of runners: A, B, and C, who entirely race among themselves. Choose three individual runners, one from each group, called a, b, and c.

Then add three races

Race ab, in which a beats b.

Race bc, in which b beats c.

Race ca, in which c beats a.

And adjust the margins in these races so that that they either count in or get dropped from the ranking scores for a, b, or c. This will crucially affect the assessment of which group is best.

Or variants of this kind of idea.

(I still think that the 'drop the worst ranking scores of some runners' is the most likely source of a problem in numerical stability, since which races those are depends on the race difficulty scores, so this could generate some oscillation.

My feeling is that in realistic data sets, dropping some races is a good thing for generating sensible rankings, because it means that someone who makes a 30 minute mistake once but is otherwise excellent does not get assigned a low ranking score. Doing that would make the runners they beat when running well get penalized when they should not be. So, to be clear, I think the real-world system SHOULD have this feature. But it seems like a potential source of oscillation in poorly-connected data sets.

Then add three races

Race ab, in which a beats b.

Race bc, in which b beats c.

Race ca, in which c beats a.

And adjust the margins in these races so that that they either count in or get dropped from the ranking scores for a, b, or c. This will crucially affect the assessment of which group is best.

Or variants of this kind of idea.

(I still think that the 'drop the worst ranking scores of some runners' is the most likely source of a problem in numerical stability, since which races those are depends on the race difficulty scores, so this could generate some oscillation.

My feeling is that in realistic data sets, dropping some races is a good thing for generating sensible rankings, because it means that someone who makes a 30 minute mistake once but is otherwise excellent does not get assigned a low ranking score. Doing that would make the runners they beat when running well get penalized when they should not be. So, to be clear, I think the real-world system SHOULD have this feature. But it seems like a potential source of oscillation in poorly-connected data sets.

May 11, 2023 7:33 PM
#

I suspect that negative times will break the convergence. That won't bother orienteers but might trouble passing mathematicians. Even convergence isn't enough, you need to have only one fixed point.

In the UK we have an algorithm that excludes "silly scores" (like when you turn up an hour late for your start time and they wont adjust it). That did produce instabilities where the score alternates between silly and not-silly on each iteration.

In the UK we have an algorithm that excludes "silly scores" (like when you turn up an hour late for your start time and they wont adjust it). That did produce instabilities where the score alternates between silly and not-silly on each iteration.

May 11, 2023 7:56 PM
#

Wow, we discovered non-transitivity of "beats" relation... time to prove a version of Arrow Theorem for orienteering rankings

May 11, 2023 10:51 PM
#

I'm not sure, but I think there may be no dropping of races during the iteration process, only at the final step where you extract the published score for each runner. But what to do with DNFs is a good question. You can argue that a runner deserves a low score for not completing a course. But it doesn't make sense to be averaging in some artificial large time when assessing the difficulty of a course. The "would have medaled but didn't punch at the water stop" situation which has occurred several times recently is the prime example of that. Those shouldn't figure into the calculation (I don't know if they are).

May 12, 2023 11:02 PM
#

I was the OUSA Rankings coordinator from 1993 to 2002 and spent many hours watching datasets converge. The rankings were calculated in Excel using an Excel macro. I know some things have changed since I stopped doing the rankings but I still have pretty good recollection of what I was doing with the calculations at the time. First of all, I can tell you is that everything discussed here is mostly true. Races were not dropped until the iteration process was over. However, non-finishes (DNF, MSP, DSQ) were assigned 20min + course time limit (so usually 200 minutes) but those scores were NOT included in the iterations as they didn't offer a useful point for the calculation. For that same reason, I switched from using the harmonic mean the first few years to using the 40th percentile to calculate the CD (course difficulty). While the harmonic mean would give a lower weight to someone with a abnormally slow time compared to their ranking, using the 40th percentile would not give any weight to an abnormally slow time. Using the 40th percentile didn't seem to make the rankings any less stable. But still many ranking datasets would not converge at all because of the lack of connectivity or overlap of competitors. (As an aside, I always wondered if you could mathematically describe connectivity as a single number and relate it to whether a dataset would converge or not). To deal with these unconverging datasets, I instituted a rule where I would not include a race with less than 5 competitors (they would be given credit days, which I believe are no longer part of the current ranking system). This would solve most of the non-convergence issues but often, it did solve the issue for the White and Yellow courses because of the low numbers of competitors in those classes and the lack of traveling to multiple races around the country. I was under pressure though to make sure juniors in the M/F-12 and M/F-14 categories got rankings so sometimes I would just run 10 iterations, stop and call it good, I didn't know what else to do in those cases. Anyway, this idea that poorly-connected data sets lead to inaccurate rankings is definitely a real problem in my opinion and I've seen lots of published rankings in the last 20 years that look suspect. But I think this ranking system is the best that we can do with the data that we have.

May 13, 2023 2:31 AM
#

I'll note that in the original ranking system, White and Yellow courses were just given a flat, arbitrary Course Difficulty, always the same. (Or something like that.)(IIRC)

May 18, 2023 5:10 AM
#

The IOF ranking system is very different from the OUSA system. The IOF system accounts for the spread in times, so that a race with more tightly clustered times (due to relatively easy terrain or course setting) can still produce a wide distribution of ranking points. The OUSA system, on the other hand, has no way of accounting for course difficulty and corresponding results spread. Thus, courses where times are more spread out tend to generate a very wide point spread between the top runners and the back of the pack finishers. This usually seems like a virtue, because it effectively gives more weight to performance in tougher races, but it can also be a vice when a very good performance in a sprint race is nonetheless not rewarded with commensurately high ranking points, simply because the fast terrain produced less spread in the finish time distribution. I'd be interested in whether others have observations regarding the stengths or predictive power of one or the other system. I guess this could be tested by feeding historical results data from IOF elite races into the OUSA system and seeing if the resulting rankings are more or less closely correlated with subsequent race results than the IOF rankings were.

Please login to add a message.