Register | Login
Attackpoint - performance and training tools for orienteering athletes

Discussion: Rankings

in: Orienteering; General

Mar 29, 2016 12:56 AM # 
rm:
Since ranking system calculations have become a minor "thing" today on AttackPoint, I was wondering whether anyone had considered the idea of using "match ups", in which, for each race, for each competitor, one calculates a match-up number relative to each other competitor on the same course, by a formula using the ratio of the competitors' times multiplied by the other competitor's most recent ranking points. Then, to derive a competitor's ranking for the year, one takes the median of all the competitor's match-up points for the year. The advantage over other common ranking methods would be the number of samples (larger sample size). If one competes ten times a year, and each competition has eleven competitors on one's course, one has a hundred match up numbers to take the median of at the end of the year. In many conventional ranking systems, one calculates similar formulas, but then reduces them to a single number for each race (reducing ten numbers to one in my example), and then reduces the race numbers to a single number for the year (again reducing ten to one). The double reduction of samples has one dealing with a much smaller number of samples in each reduction, which is generally not as good for statistics as a larger sample size. With a match up approach, one could even use the results of a course in which only two competitors competed (as long as one competitor were ranked), because it's the total number of match ups in the year that matters, not in an individual race. Also, one could rank a competitor who raced just a few times in the year at very large races.

One could use weightings to emphasize major competitions (which competitors are presumably taking more seriously) and more recent events, or so forth. Presumably the rankings are rolling. Or use standard deviations to compensate for technical versus fast races. Etc. But my main ponder is about the overall approach.

Anyone with statistics knowledge able to shoot this down, or otherwise comment? Or thoughts from rankings folk, or anyone else?
Advertisement  
Mar 29, 2016 1:04 AM # 
jjcote:
I'll suggest that the biggest obstacle to this sort of thing is data entry. However, the database should be available from OUSA (talke to Valerie). Then spend an hour or two coding some evening and see if what you get looks good.
Mar 29, 2016 1:05 AM # 
rm:
The motivation, to clarify, is that often in orienteering (outside Europe) one is dealing with smallish numbers of competitors per race, and smallish numbers of races per year, with which to calculate rankings. Sometimes competitors get no ranking points at all because only a few others showed up, or because, although they got to a couple of major events, they didn't meet the minimum number of events. This approach wouldn't do much to fix sets of results that didn't overlap much (such as parts of a continent separated by large distances that only handfuls of competitors cross), but would (I think?) help with the overall number-of-samples issue.
Mar 29, 2016 1:08 AM # 
rm:
OK, a project for some snowed-in day.

Yes, data entry can be a problem for rankings. It's great that RMOC manages to get results uploaded to an online database, and that AttackPoint seems to get lots of results uploaded. I remember, in some past position, trying to get all clubs to submit results.
Mar 29, 2016 1:20 AM # 
tRicky:
Our biggest problem is that many of the races I do on the east coast (of Australia) - where there are much larger numbers competing - don't make it into AP. Our west coast ones have a higher probability because we have more than one person (that uses AP) putting results into Eventor but then we often don't have enough people on the course (i.e. it's not unusual to have just four or five people running on the course I run).
Mar 29, 2016 1:27 AM # 
rm:
My hope is that this would improve things for those four or five participant courses, so long as one participated in enough such events, or in several such events and one large one.

One inhibition I've noticed to posting official post-race results is a desire to check them over before posting. The larger the event, the greater chance of oddities one wants to check before posting, and thus the longer delay, and the greater hesitancy to commit oneself by publishing. I dunno if this is still a factor, but it seemed to be long ago.
Mar 29, 2016 1:46 AM # 
jjcote:
ernst would be a good person to talk to about this.
Mar 29, 2016 2:26 AM # 
cmpbllv:
For local events, QOC results are AP-ready by the time we finish tearing down, and sometimes posted to AP from the venue itself if we have wifi. The results crew just has to be diligent about checking results as they come in, but it usually slows down enough at the end that we can clean up irregularities. And if there's a change, it's pretty easy to re-post.
Mar 29, 2016 4:57 AM # 
origamiguy:
One thing we (usually) do before posting is reclassify some of the MPs as DNFs. The epunch software generally uses the international definitions of these, while the US rules are different. I don't know if that makes a difference for rankings, though.
Mar 29, 2016 5:27 AM # 
MChub:
First of all, "double reduction" does not necessarily lead to more noise. Suppose you have 100 data values and want to calculate their average. Whether you simply take the average of these 100 numbers, or divide them into 10 groups of 10 numbers in each and calculate the average in each group first and then the average over the groups, the end answer is exactly the same and thus the standard deviation (or the error) of the answer is also the same.

Now, the situation is a bit different if the groups are unequal. If there are two groups, with 99 numbers and 1 number, i.e., you average the first 99 numbers and then take the average of the result and the remaining number, then, obviously, the end result will be much noisier than if you simply averaged all 100 numbers in one step. So your scheme produces less noise if the sizes of the competitions are very unequal (very different numbers of competitors). However, I think that just giving less weight to smaller competitions in existing ranking schemes would produce essentially the same result.

By the way, in existing ranking schemes only a certain number of best races (four in the case of IOF rankings) are taken into account. How would you achieve a similar effect in your scheme?
Mar 29, 2016 5:36 AM # 
MChub:
By the way, I don't understand why everybody is so negative about IOF rankings. Are there any orienteers who participate in WREs regularly (so there is enough data to rank them), yet their rankings are obviously too high or too low? Some complain about WOC and World Cup races having more weight giving a disadvantage to those who do not participate in these events, but does it really have a significant effect, given that nearly all top athletes participate in them anyway? Are there any other issues with IOF rankings besides this?
Mar 29, 2016 5:45 AM # 
tRicky:
I thought this was about AP rankings, not IOF rankings. I did do two WREs over the weekend and didn't do any good at either of them.
Mar 29, 2016 8:05 AM # 
MChub:
I saw a negative comment about IOF rankings in the Otter Creek course design thread (and recall many other similar comments) and thought this thread would be a better place to have that discussion.
Mar 29, 2016 12:55 PM # 
rm:
MChub: For my proposed rankings approach, one would need to set a minimum number of match-ups (say, forty over the year) in order to get a ranking. Since it might give less info if those all came from a single race (and thus a single run of the competitor), it would probably also make sense to require a minimum number of races, but perhaps a low number.

What you say about averages is true, although medians change things a bit, and some rankings systems use medians or percentiles. Yes, weighting races by number of competitors on the course could have much of the effect in practice.

(As an aside, one could also weight match-ups slightly by the number of results that the other competitor's rankings are based on, in order to give more weight to more "certain" rankings, or by the standard deviation in the other competitor's match-up points, to give more weight to match-ups with more consistent other competitors.)

Yes, this thread is a fine place for rankings discussions; it works well to shift it here from the course design thread.
Mar 29, 2016 12:58 PM # 
acjospe:
Ok, I didn't read any of this thread except the first post, mostly to avoid getting angry about anything. But, attackpoint definitely needs Nemeses and Victims, a la crossresults.com. No reason to go beyond the attackpoint community, really.

It's certainly a nerdy enough audience to support something like that.
Mar 30, 2016 12:51 AM # 
TheInvisibleLog:
Expressing results as ratios between pairs of competitors will create an incomplete matrix. With enough results (ie a determinate matrix) you can use matrix algebra to scale the competitors. I did this years ago with a small pair comparison matrix of consumer assessment of peach and nectarine flavours. It was the only way to compare early and late season varieties. Probably should work with orienteering data, if one could manage to automate restructuring of the data.
Apr 4, 2016 2:13 AM # 
Shep:
I like it JimBaker.

What I don't like is our current (Australian) ranking system that seems to be designed for large fields with reliable mean and standard deviation. I also don't like how what happens in the bottom of the field affects the points at the top. Eg in a difficult race the times (and standard deviation) spread a lot more, so the time gaps between the top runners are reduced in terms of the number of points. You could argue this is right - a 1 minute gap in a sprint should be worth more than a 1 minute gap in a long. but should that 1 minute gap in the long be worth less because a couple of beginners decided to race and pulled the standard deviation right out?

So what I do like about JimBakers idea is that it appears to be less reliant on a possibly very noisy statistical measure. Whether it fixes my complaints above and working out if the median is the best metric to pull from a runners results probably needs a snow day...

@TheInvisibleLog : is an incomplete matrix actually an issue? I'd say the current system is essentially representing an incomplete matrix - it will pretty confidently say athlete A is better than athlete B even if those two have never raced each other.
Apr 4, 2016 3:34 AM # 
MChub:
The IOF scheme addresses this issue but including only "ranked" athletes in the calculations. To quote the IOF document: "A ranked athlete for a particular race is defined as an athlete who finishes within the winner's time plus 50% and who has scored World Ranking points in the 18 months before the event with an average points score greater than or equal to 600, disregarding zero scores."

As far as I can tell, JimBaker's idea in its basic form does not address the issue of race difficulty at all. Match-ups are considered without taking into account the difficulty of the race in which the match-up occurred. As a result, someone who only takes part in sprint races may end up with a ranking that is too high. Of course, JimBaker's scheme can be modified to take race difficulty into account, but then it should be possible to make a similar modification in a traditional scheme, with a similar result.
Apr 4, 2016 5:12 AM # 
TheInvisibleLog:
@Shep. Incomplete matrix just needs to be resolveable. Then the matrix algebra is simple enough. The paired comparison approach has been used in psychometrics since the 1950s. Sports ranking methods seem stuck in older analytic techniques, leading to the issues you bemoan. I suspect the reason they continue to be used is that spreadsheets can do the arithmetic and they are relatively easy to understand.
The methods are explained in this classic from 1958. Chapter 10.
https://books.google.com.au/books?id=FWN-AAAAMAAJ&...
The initial publishing of the technique-
http://www.ets.org/research/policy_research_report...
Abstract
A precise and rapid procedure has been devised for dealing with a matrix of incomplete data in paired comparisons. This method should increase the general applicability of paired comparisons since experiments involving large numbers of stimuli may now be shortened to feasible experimental proportions. Also we may now use sets of stimuli which cover a wide range resulting in a considerable number of 100% vs. 0% judgments, and still give a precise solution depending equally on each of the observations.
Just needs someone with the skill to write the data extraction and reformulation. Beyond me.
Apr 4, 2016 10:50 AM # 
rm:
Thanks for the reference link.
Apr 4, 2016 1:11 PM # 
Shep:
Yeah thanks invisible. I'll have a read...

@MChub yep the Aussie system does that too, that is only includes "ranked" athletes. I was exaggerating a little saying "beginners" but the problem is still there even with so called ranked runners.

I don't see why JimBakers scheme needs to specifically take race difficulty into account. Its athlete vs athlete and not athlete vs a statistical measure (which is dependent upon difficulty and length) - which I think is fair.

So if we have two equally ranked athletes X and Y, and athlete X beats athlete Y when it's hard, then Y beats X by the same (proportional) amount when it's easy - they'll still be equally ranked. and I don't have an issue with that.

What I do have an issue with is that the current system won't end up with them being equal and in fact will generally make Y (who is better when it's easy) be ranked higher (as Y's winning margin will be more standard deviations in the more tightly bunched field - which is what we get when its easier and/or shorter, i actually did a study of that on Australian national series results a few years ago).

Of course you can argue that a winning margin being more standard deviations is more impressive and deserves more points. But my issue is that this comparison of X vs Y is so dependent on the rest of the field (and even the size of the field - is standard deviation really valid with 10 runners?)...

This discussion thread is closed.