...reminds me of the time it took to make a simple credit card payment (check out) at a parking meter last week. First contacting the parking system, then contacting credit card company and finally matching those two. Every single customer took close to two minutes and the waiting line grew... Paying by cash was faster!
Your points re. 25manna are valid.
I don´t know for sure, but I think it was the first time (or second ?) they were matching SI cards and numbers this way. Maybe my memory is short but I don´t recall that major crashes as bad as this have occured before. I know a few people in the IT group and know they have been doing this before so they should know the routine. Something new must have happened - whether it´s various new components of the system that hadn´t been tested together or not I can´t say.
Manual entry of SI &bib number (i.e. with pen and paper) may not have been faster but at least it would have worked...
I didn´t run myself so I´ve only heard all this from the "outside" from runners that were affected. That the results didn´t work at all was something everyone noticed though...
... they did manual entry of SI & bib number eventually - but obviously they didn't get it all down, and now they say they might contact clubs to get SI badge numbers for the runners. Don't think they'll ever get correct results this year.
It depends how much there is missing, but they may get some results. Will all clubs be able to tell SI numbers? We used club's SI cards and I have no clue what was my number, but we have numbers written down somewhere. Our card numbers are not in Eventor for sure. I can imagine some teams may just have had a bag of sticks and you were just supposed to pick one...
Maybe they should set up a web page where teams could type is their card numbers for each runner, especially those who were not checked in properly.
Anyway, just imagine check-in app being designed it the first place to store pairs locally in a file and to post data at the backround if network is up.
Any rumours what was the cause? I find it hard to believe it was software think, they must have tested it well enough to be sure it works if server and network works. more likely it was hardware failure, like broken network adapter/router or cable. If packets are lost protocols may try again and again until evetyrhing gets transferred or there is timeout. Failure like that may make thigs really slow and may make system behave about like it did. I have seen broken adapter issues twice, system worked mostly but was way too slow. For events like this you ahve to move hardware around and gear gets bumps and other abuse along the way, so it is just normal gear gets broken. And if designed right essential parst of the system should be operational even if there is some broken hardware. You know, there is _only_ two things system should take care of no matter what happens.: 1) store information who ran with which card. 2) store card's punch info. If those two are stored, you will be able to get results. And network, servers etc are really not needed to get those two done. Even if database server burns and vandals stole your routers you still could take usb stick and pick files form each check-in and dowload station, import/merge them and print out Prelilimaty results results pretty much in no time.
To be honest, to me it looked like the crew did decent job and made good decissions mostly. They tried to use check-in app untill it became impossible, they switcehd to manual pen paper method (maybe better than typing them with notepad?). They knew they needed to get those in digital form and they had volunteers avaialble so they soon did set up a tent with lots of laptops and people typing those pairs in (that's why they already have it done). I guess they did what they cuold to find the problem, but maybe they could not shut down servers and take network down to change routers/cables because that might have had effect on card download operations(?), so they may have thought they should keep that part running as long as it still works. They did all they could and it was in good hands for sure, it is just the system wasn't designed to survive failures, aka essential parts staying operational even if other parts are down ... I hope they get results done, current situation is sort of not their fault, it's sad they get blame here.
I still remember only too well FIN5 1995 and how all our computers (30) freezed, poor me in charge of fininsh line 4, the only one who had enough common sense to reboot my pc right away and getting it back and for it getting all lines 1-5 directed to my finish for next 10 minutes and gettin really busy, and for it not getting time to type it runners system missed while being reezed form back-up papers. Haven't yelled "re-boot your computers you fools" that loud ever since. Luckily we got away without major disaster, just 30 min delay of getting resuls at one point.
Yes, check-in with writing to local file first (and then insert into database; with possibility to work offline if problem with database connection) is definitely the only way to go - then you have your backup.
The only thing I heard about the cause is that writing to database started to get really slow. I guess HW is a good candidate, but could also be a DB which was not properly set up/initialized (hopefully they had done their testing...).
From 25-manna web site: "... we have succeeded in reconstructing the competition based on data from the control units that were in the forest. Without him we would not have gotten this far this quickly."
Why did they have to use download data from forest units? does this mean even the download stations did not strore data in local file/backup, and when system wasn't responsive even punch data wasn't stored anywhere, except maybe in SI readers memory (is tehre such? maybe that memory got full?). I'd guess that's the case.
I wonder what they were thinking whne they designed the system, maybe original focus was on smaller events and where readers's memory (if there is such) was big enough to work as a back up and events where cards were registered in advance along with the entry. in events like that it would work just fine. And this system was now just extended/ported to support 25-manna style relay without seeing or thinking the bigger picture much.