| SSN: | 099 |
| SFI: | 127 |
| A: | 20 |
| K: | 4 |
The past few weeks have been a frustrating time for many users of the ARRL’s Logbook of The World (LoTW) QSO confirmation system. The problems began with a 3-1/2 day outage during the first week of November — a particularly busy time of the year for log uploads. When the system was brought back on line, a large queue of logs awaiting processing developed and processing times lengthened well past normal expectations.
Another, much more subtle problem cropped up at about the same time: some logs began disappearing from the queue, apparently at random. Users had been advised at the time the log was uploaded that it was “queued for processing,” and so they were understandably upset when the log was not processed, even after the several-day delay that most logs were experiencing.Because of its random nature, it took the ARRL IT staff a while to figure out what was happening. When LoTW was designed more than a decade ago — long before the present IT staff was here — an assumption was made as to how many logs could possibly be in the queue at a given time. The assumption was based on users uploading their most recent QSOs perhaps once a week or once a month. The environment in which LoTW now operates is quite different from that assumption, in that many users now upload logs with small numbers of QSOs in them, almost in real time. This creates a much larger number of separate logs.
When a log is uploaded, it is identified by a file name that is assigned by the user. Because there is no way to avoid duplication of file names that are assigned in this fashion, the LoTW system renames each file. Because of the unusual processing delay — combined with the dramatic increase in the number of submitted logs — the system began to run out of unique identifiers for the log files. This resulted in a file sometimes being renamed with an identifier that had already been assigned to a log that was still in the queue, causing the earlier log to be overwritten.
Once the problem was identified, designing a fix was relatively easy. It should be in place by 2359 UTC November 28. Because the number of overwritten logs is relatively small, we have decided to keep the system available for use, even though this may result in a few more logs being lost until the fix is in place.
We apologize for the inconvenience that users have experienced, and especially for being unable to explain what was happening until now. We want to emphasize that no data from processed logs has been lost. That data is secure and backed up. If you have had a log disappear after it was “queued for processing,” the solution is to upload the log file again, preferably after the bug fix is in place. We will announce when that occurs. [source]
Originally we were told “No data has [sic] been lost, and everyone’s records are intact. ” Now, we find out that indeed data has been lost due to an extraordinarily poor practice of reusing “unique” identifiers. How can a competent developer even consider reusing a “unique” identifier?
A further question leaps to my mind, and certainly it will leap to the minds of all developers, what was the limit to the identifier? Did someone really use an unsigned short, or perhaps even a signed short to provide a unique identifier in a queue? (A limit of 65535 or 32768 respectively) I doubt very much that we will have an answer to this question, especially considering that we aren’t even given an honest statement about the possible lost data.
I upload to Logbook after every operating session. I have, more than once in the past year noticed that one of my files was apparently “lost in the ether.” I attributed this to a PEBKAC error, and reuploaded the file after a day or two of not seeing my QSL counter increase. Earlier this year, I saw that a particular dxpedition had apparently uploaded their log into lotw, yet even after a few days only one of my qso’s with them were in logbook, yet I had two listed with clublog (and confirmed with the online qsl request). I generated a new .tq8 file and it was processed in seconds and I get my second 7O6T qsl.
At this point, I took a very lazy way out– I reuploaded the whole log. I would hope that the ARRL would make available a download of all your QSO records in ADIF, as this would be trivial to compare to your log. Anyway, I found that after my log was reprocessed, that to my shock, I had a few additional QSO and QSL’s. Even an all time new one.
The number of QSO’s were so small, and over such a long period of time, I again attributed them to user error. Perhaps during a few late nights I exported one less row from my log than I meant to, or perhaps I marked a record as uploaded that I did not include in my export. I wonder how many people will take the effort to go back and compare their logs and to what is reflected in Logbook. I also wonder how many “surprises” there will be.
I would urge people not to do what I did and upload their entire log, certainly we’ve learned that logbook isn’t particularly robust or quick at processing logs. A much better course of action is to begin by comparing the number of QSO’s in your log to the number displayed by logbook. The best course of action would be for the ARRL to release a tool where an end user could download an ADIF of all QSO’s in their account, then compare with an ADIF of their log and provide an output ADIF of all QSO’s in the log, but not in LOTW. The following utility seems like a good start:
http://www.rickmurphy.net/lotwquery.htm
Then simply use the ADIF comparison tool you prefer.
I can understand, and perhaps even forgive this bug, and I hope all logbook users can as well. However, I am shocked by the decision to keep logbook online with the knowledge that even “occasionally” data were missing. And yet, this decision could not have come at a worse time– think of how busy logbook must be with all the contests and end of the year uploads. Or perhaps of the amazing efforts of the PT0S team to upload to logbook while on the rocks. Are all those QSO’s accounted for?
Sure people would have raised quite the tempest in the teacup that is online message boards had logbook been down for an additional week or two as this critical error was resolved, but what fury will there be once hams realize that they might not be able to trust uploads made when logbook was just a bit busy. Or the justified rage when logbook was kept online EVEN though logs were being lost AND the bug was identified. We all deserve an explanation of why logbook was kept online when not only was there a critical bug that causes data to be lost, but the precise conditions (high load, long queues) that will cause logbook to overwrite files, and thus lose data.
The WPX Award does not work either.
With 6 confirmed QSO’s with 9M2 from 1998 to 2011, i have nothing marked in my award, as an example between many others.