Operating 666a
In the final quarter of 2023 I launched my little side project 666a, which is a service providing work environment email notifications for union reps and people like that. It took a few months of work to build something shippable, and I've been refining it ever since to increase its reliability. I thought it'd be nice to share a glimpse of what that refinement work looks like. This is gonna be a technical one, so apologies to anyone who preferred the union propaganda.
How 666a works
The beating heart of 666a is a daily cron job that checks for new public filings on the Work Environment Authority's website and sends email notifications whenever it finds one matching a user's subscription criteria.
Filings are published with one day of delay, which means Monday's filings become available on Tuesday and so on. So each day, 666a scans the filings from the previous day.
Backdated filings and "document lag"
The complicating factor here is that not all Monday's filings are immediately available on Tuesday. In fact, they might not even be filed at all until next Friday. This is a phenomenon I've been referring to as "document lag". The Work Environment Authority isn't an API: it's an organisation powered by people, interactions, emails, and even good old fashioned physical mail. This makes for a much more socially constructed model of time than I'm used to in my everyday tech work where machine-generated timestamps with millisecond accuracy are the norm.
I didn't spot this nuance until after launch. I take the responsibility of operating this service really really seriously, and I'm not comfortable with the idea of anyone missing out on important information because of a detail like that. So I've been slowly tweaking the design of the system to meet this requirement.
Manually triggered backfill
My first move once I understood the system was missing some of the data was to start manually triggering the scan job from time to time. I was very deliberate about designing the actual jobs around this kind of versatility, so a bit of manual intervention was easy and didn't require any code changes. This is more or less what I'd run in the rails console from time to time.
WorkEnvironment::DayJob.perform_now("2023-11-04")
Doing this every day was a labour-intensive approach, but it was a nice way to feel out the scope of the problem. Took about a week for this to give me a nice clear sense of how far back stuff was typically getting backdated.
Scheduled backfill
Based on that informal feel for the typical lag between document filing dates and their actual first appearance, I began scheduling a bunch of backfill jobs like these.
scheduler.cron("0 14 * * *") do WorkEnvironment::DayJob.perform_later( 2.days.ago.strftime("%Y-%m-%d") ) end
These have been running nicely since early December. For most days, the average document lag is only about a day or two. So except for the occasional rogue document filed 30 days after its official filing date, this was mission accomplished.
Christmas
The arrival of Christmas unlocked a new level of this puzzle. Suddenly, my daily document count graph flatlined. Everybody had buggered off to watch Donald Duck, and even the steady everyday drumbeat of the asbestos removal company filings fell silent.
This would be no big deal, except by this point I'd also backfilled the data well into 2022. And the December 2022 graph's final week looked nothing like this.
I immediately suspected that this was another case of document lag. In other words everyone's off for Christmas, and when they get back they'll backdate all the filings that were missed during the holidays.
Quantifying document lag
The only way to find out if my document lag theory was the right explanation was to ignore the problem, enjoy Christmas, and wait and see. It probably didn't matter anyway: I don't think many world-changing public filings are dropping during the Christmas season.
I did want some way to visualise the issue when the backdated filings began to trickle in though. So I created a new chart displaying the document lag for each day since launch. The bar height here represents the average number of days difference between that date and the true chronological first appearance of the documents officially filed on that date.
The Christmas spike represents the fact that the documents officially filed on e.g. December 27th were in fact not filed until January 3rd when people began returning to work.
Next Steps
I think the scale of this phenomenon will probably more or less double for Midsummer, so that's what I want to be prepared for next. Once the dust settles from the Christmas period, I'm thinking I'll figure out the maximum document lag measured in that week, double it, and then wire up the scheduler to backfill that far back every single day. I want Midsummer to run smoothly even if I'm too busy to do any manual interventions in the moment.
By the way, if you're a tech worker in Sweden and the events of the past year or so have got you thinking some new thoughts about labour relations in our industry, you might wanna check out 666a more closely. You can grab your employer's organisationsnummer from Bolagsverket's company search tool and then subscribe to get notified (for free) about their correspondence with the Work Environment Authority. I'm also hosting web versions of the Swedish government's unofficial English translations of the Work Environment Act, the Co-Determination Act and the Employment Protection Act, which can be useful resources to know about depending on how 2024 unfolds at your job. Check it out!