Postmortem of last night's downtime

So last night was an absolute mess, I had to spend about two hours fixing it. So last night I was starting the upgrade to Ubuntu 20.04, and I was making a backup file before I went on. Then sidekiq (the process handler) got thousands of jobs piled on it so everything was horribly slow and was breaking. During that time the database died as well, forcing me to restore from the backup I just made. By then the site was back up, but I still had to wait hours for the queue to clear before reenabling email. 0/10 would not do again. In other news, the upgrade eventually succeeded.

8 Likes

Thank you for your job!

3 Likes

interesting