Scheduling and End-User Transactional Pages

Postmortem - Scheduling Service Outage During Maintenance Window

Postmortem

Impacted Service

Studio scheduling pages were met with ‘unavailable’ messages during the incident. During this time customers of studios would have been unable to sign up for classes.

What Happened

At 08:00 UTC, maintenance on our caching cluster began as was scheduled. Amazon Web Services (AWS) promised this would take but one or two minutes at most, but their maintenance ended up taking much longer than anticipated, elongating what should have been a brief blip in service. Our crew was standing by at the scheduled maintenance time to be able to step in if needed.

Resolution

As we did not want to wait for AWS to finish their maintenance we created another temporary cluster that would bring service back up.

We apologize for any inconvenience this may have caused; we endeavor to keep our platform both robust and always-available. Going forward, we’ll implement more proactive safeguards to fend away the unexpected.

Resolved
Assessed

This issue was opened retrospectively.