On the first week of April, we experienced our first Simple OKR outage. We had several hours of downtime during that week, which prevented our customers from completing their day-to-day activities related to OKRs.
What is worse, is that we had no idea that our systems were having performance issues. We experienced downtime during hours when everyone in the US was asleep. As a company that provides service to the people across the globe and in different time zones, such lack of visibility into system performance issues is unacceptable. We noticed these issues on April 4th and fixed them immediately; however, we were terrible at communicating about this to everyone who uses Simple OKR. This is an awful customer experience, and I am sorry you had to witness it. We fucked up.
We are going to dedicate time and resources to invest more into monitoring, infrastructure and alerting to ensure that our systems are always healthy and available to our users. We are also going to setup processes and communication channels for informing users when we are experiencing issues.
Thank you for staying with us.