Intermittent Service Issues
Incident Report for ReadMe
Postmortem
  • What happened?

On Friday afternoon, October 25th, ReadMe was targeted by two overlapping Denial-of-Service attacks, first intermittently and then persistently. We believe one to be malicious, the other accidental. We are in the final stages of implementing our pre-existing plans to move CDN Delivery and SSL Certs to CloudFlare which will assure an outage of this magnitude will not occur again.

  • Who was affected?

All ReadMe projects, including ours (docs.readme.com)

  • For how long?

The first attack lasted from 9am to 6pm PST. The second from 1pm to 7:30pm PST

  • How will you prevent this from happening in the future?

A few months ago we started the process of migrating to CloudFlare for CDN delivery and SSL Certification duties. This will enable us to handle these influxes of traffic and attacks and allow us to set rules blocking traffic that do not require a production deployment of our infrastructure. Once the final steps of this migration are complete we will not experience this level of service disruption again.

Attack #1 Details (Malicious)

  • A lot of requests coming in to an invalid log endpoint on ReadMe’s docs site.
  • This was causing load on ReadMe’s backend as that page has to do quite a few database reads.
  • We put our docs behind a CDN and added 2 firewall rules:

    • One to block anything that looks like that URL
    • Another to block the originating IP of those requests
  • These rules together blocked over 2m requests in a couple of hours.

Attack #2 Details (Accidental)

  • A very large customer was hotlinking to an asset on their documentation site.
  • This resulted in a huge influx of traffic, far greater than we typically deal with.
  • Our SSL servers got flooded with many connections, affecting the performance and reliability of other documentation sites.
  • We performed the following steps to mitigate against this:

    • Added 4 more production SSL instances to handle the traffic
    • Added some explicit rules at our nginx configuration to block those URLs from ever hitting our backend
    • Added some IP rate limiters within our application (which we’re still tuning to make sure it does not affect valid traffic)
  • We blocked the hub and communicated with the customer.

Posted Oct 29, 2019 - 11:54 PDT

Resolved
The issue has been fully resolved. A post-mortem will be published on Monday.
Posted Oct 25, 2019 - 20:29 PDT
Update
We are continuing to work on a fix for this issue.
Posted Oct 25, 2019 - 17:37 PDT
Identified
The source of the issue has been identified and we are working on multiple solutions to mitigate the malicious traffic. We've made good progress and are fully committed to resolving this as soon as possible.
Posted Oct 25, 2019 - 17:28 PDT
Investigating
Due to a DDOS attack that's being perpetrated on a selection of our servers, customers are facing intermittent connection issues regarding the ReadMe service. We are currently working on a fix and should have a solution shortly.
Posted Oct 25, 2019 - 13:41 PDT
This incident affected: ReadMe Hubs.