Why a rebuild?

Let's start this series with why we decided to rebuild the platform from scratch. After all, that alone is generally seen in the software industry as a difficult if not hopeless endeavour, with essentially no chance of success.

Introduction

Netscape 6.0 is finally going into its first public beta. There never was a version 5.0. The last major release, version 4.0, was released almost three years ago. Three years is an awfully long time in the Internet world. During this time, Netscape sat by, helplessly, as their market share plummeted.

This now famous article by Joel Spolsky recounts the downfall of Netscape as they embarked on a quest to completely rewrite their browser. It was a monumental drawn-out failure that has since served as a cautionary tale to everyone in the software industry: you do not rewrite from scratch.

Instead, you should always favor incremental migration, moving features into a new system bit-by-bit and diverting live traffic onto it as you go.

The breach

MangaDex v5 development had been an ongoing project for at least a whole year before the end of v3. As it goes, fixing and improving the current live system always took priority over the development of its future replacement, and this meant that the already limited volunteer time of the staff was stretched thin, and resulted in dreadfully slow progress.

However, in March 2021 MangaDex was breached, allowing an unknown attacker to gain administrative access to the platform. In a sad twist of irony, this happened right after we had just gone through a large effort  to harden our security at the operating system level and improve the performance of the site as a whole. It certainly was a soul-crushing falldown right after a high point.

The thing is, security is very much an asymmetric war. The defender has to worry about a myriad of attack vectors while the attacker needs only a single vulnerability to get in. This can range from a temporary misconfiguration to outdated dependencies and/or third-party software. Keeping track of the multiple daily published vulnerabilities is difficult, and applying the required updates/changes in a timely fashion ranges from difficult to sometimes impossible without specific application-level changes.

Following the writings of security researchers like @GossiTheDog will quickly show one how widespread attacks are, and how quickly attackers move from a vulnerability being disclosed in a piece of software to automated scanning of the internet for vulnerable instances and exploitation of these thereafter.

Unfortunately, defenders only have some very limited ways to efficiently protect themselves. These ways generally boil down to a couple of principles:

  1. Security by design: Disable and lock down every unused feature, command, input and access. Instead of needing to remember to protect against them, you should actively have to allow what is needed for normal operation.
  2. Defense in depth: In the case of an intrusion, make it as slow and painful for an attacker to leverage their access, and ideally impossible to move from one compromised host to another (i.e. lateral movement)
  3. Observability: You should (ideally) be able to detect or at least trace in details the actions of an attacker once you are aware of an intrusion

And MangaDex v3 failed at all of these.

The platform had grown organically, and while boasting an impressive feature set, it was not engineered for either. The security relied on checks in various places, lateral movement and planting of persistent malicious code was possible, and observability was near impossible due to the horrific signal-to-noise ratio of the logs, when there were logs.

This left us with essentially no choice but a rewrite if we wanted to respect the trust our users put in us on one hand, and sleep at night without the fear of waking up to a nightmare on the other hand.

The aftermath

Truthful to Joel's words, just getting out a first beta of the API took many weeks, and opening up the website to user content again took multiple months. And here we are, almost 6 months later, with a large amount of features, some very much critical, still missing.

Had MangaDex been a company, this whole affair would most certainly have been a death sentence. However, in a "blessing and a curse" kind of way, we obviously aren't, as the entire staff is volunteer-based and the site is self-funded.

This means that progress is subject to staff dedicating evenings, weekends and time off their day job to the project, but it also means that aside from servers (which can be cut back on if unneeded) we do not have recurring costs.

All hope is not lost however. The state of v5 with regards to performance and security is not even comparable to v3. It has been engineered for security from the ground up, and this means we can carry on confidently, even if slower, towards restoring the platform to an ideal state.

MangaDex lives on and will keep living on.

Show Comments