Hashing passwords on the server-side, a dead end?
[French version available here]
Are we willingly making our servers vulnerable to denial-of-service? After a reminder about hash algorithms and the reasons for their use, let’s discuss their cost on our infrastructures.
I recently realized that these algorithms were much more complex to implement than expected. Beware, don’t jump to conclusions: hashing passwords is still the most effective way to (not) store them today. But it seems that we should prepare for the next step.
We all have developed applications that identify their users through a login form that asks for a password. It is up to us, devs, to store these passwords in a database when creating or updating user accounts. It’s also a classic example: password databases leak on the Internet — either inadvertently or maliciously.
The threat that these leaks represent is twofold:
- a malicious party can perform actions on your site by pretending to be someone else, thus harming both legitimate users and your business;
- too often people reuse the same password on several sites, so that a leak from site A.com can jeopardize site B.com.
Even for the most minor applications, you don’t want to be responsible for the consequences of these leaks.
The mitigation is known: you should never store clear-text passwords in a database. Since decryption keys also leak, you should not store encrypted passwords in a database either.
Instead, passwords should be mangled by a cryptographic hash function, and only their fingerprints should be stored.
In PHP, the hash() function can be used to compute fingerprints for several hash algorithms:
echo hash('md5', '123456'); // e10adc3949ba59abbe56e057f20f883e
echo hash('sha256', '123456');
The core property of a cryptographic hash function is that for two different passwords, it must produce a different result. This feature is so critical that security experts consider that if only one collision is found, the algorithm must be deprecated. This is already the case for MD5 and SHA-1. SHA-256 is still resistant to the theorists.
Conversely, given a particular output, it must be impossible to recover the corresponding password. Thanks to this property, we can store the fingerprints in our databases: in case of a leak, it remains impossible to retrieve the original passwords.
Obviously, “impossible” is only a question of time and means: if I ask you to retrieve the password of the e10adc3949ba59abbe56e057f20f883e hash, the answer is written just above, it is 123456. There are pre-computed fingerprint databases, called rainbow tables, which can instantly retrieve any password with less than ~10 characters.
There exists also dedicated chips capable of computing fingerprints at extraordinary speeds, so that a few hours, days or weeks of calculation can be used to test billions of combinations. Recovering a password is then only a question of financial means.
In the face of these threats, best practices have improved.
To guard against rainbow table attacks, hashing must be combined with a salt. This is a random component that allows the password to be virtually lengthened and ensures that two people with the same password have different fingerprints. In PHP:
$salt = random_bytes(8);
$h = bin2hex($salt).hash('sha256', $salt.$password);
With Symfony, this configuration will give an equivalent result:
To counter brute-force attacks allowed by specialized processors, it has become common and recommended to apply the hash algorithm several times:
$h = bin2hex($salt).hash('sha256', hash('sha256', ..., hash('sha256', $salt.$password)...));
The PBKDF2 standard standardizes this practice, with a recommended number of iterations of 1000, intended to be increased as CPUs become more powerful. In Symfony, with the default values:
More recently, and still in fashion, the bcrypt algorithm has reinforced this practice by integrating the number of iterations into the heart of the hash logic. Since PHP 5.5, the password_hash() function allows to use bcrypt natively:
echo password_hash('123456', PASSWORD_BCRYPT, ['cost' => 10]);
The returned hash will be different each time because it is automatically salted. The larger the cost parameter, the more expensive the calculation of the hash will be. Compared to PBKDF2, bcrypt has a linear advantage, which means that it is a better choice, but the difference is not decisive.
As the use of bcrypt becomes more widespread, specialized processors appear, and the cost of brute-force password cracking decreases :
All the algorithms mentioned so far have one thing in common: their cost is mostly measured in CPU time. Designing a more powerful computing unit is “enough” to increase the speed of fingerprint calculations. However, there is another limited resource in our computers: RAM and its corollary, memory bandwidth.
Dedicated FPGA or ASIC circuits are indeed excellent in raw computing, but when it comes to bandwidth and memory access, their performance becomes “classic” again, at a comparable financial investment.
This is why a new class of algorithms has emerged, the winner of which is called Argon2. Where previous algorithms only take the number of iterations as a parameter, Argon2 also allows to control the memory required to compute a fingerprint. It is available in 3 variants : Argon2d is designed to withstand vector computing units such as GPUs, Argon2i is optimized against side-channel attacks, and Argon2id is a combination of both.
echo password_hash('123456', PASSWORD_ARGON2ID);
The m=65536 in this fingerprint means that it took 64Mio to calculate it, and it is not possible to consume less memory to verify it.
By hashing passwords with Argon2, we thus regain the advantage over all the computing powers that will be financially accessible in the near future, even in the face of governmental resources, since this is what these algorithms were designed for.
I invite you to read the OWASP recommendations to dig a little deeper into the subject.
Implementation in Symfony
I guess there aren’t many of us today who encode passwords with Argon2. I also imagine that one day soon, Argon2 will be replaced by a new, more resistant algorithm. But how can we adopt a newer algorithm when our database contains fingerprints calculated with previous versions?
The only way is to have the password in clear text and apply the new hash function to it. We will therefore have to wait for a user to log on to our site, verify their password using the old algorithm, and take the opportunity to update the password’s fingerprint.
This is the process that is allowed since Symfony 4.4: by implementing PasswordUpgraderInterface in your UserProvider, you can update the fingerprint when a user logs in.
To be able to verify both old and new fingerprints, you need to use a MigratingPasswordEncoder, which is done either by explicitly specifying the old and new algorithm:
either by leaving the choice of the best algorithm to Symfony, depending on the installed PHP run-time:
If your passwords need to be interoperable with other applications, you will need to use the first strategy. But if you want to always use the latest algorithm, choose “auto”.
When using Argon2, you may decide to set its cost parameters yourself: MigratingPasswordEncoder will then recalculate the fingerprints with the new CPU and memory costs. My recommendation would be to let Symfony set these parameters for you, unless you are able to compute them optimally during deployment.
Because that’s the challenge with Argon2: you have to choose cost parameters adapted to your infrastructure, but high enough to generate fingerprints resistant to the power of dedicated processors. These settings cannot be made statically, they must evolve over time: either by successive updates of Symfony, or by another automation of your own.
If you choose too weak parameters, your users’ passwords will be exposed. But if you choose too high parameters, your infrastructure becomes vulnerable to denial of service attacks.
Note that Argon2 has a third parameter p which controls the level of parallelism allowed during the calculation: p=1 means that it is not possible to calculate the fingerprint with several threads; p=N means that it is possible to calculate it with maximum N threads. Therefore, the strictest value for this parameter is 1. Coincidentally, PHP is not multi-threaded, and libsodium — a quality implementation of Argon2 used in PHP and elsewhere — does not allow any value other than 1.
Resisting denial of service attacks
By default, Argon2 is configured to require 64Mio to verify a password. In PHP, this means that if you have set your FPM server for 50 concurrent processes, you need a little more than 3Gio, on top of the memory needed to run the OS and these 50 PHP scripts.
Since each check must also consume CPU resources, the parameters are calibrated to take between 0.5 and 1 second to calculate.
As you can see, it’s quite easy to make such an application unavailable: you only need to launch 50 connections at the same time and it won’t be able to serve the rest of the traffic.
Choosing another algorithm won’t solve the problem — they are all designed to consume resources. On the contrary, it would expose your users’ passwords.
Solutions are not trivial to implement:
First of all, it is your responsibility to make sure you’ve provisioned enough memory. If your servers do not have enough, at best a SodiumException will be thrown, at worst the script will be blocked waiting for available memory.
A first proposal to work around the issue is to limit the number of connection attempts by IP ranges. This can protect against simple DoS attacks but won’t help in face of a distributed one.
Another effective strategy is to restrict hash computations to a limited number of PHP processes. Naturally, this means queuing the requests in excess. This allows serving the other routes of the application even if password checks arrive in large numbers. This strategy also de facto limits the memory problem.
In practice, this can be achieved by setting up a dedicated FPM pool, with Apache or Nginx configured to route POST requests on the login page to this pool. Setting up your PHP application in this way is not common, yet it is the only way to pass scalability tests without sagging.
Note that this problem is not specific to PHP. A Node.js application would have the same issue — it could even be worse since computing hashes is a blocking operation if one is not careful, whereas the whole architecture of Node.js relies on asynchronous.
Personally, I find this state of the art unsatisfactory. We can always delegate identity management to a third-party service, such as Google Connect, Facebook Connect, Amazon Cognito, Auth0, or a self-hosted equivalent such as Keycloak… But wouldn’t it be possible to imagine connection protocols that delegate the resource-consuming part to the client?
This is the problem addressed by the PAKE protocols, and more particularly the Augmented-PAKE or aPAKE ones. If I had to choose one, SPAKE2+EE seems the most promising. However, I’m not familiar with any of them and I haven’t found any implementations that we could use in the short term, have you?
Last but not least, we could consider dropping the need for a password altogether with WebAuthn. There’s even a Symfony bundle to do so!
That’s where the state-of-the-art ends on my side, and where innovation begins. It’s up to us all now, please keep me posted with your findings!
📝 Read this story later in Journal.
👩💻 Wake up every Sunday morning to the week’s most noteworthy stories in Tech waiting in your inbox. Read the Noteworthy in Tech newsletter.