RFC for a Secure Unserialization Mechanism in PHP
[edit] This discussion has indirectly resulted in a new serialization mechanism added to PHP 7.4. This post is now obsolete.
PHP serialization/unserialization has several drawbacks ^1.
On the serialization side, the Serializable
interface:
- breaks references inside serialized data structures;
- delegates the responsibility of the serialization format to its implementations, to the detriment of optimized formats that e.g.
igbinary
provides.
On the unserialization side:
- security exploits have been demonstrated when using
unserialize()
on user-submitted data; - serialized string referencing missing classes create placeholder objects of type
PHP_Incomplete_Class
, which behave in an unusual manner and most importantly break the semantics of the original structure.
The root of these security issues is that creating objects out of serialized strings can led to code execution, namely of the callable defined by the unserialize_callback
ini setting and/or of the __wakeup()
, unserialize()
and/or __destruct()
methods. The first three are part of the typical unserialization lifecycle: a security issue caused by them would be the responsibility of their authors. But __destruct()
is much more nasty: authors usually don't think of it as an attack vector and thus fail to implement needed safety measures (which could e.g. consist of throwing an exception in a __wakeup()
method).
To mitigate these security issues, the unserialize()
function handles an allowed_classes
option since PHP 7.0. Thanks to it, Serializable
allows filtering the allowed classes in the subgraph managed by objects that implement it. This feature is only a mitigation because not all use cases know all the possible classes beforehand.
Proposal
- handle a new
__serialize(): array
method, replacing__sleep()
andSerializable::serialize()
when implemented; - serialize the returned array using a new
S:
type (e.g for an object of classFoo
whose__serialize()
method returns[123]
:S:3:"Foo":a:1:{i:0;i:123;}
); - forbid using
C:
orO:
for classes implementing__serialize()
; - handle a new
__unserialize(array $data, array $nested_objects): void
method, replacing__wakeup()
andSerializable::serialize()
when implemented; - have
$data
set to the unserialized value; - for validation purposes, have
$nested_objects
contain the list of all objects in$data
, excluding those already inspected by nested implementations of__unserialize()
; - have the
unserialize()
function handle a newvalidation_callback
option that would accept a$nested_objects
argument with same semantics as above; - have the PHP engine disable any destructors found in the unserialized value whenever the
unserialize()
function throws anyThrowable
or terminates the script execution (alternatively, if disabling destructors is not technically possible, the engine should empty all properties of unserialized objects.)
Expected benefits
- fixing compatibility with soft and hard references;
- moving the responsiblity of the serialization format to the outside of the userland serialization steps;
- same or higher validation capabilities of the unserialized objects/classes;
- ability to reject
PHP_Incomplete_Class
instances independently from theunserialize_callback
ini setting; - higher security by not calling destructors on any early termination of
unserialize()
.
Extra considerations
The global unserialize_callback
ini setting and the related PHP_Incomplete_Class
objects could be left unchanged. But we could also take this RFC as an opportunity to make enabling the validation_callback
option also disable them and always throw a specific type of Throwable
instead.
As described before ^2, having __serialize()
and __unserialize()
be magic methods has a distinct backward compatibility advantage. For this reason, this RFC doesn't mention any new interface that implementations should use.
Instead, the PHP engine should have a rule that checks that both methods are defined at the same time (implementing only one of them would make no sense) and that they have the expected signature.
Originally published at gist.github.com.
Retweets welcomed at https://twitter.com/nicolasgrekas/status/1033043739671977985
See discussion at https://www.reddit.com/r/PHP/comments/9a7cyb/rfc_for_a_secure_unserialization_mechanism_in_php/