UUID
The only ID you need?
ARKADIUSZ KONDAS
Lead Software Architect
@ Proget Sp. z o.o.
Zend Certified Engineer
Code Craftsman
Ultra Runner
@ ArkadiuszKondas
Zend Certified Architect
php-ai/php-ml
Genesis
Database
node (entity)
edge (relation)
Primary key
Main role: to provide a stable, indexable reference to an entity.
-
semantic / natural
extracted from the entities attributes (one or more fields)
-
technical / surrogate
completely unrelated to the fields of its entity
constructed when the entity is inserted in the DB.
Auto-increment / serial
> CREATE TABLE customer (
id SERIAL PRIMARY KEY,
name VARCHAR(255) NOT NULL
);
> INSERT INTO customer (name) VALUES ('John');
> SELECT * FROM customer;
id | name
----+------
1 | John
(1 row)
> SELECT currval('customer_id_seq'::regclass);
currval
---------
1
(1 row)
Why can be a problem?
-
Information disclosure
In url: /api/user/123
all it takes to know your user count is to create an account and look at link
-
Entity enumerations
easy to scrape all your entities (just start from 1 and ++)
-
Non uniqueness across tables
ON DELETE CASCADE
Why can be a problem?
Database round trip
$customer = new Customer('John', 'Snow');
$orm->persist($customer);
$orm->flush();
$customer->id();
Workarounds
- You can configure your sequence to start at an arbitrary point
- Have an increment bigger than one
- Share the sequence between tables
Workarounds
Obfuscation
hidden secret key is required (salt)
zackkitzmiller/tiny-php
$tiny = new \ZackKitzmiller\Tiny('5SX0...M3so9bfzZpuUenKtP74QNH6B');
echo $tiny->to(5);
// E
1 | Z4UrtW |
3456 | gPUasb |
Workarounds
Obfuscation
the salt can be guessed from about 30 hashes !!!
UUID FTW
U U ID
universally
unique
identifier
RFC 4122
Specific variant with 5 version
U U
universally
unique
Possible values: \(2^{122}\)
Collision: number of random version 4 UUIDs which need to be generated in order to have a 50% probability of at least one collision is 2.71 quintillion - generating 1 billion UUIDs per second for about 85 years
45 EB
Unsigned 128-bit integer
133187334300416338637111672389176371521
32 hexadecimal (base 16) digits
6432f291-9a6b-42ba-8e73-9e7cb327a941
Anatomy
6432f291-9a6b-42ba-8e73-9e7cb327a941
time_low
time_mid
time_hi
clock_seq_hi_and_res
node
version
variant
ramsey/uuid
Version 1: Time based
use Ramsey\Uuid\Uuid;
for ($i=0; $i<10; $i++) {
echo Uuid::uuid1()->toString() . PHP_EOL;
}
06d0af36-bfc2-11e7-a118-02424c396740
06d0f6e4-bfc2-11e7-a58f-02424c396740
06d0f7a2-bfc2-11e7-9265-02424c396740
06d0f824-bfc2-11e7-a503-02424c396740
06d0f932-bfc2-11e7-937b-02424c396740
06d0f9aa-bfc2-11e7-b47a-02424c396740
06d0fa22-bfc2-11e7-9630-02424c396740
06d0fa90-bfc2-11e7-a5d2-02424c396740
06d0fafe-bfc2-11e7-967e-02424c396740
06d0fb6c-bfc2-11e7-9e54-02424c396740
Version 3 & 5:
Name based
$namespace = Uuid::uuid3(Uuid::NAMESPACE_DNS, 'phpce.eu');
echo Uuid::uuid3($namespace, '/about');
echo Uuid::uuid3($namespace, '/contact');
da099c12-57c5-38bb-a5ca-7a9f0a6f6578
a0f6e622-41b5-3608-b858-8dc231da5c01
Hashing algorithm:
- MD5 -> Version 3
- SHA1 -> Version 5
Version 4: Random
use Ramsey\Uuid\Uuid;
for ($i=0; $i<10; $i++) {
echo Uuid::uuid4()->toString() . PHP_EOL;
}
c9cb5802-fbc2-4e63-9812-160d6b14af32
7a7e6dea-7c0d-4173-a3c6-5a9f0386e9ad
9af361b5-f0a6-4caa-af7d-ab01a8b8c0fe
95b2a611-b1f7-4496-a911-532e408b8fec
adf887f4-d07c-4149-80a9-a6adf0406490
4bd03549-de3b-49dd-a784-a32388bb777d
843ef345-1632-4960-a0d3-0967b981de99
3fa23ff3-6b80-4a65-bcc2-5a8021cf3fe0
a47b0437-8e7e-43a1-8783-c747080920b4
bdc2334e-f1cb-4c4d-a94d-4f8fe30f1d56
Version 2:
DCE Security
RFC 4122 does not explicitly define it
not implemented
ramsey/uuid-console
➜ composer global require ramsey/uuid-console
➜ uuid generate --help
Usage:
generate [options] [--] [<version>] [<namespace>] [<name>]
Arguments:
version The UUID version to generate. Supported are version "1", "3", "4" and "5". [default: 1]
namespace For version 3 or 5 UUIDs, the namespace to create a UUID for. May be either a UUID in string representation or an identifier for internally pre-defined namespace UUIDs (currently known are "ns:DNS", "ns:URL", "ns:OID", and "ns:X500").
name For version 3 or 5 UUIDs, the name to create a UUID for. The name is a string of arbitrary length.
Options:
-c, --count=COUNT Generate count UUIDs instead of just a single one. [default: 1]
--comb For version 4 UUIDs, uses the COMB strategy to generate the random data.
-g, --guid
ramsey/uuid-console
➜ uuid decode 2bdaf386-c012-11e7-8d9b-02424c396740
========= ========== ========================================
encode: STR: 2bdaf386-c012-11e7-8d9b-02424c396740
INT: 58293663835462406327300766704462292800
decode: variant: RFC 4122
version: 1 (time and node based)
content: time: 2017-11-02T21:09:54+00:00
clock: 3483 (usually random)
node: 02:42:4c:39:67:40
========= ========== ========================================
➜ uuid decode ffab8f50-4510-4c7f-af8f-e0a09f43a868
========= ========== =================================================
encode: STR: ffab8f50-4510-4c7f-af8f-e0a09f43a868
INT: 339843928432163496190993563305967331432
decode: variant: RFC 4122
version: 4 (random data based)
content: ff:ab:8f:50:45:10:4c:7f:af:8f:e0:a0:9f:43:a8:68
(no semantics: random data only)
========= ========== =================================================
Real life example
public function createAction()
{
// ...
$id = (string) UUID::generate();
$this->getContainer()->get('command_bus')->handle(
new CreateCustomer(
$id,
'Proget Sp. z o.o.',
'ul. Warszawska 28 43-430 Bielsko-Biała',
'Jan Kowalski',
'info@proget.pl',
'+48 33 822 14 85'
)
);
// ...
}
Uuid:uuid4() collisions
Uuid:uuid4() collisions #80
private static function generateBytes($length)
{
if (self::hasOpensslRandomPseudoBytes()) {
return openssl_random_pseudo_bytes($length);
}
$bytes = '';
for ($i = 1; $i <= $length; $i++) {
$bytes = chr(mt_rand(0, 255)) . $bytes;
}
return $bytes;
}
Uuid:uuid4() collisions #80
OpenSSL’s default random number generator mixes in the PID, which provides a certain degree of fork safety. However, once the PIDs wrap, new children will start to produce the same random sequence as previous children which had the same PID.
Solution
openssl_random_pseudo_bytes()
random_bytes()
>= PHP 7
< 7
paragonie/random_compat
PHP 5.x polyfill for random_bytes() and random_int() from PHP 7
Level: advanced
COMB
- Combined UUID version 4 with timestamp
-
Replaces the least significant bytes of the node field with the current timestamp
-
Tries to compensate for the reduced clustering in database indexes
COMB
$factory = new UuidFactory();
$generator = new CombGenerator(
$factory->getRandomGenerator(),
$factory->getNumberConverter()
);
$factory->setRandomGenerator($generator);
$combUuid = $factory->uuid4();
Override Time Generator
$factory = new UuidFactory();
$factory->setTimeGenerator(
new PeclUuidTimeGenerator()
);
$uuid = $factory->uuid1();
use pecl-uuid as time generator
Override Random Generator
$randomLibFactory = new \RandomLib\Factory();
$randomLibGenerator = $randomLibFactory
->getHighStrengthGenerator();
$uuidFactory = new UuidFactory();
$uuidFactory->setRandomGenerator(
new RandomLibAdapter($randomLibGenerator)
);
$uuid = $uuidFactory->uuid4();
use ircmaxell/random-lib library as random generator
More customization
- Generators:
- RandomGeneratorInterface
- TimeGeneratorInterface
- Codecs:
- CodecInterface
- Converters:
- NumberConverterInterface
- TimeConverterInterface
- Providers:
- NodeProviderInterface
- TimeProviderInterface
UUID
in
Database
UUIDs in PostgreSQL
> select '74546161-06ea-42be-a27d-f070461557e1'::uuid;
uuid
--------------------------------------
74546161-06ea-42be-a27d-f070461557e1
(1 row)
CREATE EXTENSION "uuid-ossp";
-- or CREATE EXTENSION "pgcrypto";
CREATE TABLE customer (
id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
name VARCHAR(255) NOT NULL
);
UUIDs in MySQL
BINARY(16)
Uuid::uuid4()->getBytes();
$factory = new UuidFactory();
$codec = new OrderedTimeCodec(
$factory->getUuidBuilder()
);
$factory->setCodec($codec);
Uuid::setFactory($factory);
Generate an optimized version 1 (time-based) UUID object
Performance
Horizontal axis – Number of inserts x 25,000
Vertical axis – Time Taken in seconds
Time
Performance
Horizontal Axis – Number of inserts x 25,000
Vertical Axis – Data Size in MB
Size
Performance
-
MySQL
https://www.percona.com/blog/2014/12/19/store-uuid-optimized-way/
-
SQL Server 200
http://www.informit.com/articles/printerfriendly/25862
-
PostgreSQL
http://www.cybertec.at/int4-vs-int8-vs-uuid-vs-numeric-performance-on-bigger-joins/
+ 10%
ramsey/uuid-doctrine
When UUID is not what you need
When UUID is not what you need
- no mutable state, no technical primary key
- strong storage constraints (2 to 4 times more space)
- you don't expose the primary key
- big database with big numer of relations
Best of Both = Integers Internal, UUIDs External
Alternatives
ulid
https://github.com/alizain/ulid
cuid
https://github.com/ericelliott/cuid
Tailor made
CREATE OR REPLACE FUNCTION insta5.next_id(OUT result bigint) AS $$
DECLARE
our_epoch bigint := 1314220021721;
seq_id bigint;
now_millis bigint;
shard_id int := 5;
BEGIN
SELECT nextval('insta5.table_id_seq') %% 1024 INTO seq_id;
SELECT FLOOR(EXTRACT(EPOCH FROM clock_timestamp()) * 1000) INTO now_millis;
result := (now_millis - our_epoch) << 23;
result := result | (shard_id <<10);
result := result | (seq_id);
END;
$$ LANGUAGE PLPGSQL;
https://engineering.instagram.com/sharding-ids-at-instagram-1cf5a71e5a5c
Summary
Summary
- Unique across every table, every database, every server
- Allows easy merging of records from different databases
- You can generate IDs anywhere, instead of having to round trip to the database
- Most replication scenarios require GUID columns anyway
- It is a whopping 4 times larger than the traditional 4-byte index value
- Cumbersome to debug "where userid='BAE7DF4-DDF-3RG-5TY3E3RF456AS10"
- The generated UUIDs should be partially sequential for best performance
Other languages
Java, Scala (native)
java.util.UUID.randomUUID
Node.js (Install uuid from npm)
var uuid = require("uuid");
uuid.v4();
Haskell (Install uuid from hackage)
uuid :: IO UUID
uuid = nextRandom
Q&A
Thanks for listening
@ ArkadiuszKondas
https://slides.com/arkadiuszkondas
https://joind.in/talk/959bf
UUID - The only ID you need?
By Arkadiusz Kondas
UUID - The only ID you need?
What is UUID and what are its successor versions? How to start using UUID in PHP? Can my database handle UUID? What is UUID suitable for, and what is not? Is UUID the only ID I need? For these and other questions I will try to answer in a presentation devoted exclusively to uniquely unique identifiers.
- 2,305