UUID

The only ID you need?

ARKADIUSZ KONDAS

Lead Software Architect
@ Proget Sp. z o.o.

Zend Certified Engineer

Code Craftsman

Ultra Runner

@ ArkadiuszKondas

Zend Certified Architect

php-ai/php-ml

Genesis

Database

node (entity)

edge (relation)

Primary key

Main role: to provide a stable, indexable reference to an entity.

  • semantic / natural
    extracted from the entities attributes (one or more fields)
  • technical / surrogate
    completely unrelated to the fields of its entity
    constructed when the entity is inserted in the DB.

Auto-increment / serial

> CREATE TABLE customer (
  id SERIAL PRIMARY KEY,
  name VARCHAR(255) NOT NULL
);
> INSERT INTO customer (name) VALUES ('John');
> SELECT * FROM customer;
 id | name 
----+------
  1 | John
(1 row)
> SELECT currval('customer_id_seq'::regclass);
 currval 
---------
       1
(1 row)

Why can be a problem?

  • Information disclosure

    In url: /api/user/123
    all it takes to know your user count is to create an account and look at link
     
  • Entity enumerations
    easy to scrape all your entities (just start from 1 and ++)
     
  • Non uniqueness across tables
     

ON DELETE CASCADE

Why can be a problem?

Database round trip

$customer = new Customer('John', 'Snow');

$orm->persist($customer);
$orm->flush();

$customer->id();

Workarounds

  • You can configure your sequence to start at an arbitrary point
     
  • Have an increment bigger than one
     
  • Share the sequence between tables

Workarounds

Obfuscation

hidden secret key is required (salt)

zackkitzmiller/tiny-php

$tiny = new \ZackKitzmiller\Tiny('5SX0...M3so9bfzZpuUenKtP74QNH6B');

echo $tiny->to(5);
// E
1 Z4UrtW
3456 gPUasb

Workarounds

Obfuscation

the salt can be guessed from about 30 hashes !!!

UUID FTW

U     U    ID

universally

unique

identifier

RFC 4122

Specific variant with 5 version

U     U    

universally

unique

Possible values: \(2^{122}\)

Collision:  number of random version 4 UUIDs which need to be generated in order to have a 50% probability of at least one collision is 2.71 quintillion - generating 1 billion UUIDs per second for about 85 years

45 EB

Unsigned 128-bit integer

133187334300416338637111672389176371521

32 hexadecimal (base 16) digits

6432f291-9a6b-42ba-8e73-9e7cb327a941

Anatomy

6432f291-9a6b-42ba-8e73-9e7cb327a941

time_low

time_mid

time_hi

clock_seq_hi_and_res

node

version

variant

ramsey/uuid

Version 1: Time based

use Ramsey\Uuid\Uuid;

for ($i=0; $i<10; $i++) {
    echo Uuid::uuid1()->toString() . PHP_EOL;
}
06d0af36-bfc2-11e7-a118-02424c396740
06d0f6e4-bfc2-11e7-a58f-02424c396740
06d0f7a2-bfc2-11e7-9265-02424c396740
06d0f824-bfc2-11e7-a503-02424c396740
06d0f932-bfc2-11e7-937b-02424c396740
06d0f9aa-bfc2-11e7-b47a-02424c396740
06d0fa22-bfc2-11e7-9630-02424c396740
06d0fa90-bfc2-11e7-a5d2-02424c396740
06d0fafe-bfc2-11e7-967e-02424c396740
06d0fb6c-bfc2-11e7-9e54-02424c396740

Version 3 & 5:
Name based

$namespace = Uuid::uuid3(Uuid::NAMESPACE_DNS, 'phpce.eu');
echo Uuid::uuid3($namespace, '/about');
echo Uuid::uuid3($namespace, '/contact');
da099c12-57c5-38bb-a5ca-7a9f0a6f6578
a0f6e622-41b5-3608-b858-8dc231da5c01

Hashing algorithm:

  • MD5 -> Version 3
  • SHA1 -> Version 5

Version 4: Random

use Ramsey\Uuid\Uuid;

for ($i=0; $i<10; $i++) {
    echo Uuid::uuid4()->toString() . PHP_EOL;
}
c9cb5802-fbc2-4e63-9812-160d6b14af32
7a7e6dea-7c0d-4173-a3c6-5a9f0386e9ad
9af361b5-f0a6-4caa-af7d-ab01a8b8c0fe
95b2a611-b1f7-4496-a911-532e408b8fec
adf887f4-d07c-4149-80a9-a6adf0406490
4bd03549-de3b-49dd-a784-a32388bb777d
843ef345-1632-4960-a0d3-0967b981de99
3fa23ff3-6b80-4a65-bcc2-5a8021cf3fe0
a47b0437-8e7e-43a1-8783-c747080920b4
bdc2334e-f1cb-4c4d-a94d-4f8fe30f1d56

Version 2:
DCE Security

RFC 4122 does not explicitly define it

not implemented

ramsey/uuid-console

➜ composer global require ramsey/uuid-console

➜ uuid generate --help
Usage:
  generate [options] [--] [<version>] [<namespace>] [<name>]

Arguments:
  version               The UUID version to generate. Supported are version "1", "3", "4" and "5". [default: 1]
  namespace             For version 3 or 5 UUIDs, the namespace to create a UUID for. May be either a UUID in string representation or an identifier for internally pre-defined namespace UUIDs (currently known are "ns:DNS", "ns:URL", "ns:OID", and "ns:X500").
  name                  For version 3 or 5 UUIDs, the name to create a UUID for. The name is a string of arbitrary length.

Options:
  -c, --count=COUNT     Generate count UUIDs instead of just a single one. [default: 1]
      --comb            For version 4 UUIDs, uses the COMB strategy to generate the random data.
  -g, --guid    

ramsey/uuid-console

➜ uuid decode 2bdaf386-c012-11e7-8d9b-02424c396740
 ========= ========== ======================================== 
  encode:   STR:       2bdaf386-c012-11e7-8d9b-02424c396740    
            INT:       58293663835462406327300766704462292800  
  decode:   variant:   RFC 4122                                
            version:   1 (time and node based)                 
            content:   time:  2017-11-02T21:09:54+00:00        
                       clock: 3483 (usually random)            
                       node:  02:42:4c:39:67:40                
 ========= ========== ======================================== 
➜  uuid decode ffab8f50-4510-4c7f-af8f-e0a09f43a868
 ========= ========== ================================================= 
  encode:   STR:       ffab8f50-4510-4c7f-af8f-e0a09f43a868             
            INT:       339843928432163496190993563305967331432          
  decode:   variant:   RFC 4122                                         
            version:   4 (random data based)                            
            content:   ff:ab:8f:50:45:10:4c:7f:af:8f:e0:a0:9f:43:a8:68  
                       (no semantics: random data only)                 
 ========= ========== ================================================= 

Real life example


    public function createAction()
    {

        // ...

        $id = (string) UUID::generate();

        $this->getContainer()->get('command_bus')->handle(
            new CreateCustomer(
                $id,
                'Proget Sp. z o.o.',
                'ul. Warszawska 28 43-430 Bielsko-Biała',
                'Jan Kowalski',
                'info@proget.pl',
                '+48 33 822 14 85'
            )
        );

        // ...

    }

Uuid:uuid4() collisions 

Uuid:uuid4() collisions #80

private static function generateBytes($length)
{
    if (self::hasOpensslRandomPseudoBytes()) {
        return openssl_random_pseudo_bytes($length);
    }

    $bytes = '';
    for ($i = 1; $i <= $length; $i++) {
        $bytes = chr(mt_rand(0, 255)) . $bytes;
    }

    return $bytes;
}

Uuid:uuid4() collisions #80

OpenSSL’s default random number generator mixes in the PID, which provides a certain degree of fork safety. However, once the PIDs wrap, new children will start to produce the same random sequence as previous children which had the same PID.

Solution

openssl_random_pseudo_bytes()
random_bytes()

>= PHP 7

< 7

paragonie/random_compat

PHP 5.x polyfill for random_bytes() and random_int() from PHP 7

Level: advanced

COMB

  • Combined UUID version 4 with timestamp
  • Replaces the least significant bytes of the node field with the current timestamp

  • Tries to compensate for the reduced clustering in database indexes

COMB

$factory = new UuidFactory();
$generator = new CombGenerator(
    $factory->getRandomGenerator(),
    $factory->getNumberConverter()
);
$factory->setRandomGenerator($generator);

$combUuid = $factory->uuid4();

Override Time Generator

$factory = new UuidFactory();
$factory->setTimeGenerator(
    new PeclUuidTimeGenerator()
);
$uuid = $factory->uuid1();

use pecl-uuid as time generator

Override Random Generator

$randomLibFactory = new \RandomLib\Factory();
$randomLibGenerator = $randomLibFactory
    ->getHighStrengthGenerator();

$uuidFactory = new UuidFactory();
$uuidFactory->setRandomGenerator(
    new RandomLibAdapter($randomLibGenerator)
);

$uuid = $uuidFactory->uuid4();

use ircmaxell/random-lib library as random generator

More customization

  • Generators:
    • RandomGeneratorInterface
    • TimeGeneratorInterface
  • Codecs:
    • CodecInterface
  • Converters:
    • NumberConverterInterface
    • TimeConverterInterface
  • Providers:
    • NodeProviderInterface
    • TimeProviderInterface

UUID
in
Database

UUIDs in PostgreSQL

> select '74546161-06ea-42be-a27d-f070461557e1'::uuid;
                 uuid                 
--------------------------------------
 74546161-06ea-42be-a27d-f070461557e1
(1 row)
CREATE EXTENSION "uuid-ossp";
-- or CREATE EXTENSION "pgcrypto";


CREATE TABLE customer (
  id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
  name VARCHAR(255) NOT NULL
);

UUIDs in MySQL

BINARY(16)

Uuid::uuid4()->getBytes();
$factory = new UuidFactory();
$codec = new OrderedTimeCodec(
    $factory->getUuidBuilder()
);
$factory->setCodec($codec);
Uuid::setFactory($factory);

Generate an optimized version 1 (time-based) UUID object

Performance

Horizontal axis – Number of inserts x 25,000
Vertical axis – Time Taken in seconds

Time

Performance

Horizontal Axis – Number of inserts x 25,000
Vertical Axis – Data Size in MB

Size

Performance

  • MySQL
    https://www.percona.com/blog/2014/12/19/store-uuid-optimized-way/
     
  • SQL Server 200
    http://www.informit.com/articles/printerfriendly/25862
     
  • PostgreSQL
    http://www.cybertec.at/int4-vs-int8-vs-uuid-vs-numeric-performance-on-bigger-joins/

+ 10%

ramsey/uuid-doctrine

When UUID is not what you need

When UUID is not what you need

  • no mutable state, no technical primary key
     
  • strong storage constraints (2 to 4 times more space)
     
  • you don't expose the primary key
     
  • big database with big numer of relations

Best of Both = Integers Internal, UUIDs External

Alternatives

ulid

https://github.com/alizain/ulid

cuid

https://github.com/ericelliott/cuid

Tailor made

CREATE OR REPLACE FUNCTION insta5.next_id(OUT result bigint) AS $$
DECLARE
    our_epoch bigint := 1314220021721;
    seq_id bigint;
    now_millis bigint;
    shard_id int := 5;
BEGIN
    SELECT nextval('insta5.table_id_seq') %% 1024 INTO seq_id;
    SELECT FLOOR(EXTRACT(EPOCH FROM clock_timestamp()) * 1000) INTO now_millis;
    result := (now_millis - our_epoch) << 23;
    result := result | (shard_id <<10);
    result := result | (seq_id);
END;
    $$ LANGUAGE PLPGSQL;

https://engineering.instagram.com/sharding-ids-at-instagram-1cf5a71e5a5c

Summary

Summary

  • Unique across every table, every database, every server
  • Allows easy merging of records from different databases
  • You can generate IDs anywhere, instead of having to round trip to the database
  • Most replication scenarios require GUID columns anyway
  • It is a whopping 4 times larger than the traditional 4-byte index value
  • Cumbersome to debug "where userid='BAE7DF4-DDF-3RG-5TY3E3RF456AS10"
  • The generated UUIDs should be partially sequential for best performance 

Other languages

Java, Scala (native)

java.util.UUID.randomUUID

Node.js (Install uuid from npm)

var uuid = require("uuid");
uuid.v4();

Haskell (Install uuid from hackage)

uuid :: IO UUID
uuid = nextRandom

Q&A

Thanks for listening

@ ArkadiuszKondas

https://slides.com/arkadiuszkondas

https://joind.in/talk/959bf