UUID

The only ID you need?

ARKADIUSZ KONDAS

Lead Software Architect
@ Proget Sp. z o.o.

Zend Certified Engineer

Code Craftsman

Ultra Runner

@ ArkadiuszKondas

Zend Certified Architect

php-ai/php-ml

Genesis

Database

node (entity)

edge (relation)

Primary key

Main role: to provide a stable, indexable reference to an entity.

  • semantic / natural
    extracted from the entities attributes (one or more fields)
  • technical / surrogate
    completely unrelated to the fields of its entity
    constructed when the entity is inserted in the DB.

Auto-increment / serial

> CREATE TABLE customer (
  id SERIAL PRIMARY KEY,
  name VARCHAR(255) NOT NULL
);
> INSERT INTO customer (name) VALUES ('John');
> SELECT * FROM customer;
 id | name 
----+------
  1 | John
(1 row)
> SELECT currval('customer_id_seq'::regclass);
 currval 
---------
       1
(1 row)

Why can be a problem?

  • Information disclosure

    In url: /api/user/123
    all it takes to know your user count is to create an account and look at link
     
  • Entity enumerations
    easy to scrape all your entities (just start from 1 and ++)
     
  • Non uniqueness across tables
     

ON DELETE CASCADE

Why can be a problem?

Database round trip

$customer = new Customer('John', 'Snow');

$orm->persist($customer);
$orm->flush();

$customer->id();

Workarounds

  • You can configure your sequence to start at an arbitrary point
     
  • Have an increment bigger than one
     
  • Share the sequence between tables

Workarounds

Obfuscation

hidden secret key is required (salt)

zackkitzmiller/tiny-php

$tiny = new \ZackKitzmiller\Tiny('5SX0...M3so9bfzZpuUenKtP74QNH6B');

echo $tiny->to(5);
// E
1 Z4UrtW
3456 gPUasb

Workarounds

Obfuscation

the salt can be guessed from about 30 hashes !!!

UUID FTW

U     U    ID

universally

unique

identifier

RFC 4122

Specific variant with 5 version

U     U    

universally

unique

Possible values: \(2^{122}\)

Collision:  number of random version 4 UUIDs which need to be generated in order to have a 50% probability of at least one collision is 2.71 quintillion - generating 1 billion UUIDs per second for about 85 years

45 EB

Unsigned 128-bit integer

133187334300416338637111672389176371521

32 hexadecimal (base 16) digits

6432f291-9a6b-42ba-8e73-9e7cb327a941

Anatomy

6432f291-9a6b-42ba-8e73-9e7cb327a941

time_low

time_mid

time_hi

clock_seq_hi_and_res

node

version

variant

ramsey/uuid

Version 1: Time based

use Ramsey\Uuid\Uuid;

for ($i=0; $i<10; $i++) {
    echo Uuid::uuid1()->toString() . PHP_EOL;
}
06d0af36-bfc2-11e7-a118-02424c396740
06d0f6e4-bfc2-11e7-a58f-02424c396740
06d0f7a2-bfc2-11e7-9265-02424c396740
06d0f824-bfc2-11e7-a503-02424c396740
06d0f932-bfc2-11e7-937b-02424c396740
06d0f9aa-bfc2-11e7-b47a-02424c396740
06d0fa22-bfc2-11e7-9630-02424c396740
06d0fa90-bfc2-11e7-a5d2-02424c396740
06d0fafe-bfc2-11e7-967e-02424c396740
06d0fb6c-bfc2-11e7-9e54-02424c396740

Version 3 & 5:
Name based

$namespace = Uuid::uuid3(Uuid::NAMESPACE_DNS, 'phpce.eu');
echo Uuid::uuid3($namespace, '/about');
echo Uuid::uuid3($namespace, '/contact');
da099c12-57c5-38bb-a5ca-7a9f0a6f6578
a0f6e622-41b5-3608-b858-8dc231da5c01

Hashing algorithm:

  • MD5 -> Version 3
  • SHA1 -> Version 5

Version 4: Random

use Ramsey\Uuid\Uuid;

for ($i=0; $i<10; $i++) {
    echo Uuid::uuid4()->toString() . PHP_EOL;
}
c9cb5802-fbc2-4e63-9812-160d6b14af32
7a7e6dea-7c0d-4173-a3c6-5a9f0386e9ad
9af361b5-f0a6-4caa-af7d-ab01a8b8c0fe
95b2a611-b1f7-4496-a911-532e408b8fec
adf887f4-d07c-4149-80a9-a6adf0406490
4bd03549-de3b-49dd-a784-a32388bb777d
843ef345-1632-4960-a0d3-0967b981de99
3fa23ff3-6b80-4a65-bcc2-5a8021cf3fe0
a47b0437-8e7e-43a1-8783-c747080920b4
bdc2334e-f1cb-4c4d-a94d-4f8fe30f1d56

Version 2:
DCE Security

RFC 4122 does not explicitly define it

not implemented

ramsey/uuid-console

➜ composer global require ramsey/uuid-console

➜ uuid generate --help
Usage:
  generate [options] [--] [<version>] [<namespace>] [<name>]

Arguments:
  version               The UUID version to generate. Supported are version "1", "3", "4" and "5". [default: 1]
  namespace             For version 3 or 5 UUIDs, the namespace to create a UUID for. May be either a UUID in string representation or an identifier for internally pre-defined namespace UUIDs (currently known are "ns:DNS", "ns:URL", "ns:OID", and "ns:X500").
  name                  For version 3 or 5 UUIDs, the name to create a UUID for. The name is a string of arbitrary length.

Options:
  -c, --count=COUNT     Generate count UUIDs instead of just a single one. [default: 1]
      --comb            For version 4 UUIDs, uses the COMB strategy to generate the random data.
  -g, --guid    

ramsey/uuid-console

➜ uuid decode 2bdaf386-c012-11e7-8d9b-02424c396740
 ========= ========== ======================================== 
  encode:   STR:       2bdaf386-c012-11e7-8d9b-02424c396740    
            INT:       58293663835462406327300766704462292800  
  decode:   variant:   RFC 4122                                
            version:   1 (time and node based)                 
            content:   time:  2017-11-02T21:09:54+00:00        
                       clock: 3483 (usually random)            
                       node:  02:42:4c:39:67:40                
 ========= ========== ======================================== 
➜  uuid decode ffab8f50-4510-4c7f-af8f-e0a09f43a868
 ========= ========== ================================================= 
  encode:   STR:       ffab8f50-4510-4c7f-af8f-e0a09f43a868             
            INT:       339843928432163496190993563305967331432          
  decode:   variant:   RFC 4122                                         
            version:   4 (random data based)                            
            content:   ff:ab:8f:50:45:10:4c:7f:af:8f:e0:a0:9f:43:a8:68  
                       (no semantics: random data only)                 
 ========= ========== ================================================= 

Real life example


    public function createAction()
    {

        // ...

        $id = (string) UUID::generate();

        $this->getContainer()->get('command_bus')->handle(
            new CreateCustomer(
                $id,
                'Proget Sp. z o.o.',
                'ul. Warszawska 28 43-430 Bielsko-Biała',
                'Jan Kowalski',
                'info@proget.pl',
                '+48 33 822 14 85'
            )
        );

        // ...

    }

Uuid:uuid4() collisions 

Uuid:uuid4() collisions #80

private static function generateBytes($length)
{
    if (self::hasOpensslRandomPseudoBytes()) {
        return openssl_random_pseudo_bytes($length);
    }

    $bytes = '';
    for ($i = 1; $i <= $length; $i++) {
        $bytes = chr(mt_rand(0, 255)) . $bytes;
    }

    return $bytes;
}

Uuid:uuid4() collisions #80

OpenSSL’s default random number generator mixes in the PID, which provides a certain degree of fork safety. However, once the PIDs wrap, new children will start to produce the same random sequence as previous children which had the same PID.

Solution

openssl_random_pseudo_bytes()
random_bytes()

>= PHP 7

< 7

paragonie/random_compat

PHP 5.x polyfill for random_bytes() and random_int() from PHP 7

Level: advanced

COMB

  • Combined UUID version 4 with timestamp
  • Replaces the least significant bytes of the node field with the current timestamp

  • Tries to compensate for the reduced clustering in database indexes

COMB

$factory = new UuidFactory();
$generator = new CombGenerator(
    $factory->getRandomGenerator(),
    $factory->getNumberConverter()
);
$factory->setRandomGenerator($generator);

$combUuid = $factory->uuid4();

Override Time Generator

$factory = new UuidFactory();
$factory->setTimeGenerator(
    new PeclUuidTimeGenerator()
);
$uuid = $factory->uuid1();

use pecl-uuid as time generator

Override Random Generator

$randomLibFactory = new \RandomLib\Factory();
$randomLibGenerator = $randomLibFactory
    ->getHighStrengthGenerator();

$uuidFactory = new UuidFactory();
$uuidFactory->setRandomGenerator(
    new RandomLibAdapter($randomLibGenerator)
);

$uuid = $uuidFactory->uuid4();

use ircmaxell/random-lib library as random generator

More customization

  • Generators:
    • RandomGeneratorInterface
    • TimeGeneratorInterface
  • Codecs:
    • CodecInterface
  • Converters:
    • NumberConverterInterface
    • TimeConverterInterface
  • Providers:
    • NodeProviderInterface
    • TimeProviderInterface

UUID
in
Database

UUIDs in PostgreSQL

> select '74546161-06ea-42be-a27d-f070461557e1'::uuid;
                 uuid                 
--------------------------------------
 74546161-06ea-42be-a27d-f070461557e1
(1 row)
CREATE EXTENSION "uuid-ossp";
-- or CREATE EXTENSION "pgcrypto";


CREATE TABLE customer (
  id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
  name VARCHAR(255) NOT NULL
);

UUIDs in MySQL

BINARY(16)

Uuid::uuid4()->getBytes();
$factory = new UuidFactory();
$codec = new OrderedTimeCodec(
    $factory->getUuidBuilder()
);
$factory->setCodec($codec);
Uuid::setFactory($factory);

Generate an optimized version 1 (time-based) UUID object

Performance

Horizontal axis – Number of inserts x 25,000
Vertical axis – Time Taken in seconds

Time

Performance

Horizontal Axis – Number of inserts x 25,000
Vertical Axis – Data Size in MB

Size

Performance

  • MySQL
    https://www.percona.com/blog/2014/12/19/store-uuid-optimized-way/
     
  • SQL Server 200
    http://www.informit.com/articles/printerfriendly/25862
     
  • PostgreSQL
    http://www.cybertec.at/int4-vs-int8-vs-uuid-vs-numeric-performance-on-bigger-joins/

+ 10%

ramsey/uuid-doctrine

When UUID is not what you need

When UUID is not what you need

  • no mutable state, no technical primary key
     
  • strong storage constraints (2 to 4 times more space)
     
  • you don't expose the primary key
     
  • big database with big numer of relations

Best of Both = Integers Internal, UUIDs External

Alternatives

ulid

https://github.com/alizain/ulid

cuid

https://github.com/ericelliott/cuid

Tailor made

CREATE OR REPLACE FUNCTION insta5.next_id(OUT result bigint) AS $$
DECLARE
    our_epoch bigint := 1314220021721;
    seq_id bigint;
    now_millis bigint;
    shard_id int := 5;
BEGIN
    SELECT nextval('insta5.table_id_seq') %% 1024 INTO seq_id;
    SELECT FLOOR(EXTRACT(EPOCH FROM clock_timestamp()) * 1000) INTO now_millis;
    result := (now_millis - our_epoch) << 23;
    result := result | (shard_id <<10);
    result := result | (seq_id);
END;
    $$ LANGUAGE PLPGSQL;

https://engineering.instagram.com/sharding-ids-at-instagram-1cf5a71e5a5c

Summary

Summary

  • Unique across every table, every database, every server
  • Allows easy merging of records from different databases
  • You can generate IDs anywhere, instead of having to round trip to the database
  • Most replication scenarios require GUID columns anyway
  • It is a whopping 4 times larger than the traditional 4-byte index value
  • Cumbersome to debug "where userid='BAE7DF4-DDF-3RG-5TY3E3RF456AS10"
  • The generated UUIDs should be partially sequential for best performance 

Other languages

Java, Scala (native)

java.util.UUID.randomUUID

Node.js (Install uuid from npm)

var uuid = require("uuid");
uuid.v4();

Haskell (Install uuid from hackage)

uuid :: IO UUID
uuid = nextRandom

Q&A

Thanks for listening

@ ArkadiuszKondas

https://slides.com/arkadiuszkondas

https://joind.in/talk/959bf

UUID - The only ID you need?

By Arkadiusz Kondas

UUID - The only ID you need?

What is UUID and what are its successor versions? How to start using UUID in PHP? Can my database handle UUID? What is UUID suitable for, and what is not? Is UUID the only ID I need? For these and other questions I will try to answer in a presentation devoted exclusively to uniquely unique identifiers.

  • 2,305