HOW TRACKING WORKS AND HOW IT's USED

Daniel Coloma

"I have nothing to hide... so I don't care about my privacy"

HAVE YOU EVER HEARD...

HOW CAN WE MAKE USERS REALIZE THAT PRIVACY IS A FUNDAMENTAL RIGHT?

DISCLAIMER

HAVE YOU EVER WONDERED...

HAVE YOU EVER WONDERED...

Why elpais.com is showing me an ad about the Lego game I wanted to give to my son for Christmas?

... I never read any news about Lego in elpais.com and I already bought it!

... THE ANSWER IS ON YOUR PERSONAL DATA

HAVE YOU EVER WONDERED...

Why booking.com is offering me this rate for this hotel?

... and my friend Peter is getting a cheaper rate for exactly the same hotel, same room, same dates!

... THE ANSWER IS ON YOUR PERSONAL DATA

AN EXAMPLE OF WHAT HAPPENS BEHIND THE CURTAINS

ONLINE ADVERTISING

WHO DECIDES WHICH ADS SHOULD BE SHOWN TO YOU?

THE ADS TO BE SHOWN ARE USUALLY DECIDED BY SOME OTHER COMPANIES CALLED AD-EXCHANGE BROKERS

NOT REALLY

HOW DO THE BROKERS DECIDE WHICH ADs YOU SHould BE SHOWN?

AD-EXCHANGE BROKERS RUN AUCTIONS

WHO ARE THE BIDDERS?

THE ADVERTISERS (AND NOT DIRECTLY)

BUT WHAT IS THE GOOD BEING AUCTIONED?

THE SPACE IN THE NEWS WEB SITE?

THEY BID FOR THE USERS WATCHING THE AD!

 

THEY BID FOR YOU!

BUT NOBODY BIDS FOR AN EMPTY CANVAS

AUTHOR

TITLE

YEAR

OWNERS

CONDITION

SO THEY NEED TO KNOW WHO ARE YOU

WHO ARE YOU IS NOT YOUR NAME

A BIG SET OF DATA ABOUT YOU, TOGETHER WITH GOOD ALGORITHMS MAY PROVIDE AN EXCELLENT PICTURE OF YOU

SO THEY PROFILE YOU

  1. HOW INFORMATION ABOUT YOU IS COLLECTED 

  2. WHAT TYPE OF INFORMATION IS IT

  3. WHO IS COLLECTING IT AND HOW IT'S EXCHANGED

  4. WHY DO THEY DO IT? wHAT IS THE PURPOSE? ... AND HOW DOES IT AFFECT YOU

  5. WHAT ARE THE CHALLENGES TO RAISE AWARENESS?

WHAT ARE WE GOING TO LEARN TODAY?

1. HOW INFORMATION ABOUT ME IS COLLECTED?

Before creating a Facebook account Peter wants to check Facebook Privacy Policy. So he goes to Google and looks for "facebook data policy", THE FIRST SUGGESTED ENTRY IS:  https://www.facebook.com/policy.php

A simple story... (I)

He opens THE link (https://www.facebook.com/policy.php) READS FACEBOOK POLICY and HE decides NOT TO CREATE an account

A simple story... (II)

Later on he wants to check some information about cancer (HIS FATHER HAS JUST BEEN DIAGNOSED CANCER) in a health forum and he opens: http://salud.ccm.net/forum/cancer-8

A simple story... (III)

A simple story... (IV)

Peter THINKS Facebook doesn't know anything about him... is he right?

No, he is not! FACEBOOK IS PROFILING HIM

When Peter visited Facebook policy page, Facebook "took the opportunity" to set some cookies in his computer

  • A random identifier of the browser is created and stored in a cookie that is scoped to the Facebook root domain: I.e. the cookie will be sent every time a resource is retrieved from Facebook.com.

  • The cookies contain additional info such as first and last Facebook visited pages, etc.

  • Facebook has started to profile Peter

When later on he read the health forum, a Facebook plugin was loaded. As the plugin is hosted in Facebook domains, the cookies are sent back to Facebook.

  • The profile is enriched:

    • The URL I just visited is added to my browsing history.

    • The referrer URL too (how did I find this forum).

    • If a "Like" button is present, the page I would like in case I press it.

Maria WAS so worried about her privacy that never visited a Facebook page...

BUT She is pregnant and visited prenatal.com

GUESS WHAT? Facebook is profiling her!

​MARIA STARTS BEING PROFILED

When Maria visited Prenatal Web page, it loaded resources from pixel.facebook.com. Facebook "took the opportunity" to set some cookies in his computer in response

When later on she visits any Web Site that loads resources from a Facebook domain, the cookies will be sent back to Facebook

HER PROFILE IS CONTINUOUSLY ENRICHED

HOW MANY SITES INCLUDE FACEbOOK PLUGINS IN THE TOP 1 MILLION?

FACEBOOK THIRD PARTY CONTENT IS PRESENT IN 35% OF THE 1 MILLION MOST VISITED WEBSITES

But I heard I can opt-out!

http://www.youronlinechoices.eu/

WHat do you think it happens afterwards?

CLIK HERE

  • COOKIES ARE NOT REMOVED

  • A NEW COOKIE IS SET

  • INFO IS STILL BEING SENT TO FACEBOOK

REMEMBER: THIS IS JUST FOR NON-FACEBOOK USERS

I DON'T HAVE TIME TO TALK ABOUT WHAT HAPPENS TO FACEBOOK USERS

So... what if I disable cookies or remove them?

Very smart...


But do you think you are smarter than the trackers?

When tracking companies detected that many users blocked cookies they thought in alternatives

ALTERNATIVE 1 - "FLASH COOKIES"

A more resilient technology for tracking than HTTP cookies where less user control.

"RESPAWNING": KEEPING COOKIES ALIVE

Browser

cookies

Flash cookies

An exact copy of browser cookies is kept in -sync in Flash Cookies. Every time a cookie is added to the browser, a copy is created in the Flash Cookies repository

"RESPAWNING": ALWAyS KEEP ONE COPY

REMOVE COOKIES?

Browser

cookies

Flash cookies

Flash cookies

Browser

cookies

Even if the user removes the cookies from his browser, a copy still exists in the Flash Cookies repository

Browser

cookies

Flash cookies

Flash cookies

Browser

cookies

When cookie removal is detected they ARE re-built using THE exact copy that is available in the Flash cookies

Flash cookies

Browser

cookies

"RESPAWNING": A ZOMBIE COOKIE

REMOVE COOKIES?

RESPAWN!

ALTERNATIVE 2 - "EVERCOOKIES"

Make use AT THE SAME TIME of all the technologies AVAILABLE to store information in YOUR browser: HTTP cookies, IndexedDB, Local Storage, etc.

Browser cookies

An exact copy of browser cookies is kept in-sync in different storage locations

Flash cookies

IndexedDB

Local Storage

Etags

"RESPAWNING" IMPROVED!

IF JUST A SINGLE ONE REMAINS, IT CAN BE USED TO RESPAWN THE REST

Etags

ENABLE TRACKING PROTECTION

(ONLY IN FIREFOX)

WHAT CAN I DO?

USE ADDITIONAL TOOLS

SLIGHTLY DIFFERENT BUT ALL OF THEM BASED IN "CUTTING" TRAFFIC TO TRACKERS

THE "TRADITIONAL' TRACKING LANDSCAPE

Top third parties on the top 1 million sites (ACCORDING TO ALEXA)

TRACKERS ARE EVOlving

STATEFUL TRACKING

STATELESS TRACKING

Require storing info on your computer

NO NEED TO STORE ANYTHING oN YOUR COMPUTER

JUST IN CASE ONE DAY ALL THE USERS ENABLE TRACKING PROTECTION

(TRACKERS ARE NOT VERY EFFECTIVE YET AGAINST STATELESS TECHNIQUES)

 "fingerprinting"

 

Look for ways to uniquely identify your browser

Canvas Fingerprinting

The web page renders an image in a hidden Canvas. If the image is defined in a smart way, its hash is unique per device/browser

Font Fingerprinting

Show (IN A HIDDEN) CANVAS TEXT IN MULTIPLE FONTS AND measuring the onscreen dimensions of font glyphs. FONT GLYPHS ARE AFFECTED By so manY FACTORS THAT THEY ARE A UNIQUE WAY TO IDENTIFY YOUR BROWSER/COMPUTER

Audio CONTEXT Fingerprinting

The web page CREATES An auDIO CONTEXT AND REQUEST THE PROCESSING OF A SILENT SIGNAL. THE HASH OF THE PROCESSED SIGNAL IS UNIQUE PER BROWSER/DEVICE

WebRTC Fingerprinting

USE WEBRTC TO DISCOVER YOUR LOCAL IP ADDRESS Without any SPECIAL PERMISSION

THE "FINGERPRINTING' TRACKING LANDSCAPE

RANK INTERVAL CANVAS FONT WEBRTC
[0,1K) 5.10% 2.50% 0.60%
[1K, 10K) 3.91% 1.98% 0.42%
[10K, 100K) 2.45% 0.86% 0.19%
[100K, 1M) 1.31% 0.25% 0.06%

THE FINGERPRINTING TECHNIQUES ARE MORE FREQUENT IN THE MORE VISITED PAGES

TRACKERS ARE EVOLVING (EVEN MORE)

CROSS-DEVICE TRACKING

IP Address: 163.63.1.0
(9AM-6PM weekdays)
IP Address: 22.68.136.129
(early morning, evenings, weekends)

PROBABILISTIC MATCHING

IP Address: 163.63.1.0
(9AM-6PM weekdays)
IP Address: 22.68.136.129
(early morning, evenings, weekends)

DETERMINISTIC MATCHING

TRACKING TECHNIQUES HAVE OUTPACED THE TRACKING PROTECTION METHODS

CONCLUSION #1

2 - WHAT INFORMATION IS BEING COLLECTED ABOUT ME?

LOCATION DATA

WiFi

GPS

CARRIER

IP ADDRESS

TECHNICAL DATA

Operating System

Web Browser

Screen Resolution

Hardware Manufacturer

Installed Plugins

BEHAVIOURAL DATA

Browsing History

Ads Seen / Clicked

Search Queries

Purchasing History

Social Media

Referrals

Browsing Habits

DEMOGRAPHIC DATA

ADDRESS

ZIP CODE

NAME

AGE

GENDER

BUT THOSE ARE JUST SOME INGREDIENTS

THEY CAN INFERE A LOT MORE ABOUT YOU BY COMBINING THEM ON A SMART WAY

LEVEL OF INCOMES

ETHNIC INFORMATION

HEALTH SITUATION

POLiTICAL TENDENCIES

ARE YOU SURE THEY CAN INFERE ALL THESE THINGS ABOUT ME?

YES! HAVE A LOOK AT

FACEBOOK AD-CAMPAIGN MANAGER

THE AMOUNT OF INFORMATION GATHERED ABOUT YOU IS HUGE

CONCLUSION #2

AS WELL AS THE THINGS THAT CAN BE INFERED THANKS TO IT

3 - WHO IS COLLECTING THAT INFORMATION AND HOW DOES IT FLOW?

www.newspaper.com

WHAT YOU PERCEIVE WHEN VISITING A NEWS WEB SITE

IN 200 MSECS HE GETS THE INFORMATION FROM THE WEB SITE, SOME ADS APPEAR MIXED WITH THE CONTENT

BUT WHAT IS GOING ON

DURING THAT TIME?

www.newspaper.com

You visit a news site

1

1 - YOU tYPE THE URL OF YOUR FAVOURITE NEWS SITE

Apart from rendering the news Website, your browser sends an "ad-tag" to an  AD-EXCHANGE the publisher has an agreement with

2

2 - tHE WEBSITE IS RENDERED + YOUR BROWSER SENDS AND "AD-TAG"

AD-EXCHANGES are kind of marketplaces for Advertisements. They sell the empty space on sites on behalf of publishers

The AD-EXCHANGE knows that there is ad-space for a bid... but most importantly, it can now retrieve your cookies. The cookies contain the ID the ad-exchange assigned to you the first time you "visited" it and extra-info: Profile

3

3 - AD-EXCHANGE RETRIEVES COOKIES FROM YOUR COMPUTER AND CHECKS WHO ARE YOU

The AD-EXCHANGE sends an "ad-call" to DEMAND-SIDE-PLATFORMS: "You have an opportunity to advertise to a user with this Profile and ID"

4

4 - THE AD-EXCHANGE LOOK FOR POTENTIAL ADVERTISERS FOR YOUR PROFILE

DEMAND-SIDE-PLATFORMS are mediators between the advertisers and the ad-exchanges. They receive campaigns from advertisers and the criteria for looking for impressions.

All DEMAND-SIDE-PLATFORM  candidates retrieve their cookies from your computer so they can also complete the profile they have about you and link it to your ID

5

5 - THE DEMAND-SIDE-PLATFORMS READ THEIR COOKIES FROM YOUR COMPUTER

DEMAND-SIDE-PLATFORMS request extra information about you to one or more DATA-BROKERS

6

6 - THE DEMAND-SIDE-PLATFORMS LOOK FOR EXTRA INFORMATION FROM DATA BROKERS

DATA-BROKERS are companies that sell user profiles and market analysis. They use their knowledge to put users in buckets such as "urban and eco-friendly"

DEMAND-SIDE-PLATFORMS Perform cookie-matching with all the info they have about you and decide how much they can bid. They correlate their ID/Profile with the Ad-Exchange ID/Profile and the extra info got  from Data Brokers.

7

$0.1

$0.09

$0.09

7 - USE ALL THE INFORMATION ABOUT YOU To decide HOW MUCH THEY CAN OFFER

The AD-EXCHANGE checks all the offers from the DEMAND-SIDE-PLATFORMS and assigns the space to the one with the highest bid

8

$0.1

8 - AD-EXCHANGE ASSIGNS THE SPACE TO THE HIGHEST BID

www.newspaper.com

The winner DEMAND-SIDE-PLATFORM  places one ad from their advertisers at www.newspaper.com

9

$0.1

9 - THE WINNER DEMAND-SIDE-PLATFORM PLACES AN AD ON YOUR BROWSER

www.newspaper.com

The ad-exchange sends an "ad-call": "You have an opportunity to advertise to a user with Profile and ID"

Apart from rendering the Website, your browser sends an "ad-tag" to the ad-exchange

The AD-EXCHANGE knows that there is ad-space for a bid... but most importantly, it can now retrieve your cookies. The cookies contain the ID the ad-exchange assigned to you the first time you "visited" it and extra-info: Profile

You visit a news site

1

2

3

4

All DEMAND-SIDE-PLATFORM candidates retrieve their cookies from your computer

Request extra information about you to DATA-BROKERS

5

6

Perform cookie-matching with all the info they have about you and decide how much they can bid

7

The AD-EXCHANGE checks all the offer and assigns the space to the Demand-Side-Platform with the highest bid

8

The winner Demand-Side-Platform places one ad from their advertisers at www.newspaper.com

9

$0.1

$0.09

$0.09

THE WHOLE "SIMPLIFIED" FLOW

IN THE WHOLE PROCESS MANY COMPANIES GET INFORMATION ABOUT YOU BY RETRIEVING THEIR COOKIES AND EXCHANGING AND MATCHING INFORMATION

MANY COMPANIES ARE LOOKING

AT EVERYTHING YOU DO ONLINE

«A site is not one company any more. A site is tens of hundreds of companies all knowing where you are and what you’re looking at.»

AND THIS IS JUST A SIMPLIFIED VIEW

... LET'S HAVE A LOOK AT THE EVOLUTION

2011 - 150 Companies

marketing technology landscape

2016 - 3500 Companies

marketing technology landscape

2011

2012

2014

2016

1000

2000

3000

150

350

1500

3500

NUMbER OF COMPANIES IN MARKETING TECHNOLOGY

CONCLUSION #3

THE NUMBER OF PLAYeRS TRACKING US IS BIG AND GROWING. THE ECOSYSTEM WORKS IN SUCh aWAY THAT THEY ARE ENCouragED TO SHARE WHAT THEY KNOW ABOUT USERS.

 4 - WHY DO THEY COLLECT THAT INFORMAtion? HOW DO THEY USE IT?

 TAKE DECISIONS

ADVERTISEMENT

CREDIT SCORE

RECRUITING

PRICE QUOTATION

SEARCH RESULTS

DECISIONS ARE TAKEN IN THE DARK:

 

 

 

 

 

HOW CAN WE BE SURE THEy ARE FAIR?

RISK OF WRONG DECISIONS

What if the data you have about me is wrong?

RISK OF MANIPULATION

What if the ad does not only show content they think is relevant to me, but also shows to me in a way that exploits "my vulnerabilities" (impulsive, cautious, etc.)?

RISK OF HIDDEN DISCRIMINATION

People are biased as well as the algorithms they create.

For instance, it was found that Google displayed ads about high-income jobs to men more often than to women.

RISK OF PRICE DISCRIMINATION

Can I get a higher price just because I use a MAC or because my incomes are higher?

RISK OF FILTERING BUBBLE

TELL YOU ONLY WHAT YOU WANT TO HEAR

A DETAILED LOOK AT FACEBOOK

97%

 

OF ITS REVENUE COMES FROM ADS

BUT THE COST OF EVERY AD IS GOING DOWN

THE MONEY FACEBOOK MAKES FROM AN AD-CLICK IS 1000 TIMES BIGGER THAN JUST AN AD-IMPRESSION

  • GET MORE USERS WATCHING ADS

  • INCREASE NUMBER OF CLICKS PER AD

  • INCREASE VALUE OF SOME AUDIENCES

HOW CAN FACEBOOK MAKE MORE MONEY EVERY YEAR?

  • INTERNET.ORG

  • DEEP TRACKING AND SEGMENTATION

HOW CAN FACEBOOK MAKE MORE MONEY EVERY YEAR?

CONCLUSION #4

MANY DECISIONS THAT AFFECT ME IN MY EVERY DAY LIFE ARE BASED ON THE DATA ABOUT ME THAT IS BEING ACCUMULATED AND EXCHANGED 

5 - key challenges to raise awareness

EVERY TIME WE BROWSE THE WEB, MANY COMPANIES ARE COLLECTING MULTIPLE DATA ABOUT ME

DATA RACE

ASYMMETRIC RACE

What they know about me

What I know about them

NO TRANSPARENCY = DANGER OF UNFAIR DECISIONS

no transparency =

NO INCENTIVE ON COMPETING ON BEST PRIVACY-FRIENDLY SERVICES
 

It AFFECTS EVERYONE but FEW PEOPLE have any INSIGHT about it

3.17 BILLION OF INTERNET USERS

WHAT IF PEOPLE WERE TRACKED OFFLINE AS THEY ARE ONLINE?

1 - ImAGINE YOU GET INTo A BIG SHOPPING MALL

2 - YOU ARE GREETED BY A MAN WHO TELLS YOU THAT HE WILL FOLLOW YOU AROUND TO RECORD WHICH SHOPS YOU ENTER, WHO YOU MEET AND WHAT YOU DO IN GENERAL

3 - BUT RELAX! HE's GOING TO STAY AT A DISTANCE SO YOU WON't notice

4 - HE TELLS YOU THEy DO THIS BECAUSE THEY WANT TO PROVIDE YOU A BETTER SERVICE NEXT TIME

5 - AND BECAUSE OF THAT, THEY NEED TO GET SOME ADDITIONAL INFO FROM YOU SO YOU CAN RECOGNISE YOU NEXT TIME YOU VISIT

6 - BUT RELAX! I DON't NEED YOUR NAME TO RECOGNISE YOU, JUST A BIT OF INFORMATION ABOUT YOU

7 - AND BY THE WAY, I MIGHT EXCHANGE INFORMATION WITH OTHER MALLS SO YOU CAN GET EVEN A BETTER SERVICE

WOULD YOU GO ON SHOPPING?

OR WOULD YOU LOOK FOR A PLACE WHERE YOU ARE LEFT ALONE?

the problem is that the users are not fully aware of this

and...  "users don't care about privacy"

and...  "users don't care about privacy"

 

 

YET

THINGS ARE ALREADY CHANGING

AND telefónica is COMMITTED & helping!

MANY OF THE FINDINGS I'VE USED TODAY HAVE BEEN DISCOVERED WITH TOOLS SPONSORED BY THE DTL


 WE ARE NOT GOING TO COMMERCIALIZE CUSTOMERS INFORMATION: WE ARE GOING To GIVE THAT INFORMATION BACK tO THEM.

DATA BELONG TO CusTOMERS

three levels of awareness

  1. Are they aware that some services are free because they use their online activity to monetize it?
  2. Are they aware how much are they worth for those services?
  3. Are they aware why are that worth for them?
  • How much data they can get about them and how is it collected
  • How much information can be inferred about them based on that data
  • How that information can be used

KEY CHALLENGE

HOW TO RAISE AWARENESS OF SUCH A COMPLICATED ISSUE?

let's see how big is that challenge via some examples and demos...

facebook data valuation tool (FDVT)

FDVT: end-user tool

  • It's an estimation good enough for users
  • Does the app provide enough value to keep them engaged?
  • What could we provide beyond the money information?

FDVT: AGGREGATED INFORMATION

  • We can get the CPC and the CPM for multiple audiences with multiple parameters (e.g. Highest CPC in UK is in the 40s and in Spain in the 30s, difference between men/women CPC...)
  • We can do this to observe trends over time (e.g. evolution of CPM/CPC for democrats/republicans during election time)

BUT I'M PRETTY SURE WE CAN DO MUCH MORE THAN THIS!!!!

revealing and controlling mobile privacy leaks (RECON)

RECON: END-USER TOOL

  • Apart from the poor UI... how could we raise awareness about what is going on rather than providing users raw information about what is going on...

RECON: AGGREGATED INFORMATION

privacy census

privacy census: THE DATA

https://webtransparency.cs.princeton.edu/webcensus/

WHAT CAN WE DO WITH ALL THAT INFO?

Sites that perform fingerprinting, 3rd parties used, type of sites, traffic, country, etc.

privacy census: UI TOOLS

KEY CHALLENGES

  • Risk of scaring people
  • Complex information:
    • Connections across  trackers
    • Too much information
    • Includes Personal Aspects
    • Very technical information
    • We are not sure about the aspects inferred about them (we can guess, for instance, why an ad has been shown)
  • Need to find metaphores

THANKS!

Made with Slides.com