HOW TRACKING WORKS AND HOW IT's USED
Daniel Coloma

"I have nothing to hide... so I don't care about my privacy"
HAVE YOU EVER HEARD...

HOW CAN WE MAKE USERS REALIZE THAT PRIVACY IS A FUNDAMENTAL RIGHT?
DISCLAIMER
HAVE YOU EVER WONDERED...
HAVE YOU EVER WONDERED...
Why elpais.com is showing me an ad about the Lego game I wanted to give to my son for Christmas?
... I never read any news about Lego in elpais.com and I already bought it!
... THE ANSWER IS ON YOUR PERSONAL DATA
HAVE YOU EVER WONDERED...
Why booking.com is offering me this rate for this hotel?
... and my friend Peter is getting a cheaper rate for exactly the same hotel, same room, same dates!
... THE ANSWER IS ON YOUR PERSONAL DATA
AN EXAMPLE OF WHAT HAPPENS BEHIND THE CURTAINS

ONLINE ADVERTISING
WHO DECIDES WHICH ADS SHOULD BE SHOWN TO YOU?

THE ADS TO BE SHOWN ARE USUALLY DECIDED BY SOME OTHER COMPANIES CALLED AD-EXCHANGE BROKERS
NOT REALLY

HOW DO THE BROKERS DECIDE WHICH ADs YOU SHould BE SHOWN?


AD-EXCHANGE BROKERS RUN AUCTIONS
WHO ARE THE BIDDERS?

THE ADVERTISERS (AND NOT DIRECTLY)

BUT WHAT IS THE GOOD BEING AUCTIONED?

THE SPACE IN THE NEWS WEB SITE?

THEY BID FOR THE USERS WATCHING THE AD!

THEY BID FOR YOU!

BUT NOBODY BIDS FOR AN EMPTY CANVAS
AUTHOR
TITLE
YEAR
OWNERS
CONDITION
SO THEY NEED TO KNOW WHO ARE YOU
WHO ARE YOU IS NOT YOUR NAME

A BIG SET OF DATA ABOUT YOU, TOGETHER WITH GOOD ALGORITHMS MAY PROVIDE AN EXCELLENT PICTURE OF YOU
SO THEY PROFILE YOU
-
HOW INFORMATION ABOUT YOU IS COLLECTED
-
WHAT TYPE OF INFORMATION IS IT
-
WHO IS COLLECTING IT AND HOW IT'S EXCHANGED
-
WHY DO THEY DO IT? wHAT IS THE PURPOSE? ... AND HOW DOES IT AFFECT YOU
-
WHAT ARE THE CHALLENGES TO RAISE AWARENESS?
WHAT ARE WE GOING TO LEARN TODAY?
1. HOW INFORMATION ABOUT ME IS COLLECTED?
Before creating a Facebook account Peter wants to check Facebook Privacy Policy. So he goes to Google and looks for "facebook data policy", THE FIRST SUGGESTED ENTRY IS: https://www.facebook.com/policy.php

A simple story... (I)
He opens THE link (https://www.facebook.com/policy.php) READS FACEBOOK POLICY and HE decides NOT TO CREATE an account

A simple story... (II)
Later on he wants to check some information about cancer (HIS FATHER HAS JUST BEEN DIAGNOSED CANCER) in a health forum and he opens: http://salud.ccm.net/forum/cancer-8
A simple story... (III)



A simple story... (IV)
Peter THINKS Facebook doesn't know anything about him... is he right?

No, he is not! FACEBOOK IS PROFILING HIM
When Peter visited Facebook policy page, Facebook "took the opportunity" to set some cookies in his computer
-
A random identifier of the browser is created and stored in a cookie that is scoped to the Facebook root domain: I.e. the cookie will be sent every time a resource is retrieved from Facebook.com.
-
The cookies contain additional info such as first and last Facebook visited pages, etc.
-
Facebook has started to profile Peter
When later on he read the health forum, a Facebook plugin was loaded. As the plugin is hosted in Facebook domains, the cookies are sent back to Facebook.
-
The profile is enriched:
-
The URL I just visited is added to my browsing history.
-
The referrer URL too (how did I find this forum).
-
If a "Like" button is present, the page I would like in case I press it.
-
Maria WAS so worried about her privacy that never visited a Facebook page...

BUT She is pregnant and visited prenatal.com
GUESS WHAT? Facebook is profiling her!
MARIA STARTS BEING PROFILED
When Maria visited Prenatal Web page, it loaded resources from pixel.facebook.com. Facebook "took the opportunity" to set some cookies in his computer in response
When later on she visits any Web Site that loads resources from a Facebook domain, the cookies will be sent back to Facebook
HER PROFILE IS CONTINUOUSLY ENRICHED
HOW MANY SITES INCLUDE FACEbOOK PLUGINS IN THE TOP 1 MILLION?
FACEBOOK THIRD PARTY CONTENT IS PRESENT IN 35% OF THE 1 MILLION MOST VISITED WEBSITES
But I heard I can opt-out!

http://www.youronlinechoices.eu/
WHat do you think it happens afterwards?

CLIK HERE
-
COOKIES ARE NOT REMOVED
-
A NEW COOKIE IS SET
-
INFO IS STILL BEING SENT TO FACEBOOK

REMEMBER: THIS IS JUST FOR NON-FACEBOOK USERS
I DON'T HAVE TIME TO TALK ABOUT WHAT HAPPENS TO FACEBOOK USERS

So... what if I disable cookies or remove them?
Very smart...
But do you think you are smarter than the trackers?
When tracking companies detected that many users blocked cookies they thought in alternatives

ALTERNATIVE 1 - "FLASH COOKIES"
A more resilient technology for tracking than HTTP cookies where less user control.

"RESPAWNING": KEEPING COOKIES ALIVE
Browser
cookies
Flash cookies


An exact copy of browser cookies is kept in -sync in Flash Cookies. Every time a cookie is added to the browser, a copy is created in the Flash Cookies repository
"RESPAWNING": ALWAyS KEEP ONE COPY
REMOVE COOKIES?
Browser
cookies
Flash cookies



Flash cookies

Browser
cookies
Even if the user removes the cookies from his browser, a copy still exists in the Flash Cookies repository
Browser
cookies
Flash cookies



Flash cookies

Browser
cookies
When cookie removal is detected they ARE re-built using THE exact copy that is available in the Flash cookies
Flash cookies

Browser
cookies

"RESPAWNING": A ZOMBIE COOKIE
REMOVE COOKIES?
RESPAWN!
ALTERNATIVE 2 - "EVERCOOKIES"

Make use AT THE SAME TIME of all the technologies AVAILABLE to store information in YOUR browser: HTTP cookies, IndexedDB, Local Storage, etc.
Browser cookies
An exact copy of browser cookies is kept in-sync in different storage locations
Flash cookies
IndexedDB
Local Storage
Etags
"RESPAWNING" IMPROVED!
IF JUST A SINGLE ONE REMAINS, IT CAN BE USED TO RESPAWN THE REST
Etags
ENABLE TRACKING PROTECTION
(ONLY IN FIREFOX)
WHAT CAN I DO?

USE ADDITIONAL TOOLS

SLIGHTLY DIFFERENT BUT ALL OF THEM BASED IN "CUTTING" TRAFFIC TO TRACKERS
THE "TRADITIONAL' TRACKING LANDSCAPE
Top third parties on the top 1 million sites (ACCORDING TO ALEXA)
TRACKERS ARE EVOlving
STATEFUL TRACKING
STATELESS TRACKING
Require storing info on your computer
NO NEED TO STORE ANYTHING oN YOUR COMPUTER
JUST IN CASE ONE DAY ALL THE USERS ENABLE TRACKING PROTECTION
(TRACKERS ARE NOT VERY EFFECTIVE YET AGAINST STATELESS TECHNIQUES)
"fingerprinting"
Look for ways to uniquely identify your browser

Canvas Fingerprinting

The web page renders an image in a hidden Canvas. If the image is defined in a smart way, its hash is unique per device/browser
Font Fingerprinting
Show (IN A HIDDEN) CANVAS TEXT IN MULTIPLE FONTS AND measuring the onscreen dimensions of font glyphs. FONT GLYPHS ARE AFFECTED By so manY FACTORS THAT THEY ARE A UNIQUE WAY TO IDENTIFY YOUR BROWSER/COMPUTER

Audio CONTEXT Fingerprinting
The web page CREATES An auDIO CONTEXT AND REQUEST THE PROCESSING OF A SILENT SIGNAL. THE HASH OF THE PROCESSED SIGNAL IS UNIQUE PER BROWSER/DEVICE
WebRTC Fingerprinting
USE WEBRTC TO DISCOVER YOUR LOCAL IP ADDRESS Without any SPECIAL PERMISSION

THE "FINGERPRINTING' TRACKING LANDSCAPE
RANK INTERVAL | CANVAS | FONT | WEBRTC |
---|---|---|---|
[0,1K) | 5.10% | 2.50% | 0.60% |
[1K, 10K) | 3.91% | 1.98% | 0.42% |
[10K, 100K) | 2.45% | 0.86% | 0.19% |
[100K, 1M) | 1.31% | 0.25% | 0.06% |
THE FINGERPRINTING TECHNIQUES ARE MORE FREQUENT IN THE MORE VISITED PAGES


TRACKERS ARE EVOLVING (EVEN MORE)

CROSS-DEVICE TRACKING


IP Address: 163.63.1.0
(9AM-6PM weekdays)
IP Address: 22.68.136.129
(early morning, evenings, weekends)
PROBABILISTIC MATCHING
IP Address: 163.63.1.0
(9AM-6PM weekdays)
IP Address: 22.68.136.129
(early morning, evenings, weekends)
DETERMINISTIC MATCHING









TRACKING TECHNIQUES HAVE OUTPACED THE TRACKING PROTECTION METHODS

CONCLUSION #1
2 - WHAT INFORMATION IS BEING COLLECTED ABOUT ME?
LOCATION DATA
WiFi
GPS
CARRIER
IP ADDRESS

TECHNICAL DATA
Operating System
Web Browser
Screen Resolution
Hardware Manufacturer
Installed Plugins

BEHAVIOURAL DATA
Browsing History
Ads Seen / Clicked
Search Queries
Purchasing History
Social Media
Referrals
Browsing Habits

DEMOGRAPHIC DATA
ADDRESS
ZIP CODE
NAME
AGE
GENDER

BUT THOSE ARE JUST SOME INGREDIENTS
THEY CAN INFERE A LOT MORE ABOUT YOU BY COMBINING THEM ON A SMART WAY
LEVEL OF INCOMES
ETHNIC INFORMATION
HEALTH SITUATION
POLiTICAL TENDENCIES

ARE YOU SURE THEY CAN INFERE ALL THESE THINGS ABOUT ME?

YES! HAVE A LOOK AT
FACEBOOK AD-CAMPAIGN MANAGER



THE AMOUNT OF INFORMATION GATHERED ABOUT YOU IS HUGE
CONCLUSION #2
AS WELL AS THE THINGS THAT CAN BE INFERED THANKS TO IT

3 - WHO IS COLLECTING THAT INFORMATION AND HOW DOES IT FLOW?
www.newspaper.com


WHAT YOU PERCEIVE WHEN VISITING A NEWS WEB SITE

IN 200 MSECS HE GETS THE INFORMATION FROM THE WEB SITE, SOME ADS APPEAR MIXED WITH THE CONTENT

BUT WHAT IS GOING ON
DURING THAT TIME?
www.newspaper.com
You visit a news site
1

1 - YOU tYPE THE URL OF YOUR FAVOURITE NEWS SITE


Apart from rendering the news Website, your browser sends an "ad-tag" to an AD-EXCHANGE the publisher has an agreement with
2
2 - tHE WEBSITE IS RENDERED + YOUR BROWSER SENDS AND "AD-TAG"
AD-EXCHANGES are kind of marketplaces for Advertisements. They sell the empty space on sites on behalf of publishers


The AD-EXCHANGE knows that there is ad-space for a bid... but most importantly, it can now retrieve your cookies. The cookies contain the ID the ad-exchange assigned to you the first time you "visited" it and extra-info: Profile
3
3 - AD-EXCHANGE RETRIEVES COOKIES FROM YOUR COMPUTER AND CHECKS WHO ARE YOU

The AD-EXCHANGE sends an "ad-call" to DEMAND-SIDE-PLATFORMS: "You have an opportunity to advertise to a user with this Profile and ID"

4



4 - THE AD-EXCHANGE LOOK FOR POTENTIAL ADVERTISERS FOR YOUR PROFILE
DEMAND-SIDE-PLATFORMS are mediators between the advertisers and the ad-exchanges. They receive campaigns from advertisers and the criteria for looking for impressions.





All DEMAND-SIDE-PLATFORM candidates retrieve their cookies from your computer so they can also complete the profile they have about you and link it to your ID
5
5 - THE DEMAND-SIDE-PLATFORMS READ THEIR COOKIES FROM YOUR COMPUTER






DEMAND-SIDE-PLATFORMS request extra information about you to one or more DATA-BROKERS


6
6 - THE DEMAND-SIDE-PLATFORMS LOOK FOR EXTRA INFORMATION FROM DATA BROKERS
DATA-BROKERS are companies that sell user profiles and market analysis. They use their knowledge to put users in buckets such as "urban and eco-friendly"








DEMAND-SIDE-PLATFORMS Perform cookie-matching with all the info they have about you and decide how much they can bid. They correlate their ID/Profile with the Ad-Exchange ID/Profile and the extra info got from Data Brokers.
7
$0.1
$0.09
$0.09
7 - USE ALL THE INFORMATION ABOUT YOU To decide HOW MUCH THEY CAN OFFER








The AD-EXCHANGE checks all the offers from the DEMAND-SIDE-PLATFORMS and assigns the space to the one with the highest bid
8

$0.1
8 - AD-EXCHANGE ASSIGNS THE SPACE TO THE HIGHEST BID

www.newspaper.com








The winner DEMAND-SIDE-PLATFORM places one ad from their advertisers at www.newspaper.com
9

$0.1

9 - THE WINNER DEMAND-SIDE-PLATFORM PLACES AN AD ON YOUR BROWSER

www.newspaper.com
The ad-exchange sends an "ad-call": "You have an opportunity to advertise to a user with Profile and ID"

Apart from rendering the Website, your browser sends an "ad-tag" to the ad-exchange
The AD-EXCHANGE knows that there is ad-space for a bid... but most importantly, it can now retrieve your cookies. The cookies contain the ID the ad-exchange assigned to you the first time you "visited" it and extra-info: Profile
You visit a news site
1
2
3
4




All DEMAND-SIDE-PLATFORM candidates retrieve their cookies from your computer
Request extra information about you to DATA-BROKERS
5


6
Perform cookie-matching with all the info they have about you and decide how much they can bid
7
The AD-EXCHANGE checks all the offer and assigns the space to the Demand-Side-Platform with the highest bid
8

The winner Demand-Side-Platform places one ad from their advertisers at www.newspaper.com
9

$0.1
$0.09
$0.09

THE WHOLE "SIMPLIFIED" FLOW
IN THE WHOLE PROCESS MANY COMPANIES GET INFORMATION ABOUT YOU BY RETRIEVING THEIR COOKIES AND EXCHANGING AND MATCHING INFORMATION

MANY COMPANIES ARE LOOKING

AT EVERYTHING YOU DO ONLINE
«A site is not one company any more. A site is tens of hundreds of companies all knowing where you are and what you’re looking at.»
AND THIS IS JUST A SIMPLIFIED VIEW
... LET'S HAVE A LOOK AT THE EVOLUTION

2011 - 150 Companies
marketing technology landscape

2016 - 3500 Companies
marketing technology landscape
2011
2012
2014
2016
1000
2000
3000
150
350
1500
3500
NUMbER OF COMPANIES IN MARKETING TECHNOLOGY
CONCLUSION #3
THE NUMBER OF PLAYeRS TRACKING US IS BIG AND GROWING. THE ECOSYSTEM WORKS IN SUCh aWAY THAT THEY ARE ENCouragED TO SHARE WHAT THEY KNOW ABOUT USERS.
4 - WHY DO THEY COLLECT THAT INFORMAtion? HOW DO THEY USE IT?
TAKE DECISIONS
ADVERTISEMENT
CREDIT SCORE
RECRUITING
PRICE QUOTATION
SEARCH RESULTS
DECISIONS ARE TAKEN IN THE DARK:
HOW CAN WE BE SURE THEy ARE FAIR?

RISK OF WRONG DECISIONS
What if the data you have about me is wrong?

RISK OF MANIPULATION
What if the ad does not only show content they think is relevant to me, but also shows to me in a way that exploits "my vulnerabilities" (impulsive, cautious, etc.)?




RISK OF HIDDEN DISCRIMINATION
People are biased as well as the algorithms they create.
For instance, it was found that Google displayed ads about high-income jobs to men more often than to women.






RISK OF PRICE DISCRIMINATION
Can I get a higher price just because I use a MAC or because my incomes are higher?







RISK OF FILTERING BUBBLE
TELL YOU ONLY WHAT YOU WANT TO HEAR








A DETAILED LOOK AT FACEBOOK
97%
OF ITS REVENUE COMES FROM ADS
BUT THE COST OF EVERY AD IS GOING DOWN

THE MONEY FACEBOOK MAKES FROM AN AD-CLICK IS 1000 TIMES BIGGER THAN JUST AN AD-IMPRESSION
-
GET MORE USERS WATCHING ADS
-
INCREASE NUMBER OF CLICKS PER AD
-
INCREASE VALUE OF SOME AUDIENCES
HOW CAN FACEBOOK MAKE MORE MONEY EVERY YEAR?
-
INTERNET.ORG
-
DEEP TRACKING AND SEGMENTATION
HOW CAN FACEBOOK MAKE MORE MONEY EVERY YEAR?
CONCLUSION #4
MANY DECISIONS THAT AFFECT ME IN MY EVERY DAY LIFE ARE BASED ON THE DATA ABOUT ME THAT IS BEING ACCUMULATED AND EXCHANGED
5 - key challenges to raise awareness
EVERY TIME WE BROWSE THE WEB, MANY COMPANIES ARE COLLECTING MULTIPLE DATA ABOUT ME

DATA RACE
ASYMMETRIC RACE
What they know about me
What I know about them
NO TRANSPARENCY = DANGER OF UNFAIR DECISIONS

no transparency =
NO INCENTIVE ON COMPETING ON BEST PRIVACY-FRIENDLY SERVICES

It AFFECTS EVERYONE but FEW PEOPLE have any INSIGHT about it
3.17 BILLION OF INTERNET USERS

WHAT IF PEOPLE WERE TRACKED OFFLINE AS THEY ARE ONLINE?
1 - ImAGINE YOU GET INTo A BIG SHOPPING MALL
2 - YOU ARE GREETED BY A MAN WHO TELLS YOU THAT HE WILL FOLLOW YOU AROUND TO RECORD WHICH SHOPS YOU ENTER, WHO YOU MEET AND WHAT YOU DO IN GENERAL
3 - BUT RELAX! HE's GOING TO STAY AT A DISTANCE SO YOU WON't notice
4 - HE TELLS YOU THEy DO THIS BECAUSE THEY WANT TO PROVIDE YOU A BETTER SERVICE NEXT TIME
5 - AND BECAUSE OF THAT, THEY NEED TO GET SOME ADDITIONAL INFO FROM YOU SO YOU CAN RECOGNISE YOU NEXT TIME YOU VISIT
6 - BUT RELAX! I DON't NEED YOUR NAME TO RECOGNISE YOU, JUST A BIT OF INFORMATION ABOUT YOU
7 - AND BY THE WAY, I MIGHT EXCHANGE INFORMATION WITH OTHER MALLS SO YOU CAN GET EVEN A BETTER SERVICE
WOULD YOU GO ON SHOPPING?

OR WOULD YOU LOOK FOR A PLACE WHERE YOU ARE LEFT ALONE?
the problem is that the users are not fully aware of this
and... "users don't care about privacy"
and... "users don't care about privacy"
YET
THINGS ARE ALREADY CHANGING

AND telefónica is COMMITTED & helping!

MANY OF THE FINDINGS I'VE USED TODAY HAVE BEEN DISCOVERED WITH TOOLS SPONSORED BY THE DTL
WE ARE NOT GOING TO COMMERCIALIZE CUSTOMERS INFORMATION: WE ARE GOING To GIVE THAT INFORMATION BACK tO THEM.
DATA BELONG TO CusTOMERS

three levels of awareness
- Are they aware that some services are free because they use their online activity to monetize it?
- Are they aware how much are they worth for those services?
- Are they aware why are that worth for them?
- How much data they can get about them and how is it collected
- How much information can be inferred about them based on that data
- How that information can be used
KEY CHALLENGE
HOW TO RAISE AWARENESS OF SUCH A COMPLICATED ISSUE?
let's see how big is that challenge via some examples and demos...

facebook data valuation tool (FDVT)
FDVT: end-user tool

- It's an estimation good enough for users
- Does the app provide enough value to keep them engaged?
- What could we provide beyond the money information?
FDVT: AGGREGATED INFORMATION
- We can get the CPC and the CPM for multiple audiences with multiple parameters (e.g. Highest CPC in UK is in the 40s and in Spain in the 30s, difference between men/women CPC...)
- We can do this to observe trends over time (e.g. evolution of CPM/CPC for democrats/republicans during election time)


BUT I'M PRETTY SURE WE CAN DO MUCH MORE THAN THIS!!!!
revealing and controlling mobile privacy leaks (RECON)

RECON: END-USER TOOL

- Apart from the poor UI... how could we raise awareness about what is going on rather than providing users raw information about what is going on...
RECON: AGGREGATED INFORMATION

privacy census


privacy census: THE DATA
https://webtransparency.cs.princeton.edu/webcensus/
WHAT CAN WE DO WITH ALL THAT INFO?
Sites that perform fingerprinting, 3rd parties used, type of sites, traffic, country, etc.
privacy census: UI TOOLS


KEY CHALLENGES
- Risk of scaring people
- Complex information:
- Connections across trackers
- Too much information
- Includes Personal Aspects
- Very technical information
- We are not sure about the aspects inferred about them (we can guess, for instance, why an ad has been shown)
- Need to find metaphores
THANKS!
UX Series
By Daniel Coloma
UX Series
- 819