DEEP WEB
Introduction to
presented by:
Hamid Salehian
Have Ever You Seen
THE ICEBERG
Deep Web
- The part of World Wide Web that is not discoverable by standard search engine (Google, Bing, Yahoo)
- Not visible, not traceable, not monitored
- Access and interact with web data anonymously and without being tracked.
- This is achieved through special encryption software like TOR
- Cannot be indexed
- Difficult to work out who is behind the sites
- 400 to 550 larger than the visible web
Surface Web
- Indexed by conventional search engines
- Also called clearnet, visible web, indexed web




How Big ....
- 550 billion documents
- Google has identified 1.2 billion documents
- An Internet search typically searches .03% (1/3000) of available content.
- The Deep Web contains 7,500 terabytes of information, compared to 19 terabytes of information in the Surface Web.
*
* there are just estimation
Not Visible... Why?
st we have to know how
Search Engine
works
1
Search Engines
• Spider (crawler) will seek out webpage by going from one hyperlink to another and adding each page to it's catalog
• A program called an indexer then reads these webpages and creates an index, storing the URL and important content of webpage.
• Each search engine has its own ranking algorithm that returns results based on their relevance to the user’s specified keywords or phrases.
• To be discovered, a webpage must be static and linked to other pages.
Reason of non Indexing
-
Some sites are not linked by other pages and therefore can not be discovered by crawlers
-
sites that require registration and login (password-protected resources).
-
Some sites require authentication before accessing the actual content
-
The webpages in their design may make it difficult to indexing
-
The use of language JavaScript (like Ajax ) misunderstood by robots
-
Darknet content
Darknet content
• Certain content is intentionally hidden from the regular Internet, accessible only with special software, such as Tor, I2P, or other darknet software.
• The darknets which constitute the Darknet content include small, friend-to-friend, peer-to- peer networks, as well as large, popular networks like Freenet, I2P, and Tor, operated by public organizations and individuals.
Tor (The Onion Router)
- Tor is a free software and an Open Network.
- Tor protects you by bouncing your communications around a distributed network of relays run by volunteers all around the world
- It prevents somebody watching your Internet connection from learning what sites you visit, and it prevents the sites you visit from learning your physical location.
- Works on The Onion Routing technique

Researchers at the U.S. Naval Research Laboratory release an early version of Tor. Originally designed to protect the identity of American operatives and dissidents in repressive countries like China.
MILNET (Military Network) was the name given to the part of the ARPANET internetwork designated for unclassified United States Department of Defense traffic.
History
Onion Routing
- Onion routing encrypts and decrypts data typically 3 or more separate times, once for each tor node it passes through on the way to the destination via the path given by the tor directory server.
- It does this using the public key of the router(tor relay), which only the router’s private key can decrypt.
- No single router knows the entire network path from source to destination.


Alice
Bob
The Dark Side of The Moon
Deep Web
Ease of use: Research is more complex due the absence of indexing of the content
Speed: Slower to access than surface Web information.
Cybercrime: Inability to track down criminals. Activities range from the sales of illegal drugs and weapons, to hacking services, the hiring of contract killers etc...






Deep
Web
The Bright Side
"there is always something positive..."
Privacy: Avoid statistical analysis by changing entry node every ~10min (Anonymousity)
Security: Cannot know which connection is initiated as a user and which as node, making impossible the monitoring of the communications
Information: Greater scope. Access to private content and information (Government Security Info) archived in searchable databases. But is this info ethical?...
Alice
Freedom of speech and information: A way for people living under oppressive or restrictive regimes to reveal the truth.
WikiLeaks is an international, online, non-profit, journalistic organisation which publishes secret information, news leaks, and classified media from anonymous sources.


Question!?
Resources
- Bergman, Michael K , "The Deep Web: Surfacing Hidden Value". The Journal of Electronic Publishing , August 2001
- Alex Wright, "Exploring a 'Deep Web' That Google Can’t Grasp". The New York Times. Sept 23, 2009.
- http://www.nytimes.com/2009/02/23/technology/internet/23search.html?th&emc=th
- Jesse Alpert & Nissan Hajaj, “We knew the web was big…”, 2008
- http://googleblog.blogspot.com/2008/07/we-knew-web-was-big.html
- He, Bin; Patel, Mitesh; Zhang, Zhen; Chang, Kevin Chen-Chuan ,"Accessing the Deep Web: A Survey". Communications of the ACM (CACM), May 2007
Thank You
- visit slides.com/hsarena/deepweb
Deep Web
By Hamid Salehian
Deep Web
- 134