WWW, HTTP and Crawlers

What is Internet ?

Глобална система от свързани компютърни мрежи.

What is a network ?

Система от хардуерни компоненти и "компютри" които могат да обменят информация помежду си.

SWITCH

HUB

ROUNTER

PC, smartphone, ...

IP address

DNS - Domain Name Server

http://www.abv.bg/ == 194.153.145.104

The big picture

ROUNTING DEMO

The bigger picture

OSI Model

OSI vs TCP/IP

WWW - world wide web

- ONE common protocol (HTTP)

- ONE common language (HTML)

- ONE common program to understand them (WEB browser)

HTTP(S) - Hypertext Transfer Protocol

Единен протокол за предаване на документи (html-ски страници)

HTTP Methods

HTTP Header

HTTP/2 200 
server: nginx
date: Wed, 22 May 2019 11:28:06 GMT
content-type: text/html;charset=UTF-8
vary: Accept-Encoding
content-language: en-US
strict-transport-security: max-age=31536000; includeSubDomains
pragma: no-cache
cache-control: no-cache
x-frame-options: SAMEORIGIN

HTTP Body

<!DOCTYPE html>
<html>
  <body>

    <p>My first paragraph.</p>

  </body>
</html>

CURL

Python `requests`

BeautifulSoup

WWW, HTTP and Crawlers

By Hack Bulgaria

WWW, HTTP and Crawlers

  • 1,186