Lecture 8:

Network. TCP/IP and OSI models. Network in the kernel. Interfaces and examples.

Version: 3

System programming

Education

Lecture plan

  • History
  • Commutation types
  • Network packet
  • Model OSI
  • Model TCP/IP
  • In the kernel
  • Public C API and examples

IPC

Anonymous

int
pipe2(int pipefd[2], int flags);

void *
mmap(void *addr, size_t length, int prot, int flags, int fd, off_t offset);

Named, XSI standard

int
semget(key_t key, int nsems, int semflg);

int
msgget(key_t key, int msgflg);

int 
shmget(key_t key, size_t size, int shmflg);

Named, POSIX standard

int
mkfifo(const char *pathname, mode_t mode);

sem_t *
sem_open(const char *name, int oflag, mode_t mode, unsigned int value);

IPC. Domain UNIX sockets [1]

int
socket(int domain, int type, int protocol);

int
bind(int sockfd, const struct sockaddr *addr,
     socklen_t addrlen);

int
listen(int sockfd, int backlog);

int
connect(int sockfd, const struct sockaddr *addr,
        socklen_t addrlen);

int
accept(int sockfd, struct sockaddr *addr,
       socklen_t *addrlen);

Creation of a socket with a given domain, protocol, type

Bind socket to a name

Listen for incoming connections

Connect to an already named and listening socket. No need to call bind() then

If bind() + listen() were called, the only thing the socket can do is to accept clients using accept()

IPC. Domain UNIX sockets [2]

Usage

int fd = socket();
connect(fd, remote_addr);
/* Ready to read/write fd. */

Connection to a named socket

Creation of a named socket without connection

int fd = socket();
bind(fd, addr);
/** Ready to read/write fd. */

Creation of a named socket with connection

int fd = socket();
bind(fd, addr);
listen(fd);
while(1) {
        int remote_fd = accept(fd);
        /*
         * Ready to read/write
         * remote_fd.
         */
}

Connect() creates a paired socket on the server side, and this pair can interact just like socketpair()

int fd2 = socket();
bind(fd2, addr2);
/** Ready to read/write fd2. */

read/write

send/recv

sendto/recvfrom

Only packet sockets work without connect(), and destination address should be specified for each packet manually

DARPA and ARPANET

1963

Idea appearance

1965

First network communication

1990

Popularisation of networks, and close of ARPANET

Joseph Carl Robnett Licklider and his work "Intergalactic computer network"

Begin of ARPANET in DARPA - Defence Advanced Research Projects Agency

Channel commutation

0/100

0/200

0/70

0/50

50

50/100

50/200

50/50

50/70

0/50

50/50

  • Always need a connection establishment stage
  • Data follows one route
  • Was used in phones, tried to adapt for networks
  • Channel capacity is occupied, even when no data
  • Unstable against failures

How works

Result

Packet commutation

  • Split data into packets
  • Each packets finds a route

0/100

0/200

0/70

0/50

25

25/100

25/200

10/50

25/70

0/50

10/50

25

10

0/100

15/100

0/100

15/100

0/100

15/100

0/100

25/100

15

15

15

10

10

15

25

25

25

  • Scalability
  • Liveness

How works

Result

Packet

Physical environment metadata

Routing metadata

Error correction metadata

User data

<html>...</html>

File size, blocks ...

Game data ...

User space

Kernel space

Applications, web servers, games, file managers, email

Delivery reliability protocols

Protocols for packet receipt and forwarding further to the network

Protocols for interaction with wires, radio-env

Payload

Service header

Service header

Service header

Packet assembly

User data

Application

Packet 1

Delivery reliability subsystem

Header

Packet 2

Header

Routing subsystem

Packet 1

Header

Packet 2

Header

Header

Header

Packet 1

Header

Packet 2

Header

Header

Header

Header

Header

Network device driver

Network models. OSI

OSI - Open System Interconnection

7 levels:

  1. Application
  2. Representation
  3. Session
  4. Transport
  5. Network
  6. Channel
  7. Physical

OSI. Application level [1]

Web pages

Movies

Documents

Database network interface

Tasks:

  • data creation
  • interaction with a user

Details:

knows about data meaning, purpose

Protocols:

FTP, SMTP, DNS, HTTP

OSI. Application level [2]

Example of an application level protocol - format of a response for an SQL request in Tarantool DBMS

+----------------------------------------------+
| IPROTO_BODY: {                               |
|     IPROTO_METADATA: [                       |
|         {                                    |
|             IPROTO_FIELD_NAME: string,       |
|             IPROTO_FIELD_TYPE: number,       |
|             IPROTO_FIELD_FLAGS: number,      |
|         },                                   |
|         ...                                  |
|     ],                                       |
|                                              |
|     IPROTO_SQL_INFO: {                       |
|         SQL_INFO_ROW_COUNT: number,          |
|         SQL_INFO_LAST_ID: number,            |
|         ...                                  |
|     },                                       |
|                                              |
|     IPROTO_DATA: [                           |
|         tuple/scalar,                        |
|         ...                                  |
|     ]                                        |
| }                                            |
+----------------------------------------------+

OSI. Application level [3]

Example of an HTTP request

:authority: clc.stackoverflow.com
:method: GET
:path: /markup.js?...
:scheme: https
accept: */*
accept-encoding: gzip, deflate, br
accept-language: ru-RU,ru;q=0.9,en-US;q=0.8,en;q=0.7
cache-control: no-cache
cookie: prov=702db90b-56ab-53f7-3894-c3733607b954;...
pragma: no-cache
referer: https://stackoverflow.com/questions/...
user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X ...

OSI. Representation level [1]

<xml>

</xml>
<html>

</html>
--
- 'YAML'
...
{
  "json":
}
0xa7 MsgPack

Tasks:

  • pack application level data, format it
  • encryption

Details:

  • no information about data meaning
  • knows about data format, structure

Languages, formats:

JSON, XML, YAML, HTML, MessagePack

OSI. Representation level [2]

 JSON response representation from GitHub API

{
  "action": "opened",
  "issue": {
    "url": "https://api.github.com/repos/octocat/Hello-World/issues/1347",
    "number": 1347,
    ...
  },
  "repository" : {
    "id": 1296269,
    "full_name": "octocat/Hello-World",
    "owner": {
      "login": "octocat",
      "id": 1,
      ...
    },
    ...
  },
  "sender": {
    "login": "octocat",
    "id": 1,
    ...
  }
}

OSI. Representation level [3]

 XML response representation from opengis

<WFS_Capabilities xmlns="http://www.opengis.net/wfs" version="1.0.0">
  <Service>
    <Name> Oracle WFS </Name>
    <Title> Oracle Web Feature Service </Title>
    <Abstract> Web Feature Service maintained by Oracle </Abstract>
    <OnlineResource>http://localhost:8888/SpatialWS-</OnlineResource>
  </Service>
  <Capability>
  <GetCapabilities>
    <DCPType>
      <HTTP>
        <Get onlineResource="http://localhost:8888/SpatialWS-"/>
      </HTTP>
    </DCPType>
    <DCPType>
      <HTTP>
        <Post onlineResource="http://localhost:8888/SpatialWS-"/>
      </HTTP>
    </DCPType>
  </GetCapabilities>
</FeatureType>
</WFS_Capabilities>

OSI. Session level

Reliability level selection

Authentication

Do need to establish a connection?

Timers

Tasks:

  • link way selection
  • authentication
  • reliability level selection
  • timeouts selection

Details:

Implements almost nothing except authentication. Decides, whether need to establish a connection, its settings

Protocols:

PAP, CHAP, SSL

?

OSI. First three levels

Application

Representation

Session

What each level does? What it knows about the data?

Application: creates data, knows its meaning.

Representation: format data, knows only its structure.

Session: choose link settings, authentication. Sees data as just bytes.

1 point

OSI. Transport level

Reliability: order, duplicates, losses, overloads

Fragmentation and batching

Virtual connection establishment

Operates by packets or by a contiguous byte stream

Tasks:

  • virtual connection establishment (if needed)
  • virtualize packets as a contiguous byte stream (if needed)
  • different degrees of delivery reliability

Details:

  • structure of the data does not matter, only size is important
  • how to deliver and recipient's physical location don't matter

Protocols:

TCP, UDP, SCTP, RDP

OSI. Network level [1]

Routing

Foreign packet forwarding

Coping with some types of overloads

Tasks:

  • routing
  • foreign packet forwarding
  • coping with network overloads (TTL)
  • networks fragmentation

Details:

  • no connection concept, only packet commutation. Connection is a transport level concept
  • no any reliability. A packet is forgotten after sending
  • operates by datagrams

Protocols:

IP, DDP, ICMP, RIP, EGP

OSI. Network level [2]

Networks are hierarchical

Network level is for interaction of subnetworks

OSI. Channel level [1]

Routing in one network

Foreign packet forwarding in one network

Tasks:

  • routing in one subnetwork
  • foreign packet forwarding
  • coping with network overloads (TTL)

Details:

  • no division on subnetwoks. Is used inside one, small subnetwork
  • no IP addresses
  • operates by frames

Protocols:

Ethernet, MPLS, PPP, TokenRing

Coping with some types of overloads

OSI. Channel level [2]

Channel level work area

Network level work area

OSI. Physical level [1]

Signal modulation

Interaction with environment

Coping with collisions

Tasks:

  • send a sequence of bits via a wire or radio
  • avoid collisions with other transmitters
  • fix noise errors, environmental influence artifacts

Details:

  • transmit bit by bit
  • lots of mathematics and physics

Technologies:

Bluetooth, Wi-Fi, OTN, Ethernet cable, USB

Checking and fixing of errors

OSI. Physical level [2]

Channel level work area

Physical level work area

OSI. Summary

Application

Representation

Session

Transport

Channel

Network

Physics

Data

OSI. Summary

Network models. TCP/IP

4 levels:

  1. Application
  2. Transport
  3. Network
  4. Channel

TCP/IP

OSI

7 levels:

  1. Application
  2. Representation
  3. Session
  4. Transport
  5. Network
  6. Channel
  7. Physical

Network in the kernel [1]

struct tcphdr {
	__be16	source;
	__be16	dest;
	__be32	seq;
	__be32	ack_seq;
	__u16	doff:4,
		res1:4,
		cwr:1,
		ece:1,
		urg:1,
		ack:1,
		psh:1,
		rst:1,
		syn:1,
		fin:1;
	__be16	window;
	__sum16	check;
	__be16	urg_ptr;
};
struct iphdr {
	__u8	version:4,
  		ihl:4;
	__u8	tos;
	__be16	tot_len;
	__be16	id;
	__be16	frag_off;
	__u8	ttl;
	__u8	protocol;
	__sum16	check;
	__be32	saddr;
	__be32	daddr;
};
struct ethhdr {
	unsigned char h_dest[ETH_ALEN];
	unsigned char h_source[ETH_ALEN];
	__be16	      h_proto;
};

Packing, wrapping into headers, like onion

Packets

Network in the kernel [2]

int
tcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t size);
send(socket, data, data_size);
int
ip_build_and_send_pkt(struct sk_buff *skb, const struct sock *sk,
		      __be32 saddr, __be32 daddr,
                      struct ip_options_rcu *opt);
int
dnet_mdio_write(struct mii_bus *bus, int mii_id, int regnum,
	        u16 value);

User space

Kernel space

API [1]

int
socket(int domain, int type, int protocol);

AF_UNIX - local socket, visible only on this machine

AF_INET - network sockets based on IPv4

AF_PACKET - "raw" network sockets, protocol assembling is in user code

API [2]

int
socket(int domain, int type, int protocol);

SOCK_DGRAM - UDP, packet size is limited, no reliability guarantees

SOCK_STREAM - TCP, data as a byte stream instead of packets, has delivery guarantees and connections

SOCK_SEQPACKET - SCTP, like TCP, but data is represented as packets of limited size

SOCK_RDM - RDP, like SCTP, but order of delivery is not guaranteed

API [3]

int
socket(int domain, int type, int protocol);

0 - default

IPPROTO_SCTP

IPPROTO_IP

IPPROTO_TCP

IPPROTO_RAW

IPPROTO_UDP

API [4]

void try_protocol(int type, int protocol, const char *protocol_name)
{
	int sock = socket(AF_INET, type, protocol);
	if (sock == -1) {
		printf("%s: error = %s\n", protocol_name, strerror(errno));
	} else {
		printf("%s: success\n", protocol_name);
		close(sock);
	}
}

void try_type(int type, const char *type_name)
{
	printf("\nTry %s type\n", type_name);
	try_protocol(type, IPPROTO_TCP, "TCP");
	try_protocol(type, IPPROTO_IP, "IP");
	try_protocol(type, IPPROTO_SCTP, "SCTP");
	try_protocol(type, IPPROTO_RAW, "RAW");
	try_protocol(type, IPPROTO_UDP, "UDP");
}

int main()
{
	try_type(SOCK_DGRAM, "DGRAM");
	try_type(SOCK_RAW, "RAW");
	try_type(SOCK_STREAM, "STREAM");
	return 0;
}

API [5]

$> gcc 1_socket_protocol.c

$> ./a.out
Try DGRAM type
TCP: error = Protocol wrong type for socket
IP: success
SCTP: error = Protocol not supported
RAW: error = Protocol wrong type for socket
UDP: success

Try RAW type
TCP: success
IP: success
SCTP: success
RAW: success
UDP: success

Try STREAM type
TCP: success
IP: success
SCTP: error = Protocol not supported
RAW: error = Protocol wrong type for socket
UDP: error = Protocol wrong type for socket

Mac

$> gcc 1_socket_protocol.c

$> ./a.out
Try DGRAM type
TCP: error = Protocol not supported
IP: success
SCTP: error = Protocol not supported
RAW: error = Protocol not supported
UDP: success

Try RAW type
TCP: success
IP: error = Protocol not supported
SCTP: success
RAW: success
UDP: success

Try STREAM type
TCP: success
IP: success
SCTP: success
RAW: error = Protocol not supported
UDP: error = Protocol not supported

Linux

API [6]

int
bind(int sockfd, const struct sockaddr *addr,
     socklen_t addrlen);
struct sockaddr {
        sa_family_t sa_family;
        char sa_data[14];
};
struct sockaddr_in {
        sa_family_t sin_family;
        in_port_t sin_port;
        struct in_addr sin_addr;
};
struct sockaddr_un {
        sa_family_t sun_family;
        char sun_path[108];
};
struct sockaddr_nl {
        sa_family_t nl_family;
        unsigned short nl_pad;
        pid_t nl_pid;
        __u32 nl_groups;
};

AF_UNIX

AF_INET

API [7]

struct sockaddr_in {
        sa_family_t sin_family;
        in_port_t sin_port;
        struct in_addr sin_addr;
};

struct in_addr {
        uint32_t s_addr;
};

AF_INET

Transport level destination address - a port, 2 bytes

Network level destination address, IP address, 4 bytes:

xxx.xxx.xxx.xxx

Port and address in network byte order, big-endian

uint32_t
htonl(uint32_t hostlong);

uint16_t
htons(uint16_t hostshort);

uint32_t
ntohl(uint32_t netlong);

uint16_t
ntohs(uint16_t netshort);
int
inet_aton(const char *cp, struct in_addr *inp);

in_addr_t
inet_addr(const char *cp);

in_addr_t
inet_network(const char *cp);

char *
inet_ntoa(struct in_addr in);

struct in_addr
inet_makeaddr(in_addr_t net, in_addr_t host);

API [8]

int
getaddrinfo(const char *node, const char *service,
            const struct addrinfo *hints,
            struct addrinfo **res);

void
freeaddrinfo(struct addrinfo *res);

const char *
gai_strerror(int errcode);

To convert a domain name into an address for bind()/connect()

API [8]. Address search example

int
main()
{
	struct addrinfo *addr, *iter;
	int rc = getaddrinfo("yandex.ru", NULL, NULL, &addr);
	if (rc != 0) {
		printf("Error = %s\n", gai_strerror(rc));
		return -1;
	}
	printf("Families: inet = %d, inet6 = %d\n", AF_INET, AF_INET6);
	printf("Socket types: dgram = %d, stream = %d, raw = %d\n", SOCK_DGRAM,
	       SOCK_STREAM, SOCK_RAW);
	printf("Protocols: tcp = %d, udp = %d\n\n", IPPROTO_TCP, IPPROTO_UDP);
	for (iter = addr; iter != NULL; iter = iter->ai_next) {
		printf("family = %d, socktype = %d, protocol = %d",
		       iter->ai_family, iter->ai_socktype, iter->ai_protocol);
		if (iter->ai_family == AF_INET) {
			char buf[128];
			struct sockaddr_in *tmp =
				(struct sockaddr_in *)iter->ai_addr;
			inet_ntop(AF_INET, &tmp->sin_addr, buf, sizeof(buf));
			printf(", ip = %s", buf);
		}
		printf("\n");
	}
	freeaddrinfo(addr);
	return 0;
}

Get address list head

Print every address

Elements are linked by ai_next field

Each element has an address of a socket listening there, and some other description fields

struct addrinfo {
        int ai_flags;
        int ai_family;
        int ai_socktype;
        int ai_protocol;
        socklen_t ai_addrlen;
        struct sockaddr *ai_addr;
        char *ai_canonname;
        struct addrinfo *ai_next;
};

API [9]. Address search example

$> gcc 2_getaddrinfo.c

$> ./a.out
Families: inet = 2, inet6 = 10
Socket types: dgram = 2, stream = 1, raw = 3
Protocols: tcp = 6, udp = 17

family = 2, socktype = 2, protocol = 17, ip = 77.88.55.80
family = 2, socktype = 3, protocol = 0, ip = 77.88.55.80
family = 2, socktype = 1, protocol = 6, ip = 5.255.255.88
family = 2, socktype = 2, protocol = 17, ip = 5.255.255.88
family = 2, socktype = 3, protocol = 0, ip = 5.255.255.88
family = 2, socktype = 1, protocol = 6, ip = 77.88.55.77
family = 2, socktype = 2, protocol = 17, ip = 77.88.55.77
family = 2, socktype = 3, protocol = 0, ip = 77.88.55.77
family = 2, socktype = 1, protocol = 6, ip = 5.255.255.80
family = 2, socktype = 2, protocol = 17, ip = 5.255.255.80
family = 2, socktype = 3, protocol = 0, ip = 5.255.255.80
family = 10, socktype = 1, protocol = 6
family = 10, socktype = 2, protocol = 17
family = 10, socktype = 3, protocol = 0

API [10]. Client example

int
main(int argc, const char **argv)
{
	int sock = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
	if (sock == -1) {
		printf("error = %s\n", strerror(errno));
		return -1;
	}
	struct addrinfo *addr;
	struct addrinfo filter;
	memset(&filter, 0, sizeof(filter));
	filter.ai_family = AF_INET;
	filter.ai_socktype = SOCK_STREAM;
	int rc = getaddrinfo(argv[1], argv[2], &filter, &addr);
	if (rc != 0) {
		printf("addrinfo error = %s\n", gai_strerror(rc));
		close(sock);
		return -1;
	}
	if (addr == NULL) {
		printf("not found a server\n");
		freeaddrinfo(addr);
		close(sock);
		return -1;
	}

Create a TCP socket

Try to find a server address by its name

Instead of iteration through the list it is possible to set a filter and find a needed address right away

	rc = connect(sock, addr->ai_addr, addr->ai_addrlen);
	freeaddrinfo(addr);
	if (rc != 0) {
		printf("connect error = %s\n", strerror(errno));
		close(sock);
		return -1;
	}
	int number;
	while (scanf("%d", &number) > 0) {
		if (send(sock, &number, sizeof(number), 0) == -1) {
			printf("error = %s\n", strerror(errno));
			continue;
		}
		printf("Sent %d\n", number);
		number = 0;
		int rc = recv(sock, &number, sizeof(number), 0);
		if (rc == 0) {
			printf("Closed connection\n");
			break;
		}
		if (rc == -1)
			printf("error = %s\n", strerror(errno));
		else
			printf("Received %d\n", number);
	}
	close(sock);
	return 0;
}

Next is no different from UNIX sockets

API [11]. Server example

void *
worker_f(void *arg)
{
	printf("New client created\n");
	int client_sock = (int) arg;
	while(1) {
		int buffer = 0;
		ssize_t size = recv(client_sock, &buffer, sizeof(buffer), 0);
		if (size == -1) {
			printf("error = %s\n", strerror(errno));
			continue;
		}
		if (size == 0) {
			printf("Closed connection\n");
			break;
		}
		printf("Received %d\n", buffer);
		buffer++;
		if (send(client_sock, &buffer, sizeof(buffer), 0) == -1)
			printf("error = %s\n", strerror(errno));
		else
			printf("Sent %d\n", buffer);
	}
	close(client_sock);
	return NULL;
}

Client serving it exactly the same as it was with UNIX sockets

Read a number, increase it

Send back

int
main(int argc, const char **argv)
{
	int sock = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
	if (sock == -1) {
		printf("error = %s\n", strerror(errno));
		return -1;
	}
	struct sockaddr_in addr;
	addr.sin_family = AF_INET;
	addr.sin_port = htons(12345);
	inet_aton("127.0.0.1", &addr.sin_addr);

	if (bind(sock, (struct sockaddr *) &addr, sizeof(addr)) != 0) {
		printf("bind error = %s\n", strerror(errno));
		return -1;
	}
	if (listen(sock, 128) == -1) {
		printf("listen error = %s\n", strerror(errno));
		return -1;
	}

Create a TCP socket

Instead of getaddrinfo() it is possible to fill the address manually

Next is no different from UNIX sockets

	pthread_attr_t attr;
	pthread_attr_init(&attr);
	pthread_attr_setdetachstate(&attr, 1);
	while(1) {
		pthread_t worker_thread;
		int client_sock = accept(sock, NULL, NULL);
		if (client_sock == -1) {
			printf("error = %s\n", strerror(errno));
			continue;
		}
		int rc = pthread_create(&worker_thread, &attr, worker_f,
					(void *) client_sock);
		if (rc != 0) {
			printf("error = %s\n", strerror(rc));
			close(client_sock);
		}
	}
	pthread_attr_destroy(&attr);
	close(sock);
	return 0;
}

Everything is the same as it was with UNIX sockets. A new client is accepted and moved to its own thread

API [12]. Interaction

$> gcc 3_server.c -o server
$> ./server
$> gcc 3_client.c.c -o client
$> ./client 127.0.0.1 12345
New client created
1
Sent 1
Received 2
Received 1
Sent 2
^C
$>
Closed connection

API [13]. Socket options

int
getsockopt(int sockfd, int level, int optname,
           void *optval, socklen_t *optlen);

int
setsockopt(int sockfd, int level, int optname,
           const void *optval, socklen_t optlen);

int
fcntl(int fd, int cmd, ... /* arg */ );

SO_KEEPALIVE

SO_REUSEADDR

Ignore graceful attempts to send remaining data after socket close

Periodically send empty packets to check if the recipient is still alive and reachable and didn't disconnect

API [14]. Socket options

SO_REUSEPORT

{protocol, src_ip, src_port, dst_ip, dst_port}

The option allows these fields be the same for multiple alive sockets

Identifier of each socket in the kernel:

O_NONBLOCK

No blocking for IO operations

API [15]

int enable = 1;
setsockopt(sockfd, SOL_SOCKET, SO_REUSEADDR, &enable, sizeof(int));
setsockopt(sockfd, SOL_SOCKET, SO_REUSEPORT, &enable, sizeof(int));
setsockopt(sockfd, SOL_SOCKET, SO_KEEPALIVE, &enable, sizeof(int));

flags = fcntl(sockfd, F_GETFL, 0);
fcntl(sockfd, F_SETFL, flags | O_NONBLOCK);

Summary

Network is hierarchical and segmented. Inside one subnetwork the addresses are unique. For inter-network communication there are address multiplexing and their temporary "renting".

The world network uses IP addressing. Internet Protocol. Each IP address is 4 (version 4) or 16 (version 6) bytes.

Internet commutation is packet-oriented. Each packet goes its own route to the destination via a chain of routers and switches.

On top of IP are build protocols like TCP (reliable byte stream, connections), UDP (unreliable blocks of data limited in size, no connections), and most of the others.

Code uses network via sockets. They have the same API as UNIX domain sockets but just use different parameters.

Conclusion

Next time:


Press on the heart, if like the lecture

Advanced IO. Non-blocking IO operations. File blocks. Multiplexing: select, poll, kqueue.

System programming 8

By Vladislav Shpilevoy

System programming 8

Network. Short history from ARPANET. Canonical OSI model, real TCP/IP model, protocol stack. Network implementation in the kernel. User space interface socket(), connect(), close(). TCP and UDP.

  • 1,553