Vladislav Shpilevoy PRO
Database C developer at Tarantool. Backend C++ developer at VirtualMinds.
Lecture 8:
Network. TCP/IP and OSI models. Network in the kernel. Interfaces and examples.
Version: 3
System programming
Anonymous
int
pipe2(int pipefd[2], int flags);
void *
mmap(void *addr, size_t length, int prot, int flags, int fd, off_t offset);
Named, XSI standard
int
semget(key_t key, int nsems, int semflg);
int
msgget(key_t key, int msgflg);
int
shmget(key_t key, size_t size, int shmflg);
Named, POSIX standard
int
mkfifo(const char *pathname, mode_t mode);
sem_t *
sem_open(const char *name, int oflag, mode_t mode, unsigned int value);
int
socket(int domain, int type, int protocol);
int
bind(int sockfd, const struct sockaddr *addr,
socklen_t addrlen);
int
listen(int sockfd, int backlog);
int
connect(int sockfd, const struct sockaddr *addr,
socklen_t addrlen);
int
accept(int sockfd, struct sockaddr *addr,
socklen_t *addrlen);
Creation of a socket with a given domain, protocol, type
Bind socket to a name
Listen for incoming connections
Connect to an already named and listening socket. No need to call bind() then
If bind() + listen() were called, the only thing the socket can do is to accept clients using accept()
Usage
int fd = socket();
connect(fd, remote_addr);
/* Ready to read/write fd. */
Connection to a named socket
Creation of a named socket without connection
int fd = socket();
bind(fd, addr);
/** Ready to read/write fd. */
Creation of a named socket with connection
int fd = socket();
bind(fd, addr);
listen(fd);
while(1) {
int remote_fd = accept(fd);
/*
* Ready to read/write
* remote_fd.
*/
}
Connect() creates a paired socket on the server side, and this pair can interact just like socketpair()
int fd2 = socket();
bind(fd2, addr2);
/** Ready to read/write fd2. */
read/write
send/recv
sendto/recvfrom
Only packet sockets work without connect(), and destination address should be specified for each packet manually
1963
Idea appearance
1965
First network communication
1990
Popularisation of networks, and close of ARPANET
Joseph Carl Robnett Licklider and his work "Intergalactic computer network"
Begin of ARPANET in DARPA - Defence Advanced Research Projects Agency
0/100
0/200
0/70
0/50
50
50/100
50/200
50/50
50/70
0/50
50/50
How works
Result
0/100
0/200
0/70
0/50
25
25/100
25/200
10/50
25/70
0/50
10/50
25
10
0/100
15/100
0/100
15/100
0/100
15/100
0/100
25/100
15
15
15
10
10
15
25
25
25
How works
Result
Physical environment metadata
Routing metadata
Error correction metadata
User data
<html>...</html>
File size, blocks ...
Game data ...
User space
Kernel space
Applications, web servers, games, file managers, email
Delivery reliability protocols
Protocols for packet receipt and forwarding further to the network
Protocols for interaction with wires, radio-env
Payload
Service header
Service header
Service header
User data
Application
Packet 1
Delivery reliability subsystem
Header
Packet 2
Header
Routing subsystem
Packet 1
Header
Packet 2
Header
Header
Header
Packet 1
Header
Packet 2
Header
Header
Header
Header
Header
Network device driver
OSI - Open System Interconnection
7 levels:
Web pages
Movies
Documents
Database network interface
Tasks:
Details:
knows about data meaning, purpose
Protocols:
FTP, SMTP, DNS, HTTP
Example of an application level protocol - format of a response for an SQL request in Tarantool DBMS
+----------------------------------------------+
| IPROTO_BODY: { |
| IPROTO_METADATA: [ |
| { |
| IPROTO_FIELD_NAME: string, |
| IPROTO_FIELD_TYPE: number, |
| IPROTO_FIELD_FLAGS: number, |
| }, |
| ... |
| ], |
| |
| IPROTO_SQL_INFO: { |
| SQL_INFO_ROW_COUNT: number, |
| SQL_INFO_LAST_ID: number, |
| ... |
| }, |
| |
| IPROTO_DATA: [ |
| tuple/scalar, |
| ... |
| ] |
| } |
+----------------------------------------------+
Example of an HTTP request
:authority: clc.stackoverflow.com
:method: GET
:path: /markup.js?...
:scheme: https
accept: */*
accept-encoding: gzip, deflate, br
accept-language: ru-RU,ru;q=0.9,en-US;q=0.8,en;q=0.7
cache-control: no-cache
cookie: prov=702db90b-56ab-53f7-3894-c3733607b954;...
pragma: no-cache
referer: https://stackoverflow.com/questions/...
user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X ...
<xml>
</xml>
<html>
</html>
--
- 'YAML'
...
{
"json":
}
0xa7 MsgPack
Tasks:
Details:
Languages, formats:
JSON, XML, YAML, HTML, MessagePack
JSON response representation from GitHub API
{
"action": "opened",
"issue": {
"url": "https://api.github.com/repos/octocat/Hello-World/issues/1347",
"number": 1347,
...
},
"repository" : {
"id": 1296269,
"full_name": "octocat/Hello-World",
"owner": {
"login": "octocat",
"id": 1,
...
},
...
},
"sender": {
"login": "octocat",
"id": 1,
...
}
}
XML response representation from opengis
<WFS_Capabilities xmlns="http://www.opengis.net/wfs" version="1.0.0">
<Service>
<Name> Oracle WFS </Name>
<Title> Oracle Web Feature Service </Title>
<Abstract> Web Feature Service maintained by Oracle </Abstract>
<OnlineResource>http://localhost:8888/SpatialWS-</OnlineResource>
</Service>
<Capability>
<GetCapabilities>
<DCPType>
<HTTP>
<Get onlineResource="http://localhost:8888/SpatialWS-"/>
</HTTP>
</DCPType>
<DCPType>
<HTTP>
<Post onlineResource="http://localhost:8888/SpatialWS-"/>
</HTTP>
</DCPType>
</GetCapabilities>
</FeatureType>
</WFS_Capabilities>
Reliability level selection
Authentication
Do need to establish a connection?
Timers
Tasks:
Details:
Implements almost nothing except authentication. Decides, whether need to establish a connection, its settings
Protocols:
PAP, CHAP, SSL
?
Application
Representation
Session
What each level does? What it knows about the data?
Application: creates data, knows its meaning.
Representation: format data, knows only its structure.
Session: choose link settings, authentication. Sees data as just bytes.
1 point
Reliability: order, duplicates, losses, overloads
Fragmentation and batching
Virtual connection establishment
Operates by packets or by a contiguous byte stream
Tasks:
Details:
Protocols:
TCP, UDP, SCTP, RDP
Routing
Foreign packet forwarding
Coping with some types of overloads
Tasks:
Details:
Protocols:
IP, DDP, ICMP, RIP, EGP
Networks are hierarchical
Network level is for interaction of subnetworks
Routing in one network
Foreign packet forwarding in one network
Tasks:
Details:
Protocols:
Ethernet, MPLS, PPP, TokenRing
Coping with some types of overloads
Channel level work area
Network level work area
Signal modulation
Interaction with environment
Coping with collisions
Tasks:
Details:
Technologies:
Bluetooth, Wi-Fi, OTN, Ethernet cable, USB
Checking and fixing of errors
Channel level work area
Physical level work area
Application
Representation
Session
Transport
Channel
Network
Physics
Data
4 levels:
TCP/IP
OSI
7 levels:
struct tcphdr {
__be16 source;
__be16 dest;
__be32 seq;
__be32 ack_seq;
__u16 doff:4,
res1:4,
cwr:1,
ece:1,
urg:1,
ack:1,
psh:1,
rst:1,
syn:1,
fin:1;
__be16 window;
__sum16 check;
__be16 urg_ptr;
};
struct iphdr {
__u8 version:4,
ihl:4;
__u8 tos;
__be16 tot_len;
__be16 id;
__be16 frag_off;
__u8 ttl;
__u8 protocol;
__sum16 check;
__be32 saddr;
__be32 daddr;
};
struct ethhdr {
unsigned char h_dest[ETH_ALEN];
unsigned char h_source[ETH_ALEN];
__be16 h_proto;
};
Packing, wrapping into headers, like onion
Packets
int
tcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t size);
send(socket, data, data_size);
int
ip_build_and_send_pkt(struct sk_buff *skb, const struct sock *sk,
__be32 saddr, __be32 daddr,
struct ip_options_rcu *opt);
int
dnet_mdio_write(struct mii_bus *bus, int mii_id, int regnum,
u16 value);
User space
Kernel space
int
socket(int domain, int type, int protocol);
AF_UNIX - local socket, visible only on this machine
AF_INET - network sockets based on IPv4
AF_PACKET - "raw" network sockets, protocol assembling is in user code
int
socket(int domain, int type, int protocol);
SOCK_DGRAM - UDP, packet size is limited, no reliability guarantees
SOCK_STREAM - TCP, data as a byte stream instead of packets, has delivery guarantees and connections
SOCK_SEQPACKET - SCTP, like TCP, but data is represented as packets of limited size
SOCK_RDM - RDP, like SCTP, but order of delivery is not guaranteed
int
socket(int domain, int type, int protocol);
0 - default
IPPROTO_SCTP
IPPROTO_IP
IPPROTO_TCP
IPPROTO_RAW
IPPROTO_UDP
void try_protocol(int type, int protocol, const char *protocol_name)
{
int sock = socket(AF_INET, type, protocol);
if (sock == -1) {
printf("%s: error = %s\n", protocol_name, strerror(errno));
} else {
printf("%s: success\n", protocol_name);
close(sock);
}
}
void try_type(int type, const char *type_name)
{
printf("\nTry %s type\n", type_name);
try_protocol(type, IPPROTO_TCP, "TCP");
try_protocol(type, IPPROTO_IP, "IP");
try_protocol(type, IPPROTO_SCTP, "SCTP");
try_protocol(type, IPPROTO_RAW, "RAW");
try_protocol(type, IPPROTO_UDP, "UDP");
}
int main()
{
try_type(SOCK_DGRAM, "DGRAM");
try_type(SOCK_RAW, "RAW");
try_type(SOCK_STREAM, "STREAM");
return 0;
}
$> gcc 1_socket_protocol.c
$> ./a.out
Try DGRAM type
TCP: error = Protocol wrong type for socket
IP: success
SCTP: error = Protocol not supported
RAW: error = Protocol wrong type for socket
UDP: success
Try RAW type
TCP: success
IP: success
SCTP: success
RAW: success
UDP: success
Try STREAM type
TCP: success
IP: success
SCTP: error = Protocol not supported
RAW: error = Protocol wrong type for socket
UDP: error = Protocol wrong type for socket
Mac
$> gcc 1_socket_protocol.c
$> ./a.out
Try DGRAM type
TCP: error = Protocol not supported
IP: success
SCTP: error = Protocol not supported
RAW: error = Protocol not supported
UDP: success
Try RAW type
TCP: success
IP: error = Protocol not supported
SCTP: success
RAW: success
UDP: success
Try STREAM type
TCP: success
IP: success
SCTP: success
RAW: error = Protocol not supported
UDP: error = Protocol not supported
Linux
int
bind(int sockfd, const struct sockaddr *addr,
socklen_t addrlen);
struct sockaddr {
sa_family_t sa_family;
char sa_data[14];
};
struct sockaddr_in {
sa_family_t sin_family;
in_port_t sin_port;
struct in_addr sin_addr;
};
struct sockaddr_un {
sa_family_t sun_family;
char sun_path[108];
};
struct sockaddr_nl {
sa_family_t nl_family;
unsigned short nl_pad;
pid_t nl_pid;
__u32 nl_groups;
};
AF_UNIX
AF_INET
struct sockaddr_in {
sa_family_t sin_family;
in_port_t sin_port;
struct in_addr sin_addr;
};
struct in_addr {
uint32_t s_addr;
};
AF_INET
Transport level destination address - a port, 2 bytes
Network level destination address, IP address, 4 bytes:
xxx.xxx.xxx.xxx
Port and address in network byte order, big-endian
uint32_t
htonl(uint32_t hostlong);
uint16_t
htons(uint16_t hostshort);
uint32_t
ntohl(uint32_t netlong);
uint16_t
ntohs(uint16_t netshort);
int
inet_aton(const char *cp, struct in_addr *inp);
in_addr_t
inet_addr(const char *cp);
in_addr_t
inet_network(const char *cp);
char *
inet_ntoa(struct in_addr in);
struct in_addr
inet_makeaddr(in_addr_t net, in_addr_t host);
int
getaddrinfo(const char *node, const char *service,
const struct addrinfo *hints,
struct addrinfo **res);
void
freeaddrinfo(struct addrinfo *res);
const char *
gai_strerror(int errcode);
To convert a domain name into an address for bind()/connect()
int
main()
{
struct addrinfo *addr, *iter;
int rc = getaddrinfo("yandex.ru", NULL, NULL, &addr);
if (rc != 0) {
printf("Error = %s\n", gai_strerror(rc));
return -1;
}
printf("Families: inet = %d, inet6 = %d\n", AF_INET, AF_INET6);
printf("Socket types: dgram = %d, stream = %d, raw = %d\n", SOCK_DGRAM,
SOCK_STREAM, SOCK_RAW);
printf("Protocols: tcp = %d, udp = %d\n\n", IPPROTO_TCP, IPPROTO_UDP);
for (iter = addr; iter != NULL; iter = iter->ai_next) {
printf("family = %d, socktype = %d, protocol = %d",
iter->ai_family, iter->ai_socktype, iter->ai_protocol);
if (iter->ai_family == AF_INET) {
char buf[128];
struct sockaddr_in *tmp =
(struct sockaddr_in *)iter->ai_addr;
inet_ntop(AF_INET, &tmp->sin_addr, buf, sizeof(buf));
printf(", ip = %s", buf);
}
printf("\n");
}
freeaddrinfo(addr);
return 0;
}
Get address list head
Print every address
Elements are linked by ai_next field
Each element has an address of a socket listening there, and some other description fields
struct addrinfo {
int ai_flags;
int ai_family;
int ai_socktype;
int ai_protocol;
socklen_t ai_addrlen;
struct sockaddr *ai_addr;
char *ai_canonname;
struct addrinfo *ai_next;
};
$> gcc 2_getaddrinfo.c
$> ./a.out
Families: inet = 2, inet6 = 10
Socket types: dgram = 2, stream = 1, raw = 3
Protocols: tcp = 6, udp = 17
family = 2, socktype = 2, protocol = 17, ip = 77.88.55.80
family = 2, socktype = 3, protocol = 0, ip = 77.88.55.80
family = 2, socktype = 1, protocol = 6, ip = 5.255.255.88
family = 2, socktype = 2, protocol = 17, ip = 5.255.255.88
family = 2, socktype = 3, protocol = 0, ip = 5.255.255.88
family = 2, socktype = 1, protocol = 6, ip = 77.88.55.77
family = 2, socktype = 2, protocol = 17, ip = 77.88.55.77
family = 2, socktype = 3, protocol = 0, ip = 77.88.55.77
family = 2, socktype = 1, protocol = 6, ip = 5.255.255.80
family = 2, socktype = 2, protocol = 17, ip = 5.255.255.80
family = 2, socktype = 3, protocol = 0, ip = 5.255.255.80
family = 10, socktype = 1, protocol = 6
family = 10, socktype = 2, protocol = 17
family = 10, socktype = 3, protocol = 0
8_net/3_client.c [1]
int
main(int argc, const char **argv)
{
int sock = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
if (sock == -1) {
printf("error = %s\n", strerror(errno));
return -1;
}
struct addrinfo *addr;
struct addrinfo filter;
memset(&filter, 0, sizeof(filter));
filter.ai_family = AF_INET;
filter.ai_socktype = SOCK_STREAM;
int rc = getaddrinfo(argv[1], argv[2], &filter, &addr);
if (rc != 0) {
printf("addrinfo error = %s\n", gai_strerror(rc));
close(sock);
return -1;
}
if (addr == NULL) {
printf("not found a server\n");
freeaddrinfo(addr);
close(sock);
return -1;
}
Create a TCP socket
Try to find a server address by its name
Instead of iteration through the list it is possible to set a filter and find a needed address right away
8_net/3_client.c [2]
rc = connect(sock, addr->ai_addr, addr->ai_addrlen);
freeaddrinfo(addr);
if (rc != 0) {
printf("connect error = %s\n", strerror(errno));
close(sock);
return -1;
}
int number;
while (scanf("%d", &number) > 0) {
if (send(sock, &number, sizeof(number), 0) == -1) {
printf("error = %s\n", strerror(errno));
continue;
}
printf("Sent %d\n", number);
number = 0;
int rc = recv(sock, &number, sizeof(number), 0);
if (rc == 0) {
printf("Closed connection\n");
break;
}
if (rc == -1)
printf("error = %s\n", strerror(errno));
else
printf("Received %d\n", number);
}
close(sock);
return 0;
}
Next is no different from UNIX sockets
8_net/3_server.c [1]
void *
worker_f(void *arg)
{
printf("New client created\n");
int client_sock = (int) arg;
while(1) {
int buffer = 0;
ssize_t size = recv(client_sock, &buffer, sizeof(buffer), 0);
if (size == -1) {
printf("error = %s\n", strerror(errno));
continue;
}
if (size == 0) {
printf("Closed connection\n");
break;
}
printf("Received %d\n", buffer);
buffer++;
if (send(client_sock, &buffer, sizeof(buffer), 0) == -1)
printf("error = %s\n", strerror(errno));
else
printf("Sent %d\n", buffer);
}
close(client_sock);
return NULL;
}
Client serving it exactly the same as it was with UNIX sockets
Read a number, increase it
Send back
8_net/3_server.c [2]
int
main(int argc, const char **argv)
{
int sock = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
if (sock == -1) {
printf("error = %s\n", strerror(errno));
return -1;
}
struct sockaddr_in addr;
addr.sin_family = AF_INET;
addr.sin_port = htons(12345);
inet_aton("127.0.0.1", &addr.sin_addr);
if (bind(sock, (struct sockaddr *) &addr, sizeof(addr)) != 0) {
printf("bind error = %s\n", strerror(errno));
return -1;
}
if (listen(sock, 128) == -1) {
printf("listen error = %s\n", strerror(errno));
return -1;
}
Create a TCP socket
Instead of getaddrinfo() it is possible to fill the address manually
Next is no different from UNIX sockets
8_net/3_server.c [3]
pthread_attr_t attr;
pthread_attr_init(&attr);
pthread_attr_setdetachstate(&attr, 1);
while(1) {
pthread_t worker_thread;
int client_sock = accept(sock, NULL, NULL);
if (client_sock == -1) {
printf("error = %s\n", strerror(errno));
continue;
}
int rc = pthread_create(&worker_thread, &attr, worker_f,
(void *) client_sock);
if (rc != 0) {
printf("error = %s\n", strerror(rc));
close(client_sock);
}
}
pthread_attr_destroy(&attr);
close(sock);
return 0;
}
Everything is the same as it was with UNIX sockets. A new client is accepted and moved to its own thread
$> gcc 3_server.c -o server
$> ./server
$> gcc 3_client.c.c -o client
$> ./client 127.0.0.1 12345
New client created
1
Sent 1
Received 2
Received 1
Sent 2
^C
$>
Closed connection
int
getsockopt(int sockfd, int level, int optname,
void *optval, socklen_t *optlen);
int
setsockopt(int sockfd, int level, int optname,
const void *optval, socklen_t optlen);
int
fcntl(int fd, int cmd, ... /* arg */ );
SO_KEEPALIVE
SO_REUSEADDR
Ignore graceful attempts to send remaining data after socket close
Periodically send empty packets to check if the recipient is still alive and reachable and didn't disconnect
SO_REUSEPORT
{protocol, src_ip, src_port, dst_ip, dst_port}
The option allows these fields be the same for multiple alive sockets
Identifier of each socket in the kernel:
O_NONBLOCK
No blocking for IO operations
int enable = 1;
setsockopt(sockfd, SOL_SOCKET, SO_REUSEADDR, &enable, sizeof(int));
setsockopt(sockfd, SOL_SOCKET, SO_REUSEPORT, &enable, sizeof(int));
setsockopt(sockfd, SOL_SOCKET, SO_KEEPALIVE, &enable, sizeof(int));
flags = fcntl(sockfd, F_GETFL, 0);
fcntl(sockfd, F_SETFL, flags | O_NONBLOCK);
Network is hierarchical and segmented. Inside one subnetwork the addresses are unique. For inter-network communication there are address multiplexing and their temporary "renting".
The world network uses IP addressing. Internet Protocol. Each IP address is 4 (version 4) or 16 (version 6) bytes.
Internet commutation is packet-oriented. Each packet goes its own route to the destination via a chain of routers and switches.
On top of IP are build protocols like TCP (reliable byte stream, connections), UDP (unreliable blocks of data limited in size, no connections), and most of the others.
Code uses network via sockets. They have the same API as UNIX domain sockets but just use different parameters.
Lectures: slides.com/gerold103/decks/sysprog_eng
Next time:
Press on the heart, if like the lecture
Advanced IO. Non-blocking IO operations. File blocks. Multiplexing: select, poll, kqueue.
By Vladislav Shpilevoy
Network. Short history from ARPANET. Canonical OSI model, real TCP/IP model, protocol stack. Network implementation in the kernel. User space interface socket(), connect(), close(). TCP and UDP.
Database C developer at Tarantool. Backend C++ developer at VirtualMinds.