Network Program Control Flow and Data Structures in the Linux Kernel

(from BSD API  to transport layer)

Overview

User applications use the BSD Socket Interface to access network services.  The BSD Interface is used to communicate between two or more application programs.  Programs can use the file system (AF_UNIX), or the network (AF_INET) as the communication medium.  When AF_INET is used, user data is handled by various network layers and then sent out onto the physical network.  At the receiving side, the data travels upward through various network layers to the receiving user application.

There are various layers of code in the Linux kernel that user-data travel through to reach the network.  Generally they go through the following:

BSD Layer  (/net/socket.c)

INET Layer (/net/ipv4/af_inet.c)

TRANSPORT Layer
                                TCP (/net/ipv4/tcp*.c)
                                UDP (/net/ipv4/udp.c)
                                XTP (/net/ipv4/xtp*.c)

NETWORK Layer
                                IP  (/net/ipv4/ip*.c)

?Routing?

DATA LINK Layer
                                ??

Drivers
            Ethernet Drivers
  


BSD Layer

The BSD Interface provides the following system calls to the user level application:

socket()
bind()
listen()

accept()
connect()
send()
sendto()
sendmsg()
recv()
recvfrom()
receivemsg()
getpeername()
getsockopt()
setsockopt()
shutdown()

close() [this is not part of BSD API but is used to terminate a BSD socket]

The BSD layer is coded in the /net/socket.c module.  Each system call has a wrapper of the form sys_xxx, where xxx is the BSD system call.  Generally these bsd functions simply point to functions in the next layer INET.  They take user level arguments and translate them to kernel level arguments.  All BSD functions translate the file descriptor to the  SOCKET data structure, do minor error checking and then call the corresponding inet_xxx  procedure.

The SOCKET data structure contains general state information and pointers to INET functions.  All network (AF_INET) related details are below this layer.
 

sys_socket(family, type,  protocol)

        sock_create(family,type,protocol,&sock)    ~~ allocates socket data structure
                    - does various checks
                    sock=sock_alloc()                                ~~ creates socket ds
                    netfamilies[family]->create(sock,protocol)        ~~ points to inet_create()

        get_fd(sock->inode)                                        ~~  sets up fd that the user app uses
 

sys_bind(fd, umyaddr, addrlen)

        sock=sockfd_lookup(fd, err)                          ~~ finds socket data structure via fd
        sock->ops->bind(sock,address,addrlen)      ~~ points to inet_bind
 

sys_listen(fd, backlog)

        sock=sockfd_lookup(fd, err)                          ~~ finds socket data structure via fd
        sock->ops->listen(sock,address,addrlen)      ~~ points to inet_listen
 

sys_accept(fd, upeer_sockaddr, upeer_addrlen)

        sock=sockfd_lookup(fd, err)
        newsock=sock_alloc()
        sock->ops->dup(newsock, sock)                     ~~ points to inet_dup; creates a duplicate socket entry
        newsock->ops->accept(sock, newsock, flags)    ~~ points to inet_accept;
        newsock=socki_lookup(inode)                        ~~ lookup via inode
        get_fd(inode)
        newsock->ops->getname(newsock, address, &len)    ~~ points to inet_getname; gets address of accepted
                                                                                                            socket and passes it back to the app
 

sys_connect(fd, upeer_sockaddr, upeer_addrlen)

sys_send(fd, bufflen, len, flags)

        sockfd_lookup()
        sock_sendmsg()         ~~ see sys_sendmsg()
 

sys_sendto(fd,buff,len,flags,addr,addrlen)

        sockfd_lookup()
        move addr to kernel space and assign it to msghdr
        sock_sendmsg()         ~~ see sys_sendmsg()
 

sys_sendmsg(fd,&msg,flags)

        copy data from user to kernel space
        sockfd_lookup()
        sock_sendmsg
                socks->ops->sendmsg(sock,msg,size)    ~~ points to inet_sendmsg
 

sys_recv()

sys_recvfrom()

sys_receivemsg()

sys_getpeername()

sys_getsockopt()

 
sys_setsockopt()

 

sys_shutdown()

        sock=sock_fd_lookup
        sock->ops->shutdown(sock,how)        ~~ points to inet_shutdown
        sockfd_put(sock)
 

sock_close()


INET Layer

The INET layer is coded in the /net/ipv4/af_inet.c module.  Each inet function perform some minor "house keeping business" and then call the transport specific function via the prot data structure.

The  SOCK data structure is the main structure for the transport layer.  However it contains (via unions) or point to other data structures athat are more transport specific.  For example, SOCK can point to a tcp_opt data structure which has more detail information about a tcp connection.  For XTP it point to a xtp_ctxt where all XTP Context data in kept.  The SOCK and other transport specific data structures are ina state of flux as it gradually moves to an even  more modular structure.  (For example SOCK still has much IP level data.)
 
 

  inet_create(sock, protocol)


inet_bind()

inet_listen()

inet_dgram_connect(sock,uaddr,addrlen,flags)

inet_stream_connect(sock,uaddr,addrlen,flags)

inet_accept(sock,newsock,flags)

inet_dup(newsock, oldsock)


inet_release(sk)

        sk=sock->sk
        if (sock->state != SS_UNCONNECTED)
                sock->state=SS_DISCONNECTING
        sk->state_change(sk)
        // if linger, timeout=1, else =0
        sock->sk=NULL
        sk->socket=NULL
        sk->prot->close(sk,timeout)        ~~ points to transport xxx_close; each transport protocol must have a xxx_close function [TCP: tcp_close, XTP: xtp_close]
 

inet_getname(sock, uaddr, uaddrlen, peer)

        // peer: get addr info for local connection or peer connection
        if (peer)
                //get addr info from the socket you are connect to
        else
                // get your own (local) addr info
 

inet_poll(sock, polltable *wait)

        sk->prot->poll(sock,wait)
 

inet_setsockopt(sock, level, optname, optval, optlen)

        sk->prot->setsockopt(sock, level, optname, optval, optlen)
 

inet_getsockopt(sock, level, optname, optval, optlen)

        sk->prot->getsockopt(sock, level, optname, optval, optlen)


inet_shutdown(sock, how)

        sk=sock->sk
        // various checks
        sk->prot->shutdown(sk, how)
 
 

sock_init_data(sock,sk)

        initialize:    sk->receive_queue
                            sk->write_queue
                            sk->back_log
                            sk->error_queue
        init_timer(sk->timer)
        --various sk members
        -- various callbacks
 
 

 


TRANPORT Layer

There are 3 main traditional transport protocols, TCP, UDP, and RAW.  The Xpress Transport Protocol (XTP) is the latest additional to the Linux kernel.


XpressTransport Protocol (XTP)

  xtp_init_sock( )

       // sets service type, error control, rate control, flow control

       // sets xtp timers

       // sets ???

xtp_close(sk, timeout)


 
xtp_connect(sk,uaddr,addrlen)

xtp_accept(sk,flags)



  Transmission Control Protocol (TCP) 

TCP provides a reliable connection-oriented service.


   tcp_v4_init_sock(sk)

        // sets mostly sk->tp_pinfo.af_tcp
        sk->priority=1
        sk->state=TCP_CLOSE
        sk->max_unacked=2048
        sk->max_ack_backlog=SOMAXCONN
        sk->mtu=576
        sk->mss=536
        sk->dummy_th.ack=1
        sk->dummy_th.doff=sizeof(struct tcphdr) >> 2

tcp_v4_connect(sk,uaddr,uaddrlen)

tcp_accept(sk,flags)

tcp_shutdown(sk, how)


        sk->shutdown |= SEND_SHUTDOWN
        if (tcp_close_state(sk))
                tcp_send_fin(sk)
        release_sock(sk)
                start_bh_atomic()
                ....
                end_bh_atomic()
 

tcp_close(sk, timeout)


        if (sk->state==TCP_LISTEN)
                tcp_set_state(sk, TCP_CLOSE)
                tcp_close_pending(sk)
                release_sock(sk)
                sk->prot->unhash(sk)
                return
        if (!sk->dead)

        // flush receive_queue
        if (tcp_close_state(sk,1)==1)
                tcp_send_fin(sk)
        if (timeout)
                ....
                sleep
                ....
        if ( ...TCP_FIN_WAIT2  [zombie]  )
                set time
        sk->dead=1
        if (sk->state==TCP_CLOSE)
                sk->prot->unhash(sk)

tcp_v4_destroy_sock(sk)

 


Unreliable Datagram Protocol UDP

UDP provides unrealiable connectionless service

udp_connect(sk,uaddr,addrlen)

udp_close(sk, timeout)

destroy_sock(sk)

                      del_from_prot_list(sk)

                      dst_release(sk->dst_cache)

                       sk_free(sk);

 

 


NETWORK Layer

The network layer protocol is Internet Protocol (IP). The TRANSPORT layer uses the services of the NETWORK to send its data to the network. The NETWORK layer protocol is usually IP (Internet Protocol).

ip_route_connect()