Monday, February 23, 2009

Normal Startup

Although our TCP example is small (about 150 lines of code for the two main functions, str_echo, str_cli, readline, and writen), it is essential that we understand how the client and server start, how they end, and most importantly, what happens when something goes wrong: the client host crashes, the client process crashes, network connectivity is lost, and so on. Only by understanding these boundary conditions, and their interaction with the TCP/IP protocols, can we write robust clients and servers that can handle these conditions.

We first start the server in the background on the host linux.

linux % tcpserv01 &
[1] 17870

When the server starts, it calls socket, bind, listen, and accept, blocking in the call to accept. (We have not started the client yet.) Before starting the client, we run the netstat program to verify the state of the server's listening socket.

linux % netstat -a
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 *:9877 *:* LISTEN

Here we show only the first line of output (the heading), plus the line that we are interested in. This command shows the status of all sockets on the system, which can be lots of output. We must specify the -a flag to see listening sockets.

The output is what we expect. A socket is in the LISTEN state with a wildcard for the local IP address and a local port of 9877. netstat prints an asterisk for an IP address of 0 (INADDR_ANY, the wildcard) or for a port of 0.

We then start the client on the same host, specifying the server's IP address of 127.0.0.1 (the loopback address). We could have also specified the server's normal (nonloopback) IP address.

linux % tcpcli01 127.0.0.1

The client calls socket and connect, the latter causing TCP's three-way handshake to take place. When the three-way handshake completes, connect returns in the client and accept returns in the server. The connection is established. The following steps then take place:




  1. The client calls str_cli, which will block in the call to fgets, because we have not typed a line of input yet.



  2. When accept returns in the server, it calls fork and the child calls str_echo. This function calls readline, which calls read, which blocks while waiting for a line to be sent from the client.



  3. The server parent, on the other hand, calls accept again, and blocks while waiting for the next client connection.


We have three processes, and all three are asleep (blocked): client, server parent, and server child.


When the three-way handshake completes, we purposely list the client step first, and then the server steps. The reason : connect returns when the second segment of the handshake is received by the client, but accept does not return until the third segment of the handshake is received by the server, one-half of the RTT after connect returns.


We purposely run the client and server on the same host because this is the easiest way to experiment with client/server applications. Since we are running the client and server on the same host, netstat now shows two additional lines of output, corresponding to the TCP connection:

linux % netstat -a
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 local host:9877 localhost:42758 ESTABLISHED
tcp 0 0 local host:42758 localhost:9877 ESTABLISHED
tcp 0 0 *:9877 *:* LISTEN

The first of the ESTABLISHED lines corresponds to the server child's socket, since the local port is 9877. The second of the ESTABLISHED lines is the client's socket, since the local port is 42758. If we were running the client and server on different hosts, the client host would display only the client's socket, and the server host would display only the two server sockets.

We can also use the ps command to check the status and relationship of these processes.

linux % ps -t pts/6 -o pid,ppid,tty,stat,args,wchan
PID PPID TT STAT COMMAND WCHAN
22038 22036 pts/6 S -bash wait4
17870 22038 pts/6 S ./tcpserv01 wait_for_connect
19315 17870 pts/6 S ./tcpserv01 tcp_data_wait
19314 22038 pts/6 S ./tcpcli01 127.0 read_chan

(We have used very specific arguments to ps to only show us the information that pertains to this discussion.) In this output, we ran the client and server from the same window (pts/6, which stands for pseudo-terminal number 6). The PID and PPID columns show the parent and child relationships. We can tell that the first tcpserv01 line is the parent and the second tcpserv01 line is the child since the PPID of the child is the parent's PID. Also, the PPID of the parent is the shell (bash).

The STAT column for all three of our network processes is "S," meaning the process is sleeping (waiting for something). When a process is asleep, the WCHAN column specifies the condition. Linux prints wait_for_connect when a process is blocked in either accept or connect, tcp_data_wait when a process is blocked on socket input or output, or read_chan when a process is blocked on terminal I/O. The WCHAN values for our three network processes therefore make sense.

No comments: