Course Content#
What is a socket? What does network programming do?
- Understand the TCP/IP five-layer model and the OSI seven-layer model
- Analogy
- Socket — Courier
- Transport Layer — Courier Company: TCP — Some Feng Courier Company, UDP — Some Tong Courier Company
- Transportation Road — Internet
- Communication Address — IP
——Transport Layer Protocol——#
Analogy of a courier company
For developers, you can only choose TCP or UDP protocol and modify protocol parameters
TCP#
Transmission Control Protocol; connection-oriented, reliable data transmission protocol
- Connection: Three-way handshake [See Additional Knowledge Points for details]
- Nature of reliability: Acknowledgment and retransmission [Requires sequence number]
- If lost, it will be retransmitted
- [PS] Both parties save some variables describing their states
- Header Format
-
- Source Port Number: From which port it is sent; Destination Port Number: To which port it is sent
- Different ports correspond to different applications
- If the computer is compared to a building, the port number is the room number in the building
- The IP address is provided by the IP layer
- Sequence Number: Marks the number of the communication; Acknowledgment Number: The expected sequence number for the next communication from the other party
- Header Length: Measured in words [Generally 4 bytes]
- Function Bit Fields [Focus on highlighted areas]
- ACK: Acknowledgment
- RST: Reset connection [Refuse next connection]
- SYN: Establish request [Used in the first two handshakes of the three-way handshake]
- FIN: Close connection [Used in the first and third handshakes of the four-way handshake, and can also carry some data, see Additional Knowledge Points for details]
- Window Size: Tells the other party how much more data can be sent, used to suppress the sending rate of the other party
- Checksum: Confirms whether the data is correct. If there is a problem, it is directly discarded and a retransmission request is made
- [PS]
- Designed so much mainly for reliability
- Real-world courier companies cannot achieve reliability because the items being transported are unique
UDP#
User Datagram Protocol; connectionless, unreliable data transmission protocol
- Connectionless: No handshake required
- Unreliable: No matter whether the other party receives it
- Advantages: Flexible, low cost
- Header Format
-
- Much simpler compared to TCP
——Socket——#
Analogy of a courier, but serves only one task
The interface between the process and the transport layer, the process must hand over network data to the transport layer for delivery
【Life and Death】#
Socket: Create Socket#
-
- Domain: Domain type
- AF_INET, corresponds to IPv4 [commonly used]
- AF_INET6, corresponds to IPv6
- Type: Type
- SOCK_STREAM, corresponds to byte stream [TCP]
- SOCK_DGRAM, corresponds to datagram [UDP]
- Protocol: Protocol
- Domain and type may uniquely determine the protocol, such as AF_INET and SOCK_STREAM determine IPPROTO_TCP
- [PS] If only one can be selected, 0 can be used as a substitute
- Return Value: File descriptor
- Returns -1 on error
- Socket is also a file; everything is a file
Close: Close Connection#
- int close(int fd);
- Four-way handshake [See Additional Knowledge Points for details]
- Both ends need to call close, the caller sends FIN, and the return value of recv on the receiving end is 0
【Service】#
Bind: Bind IP and Port#
Only for the data receiver
-
- sockfd: File descriptor
- addr: IP address and port
- Binding IP: Can receive data from that IP address [local machine]
- If empty, can receive data from any IP address
- Can be used at the junction of the intranet and the internet, serving as a firewall
- Binding Port: Serves which port [A total of $2^{16}=65536$ ports]
- Binding IP: Can receive data from that IP address [local machine]
- addrlen: Address length
- Return Value: Success, 0; otherwise, -1
+ Related Structures: sockaddr, sockaddr_in#
sockaddr
-
- sin_family: Address protocol family, generally using AF_INET, corresponding to IPv4
- sa_data: Contains both IP address and port
- ❗ Not convenient to use, switch to the following more user-friendly way 👇, then use (struct sockaddr*) for type conversion
sockaddr_in
-
- sin_port: Port number [Requires network byte order, see below]
- sin_addr: IP address
- Among them, sin_addr corresponds to a new structure in_addr
-
- Stores a 32-bit unsigned integer, generally using the inet_addr function to convert dotted decimal to in_addr structure:
-
- Dotted decimal representation [string form] is more convenient
- inet_ntoa, conversely
-
-
+ Host Byte Order & Network Byte Order#
- Host Byte Order: Big-endian, Little-endian
- Commonly Little-endian machines, low byte is placed in the low address end of memory
- Network Byte Order: For a 32-bit value of 4 bytes, first transmit 0~7 bits, ..., finally transmit 24~31 bits
- Integer byte order conversion functions
-
- htonl: Converts 32-bit host byte order to network byte order
- htons: Converts 16-bit host byte order to network byte order
- ntohl, ntohs, conversely
-
Listen: Set to Listening State#
Switch the socket from active (default) to passive [First need to bind the port]
-
- Note: The true meaning of the second parameter is the length of the completion queue
- ① The TCP connection process has two queues
- Incomplete Queue: The client sends a SYN, the server responds with SYN+ACK, at this point the server is in the SYN_RECV state, and the connection is in the incomplete queue
- Complete Queue: After the client responds with ACK, both sides are in the ESTABLISHED state, at this point the connection is transferred from the incomplete queue to the complete queue
- 👉 When the server calls accept, it removes the connection from the complete queue
- ② Notes: Set an appropriate backlog; the server should accept new connections as soon as possible
- ① The TCP connection process has two queues
Accept: Accept Connection#
Generate a new courier [can continue to establish multiple connections]
-
- ① The passed sockfd must have been processed by socket(), bind(), listen()
- ② addr is an output parameter used to store the client address
- Return Value
- On success, returns a new sockfd, the original sockfd can still be used to accept
- On failure, returns -1
- [PS] Generally, the new sockfd is closed after use; the socket in listen state is not closed
【Client】#
Connect: Establish Connection#
Active socket, can connect to one at most
-
- Unlike accept:
- sockfd does not need to be processed by bind(), listen()
- Will not return a new socket
⭐ Connect and accept are a pair, executed on the client and server respectively, during which the three-way handshake is completed
【Transmission】#
Send: Send Data#
Essentially the same as write
-
- ❗ Sendto has additional dest_addr and addrlen, which is used for UDP
- Because no connection is established, the destination IP and port need to be specified
- Flag is generally set to 0
Recv: Receive Data#
Essentially the same as read
-
- When the other party disconnects, the return value is 0
- ❗ Recvfrom has additional src_addr and addrlen, which is used for UDP
- src_addr stores the address information of the sending data end
- Default is blocking
——Additional——#
Kill#
Send a signal to a process
- man 2 kill
- Prototype
- Based on process ID and signal mask
- Description
- Setting pid has various forms
- All require existence and permission checks
- Return Value
- 0, success; -1, error
- Kill -l to view the signal list
-
- 64 types of signals
-
Signal#
Signal handling method
- man signal
- Prototype
- Needs to define a function of type sighandler_t
- Description
- Its behavior varies with UNIX versions
- There are three types of handlers: ignore, default, custom
- Custom type involves the principle of catching mice: when one mouse is caught, the next one may be lost
- Needs to be reset [by system operation]
- Return Value
- Depends on the handler
Code Demonstration#
Server#
tcp_server.h
-
- Create a courier in a listening state on the specified port
tcp_server.c
-
- Read according to the sequence number
- Note: Add socket-related header files in head.h, which can be found in the man manual, not elaborated here
1.server.c
-
-
- Accept can obtain the client's address information
- Create a child process dedicated to data transmission
- Every step must pay attention to error detection
-
- Handling of disconnection (FIN, recv returns 0)
-
- Sending and receiving strategies differ
- Send as much as possible, receive as much as possible
- Send uses strlen, recv uses sizeof
Client#
tcp_client.h
-
- Actively connect to the specified IP [dotted decimal IPv4 string] and port
tcp_client.c
-
- Fill out the form based on input
1.client.c
-
-
- Added signal capture
- Use of bzero to initialize buff variable
Effect Display#
- Left: Server, Right: Client [Can be multi-user]
- Establish connection, address capture, data transmission, disconnection
- Use netstat to check the listening status of the port
-
- Add -alnt option
-
- [PS] Need to open port 8888 in the cloud host console — security group
Additional Knowledge Points#
- IP: Public address service, strives to deliver services. Another layer of meaning, it is unreliable [may have accidents]
Three-way Handshake, Four-way Handshake#
-
- Three-way handshake [SYN, ACK]
-
- First handshake: The client sends a SYN packet to the server [the client enters SYN_SEND state, waiting for server confirmation]
- Second handshake: The server receives it, must confirm the client, sets an ACK, and also sets a SYN, i.e., SYN+ACK packet [the server moves from LISTEN to SYN_RECV state]
- Third handshake: The client receives the server's SYN+ACK packet and sends an ACK confirmation packet to the server. After sending, the client enters ESTABLISHED state, and the server also enters ESTABLISHED state after receiving ACK
- Note: Each ACK sequence number adds one to the sequence number of the packet that needs confirmation, indicating acknowledgment
-
- Four-way handshake [FIN, ACK]
-
- First wave: Assume the client wants to close the connection, the client sends a FIN packet, indicating that it has no data to send [can still receive data at this time] [the client enters FIN_WAIT_1 state]
- Second wave: The server replies with an ACK packet, indicating that it has received the client's request to close the connection, but it still needs to prepare to close the connection [the server enters CLOSE_WAIT state]
- After the client receives this ACK, it enters FIN_WAIT_2 state, waiting for the server to close the connection
- Third wave: When the server is ready to close the connection, it sends a FIN to the client [the server enters LAST_ACK state, waiting for the client's confirmation]
- Fourth wave: The client receives the close request from the server and sends an ACK packet [the client enters TIME_WAIT state, waiting for the possible timeout retransmission of the FIN packet, waiting for 2 MSL time]
- After the server receives this ACK, it closes the connection and enters CLOSED state
- After the client waits for 2 MSL, if it does not receive the server's FIN, it considers that the server has closed the connection normally, so it also closes the connection and enters CLOSED state; otherwise, it sends ACK again
-
- Reference Three-way Handshake and Four-way Handshake — Blog [Note: In the fourth wave, the client waits for the timeout retransmission of the FIN instead of ACK]
Additional: Meaning of 2 MSL#
How TIME_WAIT is triggered, what role it plays, what drawbacks it has in programming, and how to solve it?
- Cause: During the four-way handshake of TCP, after completing the first three handshakes, when the client receives the FIN from the server during the fourth handshake, it enters TIME_WAIT state after sending an ACK
- At this point, the client needs to wait for two maximum segment lifetimes (Maximum segment lifetime, MSL) before entering CLOSED state
- Reasons for existence
- ① Prevent delayed segments
- Each TCP segment contains a unique sequence number, which ensures the reliability of the TCP protocol
- To ensure that the data segments of the new TCP connection do not overlap with the historical connection data segments still being transmitted in the network, the TCP connection needs to wait for at least the longest time that the silent data segments can survive in the network, which is MSL
- Thus preventing delayed segments from being received by other TCP connections using the same source address, source port, destination address, and destination port
- ② Ensure connection closure
- If the waiting time of the client is not long enough, when the server has not received the ACK message, and the client re-establishes a TCP connection with the server, the following will occur:
- The server, having not received the ACK message, still considers the current connection valid
- When the client resends the SYN message to request a handshake, it will receive an RST message from the server, and the connection establishment process will be terminated
- Therefore, it is necessary to ensure that the remote TCP connection is correctly closed, i.e., waiting for the passive closing party to receive the ACK message corresponding to the FIN
- If the waiting time of the client is not long enough, when the server has not received the ACK message, and the client re-establishes a TCP connection with the server, the following will occur:
- ① Prevent delayed segments
- Programming impact
- In high-concurrency scenarios, it is easy to have too many TIME_WAIT
- The duration of MSL is generally 60s, which is unacceptable; a TCP connection may only communicate for a few seconds, but TIME_WAIT requires waiting for 2 minutes
- Solutions
- Based on a timestamp variable, record the time of sending packets and the time of the last received packet
- Then combined with two parameters
- Reuse: Allows the party that actively closes the connection to reuse the connection in TIME_WAIT state when initiating a connection to the other party again
- Recycle: The kernel quickly recycles connections in TIME_WAIT, only needing to wait for RTO time [timeout for packet retransmission]
- Reference
Socket Programming in C Language#
- Server: socket, sockaddr[_in], bind, listen; accept, send/recv; close
- Client: socket, sockaddr[_in], connect; send/recv; close
-
- Stream socket based on TCP, datagram socket based on UDP
- The UDP server also needs to bind IP and port, but does not need to listen, using sendto and recvfrom to send and receive information
- sockaddr[_in]: Structure that saves socket information, use [_in] to fill in information, then convert to sockaddr
- The server needs two sockets, one for listening and one for receiving the socket sent by the client connect
Input kaikeba.com and press Enter#
-> To establish a TCP connection, what happens from the local sending of the first request packet to receiving the first request packet?
- [Macro level] DNS👉TCP connection [Application layer, Transport layer, Network layer, Data link layer]👉Server processes request👉Returns response result
- DNS
- Local hosts, local DNS resolver cache
- Local DNS
- Iterative/Recursive: Root DNS server, Top-level DNS, Authoritative DNS
- Until the domain name corresponding IP is found
- TCP connection
- Application layer: Sends HTTP request — request method, URL, HTTP version
- Transport layer: Three-way handshake with the server
- Network layer: ARP protocol queries the MAC address corresponding to the IP. If within a local area network, directly sends requests based on MAC address; otherwise, uses the routing table to find the next hop address, then accesses the corresponding MAC address
- Data link layer: Ethernet protocol
- Broadcast: Sends requests to all machines in a local area network, comparing MAC addresses
- Web server
- Parses user requests, knows which resource files need to be scheduled, and calls database information to return to the browser client
- Returns response result
- Generally, there will be an HTTP status code, such as 200, 301, 404, etc. Through this status code, we can know whether the server-side processing is normal and understand the specific error
- ⭐ Recommended video: TCP-IP Explained (2000) — Youtube
- [Mainly from the IP layer]
- Involved objects: TCP packets, ICMP Ping packets, UDP packets, dead Ping, routers, router switches...
- General process
- Local: Encapsulate packets, local transmission, local router selection, switch selection, proxy check, firewall check, local transmission, router selection
- ——> Network transmission —>
- Response end: Firewall check [supervising ports], proxy checks request packets, returns corresponding information to the request end, same as the above local process [encapsulate packets, ..., router selection]
Port Reuse Related#
Can one port be bound to different services simultaneously?
- Yes. When receiving data, it determines the data attributes based on the five-tuple {Transport protocol, Source IP, Source port, Destination IP, Destination port}
- For example:
- Using TCP and UDP transport protocols to listen on the same port, receiving data does not affect each other, no conflict
- Similarly, accept generates a new socket, still using the same port
- Multiple different sockets are generated, and the destination IP and port contained in these sockets remain unchanged; only the source IP and port change [port reuse]
- [PS] TCP type sockets only send data to TCP type
Socket Relationship Between Parent and Child Processes#
The relationship between the socket in the child process cloned from the parent process and the socket in the parent process
- They are the same, corresponding to the same file
- When data arrives, whichever of the two processes receives the data first has that data, and the other process continues to wait
- Therefore, generally, resources that the child process does not need should not be inherited, such as: you can use close to directly close the socket inherited from the parent process in the child process
Tips#
- System/network programming should consider all possible error places
- Signal knowledge expansion: Implement your own sleep function
- Remember to consider all related source files [*.c] during compilation