Bo2SS

Bo2SS

3 Multiprocessing

Course Content#

What is a Process#

  • A process is an image of a program in memory, it is a running program, it is an instantiation of a program, and it is a complex collection
    • It includes allocated memory space, user information, group information, permissions, resources used, running code, open files, etc.
  • Correspondingly
    • ① What is a program
      • A program is a compiled executable binary file, stored on disk
        • It is an ordinary file with x permission
      • A collection of programs is an application
    • ② What is a thread
      • A thread represents a series of ordered instructions that need to be executed by the CPU
      • A process may consist of one or more threads, executing instructions simultaneously
    • [PS] A process is the basic unit of resource allocation by the CPU, while a thread is the basic unit of scheduling by the CPU

fork#

Create a child process [Process Interface]

  • man fork
  • Prototype + Description
    • Image
    • Return value [Type: pid_t]: Process ID
    • A new process [child process] is created by copying the process that called fork [parent process], and the parent and child processes run in independent memory spaces
      • When fork is completed, both have the same content, and subsequent memory writes and file mappings do not affect each other
      • A real copy occurs only when memory changes [Copy-on-Write concept]
        • Otherwise, they share the same memory space
    • image-20210120105027238
    • The main differences between parent and child are as follows:
      • The child has its own unique PID, which is not the same as any existing PID
      • The child's perceived parent PID [getppid] is the same as the actual parent PID
      • The child does not inherit the parent's memory locks
      • The child's resource usage and CPU usage time are reset to 0
      • The child does not inherit pending signals, semaphores, record locks, timers, or asynchronous IO operations
  • Return value
    • Image
    • Success: Returns the child's PID in the parent process, returns 0 in the child process
      • The parent cannot obtain the child's PID by other means, while the child can obtain the parent's PID via getppid
    • Failure: Returns -1 and sets errno [Failed to create child process]

wait#

Wait for process state changes

  • man wait
  • Prototype
    • Image
    • wstatus [int *]: Returns the child's status
      • Such as the return or exit value in the child process
      • Macros need to be used for parsing, such as WIFEXITED(wstatus), see code demonstration—wait—two
  • Description
    • Image
    • Wait object: The child of the calling process
    • State change situations: The child is terminated, interrupted by a signal, or awakened by a signal
    • When there is a terminated child,
      • The wait command can make the system release resources related to the child
      • Otherwise [if wait command is not executed], the terminated child process will become a zombie process [👇]
        • The dead child is not detected by the parent process, and its resources are not released
        • Can be viewed using top
        • Image
        • zombie refers to a zombie process
    • As long as one child has changed state, the wait command will return immediately
      • Otherwise, it will block until a child changes state or a signal interrupts
  • Return value
    • image
    • Returns the PID of the terminated child or -1 [Error, sets errno]

exec family#

Execute a file [Everything is a file]

  • man exec
  • Prototype
    • Image
    • There are many siblings
  • Description
    • Image
    • Image
    • It will replace the current process image with a brand new process image
      • [Let the child have a whole new world]
    • The first parameter is always the name of the file to be executed
      • path: Full path
      • file: Can be a command in the PATH environment variable or a full path
    • The entire family can be summarized as: "exec + l/v + p/e/pe"
      • arg parameter name, indicating the parameter of the previous parameter path
      • l-list, all parameters are placed in a single string [Parameter passing method]
        • By convention, arg0 should be related to the name of the file to be executed
        • Must end with (char *) NULL
      • v-vector, all parameters are placed in a string array [Parameter passing method]
        • Must end with a null pointer
      • p-path, the search range for executable files includes the PATH environment variable
        • It replicates the shell's command lookup process
      • e-env, allows specifying environment variables
        • Variable-value pairs
  • Return value
    • img
    • Returns -1 only when an error occurs

flock#

Operate advisory locks on open files

[Essentially to protect data]

  • man 2 flock
  • Prototype + Description
    • Image
    • Operated through file descriptor fd
    • Mainly three types of operations
      • LOCK_SH: Shared lock
      • LOCK_EX: Exclusive lock
        • Exclusive lock: If one person accesses it, others cannot access it
        • Example: Many people using one restroom
      • LOCK_UN: Unlock
  • Return value
    • Image
    • 0, success; -1, failure

Code Demonstration#

fork#

1. Copying Buffer, Line Buffer

  • Image
  • Output result
    • Image
    • ❗ Why does inputting suyelus output two suyelus after fork when there are no output functions?
    • 【Fact】 Although the code after fork is copied for the child process, the child process will only execute the code after fork
    • 【Key】 The buffer was copied, and it still contains suyelus
      • There is no newline character in printf, and standard I/O is line-buffered I/O, so the buffer will not be flushed after line 13 is executed
      • The condition to trigger the buffer flush occurs only when the program ends
    • [PS] In zsh, it may only output suyelus once, possibly due to zsh optimization? In bash, there are two

2. Parent and Child Processes are Independent

  • Image
  • Output result
    • Image
    • ❗ Does the parent process always execute first?
      • Not necessarily, parent and child processes are completely independent and unrelated; essentially, who executes first is determined by kernel scheduling
      • However, the parent process very likely executes first because each process has a running time assigned by the kernel, and the parent process has not yet reached its running time after creating the child
  • [PS]
    • Process 1 <pid 1 process> is the init process, and all other processes are spawned by it
    • Unlike the human world, the first process in the computer world is always alive, waiting to collect the remains of child processes

3. Create 10 child processes and print their own serial numbers

The 10 child processes are full siblings

  • Image
  • If line 18's break is not added
    • It will produce 2^10 processes: 1 -> 2 -> 4 -> 8 -> 16 -> 32 -> ... -> 2^10
    • Count the number of running parent and child processes: ps -ef | grep -v grep | grep Ten | wc -l
      • [Ten is the executable program name]
  • Sleep duration does not accumulate
    • When a process encounters sleep, the system schedules to run other processes, and the final wait time only reflects about 10s
  • The i variable is taken away by the child process and becomes independent; it will not change due to changes in the i variable in the parent process

wait#

1. Create Zombie Processes

  • Image
  • Not using wait to sense the termination of the child process will create a zombie process
  • Users have various ways to view zombie processes [Let the program run in the background: ./a.out &]
    • Based on ps, check for processes with defunct or Z markers
    • Image
    • Based on top
    • Image
    • Using pstree can show the lineage of zombie processes
    • Image
  • [PS] To kill a zombie process, you need to kill its parent process; the parent process's parent process is zsh, and after the program ends, zsh will inform the system to collect the remains of the parent and child processes

2. Sense the Return Status of the Child Process

  • Image
  • The program outputs the following after running for about 2s:
  • Image
  • ❗ Why does the child process return 1, while the parent process gets a status of 256 from wait?
    • The 16-bit int variable value is 256 👉 Its binary representation has the 8th bit as 1, and all other bits are 0
    • Refer to the following image [Linux-UNIX System Programming Manual (Volume 1) — Section 26.1.3], and the problem is answered
    • Image
    • In fact, the man manual mentions that macros can be used to check the status
    • Image
    • WEXITSTATUS(wstatus) can parse the exit status
    • In the source code, each macro corresponds to the following bit operations
    • Image
    • Therefore, when printing the status, process it using the macro as needed

exec family#

【Replace with a brand new process】

  • Image
  • The child process is replaced by a brand new process [vim] on line 17, and the subsequent code will never be executed
    • Directly exec after fork: It will not copy the parent process's memory space during fork and then immediately use it during exec [Copy-on-Write concept: A real copy occurs only when memory changes]
  • wait(NULL) is responsible for collecting the remains
  • The second parameter of execlp can be arbitrary, but it is more meaningful when related to the first parameter
    • This can reflect some aspect of the parameter below
    • If the exec code is replaced with line 17, the second parameter can be named arbitrarily
    • Image
    • The source file test.c for generating the executable file Test is as follows:
      • Image
      • Outputs the value of argv[0]
    • The results of executing the upper and lower pieces of code are as follows:
      • Image
      • It can be seen that the second parameter is reflected in the argv[0] variable

Additional Knowledge Points#

  • Using while(1){} with sleep in the loop body is more CPU-friendly
    • Otherwise, it may cause CPU utilization to spike, idle, and overheat
  • pstree can conveniently show the inheritance relationship of processes, -p can display pid
  • Viewing zombie processes: ps, ps -aux, ps -ef, top can all be used
  • Deadlock: Two or more computing units are waiting for each other to stop running to acquire system resources, but neither party exits early
  • Synchronization in computers is different from that in life
    • It is not about performing the same operation
    • Rather, the order of events is determined, and there is a causal relationship

Points for Consideration#

Tips#

  • du [-h]: View the size of the current directory and all subdirectories [human-readable]
  • For multiprocessing output, using more will display the output of different processes independently
  • Recommended movie: "Her" 2013
    • Image
    • A love story between a silicon-based life form and many carbon-based life forms, involving high concurrency concepts
    • DoubanBaidu Cloud, extraction code: 8pic

Loading...
Ownership of this post data is guaranteed by blockchain and smart contracts to the creator alone.