Course Content#
File Operations#
- The previously learned cp, mv, and cat commands all involve reading and writing files.
- cp: read → write
- mv: read → write → delete
- cat: read → write
- How are these steps implemented?
【Low-level operations based on file descriptors】
open#
Open or create a file [alias: openat, create]
- man 2 open【Focus on function prototype and description】
- Prototype
- Return value int: file descriptor or -1
- Common file descriptors: 0-stdin, 1-stdout, 2-stderr
- -1: an error occurred, and errno will be set [available for perror, see code demonstration]
- flags: file opening mode
- [PS] No need to specifically remember header files
- Description
- System call [system call]: helps you do things you don't have permission to do
- If the file opened by open does not exist, it may create the file [when O_CREAT is defined in flags]
- O_CREAT
- Flag for the open function
- In the C language system, all uppercase indicates a macro definition
- The underlying data is an int type, called a bitmask
- 32 bits, can represent 32 states, each bit represents a state
- States can be converted using AND, OR, XOR
- O_CREAT
- File descriptor [file descriptor]
- Small, non-negative, callable by the system later [read, write...]
- The return value is always the smallest number that can be taken in the current process
- Can be used to determine the number of files [if it returns 1000, the current number of files must exceed 1000]
- After opening a file, the file pointer defaults to the beginning of the file
- File description [file description]
- Each call to open creates a new open file description, which is an entry in the system global file table
- Records the file offset and the file's status
- [PS] The file descriptor is a reference to an open file description and is not affected by changes in pathname
- Each call to open creates a new open file description, which is an entry in the system global file table
- ⭐flags
- Must include one of O_RDONLY, O_WRONLY, O_RDWR
- Flags are combined using bitwise OR
- O_CREAT create
- O_TRUNC truncate
- O_DIRECT direct IO
- Direct IO—synchronous write, the file will be written directly without buffering
- Buffered IO
- Buffering end conditions: ① accumulate a bunch of data; ② wait for a fixed time
- Writing a character 'a' to disk will not be immediately written to the disk, which can reduce costs
- But data can be lost during power outages
- [PS]
- The smallest unit of disk is a block, each block is 4K
- Therefore, the disk is also called a block device
- Similar to the conditions for printf output to stdout [line buffering]
- Encountering a newline / program end, the system automatically flushes the buffer
- When the buffer is full, it automatically flushes
- fflush function, manually flushes
- The smallest unit of disk is a block, each block is 4K
- Buffering end conditions: ① accumulate a bunch of data; ② wait for a fixed time
- O_NONBLOCK non-blocking IO
- Blocking
- For example: during scanf, it must wait for input in the standard input stream before proceeding with subsequent operations
- Disadvantage: wastes resources
- Non-blocking
- Will not wait
- Disadvantages
- Requires frequent checks, which also wastes resources
- Requires some mechanism to monitor, incurring technical costs
- Blocking
- O_TMPFILE create a temporary file
- The file will be deleted after the process ends, and also when the transaction closes
- Similar to the system's temporary folder /tmp
- ❗ Before reading and writing files at a low level, you need to call the open function to obtain a file descriptor
read#
Read data through the file descriptor
- man read
- Prototype
- Return value ssize_t: number of bytes read or -1
- Ending in _t, generally a user-defined type
- Guess: also one of the basic types, possibly long long, possibly int
- Find the specific type step by step through ctags: ctrl + ], ctrl + o
-
- Answer: int [on a 32-bit system]; long int [on a 64-bit system]
- [PS] In theory, on a 32-bit system, the size of long int is equivalent to int
-
- buf, count: read up to count bytes of data into buf each time
- Description + Return value
- Attempt to read up to count bytes into the buffer
- Cases where the number of bytes read does not reach count: interrupted by someone [signal]; data itself is less than count size
- For each successful read of num [≤ count] bytes of data, the file offset [like a pointer] will automatically move forward by num size
- If the file offset is at EOF [no data to read], the function returns 0
- count
- If set to 0, errors may be detected, and if no error is detected, returns 0
- If greater than SSIZE_MAX [the maximum value of int / long int], the returned result will be predefined [POSIX.1 standard]
- Return value
- ≤ count
- Returns -1 on error, and errno will be set
- [PS] ERRORS
- EAGAIN
- When reading a file [including socket], even if the file has been set to O_NONBLOCK, read will block
- EAGAIN
write#
Write data through the file descriptor
- man 2 write
- Prototype
- Very similar to read
- Description + Return value
- Very similar to read
- Cases where the number of bytes written does not reach count: insufficient physical space; system resource limits; interrupted by signal
- If O_APPEND [append] is set when opening the file
- The file offset [offset] is at the end of the file, and the write operation will append
- Otherwise, it will be placed at the beginning, and the write operation will overwrite
close#
Close a file descriptor
- man close
- Prototype
- Mainly just close the file descriptor
- [PS]
- Record locks will be removed
- Special cases
- If the last file descriptor of the file description is closed, the resources corresponding to the file description will be released
- If the last referenced file descriptor of the file is closed, the file will be deleted
- Do not worry about what the kernel specifically does for now
【Standard file operations, based on file pointers】
<stdio.h>
fopen#
Open a file through a stream
- man fopen
- Prototype
- Return value FILE *: file pointer
- Originally a macro definition, here uppercase is for compatibility
- mode
- Type is char *, not int
- Description
- Associate a stream [stream]
- [PS] Data published on the network, byte stream; file stream <type: FILE *>
- mode
- r / r+: read / read-write
- Stream at the beginning of the file
- w / w+: read / read-write
- Stream at the beginning of the file
- If the file exists, truncate the file [opening will clear the original data]
- If the file does not exist, create the file
- a / a+: append / read and append
- When appending, the stream is at EOF; when reading, the stream is at the beginning of the file
- Will create the file if it does not exist
- +: read and write
- [PS]
- b: can be at the end of the mode string or between two characters, used for handling binary files, but generally has no effect on Linux
- ❓ Any created file will be modified by the process's umask value
- r / r+: read / read-write
- Return value
- On success, returns the file pointer
- On error, returns NULL and sets errno
fread, fwrite#
Binary stream IO
- man fread / fwrite
- fread: read nmeb times data [size bytes / time] from stream into ptr
- fwrite: write nmeb times data [size bytes / time] from ptr to stream
- Return value size_t: number of items read / written [success]
- [Unsigned ssize_t]
- On error or early EOF 👉 0 ≤ return value < nmeb
- ❗ Therefore, cannot distinguish between EOF and error through return value, need to use feof, ferror to confirm
- [PS] When size is 1, the return value equals the number of bytes transferred
fclose#
Flush the stream and close the file descriptor
- man fclose
- Flushing the stream actually calls fflush
- Return value
- 0 [success]
- -1 (EOF), and sets errno [failure]
- Undefined behavior [if an illegal pointer or one that has already been fclose'd is passed]
- ⭐ All operations in standard IO are buffered IO
- Does not have permission to write itself, must wait for kernel control
- ❓ Standard IO is more suitable for text [user], while low-level IO is more suitable for binary files
Directory Operations#
Essentially also files [can be directly opened in early days]
opendir#
- man opendir
- Return value DIR *: directory stream pointer or NULL
- The directory stream is placed at the first entry of the directory by default
- Returns NULL and sets errno on error
readdir#
- man readdir
- Return value struct dirent *: directory entry or NULL
- Pointer to the next directory entry [structure] in the directory stream
- Main fields of the structure: d_ino, d_name
- [PS]
- Returns the next file one at a time
- d_off: same as the value returned by telldir, similar to ftell()
- This offset [each file has a different size] is different from the general sense [in bytes]
- ftell() gets the value of the current file position indicator
- NULL [when reaching the end of the directory stream or an error occurs]
- Pointer to the next directory entry [structure] in the directory stream
closedir#
- Close the directory
Basic Idea of Implementing ls -al#
- ls -al effect
-
- The information needed includes: file permissions, link count, username, group name, file size, modification time, file name
-
- Idea
- readdir()
- man readdir
- Read each file in the directory
- Can obtain the file name
- stat(), lstat()
- man 2 stat
- Get file information based on file path: stat structure
-
- Can obtain file permissions, hard link count, uid, gid, file size, modification time
- Refer to the EXAMPLE inside: lstat
- Difference between lstat() and stat()
- lstat() can view the information of soft links without jumping to the file pointed to by the soft link
- getpwuid()
- man getpwuid
- Get passwd structure based on uid
-
- Can obtain the corresponding username
- getgrgid
- man getgrgid
- Get group structure based on gid
-
- Can obtain the corresponding group name
- If you implement it yourself → read files, split
- User information: /etc/passwd
- Group information: /etc/group
- readdir()
- Other Details
- Color
- Sorting
- The number of display columns for the pure ls command changes with width
- Get terminal size
-
- Refer to ioctl, man ioctl
- How to determine column width can use brute force, binary search, or gradually approach
Code Demonstration#
Low-level File Operations#
-
- ⭐ See comments for details, focus on usage
- ❗ Avoid garbled situations
- Leave one position at the end of the string buffer for '\0'
- sizeof(buff) - 1
- The last read of less than 512 bytes needs to exclude interference from excess bytes
- Method ①: manually memset(buff, 0, sizeof(buff))
- Method ②: always keep the data end as '\0', buff[nread] = '\0'
- [PS] When learning upper-level commands of the system, do not need to pay too much attention to these
- Leave one position at the end of the string buffer for '\0'
- perror prints a system error message
- man 3 perror
- Prototype
- fopen and others will set errno when an error occurs
- Description
- Outputs the error message of the last call on stderr
- s usually contains the name of the function
- Build a common header file folder to store commonly used header files
- head.h
-
Standard File Operations#
-
- Buffer placed in the loop will be initialized each time
- nread is a non-negative number and cannot distinguish between EOF and error
Standard IO is Buffered IO#
-
- The first "Hello world" is output directly, stderr is not buffered
- The second "Hello world" would normally wait for sleep to finish and could not output to stdout, but can be output immediately through 👇
- Manually flush the buffer: fflush
- Output a newline
- The sleep function is in unistd.h
Additional Knowledge Points#
- ulimit -a can view the upper limit of the number of files that can be opened
-
- The upper limit of the number of files opened per process is 1024
- Exceeding this will cause the system to crash
- [PS]
- System crashes also need to consider memory
- Be a responsible program: manually close / free, output error logs
-
- Only standard output is line-buffered
Points for Thought#
- ❓ Does saving a file immediately write to disk?
- Refer to Are file edits in Linux directly saved into disk?——StackExchange
Tips#
- In vim, Shift + K can jump to the man manual
- Recommended copy and translate software: CopyTranslator
- Online documentation for man manuals: man page——die.net