5 Arrays and Preprocessing Commands

Course Content#

Definition and Initialization of Arrays#

Definition: A sequential collection of the same type of elements of a fixed size
Initialization
- int a[100] = {2, 3}; // a[0] = 2, a[1] = 3
- int a[100] = {0}; // Array clearing operation: initializes every element to 0
  - Actually only defines the 0th element as 0, but causes other elements to be automatically initialized to 0
Others
- Starts from index 0
- Size of array space: total size of all variables
- Storage space is contiguous
  - int a[3]

Address	0123	4567	89[10][11]
Element	a[0]	a[1]	a[2]

- The arrays defined above are all static arrays
  - Not recommended operation: int n; scanf("%d", &n); int arr(n);
- Index-value mapping
  - Given the index, the time efficiency of accessing the value is O(1)

Prime Sieve#

Core Idea: Use prime numbers to eliminate all composite numbers

Learn as an algorithm framework, with many variations
Prime numbers: only have 1 and itself as factors
- 1 is neither a prime nor a composite number
- 2 is the smallest prime number
- Prime numbers must be odd (except for 2), but odd numbers are not necessarily prime
Utilizing array structure. An array can represent three types of information:
- Index
- Value mapped by the index
- Sign of the value
Algorithm measurement standards and performance
- Space complexity: O(N), an array of size N is created for N numbers
- Time complexity: O(N * logN), O(N * loglogN)--optimized version, approaching O(N)
  - Not O(N) due to duplicate marking
  - The base of the logarithm is not important; base 2 is already small
  - How to calculate: amortization idea, calculating expected value based on time complexity and probability...
Marking: Prime, 0; Composite, 1
- Utilizing two states of the array: index → number, value → mark
Prime sieve extension
- Find the minimum and maximum prime factors of all numbers within a range
- For example, for 12: 2, 3

Linear Sieve#

Start from Euler's 7th problem
- Find the 10001st prime number
- Can use enumeration method O(N * sqrt(N)), prime sieve
- Estimate how large the 10001st prime number will be?
  - Rule: The range of primes ranked within 200,000 will not exceed 20 times their rank
Thoughts from the prime sieve
- How many times is 6 marked? 2 times: 2, 3
- How many times is 30 marked? 3 times: 2, 3, 5
- If a number N's prime factorization contains m different primes, how many times is N marked? m times
- Can it be marked only once? 👇
Linear sieve
- Both space and time are O(n)
- Use an integer M to mark composite number N
- The relationship between M and N satisfies four properties
  1. The smallest prime factor of N is p
  2. N = p * M
  3. ❗ p must be less than or equal to the smallest prime factor in M
    - Similar to the principle of capturing Meng Huo, only mark on the last occasion
  4. Use M * P' (the set of all primes not larger than the smallest prime of M) to mark N1, N2, ...

[i and iii are somewhat equivalent]

Code
- Core idea: N = M (↑ as large as possible) * p (↓ as small as possible)
  - Enumerate M, gradually increase p until p meets the termination condition: property iii
  - Compared to M, p is better controlled!
- The key to avoiding duplication is property iii
- Note: Although there are two layers of for loops, the time complexity is O(N) because of the break in the second layer for loop, the entire array is traversed once
- The difference from optimizing the initial condition of the prime sieve to i * i is? The prime sieve still has duplicate marking situations

Binary Search Method#

Sequential search → binary search: O(N) → O(logN)
Key premise: The search sequence must be monotonic
Code
- Mainly demonstrates three parts: ① Binary search in an array ② Binary search in a function ③
- For the implementation part of binary search in a function
  - Should be able to unify binary_search_f_i and binary_search_f_d into one function
  - Focus on later learning, can you identify the return type of the passed function pointer?
  - However, there are differences in the implementation part, so the significance should not be great
- The commented code in the main function tests the results of binary search in arrays and functions, as follows:
- - Using a function for binary search can be more flexible
- Double type equality: use the difference less than EPSL to judge
  - Remember to #undef EPSL
- Details are in the comments
  - Learning order: binary_search 👉 binary_search_f_i 👉 binary_search_f_d
Recursive Version Code
- The parameters of the function differ from the iterative method, and the search boundaries need to be determined

Newton Iteration Method#

Purpose: To solve for the value of x when f(x) = 0
Conditions: The function must be differentiable; convergence
Iteration Formula: x2 = x1 - f(x1) / f'(x1)
Core idea: Each iteration brings x2 closer to the required value of x until the absolute value of f(x) is smaller than EPSL
Code
- ⭐ Remember to take the absolute value for the loop termination condition!
- Is x = n / 2.0 this initial value fixed?
  - No
  - This value is an imagined x closest to the solution, which is fine on both sides of the solution
  - However, it cannot be 0, as the derivative would be 0, leading to a division by 0
- Wrap the Newton function to obtain the my_sqrt function
- This is not binary search! Binary search requires monotonicity~

Preprocessing Commands#

Commands starting with #
- #include <stdio.h> → During preprocessing, the stdio.h file is copied to the current location unchanged
Macro Definitions

Define symbolic constants: constant → symbol.

#define PI 3.1415926
#define MAX_N 1000

Convenient for batch modification of values

Foolish expressions

#define MAX(a, b) (a) > (b) ? (a) : (b)
#define S(a, b) a * b

Foolish eg. #define S(a, b) a * b
- S(2 + 3, 4 + 5) → Simple replacement → 2 + 3 * 4 + 5 is equivalent to 2 + (3 * 4) + 5
- No calculations are done during the replacement!

Define code segments (advanced)

#define P(a) { \
    printf("%d\n", a); \
}

Macros can only accept one line of definition, and a newline requires the use of a backslash '' (continuation character)
Predefined macros (system encapsulated)
- The two ends of the macro are two underscores
- DATA, TIME: The date and time at pre-compilation, which do not change without preprocessing
  - Can achieve functions similar to fingerprint recognition? Pre-stored relationships?
- Non-standard macros: Some environments may not define them, and different environments may have different standards, so portability is poor
  - How to handle this? Conditional compilation can be used 👇
Conditional Compilation
- #ifdef DEBUG
  - Explanation: Whether the DEBUG macro is defined
  - Note: DEBUG must be a macro, not a variable!
    - Macros take effect after preprocessing, while variables take effect after compilation
- #ifndef DEBUG Whether not defined
- #if MAX_N == 5 Whether the macro MAX_N equals 5
  - Can be used to check the user's mobile version in games
- #elif MAX_N == 4
- #else
- Must end with #endif**!!
  - 【Coding Specification】
  - Similar to va_start, va_end
Simple Compilation Process
- C Source Code .c
👇 Preprocessing Stage (Pre-compilation) (gcc -E): Includes header files, macro replacements, and other preprocessing commands, replacing all # commands
- Compile Source Code .c
👇 Syntax checking, compilation (gcc –S)
- Assembly Source Code .s, .asm
👇 Assembly (gcc -c)
- Object File .o (under Linux) or .obj (under Windows)
👇 Linking (gcc)
- Executable Program a.out (under Linux) or a.exe (under Windows) (binary file)
PS: Finally, there is a loading process, mainly to establish the mapping relationship between the executable file, virtual address, and physical address

Strings#

Character Arrays
- Definition: char str[size];
- Initialization

char str[] = "hello world";
char str[size] = {'h', 'e', 'l', 'l', 'o'};

Line1
- The size in [] can be omitted, and the system will calculate it automatically
- The system will automatically add a terminator '\0', the size of the character array: 11+1
Line2
- Size must be at least 6, considering the terminator '\0' occupies one position
  - Extra positions are filled with '\0'
- If size is not added, '\0' must be added at the end of the array, as the system will not add it automatically
'\0'
- A termination marker
- Corresponds to 0 at the lower level: false
- '\0' can serve as a termination condition, such as: for loop reading a string
Is there a '\0' after the character? No, it is a characteristic of strings
Related Operations
- Header File: <string.h>
- strlen(str)
  - Returns the index of character '\0', length does not include '\0'
  - PS: sizeof() considers the actual memory occupied by the variable, including '\0', measured in bytes
- strcmp(str1, str2);
  - Compares ASCII values in dictionary order
    - Starts from the first position
  - Return value is the difference (0 if equal)
- strcpy(dest, src);
- The end markers for comparison and copying: look for character '\0', what if '\0' is lost? Hence there are
- Safer comparisons and copies: strncmp, strncpy
  - For cases where '\0' might be accidentally lost
  - Compare and copy at most n bytes
- Memory Related
  - memset(str, c, n)
  - Not limited to string operations, str corresponds to an address
    - Here is an example of operations on arrays
    - memset(arr, 0, sizeof(arr))
      - Initializes, clears to 0
    - memset(arr, 1, sizeof(arr))
      - This does not set every position to 1, because it operates on each byte, while int occupies 4 bytes
      - 0000...01000...001000....0001...
    - memset(arr, -1, sizeof(arr))
      - It is indeed -1, because -1 in 1 byte is 11111111
Header File: <stdio.h>
The combination of both can perform format concatenation: one for input, one for output
The str1 in scanf is analogous to terminal input, which is replaced with string input

In-Class Exercises#

Implement a BUG-free MAX macro#

Idea: ① Calculate correct output manually 👉 ② First write the simplest implementation with BUG and have friendly output display 👉 ③ Find and solve problems sequentially for erroneous outputs
① Correct output as shown
② First write with BUG, and implement friendly output display

#define MAX(a, b) a > b ? a : b
#define P(func) { \
    printf("%s = %d\n", #func, func); \
}

Line2-4: Friendly output display
- #func directly outputs its string representation
- Not adding # means calling its value
③ Find and solve problems sequentially for erroneous outputs
- (The order of solving BUGs is also important, but here the order of BUGs is good)
- At this point, check the erroneous situation
- Errors caused by macros can be seen in the preprocessed file
  - gcc -E *.c outputs the code after macro replacement
- First error correction
  - Solution: You can use parentheses to increase the precedence in the red box

#define MAX(a, b) (a > b ? a : b)

- Second error correction
  - The fourth result is 2, as shown in the above image
  - Note the conditional operator '?:'
    - Its precedence is very low: lower than '>'
    - When there are multiple '?:', its associativity is from right to left
  - Solution: Use parentheses for each variable to increase precedence

#define MAX(a, b) ((a) > (b) ? (a) : (b))

- Third error correction
  - a++ ran twice
    - It should first take a for comparison, then perform a+1
  - Solution: Assign the pre-increment value to a variable; define it as an expression generated by a code segment

#define MAX(a, b) ({\
    __typeof(a) _a = (a);\
    __typeof(b) _b = (b);\
    _a > _b ? _a : _b;\
})

Does __typeof(a++) execute?
- When used in an expression, it does not execute, it just retrieves its type
({}) expression
- The value is the last expression's value in parentheses, and the data type is that of the last expression
- Refer to C language's use of parentheses and braces in conjunction with && compound statements-CSDN
- Removing {} or () will not work!
  - () can only contain comma expressions
    - The comma expression takes the value of the last expression
  - {} is a code segment, which has no return value
    - If directly defined as a macro function
    - Directly printf output would lose the essence of MAX returning the maximum value
    - ❓ Use return to return value
      - Needs declaration
      - But using an expression to represent the return value is smarter
Final Version Code

Result is all correct!

Implement LOG macro printing#

Code
- When delivering work, to omit log information, conditional compilation can be used
  - Use "gcc -DDEBUG" during compilation to pass in the DEBUG macro definition
  - Define an empty macro replacement after #else
- Variadic Macros
  - Variadic '...' just needs a name. eg. args...
- Usage of "#" and "##" in macros
  - "#" : Take string
  - "##" : Concatenate
    - Solve the case where the log macro input is an empty character: If empty, the comma after frm will be automatically removed, refer to the precompiled result 👇
    - Refer to Text Replacement Macros-cppreference
    - The difference between ab and a##b
      - ab will not substitute a and b before concatenation
      - The parameters corresponding to a and b that need to be concatenated can be undefined
        
        The preprocessor will concatenate these undefined variable names, and the defined variable name after concatenation is sufficient

Highlights Notes#

Code Demonstration#

Prime Sieve#

Code

Dual logic, control indentation
- if continue
The front part of the prime array can simultaneously count and store (used to mark overall)
- Initially, prime[0] and prime[1] are not used, left idle
- There will be no overtaking issue: even numbers are definitely composite, and prime numbers appear at least every other time
The initial condition of the second for loop can be optimized
- Initial Condition: i * 2 → i * i
- Time efficiency: O(Nlogn)→O(Nlognlogn)
- Because composite numbers smaller than i * i must have been marked by numbers smaller than i
  - This reduces the number of duplicate markings in the earlier part, but not completely
  - To completely avoid duplicate marking, a linear sieve must be used
- Danger point: i * i grows exponentially, more likely to overflow the maximum value that int can store
  - Can a judgment be added in advance? But it may lead to greater time consumption
  - Change to a data type with a higher upper limit to store i * i

Arrays#

Code

Output
- Details 👇

Declaration and Initialization#

Programming Tip: Open 5 more positions to avoid overflow, which can reduce the bug rate

#define max_n 100
int arr[max_n + 5];

The variable space in functions is all defined in the stack area
- The stack area can be understood as a badminton tube, last in first out
- Default stack size: 8MB≈8000000Byte (bytes)
Arrays declared inside functions may be uninitialized and need manual initialization
- = {0};
- scanf input

for (int i = 0; i < max_n; i++) {
scanf("%d",arr + i);
}

- arr + i is equivalent to &arr[i]& cannot be omitted!
Arrays declared outside functions
- The system will automatically clear them to 0, similar to the = {0} operation
- The size that can be allocated is limited by the computer's memory, essentially unlimited
Arrays defined by malloc, calloc, realloc are in the heap area, even if defined inside functions

Space Occupation and Address#

The control character corresponding to sizeof() is %lu, unsigned long
The array name arr represents the address of the first element of the array, which is &arr[0]

printf("%p %p\n", arr, &arr[0]); // Address values are the same

Address + 1 offsets how many bytes depend on the element byte size represented by the address

Parameter Passing#

Condition: The external and function parameter representations must be consistent
- One-dimensional array: int arr[100];
  - void func(int *num)
- Two-dimensional array: int arr2[100][100];
  - void func2(int **num) will warn
    - num + 1 and arr + 1 have inconsistent representations
    - num is a pointer to an int pointer
    - The address difference between num and num + 1 is the size of one pointer, which is 8 bytes in a 64-bit operating system
    - The address difference between arr and arr + 1 is the size of one one-dimensional array, which is 4 * 100 bytes
  - void func2(int (*num)[100]) is possible
- Three-dimensional array: int arr3[100][100][32]
  - void func3(int (*num)[100][32]) is possible
  - (*num)[100][32] can be written as num[][100][32]
    - (*num) and num[] have the same representation, but they are essentially different
❓ Is q pointing to nil? That is 0x0

Additional Knowledge Points#

An array can represent three types of information
int arr[100]; // When arr is passed as a parameter to a function, it is the address of the first element of the array
Is 1 negated to -2? Derive it
- 1: 0000...001
- Negation: 1111...110
- Sign bit remains unchanged, negation plus one gives: 1000...010
- That is -2
- Conclusion: The bitwise negation of all positive integers is their own +1's negative number
For source files containing <math.h>, add -lm during compilation, using gcc *.c -lm
Coding habit: No more than 80 characters per line
Control character "%X": Uppercase hexadecimal

Points to Ponder#

Who is more powerful, functions or macros?
- Macros can generate functions; aren't macros the father of functions?

Tips#

Modify the format for automatic completion of header files in vim: add spaces
- Enter the ~/.vimrc file and modify the corresponding format in SetTitle()
Explore macro expansion and nesting yourself
Refer to the reference book chapters 8.1, 8.4, and chapter 15

Bo2SS

5 Arrays and Preprocessing Commands

Course Content#

Definition and Initialization of Arrays#

Prime Sieve#

Linear Sieve#

Binary Search Method#

Newton Iteration Method#

Preprocessing Commands#

Strings#

In-Class Exercises#

Implement a BUG-free MAX macro#

Implement LOG macro printing#

Highlights Notes#

Code Demonstration#

Prime Sieve#

Arrays#

Declaration and Initialization#

Space Occupation and Address#

Parameter Passing#

Additional Knowledge Points#

Points to Ponder#

Tips#

Course Notes#