Bo2SS

Bo2SS

5 Arrays and Preprocessing Commands

Course Content#

Definition and Initialization of Arrays#

  • Definition: A sequential collection of the same type of elements of a fixed size
  • Initialization
    • int a[100] = {2, 3}; // a[0] = 2, a[1] = 3
    • int a[100] = {0}; // Array clearing operation: initializes every element to 0
      • Actually only defines the 0th element as 0, but causes other elements to be automatically initialized to 0
  • Others
    • Starts from index 0
    • Size of array space: total size of all variables
    • Storage space is contiguous
      • int a[3]
Address0123456789[10][11]
Elementa[0]a[1]a[2]
    • The arrays defined above are all static arrays
      • Not recommended operation: int n; scanf("%d", &n); int arr(n);
    • Index-value mapping
      • Given the index, the time efficiency of accessing the value is O(1)

Prime Sieve#

Core Idea: Use prime numbers to eliminate all composite numbers

  • Learn as an algorithm framework, with many variations
  • Prime numbers: only have 1 and itself as factors
    • 1 is neither a prime nor a composite number
    • 2 is the smallest prime number
    • Prime numbers must be odd (except for 2), but odd numbers are not necessarily prime
  • Utilizing array structure. An array can represent three types of information:
    • Index
    • Value mapped by the index
    • Sign of the value
  • Algorithm measurement standards and performance
    • Space complexity: O(N), an array of size N is created for N numbers
    • Time complexity: O(N * logN), O(N * loglogN)--optimized version, approaching O(N)
      • Not O(N) due to duplicate marking
      • The base of the logarithm is not important; base 2 is already small
      • How to calculate: amortization idea, calculating expected value based on time complexity and probability...
  • Marking: Prime, 0; Composite, 1
    • Utilizing two states of the array: index → number, value → mark
  • Prime sieve extension
    • Find the minimum and maximum prime factors of all numbers within a range
    • For example, for 12: 2, 3

Linear Sieve#

img

  • Start from Euler's 7th problem
    • Find the 10001st prime number
    • Can use enumeration method O(N * sqrt(N)), prime sieve
    • Estimate how large the 10001st prime number will be?
      • Rule: The range of primes ranked within 200,000 will not exceed 20 times their rank
  • Thoughts from the prime sieve
    • How many times is 6 marked? 2 times: 2, 3
    • How many times is 30 marked? 3 times: 2, 3, 5
    • If a number N's prime factorization contains m different primes, how many times is N marked? m times
    • Can it be marked only once? 👇
  • Linear sieve
    • Both space and time are O(n)
    • Use an integer M to mark composite number N
    • The relationship between M and N satisfies four properties
      1. The smallest prime factor of N is p
      2. N = p * M
      3. ❗ p must be less than or equal to the smallest prime factor in M
        • Similar to the principle of capturing Meng Huo, only mark on the last occasion
      4. Use M * P' (the set of all primes not larger than the smallest prime of M) to mark N1, N2, ...

[i and iii are somewhat equivalent]

  • Code
    • img
    • Core idea: N = M (↑ as large as possible) * p (↓ as small as possible)
      • Enumerate M, gradually increase p until p meets the termination condition: property iii
      • Compared to M, p is better controlled!
    • The key to avoiding duplication is property iii
    • Note: Although there are two layers of for loops, the time complexity is O(N) because of the break in the second layer for loop, the entire array is traversed once
    • The difference from optimizing the initial condition of the prime sieve to i * i is? The prime sieve still has duplicate marking situations

Binary Search Method#

  • Sequential search → binary search: O(N) → O(logN)

  • Key premise: The search sequence must be monotonic

  • Code

    • img
    • img
    • Mainly demonstrates three parts: ① Binary search in an array ② Binary search in a function ③

    • For the implementation part of binary search in a function

      • Should be able to unify binary_search_f_i and binary_search_f_d into one function
      • Focus on later learning, can you identify the return type of the passed function pointer?
      • However, there are differences in the implementation part, so the significance should not be great
    • The commented code in the main function tests the results of binary search in arrays and functions, as follows:

    • img

      • Using a function for binary search can be more flexible
    • Double type equality: use the difference less than EPSL to judge

      • Remember to #undef EPSL
    • Details are in the comments

      • Learning order: binary_search 👉 binary_search_f_i 👉 binary_search_f_d
  • Recursive Version Code

    • img
    • The parameters of the function differ from the iterative method, and the search boundaries need to be determined

Newton Iteration Method#

  • img
  • Purpose: To solve for the value of x when f(x) = 0

  • Conditions: The function must be differentiable; convergence

  • Iteration Formula: x2 = x1 - f(x1) / f'(x1)

  • Core idea: Each iteration brings x2 closer to the required value of x until the absolute value of f(x) is smaller than EPSL

  • Code

    • img
    • ⭐ Remember to take the absolute value for the loop termination condition!

    • Is x = n / 2.0 this initial value fixed?

      • No
      • This value is an imagined x closest to the solution, which is fine on both sides of the solution
      • However, it cannot be 0, as the derivative would be 0, leading to a division by 0
    • Wrap the Newton function to obtain the my_sqrt function

    • This is not binary search! Binary search requires monotonicity~

Preprocessing Commands#

  • Commands starting with #
    • #include <stdio.h> → During preprocessing, the stdio.h file is copied to the current location unchanged
  • Macro Definitions
  1. Define symbolic constants: constant → symbol.
#define PI 3.1415926
#define MAX_N 1000
  • Convenient for batch modification of values
  1. Foolish expressions
#define MAX(a, b) (a) > (b) ? (a) : (b)
#define S(a, b) a * b
  • Foolish eg. #define S(a, b) a * b
    • S(2 + 3, 4 + 5) → Simple replacement → 2 + 3 * 4 + 5 is equivalent to 2 + (3 * 4) + 5
    • No calculations are done during the replacement!
  1. Define code segments (advanced)
#define P(a) { \
    printf("%d\n", a); \
}
  • Macros can only accept one line of definition, and a newline requires the use of a backslash '' (continuation character)

  • Predefined macros (system encapsulated)

    • img
    • The two ends of the macro are two underscores

    • DATA, TIME: The date and time at pre-compilation, which do not change without preprocessing

      • Can achieve functions similar to fingerprint recognition? Pre-stored relationships?
    • Non-standard macros: Some environments may not define them, and different environments may have different standards, so portability is poor

      • How to handle this? Conditional compilation can be used 👇
  • Conditional Compilation

    • #ifdef DEBUG
      • Explanation: Whether the DEBUG macro is defined
      • Note: DEBUG must be a macro, not a variable!
        • Macros take effect after preprocessing, while variables take effect after compilation
    • #ifndef DEBUG Whether not defined
    • #if MAX_N == 5 Whether the macro MAX_N equals 5
      • Can be used to check the user's mobile version in games
    • #elif MAX_N == 4
    • #else
    • Must end with #endif**!!
      • 【Coding Specification】
      • Similar to va_start, va_end
  • Simple Compilation Process

    • img
    • C Source Code .c

    👇 Preprocessing Stage (Pre-compilation) (gcc -E): Includes header files, macro replacements, and other preprocessing commands, replacing all # commands

    • Compile Source Code .c

    👇 Syntax checking, compilation (gcc –S)

    • Assembly Source Code .s, .asm

    👇 Assembly (gcc -c)

    • Object File .o (under Linux) or .obj (under Windows)

    👇 Linking (gcc)

    • Executable Program a.out (under Linux) or a.exe (under Windows) (binary file)

    PS: Finally, there is a loading process, mainly to establish the mapping relationship between the executable file, virtual address, and physical address

Strings#

  • Character Arrays
    • Definition: char str[size];
    • Initialization
char str[] = "hello world";
char str[size] = {'h', 'e', 'l', 'l', 'o'};
  • Line1

    • The size in [] can be omitted, and the system will calculate it automatically
    • The system will automatically add a terminator '\0', the size of the character array: 11+1
  • Line2

    • Size must be at least 6, considering the terminator '\0' occupies one position
      • Extra positions are filled with '\0'
    • If size is not added, '\0' must be added at the end of the array, as the system will not add it automatically
  • '\0'

    • A termination marker
    • Corresponds to 0 at the lower level: false
    • '\0' can serve as a termination condition, such as: for loop reading a string
  • Is there a '\0' after the character? No, it is a characteristic of strings

  • Related Operations

    • img
    • Header File: <string.h>

    • strlen(str)

      • Returns the index of character '\0', length does not include '\0'
      • PS: sizeof() considers the actual memory occupied by the variable, including '\0', measured in bytes
    • strcmp(str1, str2);

      • Compares ASCII values in dictionary order
        • Starts from the first position
      • Return value is the difference (0 if equal)
    • strcpy(dest, src);

    • The end markers for comparison and copying: look for character '\0', what if '\0' is lost? Hence there are

    • Safer comparisons and copies: strncmp, strncpy

      • For cases where '\0' might be accidentally lost
      • Compare and copy at most n bytes
    • Memory Related

      • memset(str, c, n)

      • Not limited to string operations, str corresponds to an address

        • Here is an example of operations on arrays

        • memset(arr, 0, sizeof(arr))

          • Initializes, clears to 0
        • memset(arr, 1, sizeof(arr))

          • This does not set every position to 1, because it operates on each byte, while int occupies 4 bytes
          • 0000...01000...001000....0001...
        • memset(arr, -1, sizeof(arr))

          • It is indeed -1, because -1 in 1 byte is 11111111
  • img
  • Header File: <stdio.h>

  • The combination of both can perform format concatenation: one for input, one for output

  • The str1 in scanf is analogous to terminal input, which is replaced with string input

In-Class Exercises#

Implement a BUG-free MAX macro#

  • img
  • Idea: ① Calculate correct output manually 👉 ② First write the simplest implementation with BUG and have friendly output display 👉 ③ Find and solve problems sequentially for erroneous outputs

  • ① Correct output as shown

  • ② First write with BUG, and implement friendly output display

#define MAX(a, b) a > b ? a : b
#define P(func) { \
    printf("%s = %d\n", #func, func); \
}
  • Line2-4: Friendly output display
    • #func directly outputs its string representation
    • Not adding # means calling its value
  • ③ Find and solve problems sequentially for erroneous outputs
    • (The order of solving BUGs is also important, but here the order of BUGs is good)

    • At this point, check the erroneous situation

    • img
    • Errors caused by macros can be seen in the preprocessed file

      • gcc -E *.c outputs the code after macro replacement
    • First error correction

      • img
      • Solution: You can use parentheses to increase the precedence in the red box
#define MAX(a, b) (a > b ? a : b)
    • Second error correction
      • img
      • The fourth result is 2, as shown in the above image

      • Note the conditional operator '?:'

        • Its precedence is very low: lower than '>'

        • When there are multiple '?:', its associativity is from right to left

        • img
      • Solution: Use parentheses for each variable to increase precedence

#define MAX(a, b) ((a) > (b) ? (a) : (b))
    • Third error correction
      • img
      • a++ ran twice

        • It should first take a for comparison, then perform a+1
      • Solution: Assign the pre-increment value to a variable; define it as an expression generated by a code segment

#define MAX(a, b) ({\
    __typeof(a) _a = (a);\
    __typeof(b) _b = (b);\
    _a > _b ? _a : _b;\
})
  • Does __typeof(a++) execute?

    • When used in an expression, it does not execute, it just retrieves its type
  • ({}) expression

    • The value is the last expression's value in parentheses, and the data type is that of the last expression

    • Refer to C language's use of parentheses and braces in conjunction with && compound statements-CSDN

    • img
    • Removing {} or () will not work!

      • () can only contain comma expressions
        • The comma expression takes the value of the last expression
      • {} is a code segment, which has no return value
        • If directly defined as a macro function
        • Directly printf output would lose the essence of MAX returning the maximum value
        • ❓ Use return to return value
          • Needs declaration
          • But using an expression to represent the return value is smarter
  • Final Version Code

  • img

​ Result is all correct!

  • img

Implement LOG macro printing#

  • img
  • img
  • Code

    • img
    • When delivering work, to omit log information, conditional compilation can be used

      • Use "gcc -DDEBUG" during compilation to pass in the DEBUG macro definition
      • Define an empty macro replacement after #else
    • Variadic Macros

      • Variadic '...' just needs a name. eg. args...
    • Usage of "#" and "##" in macros

      • "#" : Take string
      • "##" : Concatenate
        • Solve the case where the log macro input is an empty character: If empty, the comma after frm will be automatically removed, refer to the precompiled result 👇

        • img
        • Refer to Text Replacement Macros-cppreference

        • img
        • The difference between ab and a##b

          • ab will not substitute a and b before concatenation
          • The parameters corresponding to a and b that need to be concatenated can be undefined
            • The preprocessor will concatenate these undefined variable names, and the defined variable name after concatenation is sufficient

Highlights Notes#

Code Demonstration#

Prime Sieve#

Code

  • img
  • Dual logic, control indentation

    • if continue
  • The front part of the prime array can simultaneously count and store (used to mark overall)

    • Initially, prime[0] and prime[1] are not used, left idle
    • There will be no overtaking issue: even numbers are definitely composite, and prime numbers appear at least every other time
  • The initial condition of the second for loop can be optimized

    • Initial Condition: i * 2 → i * i
    • Time efficiency: O(Nlogn)→O(Nlognlogn)
    • Because composite numbers smaller than i * i must have been marked by numbers smaller than i
      • This reduces the number of duplicate markings in the earlier part, but not completely
      • To completely avoid duplicate marking, a linear sieve must be used
    • Danger point: i * i grows exponentially, more likely to overflow the maximum value that int can store
      • Can a judgment be added in advance? But it may lead to greater time consumption
      • Change to a data type with a higher upper limit to store i * i

Arrays#

Code

  • img
  • Output

    • img
    • Details 👇

Declaration and Initialization#

  • Programming Tip: Open 5 more positions to avoid overflow, which can reduce the bug rate

#define max_n 100
int arr[max_n + 5];

  • The variable space in functions is all defined in the stack area
    • The stack area can be understood as a badminton tube, last in first out
    • Default stack size: 8MB≈8000000Byte (bytes)
  • Arrays declared inside functions may be uninitialized and need manual initialization
    • = {0};
    • scanf input

for (int i = 0; i < max_n; i++) {
scanf("%d",arr + i);
}

    • arr + i is equivalent to &arr[i]& cannot be omitted!
  • Arrays declared outside functions
    • The system will automatically clear them to 0, similar to the = {0} operation
    • The size that can be allocated is limited by the computer's memory, essentially unlimited
  • Arrays defined by malloc, calloc, realloc are in the heap area, even if defined inside functions

Space Occupation and Address#

  • The control character corresponding to sizeof() is %lu, unsigned long
  • The array name arr represents the address of the first element of the array, which is &arr[0]

printf("%p %p\n", arr, &arr[0]); // Address values are the same

  • Address + 1 offsets how many bytes depend on the element byte size represented by the address

Parameter Passing#

  • Condition: The external and function parameter representations must be consistent
    • One-dimensional array: int arr[100];
      • void func(int *num)
    • Two-dimensional array: int arr2[100][100];
      • void func2(int **num) will warn
        • num + 1 and arr + 1 have inconsistent representations
        • num is a pointer to an int pointer
        • The address difference between num and num + 1 is the size of one pointer, which is 8 bytes in a 64-bit operating system
        • The address difference between arr and arr + 1 is the size of one one-dimensional array, which is 4 * 100 bytes
      • void func2(int (*num)[100]) is possible
    • Three-dimensional array: int arr3[100][100][32]
      • void func3(int (*num)[100][32]) is possible
      • (*num)[100][32] can be written as num[][100][32]
        • (*num) and num[] have the same representation, but they are essentially different
  • ❓ Is q pointing to nil? That is 0x0

Additional Knowledge Points#

  • An array can represent three types of information
  • int arr[100]; // When arr is passed as a parameter to a function, it is the address of the first element of the array
  • Is 1 negated to -2? Derive it
    • 1: 0000...001
    • Negation: 1111...110
    • Sign bit remains unchanged, negation plus one gives: 1000...010
    • That is -2
    • Conclusion: The bitwise negation of all positive integers is their own +1's negative number
  • For source files containing <math.h>, add -lm during compilation, using gcc *.c -lm
  • Coding habit: No more than 80 characters per line
  • Control character "%X": Uppercase hexadecimal

Points to Ponder#

  • Who is more powerful, functions or macros?
    • Macros can generate functions; aren't macros the father of functions?

Tips#

  • Modify the format for automatic completion of header files in vim: add spaces
    • Enter the ~/.vimrc file and modify the corresponding format in SetTitle()
  • Explore macro expansion and nesting yourself
  • Refer to the reference book chapters 8.1, 8.4, and chapter 15

Course Notes#

Loading...
Ownership of this post data is guaranteed by blockchain and smart contracts to the creator alone.