Course Content#
Sorting#
- Four Quadrant Principle: Stable/Unstable, Internal/External
- Stable: The relative position of identical elements remains unchanged before and after sorting
 - Internal: Requires loading all data into memory for sorting
 
 
[Consider sorting from smallest to largest]
Stable Sorting#
- Insertion (Internal)
- Front: Sorted area; Back: Unsorted area
 - The numbers at the back find their position in the front to insert
- Compare and swap one by one to achieve the effect of [insertion]
 - Not directly swapping with the insertion position, as this would disrupt the order of the sorted area
 
 - Average time complexity: O(N^2)
 
 - Bubble (Internal)
- Front: Unsorted area; Back: Sorted area
 - Move from the first element backward, find the maximum element and place it at the front of the sorted area and the end of the unsorted area
 - Perform N - 1 rounds of bubbling, adding one element to the sorted area each round
- [Optimization] If a round of bubbling does not perform any swap operations, it can be terminated directly
 
 - Average time complexity: O(N^2)
- Best time complexity: O(N) ← already sorted
 - Worst time complexity: O(N^2) ← completely reversed
 
 - Insertion sort can be understood as reverse bubbling
 
 - Merge (External)
- Introduction: Merging two sorted arrays (of lengths m and n) into one sorted array
- Time complexity: O(m + n)
 
 - First divide and conquer until pairs can be compared, then merge the sorted arrays in pairs
 - Time complexity (best, worst, average): O(NlogN)
- A total of logN layers, with a merging time of O(N) per layer
 
 - [External Sorting] 2-way merge (above), k-way merge (multi-way merge, using heaps)
- Refer to External Merge Sort - Wikipedia, the sorting method for each route can be arbitrary, ultimately merging
 - Whether this is external sorting depends on whether you want or need it, but it can be done
 
 
 - Introduction: Merging two sorted arrays (of lengths m and n) into one sorted array
 
Unstable Sorting#
- Selection (Internal)
- Front: Sorted area; Back: Unsorted area
 - Each round selects the smallest element from the unsorted area and swaps it with the first element of the unsorted area
- There may be cases of swapping itself with itself, so the XOR function cannot be used for swapping
 
 - Unstable: For example, 22'1, after the first swap, 2 moves behind 2', resulting in 12'2
 - Time complexity (best, worst, average): O(N^2)
 - Generally better than bubble sort, the number of comparisons is similar, but bubble sort has too many swaps
 
 - Quick Sort (Internal)
- [Pivot, partition]
- Take the first element as the pivot
 - [Tail pointer] first looks for elements less than the pivot from the back to place in the position of the first element (which is empty), [Head pointer] then looks for values greater than the pivot from the front to place in the newly emptied position, looping
 - Finally, the head and tail pointers coincide, pointing to an empty position, where the pivot is placed
 - Then perform the above operation on the left and right parts of the pivot
 
 - Time complexity: O(NlogN)
- When the first element is chosen as the pivot in a reverse order sequence, it degenerates into selection sort, O(N^2)
 
 - [Optimization]
- Randomly select the pivot
 - Reduce the use of recursion, use loops instead
 - Swap only after finding a value to swap on both sides
 
 
 - [Pivot, partition]
 
Searching#
- Naive Binary Search
- Premise: Monotonic sequence
 - If the value to be searched exists multiple times in the sequence, binary search cannot distinguish which one is found
 
 - Special Binary Search
- 11110000
- Introduce a virtual head pointer with an index of -1
 - Mid calculation needs to add 1 to avoid infinite loops, such as in the case of 10
 
 - 00001111
- Introduce a virtual tail pointer with an index of n [array range from 0 to n - 1]
 
 - Refer to “Interview and Written Test Algorithms” - 2. Binary Search Topic
 - Here, an additional concept of virtual pointer is introduced, which reflects the situation when a value cannot be found by changing the search boundaries [-1 ~ n -1 or 0 ~ n]
 
 - 11110000
 - Ternary Search
- 
 - Solves the problem of [finding the extreme points of concave and convex functions]
 - Divides the scale of the original problem into one-third
 - Update strategy: Retain the smaller value element and the interval of the other end; stopping condition: |m1 - m2| < ESPL
 - Time complexity: O(log[3]N), slightly slower than binary search O(log[2]N)
 
 - 
 
Hash Table#
- A [data structure] used for searching
 - Structure definition
- Requires a contiguous space to store elements
 - Consistent with sequential lists, element types can vary
 
 - Structure operations
- Hash function: Can map any type of element to an integer value [array index]
- Then a certain value can be stored at the corresponding array index
 - Array indices can only be integers
 
 - Conflict handling methods
- Open addressing [commonly used]: Look for empty positions ahead, such as quadratic probing... but can easily lead to data clustering issues
 - Rehashing: Also known as hashing method, using multiple hash functions
 - Chaining method [commonly used]: Use a linked list structure to store elements at each position
 - Establish a common overflow area: Place conflicting elements in a specific area
 
 
 - Hash function: Can map any type of element to an integer value [array index]
 - Time complexity for searching: Approaches O(1)
- Other time-consuming factors: Hash function conversion, chaining method [searching linked list elements] or other conflict handling methods
 - Only arrays and sequential lists are O(1)
 
 - The space utilization rate of hash tables is generally 50-90%
- Hash functions will always have conflicts, so space needs to be reserved when opening
 - When the utilization rate reaches 70%, it can be used industrially; fewer conflicts lead to higher utilization, and better hash functions
 
 
Code Demonstration#
Stable Sorting#
- 
 - 
 - 
 - 
See comments and red box markings for details
 - 
To ensure the stability of stable sorting, pay attention to the case of ==, do not swap [or ensure the original order]!
 - 
[Note] When calling the TEST macro, the latter array is num, which is an array obtained by copying arr in the macro definition!
 
Unstable Sorting#
V1.0 Version: Selection Sort & Ordinary Quick Sort#
- 
 - 
 - 
Focus on the quick sort algorithm, which has many optimization points
- [Self-optimization] The conditions num[] > z or num[] < z on lines 46 and 48 can add == conditions, so that when encountering equal elements, there will be no swap, which is unnecessary and can help ensure algorithm stability [although quick sort is unstable]
 - [Points to optimize] Pivot selection, recursion → loop, swapping method
 
 
V2.0 Version: Optimized Quick Sort#
- 
 - 
Implements the three points mentioned above for optimization
 - 
Pay attention to several boundary ==: x <= y, num[] < z, num[] > z
- ❓ How to understand the judgment of ==? See the following thinking point 2
 
 - 
Lines 39 and 40: Sort the right interval, and after sorting, reduce the right boundary
 
Speed Comparison of Two Versions of Quick Sort#
- Test both random sequences and reverse sequences
- Random sequence of size 200,000: The difference between the two is minimal
 - Reverse sequence of size 100,000: The difference is significant
- 
 - For a reverse sequence of 200,000, the original version of quick sort will cause a stack overflow
 
 - 
 
 
Binary Search#
- 
 - 
The use of virtual pointers can avoid returning false answers [0 or n - 1] when there is no answer [000000]
 - 
In the case of 11110000, when calculating mid, add 1 to avoid falling into an infinite loop, such as in the case of 10
 - 
For details, see “Interview and Written Test Algorithms” - 2. Binary Search Topic
 
Establishing a Hash Table for Strings#
- 
Hash function: BKDR-Hash, conflict handling method: Chaining method
 - 
 - 
 - 
The head insertion method is used when inserting values
 - 
The utilization rate of the hash table structure cannot reach 100%, so the allocated space needs to be doubled
 - 
Using calloc to allocate hash table space can ensure that the head address of each position's linked list is set to null, ensuring safety
 - 
Hash functions can be designed by oneself
 
Thinking Points#
- [My understanding] To ensure the stability of stable sorting, pay attention to the case of ==, do not swap [or ensure the original order]!
 - ❓ Thoughts on the boundaries in the optimized version of quick sort
- 
 - Similarly, according to the ordinary version of quick sort to optimize boundary judgments, the following optimization ideas have problems, while the ordinary version can be implemented
- Lines 32 - 33: Changing num[] < z and num[] > z to <= and >= will cause [segmentation fault or infinite loop]
- For num[] <= z [infinite loop]
- Example: 2 1, with the pivot value of 2, the left pointer x keeps moving to the right end, going out of bounds, while the right pointer y remains unchanged, line 27 while (l < r) holds, x returns to l, resulting in an infinite loop
 
 - For num[] >= z [segmentation fault - stack overflow]
- Example: 1 2, with the pivot value of 1, the right pointer y keeps moving to the left end, becoming -1, while the left pointer x and r remain unchanged, line 39 quick_sort(num, x, r) keeps recursively calling, eventually causing a stack overflow
 
 - Analysis of the reason: The ordinary version takes the pivot value out, while the optimized version keeps the pivot value inside, which needs to be considered
 
 - For num[] <= z [infinite loop]
 - Lines 32 - 34: Changing x <= y to x < y will also cause [segmentation fault]
- Example: 2 3, with the pivot value of 2, at this point x and y point to 2 and 3 respectively, after looping, both x and y point to 2, while x and r remain unchanged, line 39 quick_sort(num, x, r) keeps recursively calling, eventually causing a stack overflow
 - Analysis of the reason: x and y need to move apart, otherwise it resembles the special binary search 111000 not +1 situation
 
 - Summary: Both x and y need to move; if x does not move, it leads to stack overflow, and if y does not move, it leads to infinite loop
 - [Personal understanding, welcome to discuss]
 
 - Lines 32 - 33: Changing num[] < z and num[] > z to <= and >= will cause [segmentation fault or infinite loop]
 
 - 
 - Ternary Search - Practice Problem
- 
 - First determine the boundaries: INT32_MIN ~ INT32_MAX
 - Then follow the steps of ternary search
 - Boundary considerations
- Estimate based on the formula - b / 2a, but it seems a bit suspicious
 - Determine the range based on the two solutions of the quadratic function = 0, but it complicates things
 - The efficiency of ternary search is at the logN level, and the size of the boundary range does not significantly affect it
 
 
 - 
 
Tips#
- 
 - Top Ten Classic Sorting Algorithms (Animated Demonstration) - cnblogs
 - [Top Ten Classic Sorting Algorithms Animation and Analysis, just watch me] (https://www.cxyxiaowu.com/2026.html) - Learn algorithms in five minutes
 
Course Notes#
- Preview heap and priority queue, as well as the content of disjoint sets in advance