Course Content#
Sorting#
- Four Quadrant Principle: Stable/Unstable, Internal/External
- Stable: The relative position of identical elements remains unchanged before and after sorting
- Internal: Requires loading all data into memory for sorting
[Consider sorting from smallest to largest]
Stable Sorting#
- Insertion (Internal)
- Front: Sorted area; Back: Unsorted area
- The numbers at the back find their position in the front to insert
- Compare and swap one by one to achieve the effect of [insertion]
- Not directly swapping with the insertion position, as this would disrupt the order of the sorted area
- Average time complexity: O(N^2)
- Bubble (Internal)
- Front: Unsorted area; Back: Sorted area
- Move from the first element backward, find the maximum element and place it at the front of the sorted area and the end of the unsorted area
- Perform N - 1 rounds of bubbling, adding one element to the sorted area each round
- [Optimization] If a round of bubbling does not perform any swap operations, it can be terminated directly
- Average time complexity: O(N^2)
- Best time complexity: O(N) ← already sorted
- Worst time complexity: O(N^2) ← completely reversed
- Insertion sort can be understood as reverse bubbling
- Merge (External)
- Introduction: Merging two sorted arrays (of lengths m and n) into one sorted array
- Time complexity: O(m + n)
- First divide and conquer until pairs can be compared, then merge the sorted arrays in pairs
- Time complexity (best, worst, average): O(NlogN)
- A total of logN layers, with a merging time of O(N) per layer
- [External Sorting] 2-way merge (above), k-way merge (multi-way merge, using heaps)
- Refer to External Merge Sort - Wikipedia, the sorting method for each route can be arbitrary, ultimately merging
- Whether this is external sorting depends on whether you want or need it, but it can be done
- Introduction: Merging two sorted arrays (of lengths m and n) into one sorted array
Unstable Sorting#
- Selection (Internal)
- Front: Sorted area; Back: Unsorted area
- Each round selects the smallest element from the unsorted area and swaps it with the first element of the unsorted area
- There may be cases of swapping itself with itself, so the XOR function cannot be used for swapping
- Unstable: For example, 22'1, after the first swap, 2 moves behind 2', resulting in 12'2
- Time complexity (best, worst, average): O(N^2)
- Generally better than bubble sort, the number of comparisons is similar, but bubble sort has too many swaps
- Quick Sort (Internal)
- [Pivot, partition]
- Take the first element as the pivot
- [Tail pointer] first looks for elements less than the pivot from the back to place in the position of the first element (which is empty), [Head pointer] then looks for values greater than the pivot from the front to place in the newly emptied position, looping
- Finally, the head and tail pointers coincide, pointing to an empty position, where the pivot is placed
- Then perform the above operation on the left and right parts of the pivot
- Time complexity: O(NlogN)
- When the first element is chosen as the pivot in a reverse order sequence, it degenerates into selection sort, O(N^2)
- [Optimization]
- Randomly select the pivot
- Reduce the use of recursion, use loops instead
- Swap only after finding a value to swap on both sides
- [Pivot, partition]
Searching#
- Naive Binary Search
- Premise: Monotonic sequence
- If the value to be searched exists multiple times in the sequence, binary search cannot distinguish which one is found
- Special Binary Search
- 11110000
- Introduce a virtual head pointer with an index of -1
- Mid calculation needs to add 1 to avoid infinite loops, such as in the case of 10
- 00001111
- Introduce a virtual tail pointer with an index of n [array range from 0 to n - 1]
- Refer to “Interview and Written Test Algorithms” - 2. Binary Search Topic
- Here, an additional concept of virtual pointer is introduced, which reflects the situation when a value cannot be found by changing the search boundaries [-1 ~ n -1 or 0 ~ n]
- 11110000
- Ternary Search
-
- Solves the problem of [finding the extreme points of concave and convex functions]
- Divides the scale of the original problem into one-third
- Update strategy: Retain the smaller value element and the interval of the other end; stopping condition: |m1 - m2| < ESPL
- Time complexity: O(log[3]N), slightly slower than binary search O(log[2]N)
-
Hash Table#
- A [data structure] used for searching
- Structure definition
- Requires a contiguous space to store elements
- Consistent with sequential lists, element types can vary
- Structure operations
- Hash function: Can map any type of element to an integer value [array index]
- Then a certain value can be stored at the corresponding array index
- Array indices can only be integers
- Conflict handling methods
- Open addressing [commonly used]: Look for empty positions ahead, such as quadratic probing... but can easily lead to data clustering issues
- Rehashing: Also known as hashing method, using multiple hash functions
- Chaining method [commonly used]: Use a linked list structure to store elements at each position
- Establish a common overflow area: Place conflicting elements in a specific area
- Hash function: Can map any type of element to an integer value [array index]
- Time complexity for searching: Approaches O(1)
- Other time-consuming factors: Hash function conversion, chaining method [searching linked list elements] or other conflict handling methods
- Only arrays and sequential lists are O(1)
- The space utilization rate of hash tables is generally 50-90%
- Hash functions will always have conflicts, so space needs to be reserved when opening
- When the utilization rate reaches 70%, it can be used industrially; fewer conflicts lead to higher utilization, and better hash functions
Code Demonstration#
Stable Sorting#
-
-
-
-
See comments and red box markings for details
-
To ensure the stability of stable sorting, pay attention to the case of ==, do not swap [or ensure the original order]!
-
[Note] When calling the TEST macro, the latter array is num, which is an array obtained by copying arr in the macro definition!
Unstable Sorting#
V1.0 Version: Selection Sort & Ordinary Quick Sort#
-
-
-
Focus on the quick sort algorithm, which has many optimization points
- [Self-optimization] The conditions num[] > z or num[] < z on lines 46 and 48 can add == conditions, so that when encountering equal elements, there will be no swap, which is unnecessary and can help ensure algorithm stability [although quick sort is unstable]
- [Points to optimize] Pivot selection, recursion → loop, swapping method
V2.0 Version: Optimized Quick Sort#
-
-
Implements the three points mentioned above for optimization
-
Pay attention to several boundary ==: x <= y, num[] < z, num[] > z
- ❓ How to understand the judgment of ==? See the following thinking point 2
-
Lines 39 and 40: Sort the right interval, and after sorting, reduce the right boundary
Speed Comparison of Two Versions of Quick Sort#
- Test both random sequences and reverse sequences
- Random sequence of size 200,000: The difference between the two is minimal
- Reverse sequence of size 100,000: The difference is significant
-
- For a reverse sequence of 200,000, the original version of quick sort will cause a stack overflow
-
Binary Search#
-
-
The use of virtual pointers can avoid returning false answers [0 or n - 1] when there is no answer [000000]
-
In the case of 11110000, when calculating mid, add 1 to avoid falling into an infinite loop, such as in the case of 10
-
For details, see “Interview and Written Test Algorithms” - 2. Binary Search Topic
Establishing a Hash Table for Strings#
-
Hash function: BKDR-Hash, conflict handling method: Chaining method
-
-
-
The head insertion method is used when inserting values
-
The utilization rate of the hash table structure cannot reach 100%, so the allocated space needs to be doubled
-
Using calloc to allocate hash table space can ensure that the head address of each position's linked list is set to null, ensuring safety
-
Hash functions can be designed by oneself
Thinking Points#
- [My understanding] To ensure the stability of stable sorting, pay attention to the case of ==, do not swap [or ensure the original order]!
- ❓ Thoughts on the boundaries in the optimized version of quick sort
-
- Similarly, according to the ordinary version of quick sort to optimize boundary judgments, the following optimization ideas have problems, while the ordinary version can be implemented
- Lines 32 - 33: Changing num[] < z and num[] > z to <= and >= will cause [segmentation fault or infinite loop]
- For num[] <= z [infinite loop]
- Example: 2 1, with the pivot value of 2, the left pointer x keeps moving to the right end, going out of bounds, while the right pointer y remains unchanged, line 27 while (l < r) holds, x returns to l, resulting in an infinite loop
- For num[] >= z [segmentation fault - stack overflow]
- Example: 1 2, with the pivot value of 1, the right pointer y keeps moving to the left end, becoming -1, while the left pointer x and r remain unchanged, line 39 quick_sort(num, x, r) keeps recursively calling, eventually causing a stack overflow
- Analysis of the reason: The ordinary version takes the pivot value out, while the optimized version keeps the pivot value inside, which needs to be considered
- For num[] <= z [infinite loop]
- Lines 32 - 34: Changing x <= y to x < y will also cause [segmentation fault]
- Example: 2 3, with the pivot value of 2, at this point x and y point to 2 and 3 respectively, after looping, both x and y point to 2, while x and r remain unchanged, line 39 quick_sort(num, x, r) keeps recursively calling, eventually causing a stack overflow
- Analysis of the reason: x and y need to move apart, otherwise it resembles the special binary search 111000 not +1 situation
- Summary: Both x and y need to move; if x does not move, it leads to stack overflow, and if y does not move, it leads to infinite loop
- [Personal understanding, welcome to discuss]
- Lines 32 - 33: Changing num[] < z and num[] > z to <= and >= will cause [segmentation fault or infinite loop]
-
- Ternary Search - Practice Problem
-
- First determine the boundaries: INT32_MIN ~ INT32_MAX
- Then follow the steps of ternary search
- Boundary considerations
- Estimate based on the formula - b / 2a, but it seems a bit suspicious
- Determine the range based on the two solutions of the quadratic function = 0, but it complicates things
- The efficiency of ternary search is at the logN level, and the size of the boundary range does not significantly affect it
-
Tips#
-
- Top Ten Classic Sorting Algorithms (Animated Demonstration) - cnblogs
- [Top Ten Classic Sorting Algorithms Animation and Analysis, just watch me] (https://www.cxyxiaowu.com/2026.html) - Learn algorithms in five minutes
Course Notes#
- Preview heap and priority queue, as well as the content of disjoint sets in advance