Unit 4: Searching and Sorting
1. Searching
Searching is the process of finding a particular element (the "key") within a collection of elements (like an array or list).
Linear Search
This is the simplest search algorithm. It sequentially checks each element in the list until the target key is found or the end of the list is reached.
- Algorithm:
- Start from the first element (index 0).
- Compare the current element with the
key.
- If they match, return the current index (element found).
- If they don't match, move to the next element.
- If the end of the list is reached, return -1 (element not found).
- Prerequisite: None. The list does not need to be sorted.
- Time Complexity:
- Best Case: O(1) (Key is the first element).
- Worst Case: O(n) (Key is the last element or not present).
- Average Case: O(n).
Binary Search
This is a much more efficient search algorithm, but it has one major prerequisite: the list must be sorted.
It works on the "Divide and Conquer" principle.
- Algorithm (Iterative):
- Set
low = 0 and high = n - 1 (where n is the number of elements).
- Loop while
low <= high:
- Calculate
mid = floor((low + high) / 2).
- Compare the element at
arr[mid] with the key:
- If
arr[mid] == key, return mid (element found).
- If
arr[mid] < key, the key must be in the right half. Set low = mid + 1.
- If
arr[mid] > key, the key must be in the left half. Set high = mid - 1.
- If the loop finishes (
low > high), return -1 (element not found).
- Prerequisite: The array must be sorted.
- Time Complexity:
- Best Case: O(1) (Key is the middle element).
- Worst Case: O(log n) (Key is at the end or not present).
- Average Case: O(log n).
Common Mistake: A common bug is integer overflow when calculating the midpoint. mid = low + (high - low) / 2 is a safer way to write mid = (low + high) / 2 for very large arrays.
Comparison of Linear and Binary Search
2. Sorting
Sorting is the process of arranging elements in a collection in a specific order (e.g., ascending or descending).
Key Sorting Concepts (Stable vs. Unstable)
- In-Place Sort: An algorithm that sorts the data using only a small, constant amount of extra memory (O(1) space). It modifies the original array directly. (e.g., Bubble, Selection, Insertion, Quick Sort).
- Out-of-Place Sort: An algorithm that requires extra memory (e.g., O(n) space) to hold temporary data. (e.g., Merge Sort).
- Stable Sort: A sorting algorithm is stable if it preserves the relative order of equal-valued elements.
Example: If you have (5, 'apple') and (5, 'banana') and you sort by number, a stable sort will guarantee that 'apple' comes before 'banana' in the output. An unstable sort might swap them.
Stable: Bubble, Insertion, Merge Sort.
Unstable: Selection, Quick Sort.
Bubble Sort
The simplest sorting algorithm. It repeatedly steps through the list, compares adjacent elements, and swaps them if they are in the wrong order. This "bubbles" the largest elements to the end.
- Algorithm:
- Outer loop from
i = 0 to n-2.
- Inner loop from
j = 0 to n-i-2.
- Compare
arr[j] and arr[j+1].
- If
arr[j] > arr[j+1], swap them.
(Can be optimized by adding a flag to stop if no swaps occur in an inner loop pass, meaning the array is already sorted).
- Time Complexity:
- Best Case: O(n) (if optimized and array is already sorted).
- Worst/Average Case: O(n²)
- Space: O(1) (In-place).
- Stable: Yes.
Selection Sort
This algorithm divides the list into a sorted sublist (at the beginning) and an unsorted sublist.
- Algorithm:
- Outer loop from
i = 0 to n-2. (i is the boundary of the sorted sublist).
- Assume
min_index = i.
- Inner loop from
j = i+1 to n-1 to find the minimum element in the unsorted sublist.
- If a smaller element is found, update
min_index.
- After the inner loop, swap
arr[i] with arr[min_index].
- Time Complexity: O(n²) (Best, Worst, and Average). It always performs (n-1) passes.
- Space: O(1) (In-place).
- Stable: No (by default).
Selection Sort is useful when memory writes are "expensive," as it performs the minimum possible number of swaps (O(n) swaps).
Insertion Sort
Builds the final sorted array one item at a time. It's like sorting a hand of playing cards. It takes an element from the unsorted part and "inserts" it into its correct position in the sorted part.
- Algorithm:
- Outer loop from
i = 1 to n-1. (Element at arr[i] is the key to be inserted).
- Store
key = arr[i].
- Inner loop (
j = i - 1) moves backward (j >= 0).
- While
arr[j] > key, shift the element: arr[j+1] = arr[j] and decrement j.
- When the loop stops, insert the
key: arr[j+1] = key.
- Time Complexity:
- Best Case: O(n) (If the array is already sorted).
- Worst/Average Case: O(n²)
- Space: O(1) (In-place).
- Stable: Yes.
Insertion sort is very efficient for small datasets and for datasets that are almost sorted.
Merge Sort
A "Divide and Conquer" algorithm. It's one of the most efficient sorting algorithms.
- Algorithm:
- Divide: Recursively divide the array into two halves until you have 1-element arrays (which are trivially sorted).
- Conquer: Merge the sorted subarrays back together. The
merge function is the core:
- It takes two sorted subarrays.
- It creates a temporary array.
- It compares the first elements of both subarrays, copies the smaller one to the temp array, and advances that subarray's pointer.
- It repeats this until one subarray is empty, then copies the rest of the other subarray.
- Finally, it copies the merged (and sorted) data from the temp array back into the original array.
- Time Complexity: O(n log n) (Best, Worst, and Average).
- Space: O(n) (Out-of-place, for the temporary array).
- Stable: Yes.
Quick Sort
Another "Divide and Conquer" algorithm. It is generally the fastest sorting algorithm in practice.
- Algorithm:
- Divide: Pick an element from the array as the pivot (e.g., first, last, or random).
- Conquer: Partition the array. Reorder the array so that all elements smaller than the pivot are to its left, and all elements greater are to its right. The pivot is now in its final sorted position.
- Recurse: Recursively apply Quick Sort to the two subarrays (the one to the left of the pivot and the one to the right).
- Time Complexity:
- Best/Average Case: O(n log n).
- Worst Case: O(n²) (Occurs if the pivot is always the smallest or largest element, e.g., on an already-sorted array).
- Space: O(log n) (In-place sort, but uses O(log n) stack space for recursion).
- Stable: No.
The worst case (O(n²)) is a major drawback. This can be mitigated by choosing a good pivot, such as a random element or the "median-of-three."
Comparison of Sorting Techniques
Key Takeaway:
- For small or nearly sorted lists: Use Insertion Sort.
- For guaranteed O(n log n) performance and stability: Use Merge Sort.
- For the fastest average-case performance: Use Quick Sort (with a good pivot strategy).