Top-K Elements (Heap / QuickSelect)
Top-K Elements (Heap / QuickSelect)
Problem Scenario / Typical Use Case
Imagine you are analyzing sales data for thousands of products, and you want to find the top 5 best-selling items.
Naively sorting the entire dataset is inefficient when you only need a few elements. The Top-K Elements pattern provides techniques to efficiently extract the largest or smallest k elements without full sorting.
Basic Idea
There are two main approaches:
- Heap-Based Approach: Use a min-heap of size k to maintain the top k elements while iterating through the dataset.
- QuickSelect (Partition-Based): Use a divide & conquer selection method to find the k-th largest element and partition the array.
Typical operations include:
- Finding top/bottom performers (sales, scores, metrics).
- Real-time leaderboards.
- K largest or smallest numbers in streams or arrays.
Advantages
- Efficiency: Reduces the need for full sorting; heaps achieve (O(n \log k)), QuickSelect achieves (O(n)) on average.
- Memory Controlled: Min-heap size is limited to k, saving space for large datasets.
- Versatility: Works for static arrays and dynamic streams with small adjustments.
Limitations / Considerations
- Heap approach requires log k time per insertion, which may be costly for very large k.
- QuickSelect has O(n²) worst-case if pivot selection is poor.
- For dynamic datasets (frequent insertions/deletions), heaps are preferable over QuickSelect.
Practical Examples / Thought Triggers
- Top 5 Best-Selling Products: Maintain a min-heap to track current top sellers.
- Leaderboard Scores: Extract top k scores efficiently in real-time.
- K Smallest/Largest Elements in Streams: Combine heap with sliding window techniques.
Thought Triggers:
- Is the dataset static or dynamic? This affects choice between heap and QuickSelect.
- How large is k relative to n? Small k favors heap, large k may favor sorting.
- Can this combine with hashmaps for counting frequencies before extracting top k?
Implementation Example
Here’s a Python example using a min-heap to find the top k elements:
import heapq
def top_k_elements(arr, k):
min_heap = []
for num in arr:
if len(min_heap) < k:
heapq.heappush(min_heap, num)
else:
if num > min_heap[0]:
heapq.heapreplace(min_heap, num)
return min_heap
# Example usage
sales = [100, 200, 150, 400, 250, 50]
top_3 = top_k_elements(sales, 3)
print(top_3)
# Output: [200, 400, 250] (heap order may vary)
This demonstrates the heap-based top-k extraction, maintaining a running collection of the largest elements efficiently.
Related Articles
- Kth Largest/Smallest Elements: QuickSelect is often used for these problems.
- Hashmaps & Frequency Counting: Useful when top-k is determined by frequency rather than value.
- Heap Fundamentals: Understanding min-heap and max-heap operations is essential.