Hashmaps & Frequency Counting
Hashmaps & Frequency Counting
Problem Scenario / Typical Use Case
Imagine you are analyzing word occurrences in a large document, and you need to find the most frequent words.
Iterating over the list multiple times to count occurrences is inefficient. Hashmaps & Frequency Counting provide an efficient way to track counts and frequencies in a single pass.
Basic Idea
A hashmap (dictionary) stores key-value pairs, where the key is the element of interest and the value is its count or frequency.
Typical operations include:
- Counting occurrences of numbers, characters, or words.
- Detecting duplicates or missing elements.
- Grouping elements by frequency or type.
Common techniques:
- Single-pass counting.
- Combining hashmap with heaps or sorting to extract top-k elements.
- Using hashmap for lookups to reduce time complexity from O(n²) to O(n).
Advantages
- Efficiency: Frequency counting in O(n) time.
- Versatility: Works for arrays, strings, or more complex objects.
- Supports Complex Queries: Can combine with top-k extraction, sliding windows, or pattern detection.
Limitations / Considerations
- Extra Space Required: O(n) in the worst case for storing counts.
- Hashmap operations rely on good hash functions; collisions can reduce efficiency.
- Order of elements is not guaranteed unless using ordered maps.
Practical Examples / Thought Triggers
- Most Frequent Words: Count frequencies and extract top-k words.
- Detect Duplicates: Quickly check if an element has appeared before.
- Sliding Window Frequency Analysis: Count elements within a moving window.
Thought Triggers:
- Do I need to maintain order while counting, or just frequency?
- Can I combine hashmap with heap or sorting for top-k queries?
- Is the dataset static or streaming? Streaming may require incremental updates.
Implementation Example
Here’s a Python example for counting word frequencies:
from collections import defaultdict
def word_frequencies(words):
freq = defaultdict(int)
for word in words:
freq[word] += 1
return freq
# Example usage
document = ["data", "science", "data", "algorithm", "science", "data"]
frequencies = word_frequencies(document)
print(frequencies)
# Output: {'data': 3, 'science': 2, 'algorithm': 1}
This demonstrates single-pass counting using a hashmap for efficient frequency tracking.
Related Articles
- Top-K Elements / Kth Largest/Smallest Elements: Combine frequency maps with heaps to find the most frequent elements.
- Sliding Window: Frequency counting can be applied in windowed analysis.
- Hashmaps Fundamentals: Understanding collisions, key-value operations, and iteration is essential.