Given a text file, the task is to find all words that are exactly ‘n’ characters long. This can be useful in text analysis, data cleaning, word filtering, or pattern-matching applications.
Example: (Myfile.txt)
Hello, how are you? This is a simple text.
Input: n=3
Output: ['how', 'are', 'you']
Using Regular Expressions
Regular expressions allow you to match patterns in text efficiently. Here we can use \b\w{n}\b to find words of exact length n.
import re
fp = "Myfile.txt"
n = 3
with open(fp, 'r') as f:
text = f.read()
pattern = r'\b\w{' + str(n) + r'}\b'
words = re.findall(pattern, text)
print(f"Words containing {n} characters:")
print(words)
Output:
['how', 'are', 'you']
Explanation:
- text = f.read(): Reads entire file content.
- pattern = r'\b\w{n}\b': Matches words with exactly n characters (\b ensures word boundaries) (\w{n} ensure any word of length n)
- re.findall(pattern, text): Returns a list of all matching words.
Using Split and List Comprehension
This method splits text into words and filters words based on length using list comprehension.
fp = "Myfile.txt"
n = 3
with open(fp, 'r') as f:
text = f.read()
w1 = text.split()
w2 = [w for w in w1 if len(w) == n]
print(f"Words containing {n} characters:")
print(w2)
Output:
['how', 'are', 'you']
Explanation:
- text.split(): Splits the text into words.
- [w for w in w1 if len(w) == n]: Filters words with length exactly n.
Using a Generator Function
Generators yield words one by one, making them memory-efficient for very large files.
fp = "Myfile.txt"
n = 3
with open(fp, 'r') as f:
for line in f:
for w in line.split():
if len(w) == n:
print(w)
Output:
how
are
you