Phrase Frequency Counter

Overview

  • Takes the output of the autocomplete tools as input
  • Produces a sorted list of the most common phrases and how many times they have occurred within the input file
  • Also produces a list of the top N=20 phrases
  • Both written in Python

PHP & Python

  • Two versions
  • Difference due to differences in delimiting of data
  • Versions are separate and can be used independently

PHP

frequencycount.py

*toptermfinder.py

  • Takes the resulting top N terms from all input files and consolidates them into one file, finding the phrases with the most occurrences overall

Python

frequencycount_forpython.py

PHP ver.

Input:

Output:

$ python frequencycounter.py <input> <output>

PHP cont

(in same directory) Another test file: 

Output:

PHP

Convert output (N = 20)

toptermfinder.py takes all of these lists in one directory and finds the most common phrases

directory: USArest

Python

$ python frequencycounter_forpython.py <input1> <input2> <output>

input1

input2

Python cont

output is similar to PHP ver.

Made with Slides.com