+1 (315) 557-6473 

Create a Program to Implement Map ADT in Java Assignment Solution.


Instructions

Objective
Write a program to implement map ADT in java.

Requirements and Specifications

Background and Introduction:
Word counting is used in many applications, including text analysis, text indexing, text compression, and cryptography. In this programming assignment, you will read a text file and discover what words appear in that file and how many times each word appears.
In this assignment, you will create a dictionary (Map), inserting each word (String) that appears in the input text file into the dictionary. The dictionary will be implemented as a Map using classes from the JCF (TreeMap and HashMap). You will then determine which words appeared in the file most frequently. You will run the code on 2 input files of different sizes and different content to see which data structure performs better in practice.
Learning Objectives:
  • Increase your familiarity with the Map ADT.
  • Conduct experiments to compare the performance of data structures and their implementations.
  • Understand how different data structures and different inputs can affect application performance.
Tasks:
Write a java assignment program named WordCount.java to perform the tasks described below. (You may create additional classes to help perform this task).
  • The program should prompt for the name of a text file, read the words in the file, and build a List of those words. The program should also ask the user how many of the most frequently occurring words they would like the program to report about. For instance, the user might choose to ask for information about the top 20 most frequent words in the file. (This first part of your program is not timed.)
(The next steps in your program should be timed. Repeat the next steps 10 times and capture the ‘best’ time. Notes about how to capture the run time of an algorithm are provided on Canvas.) 2) Place the words from the word List into a TreeMap of frequency counts for the words.
  • Your program should then extract the top 'N' most frequent words (the number of words requested by the user of the program) and the frequency counts for those words.
(As stated above, you should do steps 2 and 3 in a loop. Run the loop 10 times. In your code, record how long each iteration of the loop requires. In your code, save the ‘best’ (fastest) time.)
Next, repeat the above process using a HashMap instead of a TreeMap.
  • Your program must display the total number of words in the file, the top 'N' words and the count for each of those words, and the elapsed times for the TreeMap and HashMap implementations. If there are ties in the frequency counts for several words, then your program should break ties by listing the tied words alphabetically.
NOTE: You will experiment with 2 different text files to see how the performance of TreeMap and HashMap differ. As an example of a large piece of text, The Adventures of Sherlock Holmes is provided in the project archive (ASH.txt). Some other large text files were also provided for testing your program.
These files were obtained from Project Gutenburg, whose website is http://www.gutenberg.org/. You must also run your test on the largest source code file in your project.
Make the program flexible.
  1. It should be easy for a user to select any text file to process
  2. It should be easy for a user to select the size of the top ‘N’ list
Your program might prompt for the user to enter these choices OR
Your program might accept these choices as command line parameters OR
(For 5% Extra credit: Practice your GUI skills and build a simple GUI for this program. Your GUI must provide controls for the user to select the input file and the top 'N' value. If you use JFileChooser, then it should initially open in the project (where the provided sample input files are), not in some other location on the user’s computer.
Getting Started:
Download the project archive file from Canvas. Import the project into Eclipse. Rename the project by changing ‘username’ to your
What to turn in:
Turn in your project as an archive file on Canvas as you have done for previous assignments.
Turn in an executive summary on Canvas in a file called username-assignment4.txt where 'username' is your The executive summary will be a bit different this time.
Your summary must include the top 10 most frequent words in the ASH.txt file and the count for each of those words. Also include a description of how you extracted the top 10. (What was your algorithm? Which data structure(s) did you use?). The algorithm may use any classes from the JCF (not just HashMap and/or TreeMap).
Your summary must indicate the elapsed time that it took for TreeMap and for HashMap. Do NOT include the time for file IO in your empirical tests. (File IO is MUCH slower than any operations in memory.) Repeat the tests for both of your input files. Indicate the elapsed time when using TreeMap and when using HashMap. Finally, indicate any conclusions that you have reached about the relative performance of these two implementations of the Map ADT.
Grading: 15% executive summary; 60% external correctness and empirical tests; 25% internal correctness. Part of the internal correctness grade will be based on the efficiency of your algorithm for extracting the top 'N' words.
NOTE: You must design your word count program carefully so that it does not undercount the frequency of words. For instance, the following Strings should all be counted as the same word: (case and punctuation should be ignored)
  • hello
  • Hello
  • "hello"
  • hello,hello?
Can you think of other situations which might produce an undercount? If you try to handle EVERY possible condition you will have a difficult time. Perhaps as part of your code development process you could display the contents of your Map and scan it for obvious issues such as those shown above. I do NOT expect you to handle every case, but I do expect you to handle the most common cases.
I suggest that you build several small ‘test’ input files to check the correct functioning of your program.
If you would like to teach yourself something very useful and powerful to help with "String matching", then explore the use of regular expressions. (Use of regular expressions is NOT strictly required in this project, but it can make it easier to do the "String matching" and can help make your code cleaner.) The textbook we are using includes an appendix about regular expressions.
Additional links to information about regular expressions in Java:
http://docs.oracle.com/javase/tutorial/essential/regex/
Sample program output
(You may format your program output differently, but your program must provide the same information):
This program will display a list of the 'n' most frequently occurring words found in an input text file.
This program will also display the time in milliseconds
for the process of placing the words into a Map of frequency counts and then extracting the top 'n'.
The program displays the elapsed time for the algorithm using both HashMap and TreeMap.
This program will prompt for the name of a file and the size 'n' for the most frequent list.
Please enter the name of a text file to read : abc
File not found. Please try again.
Please enter the name of a text file to read : Address.txt
Word list size: 271
Please enter a number between 1 and 100 for the top 'n' list: 10
Best elapsed time to populate the HashMap and extract the top 'n': 0.22 ms. Top 10 list: that (13) the (11) we (10) here (8) to (8) a (7) and (6) can (5) for (5) have (5)
Best elapsed time to populate the TreeMap and extract the top 'n': 0.27 ms.
Top 10 list: that (13) the (11) we (10) here (8) to (8) a (7) and (6) can (5) for (5) have (5)
Source Code
package programs;
import java.io.File;
import java.util.ArrayList;
import java.util.Scanner;
import structures.HashMapWordsDictionary;
import structures.TreeMapWordsDictionary;
import structures.WordsDictionary;
public class WordCount {
/**
* Removes all leading and trailing punctuation and makes sure all words will be
* in lower case.
*
* @param word Word to sanitize
* @return Sanitized word
*/
private static String normalizeWord(String word) {
word = word.toLowerCase();
while (!word.isEmpty() && !Character.isAlphabetic(word.charAt(0)))
word = word.substring(1);
while (!word.isEmpty() && !Character.isAlphabetic(word.charAt(word.length() - 1)))
word = word.substring(0, word.length() - 1);
return word;
}
/**
* Load the words from file. We need to store in-memory because loading from
* file is slow and shouldn't be part of measuring performance.
*
* @param filename Name of file where to load words
* @return List of words
* @throws Exception Exception when failed to process file
*/
private static ArrayList loadWords(String filename) throws Exception {
ArrayList words = new ArrayList<>();
Scanner inFile = new Scanner(new File(filename));
while (inFile.hasNext()) {
String word = normalizeWord(inFile.next());
if (!word.isEmpty())
words.add(word);
}
inFile.close();
return words;
}
/**
* Test the functionality of a dictionary.
*
* @param dictionary Dictionary to be tested
* @param words List of words to load into the dictionary
* @param topN Top N words to retrieve
* @throws Exception when failed to process file
*/
private static void testWordsDictionary(WordsDictionary dictionary, ArrayList words, int topN)
throws Exception {
dictionary.clear();
// Load the words
for (String word : words)
dictionary.addWord(word);
// Get the top N words:
System.out.println("Top " + topN + " list:");
for (String word : dictionary.getTopFrequentWords(topN))
System.out.println(word);
}
/**
* Get the best time to populate and extract the top 'n' of a dictionary
*
* @param dictionary Dictionary to be measured
* @param filename Name of file where to load data
* @param topN Top N words to retrieve
* @throws Exception When failed to process file
* @return Elapsed time in milliseconds
*/
private static double measureWordsDictionary(WordsDictionary dictionary, ArrayList words, int topN)
throws Exception {
long bestElapsedTime = 0;
// Measure the process 10 times and get the best
for (int i = 0; i < 10; i++) {
dictionary.clear();
long startTime = System.currentTimeMillis();
for (String word : words)
dictionary.addWord(word);
dictionary.getTopFrequentWords(topN);
long elapsedTime = System.currentTimeMillis() - startTime;
if (i == 0 || elapsedTime < bestElapsedTime)
bestElapsedTime = elapsedTime;
}
return (double) bestElapsedTime / 1000;
}
/**
* Entry point of the program.
*
* @param args Unused arguments
*/
public static void main(String[] args) throws Exception {
Scanner in = new Scanner(System.in);
// Read input configurations
System.out.print("Please enter the name of a text file to read: ");
String filename = in.nextLine();
System.out.print("Please enter a number between 1 and 100 for the top 'n' list: ");
int topN = Integer.parseInt(in.nextLine());
if (topN < 1 || topN > 100) {
System.out.println("Error: Invalid top 'n'.");
return;
}
// Load the words in memory
ArrayList words = loadWords(filename);
System.out.println("Word list size: " + words.size());
System.out.println();
// Measure the hash map words dictionary
HashMapWordsDictionary hashMapWords = new HashMapWordsDictionary();
System.out.println("Best elapsed time to populate the HashMap and extract the top 'n': "
+ String.format("%.2f", measureWordsDictionary(hashMapWords, words, topN)) + "ms");
testWordsDictionary(hashMapWords, words, topN);
System.out.println();
// Measure the tree map words dictionary
TreeMapWordsDictionary treeMapWords = new TreeMapWordsDictionary();
System.out.println("Best elapsed time to populate the TreeMap and extract the top 'n': "
+ String.format("%.2f", measureWordsDictionary(treeMapWords, words, topN)) + "ms");
testWordsDictionary(treeMapWords, words, topN);
}
}