Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

CIT 594 Module 4 Programming Assignment

In this assignment, you will write a program that will analyze the sentiment (positive or negative) of a sentence based on the words it contains by implementing methods that use the List, Set,    and Map interfaces from the Java Collections Framework.

Learning Objectives

In completing this assignment, you will:

●    Become familiar with the methods in the java.util.List, java.util.Set, and java.util.Map interfaces

●    Continue working with abstract data types by using only the interface of an implementation

   Apply what you have learned about how lists, sets, and maps work

    Get a better understanding of the difference between lists and sets

    Demonstrate that you can use lists, sets, and maps to solve real-world problems

    Gain experience writing Java code that reads an input file

Background

Sentiment analysis is a task from the field of computational linguistics that seeks to determine the general attitude of a given piece of text. For instance, we would like to have a program that could look at the text This assignment was joyful and a pleasure” and realize that it was a       positive statement while It made me want to pull out my hair” is negative.

For more on sentiment analysis in the context of this assignment see the supplemental document provided along with these directions.

Definitions for this assignment:

Valid Line: (in the context of reading the input corpus) a line starting with an optional sign     character (- or +) and single digit representing a valid score (integers from -2 to 2, inclusive), followed by a single whitespace character, followed by a statement.

Statement: a string that may be empty and may contain 0 or more whitespace separated tokens each of which may be a word.

Sentence: An Object of type Sentence contains a text String that is the textual statement, as well as an integer sentiment score.

Token: All of the non-whitespace characters between whitespace characters or at the beginning or end of a sentence/statement.

word: A token starting with one letter. Any additional characters may be letters or any other non-whitespace character.

Letter: any character for which the method java.lang.Character.isLetter returns true” .

https://docs.oracle.com/javase/8/docs/api/java/lang/Character.html#isLetter-char-

Whitespace: any character for which Character.isWhitespace returns true.

https://docs.oracle.com/javase/8/docs/api/java/lang/Character.html#isWhitespace-char-

ObservationTally: An accumulator object of the ObservationTally class (ObservationTally.java) the word’s accumulated context scores from all of its appearances which have been analyzed so far. An ObservationTally’s count is the total number of times the word has been seen so far, and its total is the sum of every occurrence of the word’s sentiment score seen so far.

Getting Started

Download the starter code files: Sentence.java, ObservationTally.java and Analyzer.java. All tasks for the assignment should be written in Analyzer.java; do not modify the other two          (grading will use the original versions, not yours).

You may also download the reviews.txt, which should be placed in the base directory of your project, unlike the java files, which should be placed in the source directory of your project.     (Later, Analyzer.java should be uploaded to your submit folder on codio). This file can be used for testing your readFile method.

Activities

General note: for each activity the method should return a sensible output even if the input is     invalid.  For methods that return a collection of some sort, the return for bad input should just be an empty collection. For methods with numerical return types, the default return value should be

0.

Bad items in the input, such as null strings or non-word tokens in non-null strings, should be ignored, and the method should continue processing subsequent valid items.

When processing words, they should be converted to lower case to simplify case-insensitive comparison. Tokens and words should not be altered in any other way.

1. Implement Analyzer.readFile

This method takes as input a (nullable) filename, reads the given file from the filesystem, and   returns a non-null List of Sentence objects parsed from the valid lines of the file in the order in  which they are encountered. Invalid lines should be ignored and not entered into the output list. If if the input filename is null or the file cannot be opened for reading, this method should return an empty List. For the return object you are free to select from any class that implements          java.util.List.

For an explanation of how to read a file line by line see:

https://docs.oracle.com/javase/tutorial/essential/io/file.html#textfiles

Valid lines are defined above, and in regular expression syntax the exact definition of the line is:

^(?<score>[+-]?[0-2])\\s(?<text>.*)$”

The documentation for this regular expression syntax may be found here:

https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html

Note: You are not required to use regular expressions (nor are you prohibited from using them) for this assignment; the expression here is provided as a formal exact definition. If you have any questions about what is or is not valid, please test with that expression before asking. You can   test regular expressions on regex101.com and in Java using jshell.

Note: the first whitespace character on the line is the separator between the score and the text. That character should not be considered part of either the score or the text. String.split using a  limit of 2 (i.e. line.split(“\\s” ,2);) is one easy way to separate the line into those two components.

https://docs.oracle.com/javase/8/docs/api/java/lang/String.html#split-java.lang.String-int-

For a valid line such as:

2 I am learning a lot .

then the score field of the Sentence object should be set to 2, and the text field should be:

I am learning a lot .

Evaluation will be based on exact String matching; do not alter the statement text.

2. Implement Analyzer.allOccurrences

This method takes as input a (nullable) List of (nullable) Sentence objects and outputs a (non-null) List containing every word, converted to lowercase, encountered in the input List in   the order in which it was encountered. Null Sentence objects and invalid words (i.e. tokens not starting with a letter) should both be ignored. If the input parameter is null, the output should be an empty List. You may select any implementation of java.util.List for the output.

3. Implement Analyzer.uniqueWords

This method is identical to Analyzer.allOccurrences, except that the output should be in the form of any implementation of java.util.Set; in other words, the output will not have duplicates and      insertion order need not be preserved.

4. Implement Analyzer.wordTallies

This method takes as input a (nullable) List of (nullable) Sentence objects and outputs a

non-null Map, whose keys are the (valid, lowercased) words in each input Sentence and whose values are the final context scores, represented by ObservationTally, for that word. If the input List of Sentences is null or empty, the method should return an empty Map. If a Sentence object in the input List is null, this method should ignore it and continue processing the remaining         Sentences. You may return any implementation of java.util.Map.

Note: if a word appears in multiple sentences or multiple times in the same sentence, its corresponding ObservationTally object should accrue multiple scores, one for each occurrence.

Note: While your output keys should be lowercased, do not assume that the strings in the          Sentence objects have already been converted to lowercase. Do not make any other alterations or assumptions or interpretations about the tokens or words.

Hint: if you use String.split to tokenize the text of the Sentence, keep the pattern as simple as possible. Consult the Java documentation for help with this:

●   https://docs.oracle.com/javase/8/docs/api/java/lang/String.html

5. Implement Analyzer.calculateScores

This method takes a (nullable) Map from (non-null) word to (non-null) ObservationTally and outputs a non-null Map with the original word as key and the word’s average sentiment score as value. If the input Map is null, return an empty Map. You may return any implementation of java.util.Map.

For this method, use the ObservationTally’s calculateScore method to get the average         sentiment score for that word from its previously recorded context scores, and then place the text of the word (as key) and calculated score (as value) in the new Map.

6. Implement Analyzer.calculateSentenceScore

This method takes as input a (nullable) Map from (non-null) words to (non-null) sentiment scores as well as an arbitrary statement text and, using the sentiment scores for each word in this Map, outputs the sentiment score for the given statement text, which is the arithmetic mean score of all its (valid) words. Words in the input text that are not present in the input Map should have a default sentiment score of 0.

Note: each occurrence of a word counts towards the mean (i.e. do not filter out duplicates). Note: you will need to tokenize/split/filter the sentence to its valid words, as you did previously.

Your calculateSentenceScore method must be case insensitive. Recall that, to ensure case insensitivity, you normalized all words by converting them to lowercase. Accordingly, for this method, you may assume that the keys given in the input Map are all lowercase words.

If the input Map is null or empty, or if the input sentence is null or empty or does not contain any valid words, this method should return 0.

General Hints

Documentation about the methods in the List, Set, and Map interfaces are available as part of the Java API docs:

●   https://docs.oracle.com/javase/8/docs/api/java/util/List.html

●   https://docs.oracle.com/javase/8/docs/api/java/util/Set.html

●   https://docs.oracle.com/javase/8/docs/api/java/util/Map.html

Refer to this documentation if you need help understanding the methods that are available to you.

In implementing this program, we recommend that you implement and test each of the six methods individually.  Each method is required to tolerate partially or entirely invalid input and return valid output. We also recommend you test the entire program using the main method in Analyzer.java. Be sure to specify the name of the input file as the argument to main.

Before You Submit

Please be sure that:

●   your Analyzer class is in the default package, i.e. there is no package” declaration at the top of the source code

●   your Analyzer class compiles and you have not changed the signatures of any of the six methods you implemented

●   you did not add other methods with the same name as the existing methods in Analyzer.java

●   you have not created any additional .java files and have not made any changes to Sentence.java or ObservationTally.java (you do not need to submit these files)

●    any new methods you added have unique names that do not conflict with the existing methods, even if the input arguments are different

●   you filled in and signed the academic integrity statement at the top of Analyzer.java

How to Submit

After you have finished implementing the Analyzer class, go to the Module 4 Programming Assignment Submission” item and click the Open Tool” button to go to the Codio platform.

Once you are logged into Codio, read the submission instructions in the README file. Be sure you upload your code to the “submit” folder.

To test your code before submitting, click the “Run Test Casesbutton in the Codio toolbar.

As in the previous assignment, this will run some but not all of the tests that are used to grade this assignment. That is, there are “ hidden tests” on this assignment!

The test cases we provide here are sanity check” tests to make sure that you have the basic    functionality working correctly, but it is up to you to ensure that your code satisfies all of the requirements described in this document. Just because your code passes all the tests when you click Run Test Cases” , that doesn’t mean you’d get 100% if you submitted the code for       grading!

When you click “Run Test Cases,” you’ ll see quite a bit of output, even if all tests pass, but at the bottom of the output you will see the number of successful test cases and the number of failed   test cases.

You can see the name and error messages of any failing test cases by scrolling up a little to the “Failures” section.

You must manually submit when you are done. Your code will not be automatically submitted at the deadline.

Assessment

This assignment is scored out of a total of 96 points.

●    readFile is worth 22 points. Note that some of the input files used for grading are available in the tests” folder in Codio; the others are not made available prior to submission.

●   The remaining methods are evaluated based purely on their own implementation, and do not assume a correctly functioning readFile method.

●    allOccurrences is worth 7 points

●    uniqueWords is worth 6 points

   wordTallies is worth 27 points

    calculateScores is worth 12 points

    calculateSentenceScore is worth 22 points

As noted above, the tests that are executed when you click Run Test Cases” are not all of the   tests that are used for grading. There are “ hidden” tests for each of the three methods described here.

After submitting your code for grading, you can go back to this assignment in Codio and view the “ results.txt” file, which should be listed in the Filetree on the left. This file will describe any failing test cases.

Optional Challenges

Use try-with-resources to simplify file reading

Read the Oracle tutorial on try-with-resources:

https://docs.oracle.com/javase/tutorial/essential/exceptions/tryResourceClose.html

Try to make use of this Java construct in your code, but make sure you understand it well. It’s easy to misuse, leading to unexpected behavior in your code.

Use lambdas and the Stream API to simplify your code

If you’ re not familiar with lambdas or streams (sometimes called sequences), this challenge will have a huge learning curve, but it will also introduce you to a completely different model of programming that is sometimes used in modern Java, and very often used in programming languages like Scala (which also compiles to the JVM), JavaScript, and others.

The high-level goal of this challenge is to re-write your code into a style that uses almost no      loops or other imperative constructs by making each of the six required functions into a pipeline of aggregate operations over a stream pipeline.

When done consistently and with reasonably good style, the resulting code will likely have no more than 2 loops across the entire file. (These might be actual explicit for/while loops, or they could be .forEach() functions.)

Start here with learning about lambda expressions: https://dev.java/learn/lambda-expressions/ Then move on to learning about the Stream API: https://dev.java/learn/the-stream-api/