Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

CSCI-1200 Data Structures — Fall 2022

Homework 4 — Visual Difference Lists

In this assignment we will compute and visualize the differences between two text files.  We will use this  assignment as a chance to review and practice using the STL  list container class,  as well as Big O’ Notation and Recursion. Please carefully read the entire assignment before  beginning your implementation.

To get started, let’s study the sequence of words in these two input files:

input_original .txt

the quick brown fox jumped over the lazy dogs

input_revised .txt

the quick fox jumps over the big lazy dogs

We can see 3 differences between the files: the word“brown”has been erased early in the sentence, the word “jumped”has been replaced by the word“jumps”, and finally the word“big”has been inserted near the end of the sentence. We will represent these differences in the following file format:

output_original_revised .diff

2 ERASE

4  REPLACE  "jumps"

7 INSERT "big"

The operations above are necessary and sufficient to transform the original text into the revised text. Each operation has an integer which indicates the position in the original file where the operation is performed. Given the original file and the operations to transform original to revised, we can also prepare the inverse list of operations necessary to go the opposite way (transform the revised text to the original text):

output_revised_original .diff

2 INSERT  "brown"

3 REPLACE  "jumped"

6 ERASE

We can visualize the edits discussed above using HTML background highlighting and any modern web browser (Chrome, Firefox, Safari, Edge, etc.). The pink areas have been erased or replaced, and the green areas have been replaced or inserted. The grey regions add space to visually align the portions of the files that match.

In the example above, the unit for comparison was words separated by whitespace. But we can also do the comparison per  character as illustrated in the example below, using the first paragraph of the poem Still I Rise by Maya Angelou. These are the operations necessary to remove all of the punctuation and replace the uppercase letters with the lowercase versions:

output_still_i_rise_char .diff

0 REPLACE  "y"

33  REPLACE  "w"

49  ERASE

63  ERASE

65 REPLACE  "y"

98  REPLACE  "b"

107  ERASE

118  ERASE

120 REPLACE "i"

121  ERASE

129  ERASE

This visualization technique is very helpful in collaborative software development projects. When one member of the team proposes changes to the code base, their requested changes can be visualized for review and approval by the other team members using a visual difference.  Here is a small example where we consider the input as whole lines instead of words or characters as in the above examples.

You can see more complex code difference visualizations on GitHub, https://github.com/, which hosts many open-source software development projects, including Submitty.  Here’s a example pull request from one of the Submitty developers which changes multiple files in the code base:

https://github.com/Submitty/Submitty/pull/7128/files

Interactive and Incremental Commands for Testing and Debugging

We provide a framework of code to read and write most of the different input, output, and visualization files for this homework, but you must write several key functions to complete the program. You may not modify any of the provided code. All of your work will be in two new files, solution .h and solution .cpp. Please study all of the provided code and the example input and output files posted on the course web site.

The program accepts commands interactively from std::cin (the keyboard), but typically we will run the program by redirecting a text file of commands on the command line. See also “Redirecting Input & Output” from http://www.cs.rpi.edu/academics/courses/fall22/csci1200/programming_information.php.     Here is a example file of commands to visualize the first example in this handout:

requests .txt

compute_diff WORD input_original .txt input_revised .txt output_original_revised .diff                    render_diff WORD input_original .txt output_original_revised .diff output_original_revised .html

And here is the same example, saving more of the intermediate steps for inspection and debugging: requests_debugging .txt

compute_diff WORD input_original .txt input_revised .txt output_original_revised .diff apply_diff WORD input_original .txt output_original_revised .diff output_applied .txt  assert_same WORD input_revised .txt output_applied .txt

invert_diff WORD input_original .txt output_original_revised .diff output_inverted .diff apply_diff WORD input_revised .txt output_inverted .diff output_reapplied .txt

assert_same WORD input_original .txt output_reapplied .txt

compute_diff WORD input_revised .txt input_original .txt output_revised_original .diff assert_same_diff WORD output_inverted .diff output_revised_original .diff

render_diff WORD input_original .txt output_original_revised .diff output_original_revised .html

The main function in main .cpp will parse these commands, read the necessary input files, call several functions you will implement, and write intermediate or final results to files (functions in  input_output .cpp and render .cpp). You should make your own test cases for this assignment, and incrementally test each operation and inspect every output file. You must implement the missing functions, writing the function prototypes in a file named solution .h and implementing the body of the functions in solution .cpp.

This is how you will compile and run your program:

g++ main .cpp  input_output .cpp  render .cpp  solution .cpp  -Wall  -Wextra  -std=c++11  -o  run .out ./run .out  < requests .txt

./run .out  < requests_debugging .txt

Study the expected output files for this first example posted on the course website.

Multiple Solutions: Using Recursion to Find the Minimum Edit Distance

The most complex operation is the compute_diff function.  We recommend saving the implementation of this function for last.  Why is this a complicated problem?  There are multiple  valid solutions   – multiple different combinations of insert, erase, and replace operations that when applied to the first file result in the second file. In particular, the problem is non-trivial if two or more adjacent/neighboring chars/words/lines must be erased, inserted, or replaced as shown in the example below.  Here are the visualizations of five different, valid ways to edit the original file all resulting in the same revised2 file:

input_original .txt

the quick brown fox jumped over the lazy dogs

input_revised2 .txt

yesterday the foolish and debatably quick brown fox jumped over dogs

output_prioritize_erase .diff

0 ERASE

1 ERASE

2 ERASE

3 ERASE

4 ERASE

5 ERASE

6 ERASE

7 ERASE

8 ERASE

9 INSERT "yesterday"

9 INSERT "the"

9 INSERT "foolish"

9 INSERT "and"

9 INSERT "debatably"

9 INSERT "quick"

9 INSERT "brown"

9 INSERT "fox"

9 INSERT "jumped"

9 INSERT "over"

9 INSERT "dogs"

output_prioritize_erase .diff

output_prioritize_insert .diff

0 INSERT "yesterday"

1 INSERT "foolish"

1 INSERT "and"

1 INSERT "debatably"

6 INSERT "dogs"

6 ERASE

7 ERASE

8 ERASE

output_prioritize_replace .diff

0 REPLACE "yesterday"

1 REPLACE "the"

2 REPLACE "foolish"

3  REPLACE "and"

4 REPLACE "debatably"

5 REPLACE "quick"

6 REPLACE  "brown"

7 REPLACE "fox"

8 REPLACE  "jumped"

9 INSERT "over"

9 INSERT "dogs"

output_default .diff

0 INSERT "yesterday"

1 REPLACE "foolish"

2  REPLACE "and"

3 REPLACE "debatably"

4 REPLACE "quick"

5 REPLACE  "brown"

6 REPLACE "fox"

7 REPLACE  "jumped"

8 INSERT