CSCI-1200 Data Structures — Fall 2022 Homework 4
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
CSCI-1200 Data Structures — Fall 2022
Homework 4 — Visual Difference Lists
In this assignment we will compute and visualize the differences between two text files. We will use this assignment as a chance to review and practice using the STL list container class, as well as Big ’O’ Notation and Recursion. Please carefully read the entire assignment before beginning your implementation.
To get started, let’s study the sequence of words in these two input files:
input_original .txt
the quick brown fox jumped over the lazy dogs |
input_revised .txt
the quick fox jumps over the big lazy dogs |
We can see 3 differences between the files: the word“brown”has been erased early in the sentence, the word “jumped”has been replaced by the word“jumps”, and finally the word“big”has been inserted near the end of the sentence. We will represent these differences in the following file format:
output_original_revised .diff
2 ERASE 4 REPLACE "jumps" 7 INSERT "big" |
The operations above are necessary and sufficient to transform the original text into the revised text. Each operation has an integer which indicates the position in the original file where the operation is performed. Given the original file and the operations to transform original to revised, we can also prepare the inverse list of operations necessary to go the opposite way (transform the revised text to the original text):
output_revised_original .diff
2 INSERT "brown" 3 REPLACE "jumped" 6 ERASE |
We can visualize the edits discussed above using HTML background highlighting and any modern web browser (Chrome, Firefox, Safari, Edge, etc.). The pink areas have been erased or replaced, and the green areas have been replaced or inserted. The grey regions add space to visually align the portions of the files that match.
In the example above, the unit for comparison was words separated by whitespace. But we can also do the comparison per character as illustrated in the example below, using the first paragraph of the poem Still I Rise by Maya Angelou. These are the operations necessary to remove all of the punctuation and replace the uppercase letters with the lowercase versions:
output_still_i_rise_char .diff
0 REPLACE "y" 33 REPLACE "w" 49 ERASE 63 ERASE 65 REPLACE "y" 98 REPLACE "b" 107 ERASE 118 ERASE 120 REPLACE "i" 121 ERASE 129 ERASE |
This visualization technique is very helpful in collaborative software development projects. When one member of the team proposes changes to the code base, their requested changes can be visualized for review and approval by the other team members using a visual difference. Here is a small example where we consider the input as whole lines instead of words or characters as in the above examples.
You can see more complex code difference visualizations on GitHub, https://github.com/, which hosts many open-source software development projects, including Submitty. Here’s a example pull request from one of the Submitty developers which changes multiple files in the code base:
https://github.com/Submitty/Submitty/pull/7128/files
Interactive and Incremental Commands for Testing and Debugging
We provide a framework of code to read and write most of the different input, output, and visualization files for this homework, but you must write several key functions to complete the program. You may not modify any of the provided code. All of your work will be in two new files, solution .h and solution .cpp. Please study all of the provided code and the example input and output files posted on the course web site.
The program accepts commands interactively from std::cin (the keyboard), but typically we will run the program by redirecting a text file of commands on the command line. See also “Redirecting Input & Output” from http://www.cs.rpi.edu/academics/courses/fall22/csci1200/programming_information.php. Here is a example file of commands to visualize the first example in this handout:
requests .txt
compute_diff WORD input_original .txt input_revised .txt output_original_revised .diff render_diff WORD input_original .txt output_original_revised .diff output_original_revised .html |
And here is the same example, saving more of the intermediate steps for inspection and debugging: requests_debugging .txt
compute_diff WORD input_original .txt input_revised .txt output_original_revised .diff apply_diff WORD input_original .txt output_original_revised .diff output_applied .txt assert_same WORD input_revised .txt output_applied .txt invert_diff WORD input_original .txt output_original_revised .diff output_inverted .diff apply_diff WORD input_revised .txt output_inverted .diff output_reapplied .txt assert_same WORD input_original .txt output_reapplied .txt compute_diff WORD input_revised .txt input_original .txt output_revised_original .diff assert_same_diff WORD output_inverted .diff output_revised_original .diff render_diff WORD input_original .txt output_original_revised .diff output_original_revised .html |
The main function in main .cpp will parse these commands, read the necessary input files, call several functions you will implement, and write intermediate or final results to files (functions in input_output .cpp and render .cpp). You should make your own test cases for this assignment, and incrementally test each operation and inspect every output file. You must implement the missing functions, writing the function prototypes in a file named solution .h and implementing the body of the functions in solution .cpp.
This is how you will compile and run your program:
g++ main .cpp input_output .cpp render .cpp solution .cpp -Wall -Wextra -std=c++11 -o run .out ./run .out < requests .txt
./run .out < requests_debugging .txt
Study the expected output files for this first example posted on the course website.
Multiple Solutions: Using Recursion to Find the Minimum Edit Distance
The most complex operation is the compute_diff function. We recommend saving the implementation of this function for last. Why is this a complicated problem? There are multiple valid solutions – multiple different combinations of insert, erase, and replace operations that when applied to the first file result in the second file. In particular, the problem is non-trivial if two or more adjacent/neighboring chars/words/lines must be erased, inserted, or replaced – as shown in the example below. Here are the visualizations of five different, valid ways to edit the original file – all resulting in the same revised2 file:
input_original .txt
the quick brown fox jumped over the lazy dogs |
input_revised2 .txt
yesterday the foolish and debatably quick brown fox jumped over dogs |
output_prioritize_erase .diff
0 ERASE 1 ERASE 2 ERASE 3 ERASE 4 ERASE 5 ERASE 6 ERASE 7 ERASE 8 ERASE 9 INSERT "yesterday" 9 INSERT "the" 9 INSERT "foolish" 9 INSERT "and" 9 INSERT "debatably" 9 INSERT "quick" 9 INSERT "brown" 9 INSERT "fox" 9 INSERT "jumped" 9 INSERT "over" 9 INSERT "dogs" |
output_prioritize_erase .diff
output_prioritize_insert .diff
0 INSERT "yesterday" 1 INSERT "foolish" 1 INSERT "and" 1 INSERT "debatably" 6 INSERT "dogs" 6 ERASE 7 ERASE 8 ERASE |
output_prioritize_replace .diff
0 REPLACE "yesterday" 1 REPLACE "the" 2 REPLACE "foolish" 3 REPLACE "and" 4 REPLACE "debatably" 5 REPLACE "quick" 6 REPLACE "brown" 7 REPLACE "fox" 8 REPLACE "jumped" 9 INSERT "over" 9 INSERT "dogs" |
output_default .diff
0 INSERT "yesterday" 1 REPLACE "foolish" 2 REPLACE "and" 3 REPLACE "debatably" 4 REPLACE "quick" 5 REPLACE "brown" 6 REPLACE "fox" 7 REPLACE "jumped" 8 INSERT |
2022-10-07
Visual Difference Lists