Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

CS 480 OPERATING SYSTEMS (SPRING 2023)

Assignment 01

Due: Beginning of class, 01/26/2023

Dictionary Tree Specifications

The dictionary tree is a tree storing a set of words (or strings), with each word stored along a      tree path. Starting from the child nodes of the root node, each tree node represents a single       character (or letter) of the word, or the terminator character of the word. Each node contains a collection of child nodes representing the next set of possible characters to continue the word   from the current node. A dictionary tree constructed this way supports very efficient search and insert operations, in O(K) time with K being the length of the word searched or inserted.

For this implementation, valid word characters are:

•    alphabet characters from a to z (note: big case letters A to Z would be equivalent to a to z, all operations should be case-insensitive, as well as the autograding part)

•    the ascii apostrophe 'character.

•    the ascii hyphen - character.

•    the ascii underscore _ character.

Below is a typical representation of a dictionary tree node in a C struct type (similar data contents can be defined as member variables in a class for C++ implementation):

#define NCHILD 30 /* a-z, ', -, _, terminator of word */

typedef struct dictNode

{

// Children nodes represent mapping to possible characters

// of a word and the terminator character of a word.

// Note the C string ends with a null \0 character.           // Essentially, the index of each node in the next[] dictNode* // array is mapped to a particular valid character

// or the terminator character.

// For example, say index 0 is mapped

// to character a’, index 29 is mapped to the terminator       // character. If the next character of the word is a, a new node // would be created and assigned to next[0]. Setting next[0]    // from a null pointer to a new node means setting the next     // character of the word to a’ .

// Note all nodes in the next[] are initialized with a null // pointer.

// After setting the last node corresponding to the last      // character of the word, to terminate a word, you would      // set the child node of the next [] of last character node   // at the terminator index position to a new node, to indicate // the end of the word.

struct dictNode *next[NCHILD];

}

Note: Both the big case and small case of a letter should correspond to the same index in the child node array.


A particular character can be mapped to any index of child node, it is your preference. In the   following example, both letters ‘A’ and ‘a’ correspond to index 0, while the \0 null terminator is mapped to index 29.

Figure 1 below shows a dictionary tree example. Non null pointers in the next array are shown  as arcs to child nodes, and the end of word indicator is shown for each node.  A label that is not part of the data structure is shown at the top of the node to illustrate how one reaches that node.

next[24] (y 24)

next[7] (h7)

next[29] (\029)

Figure 1 - Dictionary tree containing a number of words, note \0 is the null terminator of a C string

Implementation of Tree operations

You would not lose points if you do not follow the proposed implementation below. However, as this assignment is assessing your skills in C/C++ programming, particularly the pointers, your implementation MUST satisfy the following implementation requirements, violating any one of them would nullify your submission:

•    You must implement the pointer tree data structure, similar to what was presented above.

•   The data portion of the tree NODE representation MUST ONLY have the childrens pointers.

•    In tree operations, particularly the insert and search and their helpers (if you have), you MUST NOT use the STL (standard template library) std::string class.

You may use std::string class in other parts of your code.

With the representation of the dictionary tree structure, you need to implement the insert operation for inserting a word to the dictionary tree and the search operation for searching the tree to count the words that start with a specific string or prefix.

Below is a proposed code design for implementing the dictionary tree data structure and its operations:

•    dictionary.h (in C++ or C)

a.    Defines the data structure of the dictionary tree node described above, and the      signatures of its operations for inserting a word to the dictionary, and for searching and counting words starting with a string.

b.   Adds other supporting helper methods if needed.

c.    Add node method signature:

• In C++, bool add(const char * wordBeingInserted = nullptr);

• In C, bool add(struct dictNode *dictnode, const char * wordBeingInserted);

• Return value (boolean) would indicate whether the word is added / inserted to the tree successfully.

• Implementation tips:

1.   This method is initially called from the root node to insert the new word.

2.    Use recursive calls (or iterative loop) to traverse the tree structure to insert the word, one character at a time starting from the first    character (you need to convert each character from the word to an index of a child node), until the \0 null terminator is inserted.

d.   Search node operation signature for finding the node (in the tree) that ends a       prefix, i.e., the node that contains the last character of the prefix being searched:

• In C++, dictNode* findEndingNodeOfAStr(const char *strBeingSearched);

• In C, dictNode* findEndingNodeOfAStr(struct dictNode *dictnode, const char *strBeingSearched);

• Implementation tips:


1.   This method is initially called on the root node to start the search.

2.    Use recursive calls (or iterative loop) to traverse the tree structure to find the string, one character at a time starting from the first       character (you need to convert each character from the word to an index of a child node). Returns the node pointer that ends the string if found; otherwise, return NULL pointer.

3.   The strBeingSearched” argument is for passing the string being searched.

e.    Count word operation signature for counting the number of words (in the tree) that start from a node:

• In C++, void countWordsStartingFromANode(int &count);

• In C, void countWordsStartingFromANode(struct dictNode *dictnode, int &count);

• Implementation tips:

1.   This method is initially called on a starting node to start the counting.

2.    Starting from the dictnode, use recursive calls (or iterative loop) to traverse the whole sub-tree from the node (including the node) to count all words that start from the node. Use the count argument passed by reference to count.

f.     For searching the tree to count the words that start with a specific string or prefix:

• First call findEndingNodeOfAStr to find node that ends the string or prefix,

• Then call countWordsStartingFromANode to count the words starting from the node returned from findEndingNodeOfAStr

•    dictionary.cpp (for C++) or dictionary.c (for C) for implementing dictionary.h.

Main program

With the above dictionary tree implementation, you would then need to write the main program to:

a.    Read words from a dictionary text file and insert them to a dictionary tree.

b.    Read words from test text file, and for each word read in: search, count, and print the number of words in the dictionary tree that start with it.

Suppose you put the main program code in a file countwords.cpp (C++) or countwords.c (C).  It should contain the main(int argc, char **argv) function. Implementation tips are below.

Note that for any of the library functions we suggest, you can read the manual page on Edoras by typing man fn_name” at the shell prompt, e.g. man strtok; or refer to https://www.kernel.org/doc/man-pages/.

Your main(int argc, char **argv) function should expect two arguments.  The first argument is the file path to the dictionary source text file, the second is for the file path to the test text file. Minimal error checking is required, fail gracefully when there are the wrong number of              arguments, or a file does not exist.


For reading a text file line by line

•    In C++, use std::ifstream and std::getline.

•    std::ifstream dictstream(filepath); // open file for parsing

•    std::string line;

•    // iterate over dictionary file line by line

•    while (std::getline(dictstream, line))

•    In C, use FILE *fp = fopen(“filename”, "r"), then use fgets(line, sizeof(line), fp) to read each line to a char buffer.

To extract all words from each line read in, you can use the strtok() function from to   parse each line buffer read from the file.  The strtok function iterates across a buffer, pulling out groups of characters separated by a list of characters called delimiters (see snippet below for an example).

You can use the following delimiter string to separate words:                                                     const char *delimiters = "\n\r !\"#$%&()*+,./0123456789:;<=>?@[\\]^`{|}~";

Example:

char *word = strtok(line_c, delimiters);

while (word != nullptr)

{

// call add method to insert word to build the dictionary tree

...

// handle error from insertion result

..

// read next word

word = strtok(NULL, delimiters);

}