COMP5421 – (Solution)

$ 20.99
Category:

Description

1 Objectives
In this assignment, you will redo Assignment 1 without getting involved with dynamic storage management. A major goal in this and the following assignments is to encourage you to leverage the C++ standard and popular libraries to do the work for you so you don’t ever have to reinvent the wheel.
Although assignment 1 can be ideally and conveniently implemented using associative containers such as std::map and std::set, this assignment requires that class IntList be replaced with std::vector<int>, class TList with std::list<Token>, and C-style text processing with std::string.

std::vector<int>

In addition to implementing the bulk of the work in this assignment, the vector<int> and list<Token> classes also provide plenty opportunities for us to learn about container iterators, which we will use in class Tokenizer to iterate through and modify the container elements.
Class Tokenizer provides the functionalities of A1’s TList class that are not directly supported by std::list<Token>, plus a few more functionalities.
2 String tokens
In assignment 1, a string token is defined as a sequence of contiguous characters excluding white-space characters such as space, tab, and newline characters.
Generalizing that definition in this assignment, we define a string token to be a sequence of contiguous characters excluding those specified in a user defined set of separator characters.
The separator characters are those that we do not want in a token in this assignment, and they are represented using a std::string object. For example, the separators in assignment 1 can be defined as ” “; that is, the new line, tab, null, and blank characters separate tokens in a text.
3 Class TokenizerApp
TokenizerApp is the name of a C++ file that contains the main function, and possibly other functions, to test drive the functionality of a Tokenizer object. It is a simple menu-driven program that displays a menu of options for the user to choose from:
Enter the name of an input file of text: input text file.txt Enter the seperator characters: ;. ,?!”=’:
Menu
======
A – Print all input lines
P – Print indexed tokens
F – Print tokens sorted on frequency
L – Print tokens sorted on length
S – Search
X – Exit
Enter your choice:
1
2
3
4
5
6
7
8
9
10
11
12
3.1 Option A (or a)
Prints all of the input lines:
Enter your choice: a
1: Do you like green eggs and ham?
2:
3: I do not like them, Sam-I-am.
4: I do not like green eggs and ham!
5:
6: Would you like them here or there?
7:
8: I would not like them here or there.
9: I would not like them anywhere.
10:
11: I do so like green eggs and ham!
12: Thank you! Thank you, 13: Sam-I-am!
Menu
======
A – Print all input lines
P – Print indexed tokens
F – Print tokens sorted on frequency
L – Print tokens sorted on length
S – Search
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
X – Exit
Enter your choice:
35
36
3.2 Option P (or p)
Prints all tokens in alphabetic order, similar to that in assignment 1, with one noticeable difference: the numbers inside the parentheses, each indicating the frequency of the corresponding token in the input file.
Enter your choice: p
and (3) 1 4 11
anywhere (1) 9
Do (4) 1 3 4 11 eggs (3) 1 4 11 green (3) 1 4 11 ham (3) 1 4 11
here (2) 6 8
I (5) 3 4 8 9 11 like (7) 1 3 4 6 8 9 11
not (4) 3 4 8 9
or (2) 6 8
Sam-I-am (2) 3 13 so (1) 11
Thank (2) 12 them (4) 3 6 8 9
there (2) 6 8
Would (3) 6 8 9 you (4) 1 6 12
Menu
======
A – Print all input lines
P – Print indexed tokens
F – Print tokens sorted on frequency
L – Print tokens sorted on length
S – Search
X – Exit
Enter your choice:
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
Notice that the user supplied separator characters ;. ,?!”=’: defines a very limited number of separators, allowing, for example, the strings 123 and }1+2*3{ as tokens. For a C++ source file, the user might input something like ;. ,?!”=’:|{}[]()&+-*%$#!~>^</ for the separator characters.
3.3 Option F (or f)
Prints the tokens sorted in the ascending order of their frequencies in the input file:
Enter your choice: F
anywhere (1) 9 so (1) 11
here (2) 6 8 or (2) 6 8
Sam-I-am (2) 3 13 Thank (2) 12 there (2) 6 8
and (3) 1 4 11 eggs (3) 1 4 11 green (3) 1 4 11 ham (3) 1 4 11
Would (3) 6 8 9
Do (4) 1 3 4 11 not (4) 3 4 8 9 them (4) 3 6 8 9
you (4) 1 6 12
I (5) 3 4 8 9 11 like (7) 1 3 4 6 8 9 11
Menu
======
A – Print all input lines
P – Print indexed tokens
F – Print tokens sorted on frequency
L – Print tokens sorted on length
S – Search
X – Exit
Enter your choice:
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
Knowing that sorting a linked list in general is not a trivial task, let alone sorting the list efficiently, we are happy to know that std::list<Token> can sort the list for us if we tell it how to compare two tokens.
See here for a quick example.
3.4 Option L (or L)
Prints the tokens sorted on the length of each string tokens, with tokens of equal lengths sorted alphabetically.
Enter your choice: l
I (5) 3 4 8 9 11 Do (4) 1 3 4 11 or (2) 6 8 so (1) 11 and (3) 1 4 11 ham (3) 1 4 11 not (4) 3 4 8 9 you (4) 1 6 12 eggs (3) 1 4 11 here (2) 6 8 like (7) 1 3 4 6 8 9 11 them (4) 3 6 8 9
Thank (2) 12 Would (3) 6 8 9 green (3) 1 4 11 there (2) 6 8 Sam-I-am (2) 3 13 anywhere (1) 9
Menu
======
A – Print all input lines
P – Print indexed tokens
F – Print tokens sorted on frequency
L – Print tokens sorted on length
S – Search
X – Exit
Enter your choice:
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
Again, the std::list<Token> class can sort the list for us as long as we tell it how to compare two tokens.
3.5 Option P (or p), Again
Let us note that printing the tokens sorted on some attributes must not disturb the order of the tokens in the original indexed list. In other words, selecting option P now should print the original indexed list:
Enter your choice: p
and (3) 1 4 11
anywhere (1) 9
Do (4) 1 3 4 11 eggs (3) 1 4 11 green (3) 1 4 11 ham (3) 1 4 11
here (2) 6 8
I (5) 3 4 8 9 11 like (7) 1 3 4 6 8 9 11
not (4) 3 4 8 9
or (2) 6 8
Sam-I-am (2) 3 13 so (1) 11
Thank (2) 12 them (4) 3 6 8 9
there (2) 6 8
Would (3) 6 8 9 you (4) 1 6 12
Menu
======
A – Print all input lines
P – Print indexed tokens
F – Print tokens sorted on frequency
L – Print tokens sorted on length
S – Search
X – Exit
Enter your choice:
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
3.6 Option S (or s)
Prompts the user for a token to search for; if found, prints the input lines on which the token appears; otherwise, displays the message “token not found”.
Enter your choice: s
Enter the text to search for: you
1: Do you like green eggs and ham?
6: Would you like them here or there?
12: Thank you! Thank you,
Menu
======
A – Print all input lines
P – Print indexed tokens
F – Print tokens sorted on frequency
L – Print tokens sorted on length
S – Search
X – Exit
Enter your choice:
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
3.7 Option X (or x)
Enter your choice: x
Thank you for trying my program.
Goodbye.
170
171
172
173
4 Class Token
4.1 Representation
1
2
3
4
class Token
{ private:
string theText{}; // the text of this token
vector<size_t> theLineNumbers{}; // this token’s list of (non-negative) line size_t theFrequency{1}; // the frequency of this token in the input
5 numbers
6 file
We choose size t over int as the type of the line numbers, because in this application we only deal non-negative line numbers and size t represents non-negative (unsigned) integers.
The size of theLineNumbers is at least 1, because a string token must reside on some line in the input file, and every input line has a number ≥1.
4.2 Normal constructors
This is the only normal (non-special) constructor of the class; it sets the frequency to 1, and initializes the text and line number of the token being constructed with the corresponding supplied argument values.
Token(string text, size_t linenum); // a normal constructor
7
4.3 Default constructor
Since a token cannot exist without a text and associated line number, we decide to disallow default construction; so we make our decision visible to both the human reader as well as the compiler by explicitly declaring the default constructor deleted.
public:
Token() = delete; // disable default construction
8
9
4.4 Other Special Member Functions: The Big Five
Modern C++ programming style encourages that these members be explicitly either defined, deleted, or defaulted. Since Token is not involved directly in dynamic resource allocation, the compiler-generated versions of these members are most appropriate for Token objects.
// the big five: three choices (either default, delete, or define);
// avoid relying on implicit generation of special member functions
Token& operator=(const Token& rhs) = default; // copy op=
Token& operator=(Token&& rhs) = default; // move op=
Token(const Token& source) = default; // copy ctor
Token(Token&& source) = default; // move ctor
~Token() = default; // dtor
10
11
12
13
14
15
16
17
Consequently,
copy/move constructors copy/move the source token’s theFrequency, theText, and theLineNumbers members to theFrequency, theText, and theLineNumbers of the Token object being constructed. Note that these data members in turn propagate and apply copy/move construction internally to their respective members, recursively.
copy/move assignments copy/move-assign the right-hand side token’s theFrequency, theText, and theLineNumbers members to the left-hand side token theFrequency, theText, and theLineNumbers members. Note that the data members in turn propagate and apply the copy/move assignment internally to their respective members, recursively.
4.5 Comparison operations
In this assignment we use case insensitive comparison to compare the text of the tokens.
// comparison member function (returns -1, 0, +1, as in A1)
int compareIgnoreCase(const Token& t) const; // case insensitive comparison
18
19
20
See Here for an example of a case insensitive comparison function.
4.6 Other Self-Explanatory Operations
21
22
23
24
25
26
27
28
// getter members; each is doubly safe!
// since each is const, the invoking object remains intact, and,
// returning by value, each adheres to the principle of information hiding
string getTheText() const; vector<size_t> getTheLineNumberList() const; size_t getFrequency() const;
size_t getLineNumber(size_t = 1) const; // line number is 1-based
// to provide user friendly
void pushBackLineNumber(size_t lineNum); // append the suppled line number void print(ostream& sout) const; // print this token to sout
}; #endif ostream& operator<<(ostream& sout, const Token& arr);
29 interface
30
31
32
33
34
35
5 Class Tokenizer
A Tokenizer object provides a minimal set of services, allowing a typical menu-driven program to produce an interactive session as shown above.
5.1 Representation
class Tokenizer
{ private:
const string theSeparators; // the separator characters in a std::string
list<Token> theTokenList; // the list of tokens managed by this tokenizer
vector<string> theLines; // the lines in the input file
1
2
3
4
5
6
5.2 Default constructor
Since a Tokenizer cannot exist without an input file of text, we decide to disallow default construction; so we make our decision visible to both the human reader as well as the compiler by explicitly declaring the default constructor deleted.
public:
Tokenizer() = delete; // disable default constructor
7
8
5.3 Normal constructor
Tokenizer(const string& filename, const string& separators);
9
5.4 Other Special Member Functions: The Big Five
Modern C++ programming style encourages that these members be explicitly either defined, deleted, or defaulted. Since our class is not involved directly in dynamic resource allocation, the compiler-generated versions of these members are most appropriate to use.
~Tokenizer() = default; // dtor
Tokenizer(const Tokenizer&) = default; // copy ctor
Tokenizer(Tokenizer&&) = default; // move ctor
Tokenizer& operator=(const Tokenizer&) = default; // copy op=
Tokenizer& operator=(Tokenizer&&) = default; // move op=
10
11
12
13
14
5.5 Private Facilitators
private:
void ProcessTokensInLine(const string& line, size_t linenum);
5.5.1 15
16
1. To extracts the tokens from line, it delegates to member function splitLineIntoTokens, which in turn returns the extracted tokens in a std::vector<string>, say, vec.
2. To turn the string tokens in vec into Token objects and to store the resulting objects in the token list, it delegates to the member function insert, one vec element at time.
vector<string> splitLineIntoTokens(const string& line) const;
5.5.2 17
Using the separator characters, it splits the given line of text into string tokens, storing and returning them all in a std::vector of std::strings.
void insert(string text, size_t linenum);
5.5.3 18
This function is equivalent to the addSorted member function in the TList class in A1.
Specifically, it first creates a Token object using the supplied text and line number; it then compares that object against the Token objects in the token list, which are already sorted in ascending order; if found, it updates the number list of that object; otherwise, it inserts the newly created Token object into the token list.
Requirement To scan and to insert into the token list, your implementation of this member function must use list<Token>::iterators.
vector<size_t> search(const string& str)const;
5.5.4 19
Searches the token list for a token object whose text is equal (case insensitive) to that of the given string str. If found, it returns a copy of that object’s line number list; otherwise, it returns an empty number list.
void printSomeInputLines(const vector<size_t>& vec)const;
5.5.5 20
Given a std::vector of line numbers, prints the input lines corresponding to those line numbers.
5.6 Public Interface
public:
void searchAndPrint(string& str)const;
5.6.1 21
22
Searches the token list for a token object whose text is the same as the given string str; if found, it prints the input lines that contain str.
public:
void printAllInputLines()const;
5.6.2 23
24
Prints all input lines.
public:
void print(ostream& sout)const;
5.6.3 25
26
Prints the token list to the given output stream sout.
public:
void sortOnFrequecy()const;
5.6.4 27
28
Prints the token list sorted on frequencies of the tokens in the input file.
Requirements
1. Since the token list must remain intact, we need to copy it into another linear sequence and then sort that linear sequence.
Our options here are limited to std::list, std::vector, and std::forward list.
Familiar with std::list and std::vector, we take advantage of this opportunity to use std::forward list, which already comes equipped with a sort member fuction ( std::vector does not).
2. This member function must use std::forward list’s sort member.
(Hint: simply implement the compareFrequency function given in 5.7.3 below and pass it to std::forward list’s sort member function as the argument.)
void sortOnTokenLength()const;
};
5.6.5 29
30
Prints the token list sorted on the text of the tokens.. Requirements
1. Similar to 5.6.4 above, copy the token list into a std::forward list and then sort that forward list using std::forward list’s own sort member function.
2. This member function must use std::forward list’s sort member.
(Hint: simply implement the compareLength function given in 5.7.2 below and pass it to std::forward list’s sort member function.)
5.7 Free (top-level) helper functions
ostream& operator<<(ostream&, const Tokenizer&);
5.7.1 31
Writes a given token to a given stream
bool compareLength(const Token& t1, const Token& t2);
5.7.2 32
Determines whether t1 is less than t2, comparing their lengths; if they are of equal length, then determines whether t1 is less than t2, alphabetically (using std::string’s operator<).
bool compareFrequency(const Token& t1, const Token& t2);
5.7.3 33
Comparing the frequencies of t1 and t2, determines whether t1 is less than t2.
6 Deliverables
Create a a new folder that contains the files listed below, then compress (zip) your folder, and submit the compressed (zipped) folder as instructed in the course outline.
1. Header files: Token.h, and Tokenizer.h
2. Implementation files: Token.cpp, Tokenizer.cpp, and TokenizerApp.cpp
3. Input file input file A1.txt
4. Output file output file A1.txt (copy output from cmd window and paste it into this file)
5. A README.txt text file (see the course outline).
7 Grading scheme

Reviews

There are no reviews yet.

Be the first to review “COMP5421 – (Solution)”

Your email address will not be published. Required fields are marked *