CSE310 – Assignment 2 Solved

$ 24.99
Category:

Description

Lexical Analysis

1 Introduction
In this assignment we are going to construct a lexical analyzer. Lexical analysis is the process of scanning the source program as a sequence of characters and converting them into sequence of tokens. A program that performs this task is called a lexical analyzer or a lexer or a scanner. For example if a portion of source program contains int x=5; the scanner would convert in a sequence of tokens like <INT><ID,x><ASSIGNOP,=><COST_NUM,5>
<SEMICOLON>.
After successfully(!) completing the construction of a simple symbol table, we will construct a scanner for a subset of C language. The task will be performed using a tool named flex (Fast Lexical Analyzer Generator) which is a popular tool for generating scanners.
2 Tasks
You have to complete the following tasks in this assignment.
2.1 Identifying Tokens
2.1.1 Keywords
You have to identify the keywords given in Table 1 and print the token in the output le. For example, you will have to print <IF> in case you nd the keyword if in source program. Keywords will not be inserted in the symbol table.
Keyword Token Keyword Token
if IF else ELSE
for FOR while WHILE
do DO break BREAK
int INT char CHAR
oat FLOAT double DOUBLE
void VOID return RETURN
switch SWITCH case CASE
default DEFAULT continue CONTINUE
Table 1: Keyword List
2.1.2 Constants
For each constant you have to print a token of the format <Type, Symbol> in the output le and insert the symbol in symbol table.
Integer Literals: One or more consecutive digits form an integer literal. Type of token will be CONST_INT. Note that + or – will not be the part of an integer.
Floating Point Literals: Numbers like 3.14159, 3.14159E-10, .314159 and 314159E10 will be considered as oating point constants. In this case, token type will be CONST_FLOAT.
Character Literals: Character literals are enclosed within single quotes. There will be a single character within the single quotes with the exception of ‘ ’, ‘ ’, ‘\’, ‘‘ , ‘a’, ‘ ’, ‘ ’, ‘’, ‘ ’ and ‘’. For character literals token type will be CONST_CHAR.
Note that, you need to convert the detected lexeme to the actual character. For example if you nd ‘a’, then you need to print <CONST_CHAR,a>. That means we only need the ASCII code, not the quote symbols around it. Similarly, you need a newline character (ASCII code 10) in your token if you detect ‘ ’.
2.1.3 Operators and Punctuators
The operator list for the subset of C program we are dealing with is given in the Table 2. A token in the form of <Type,Symbol> should be printed in the output le and the operator should be inserted in the symbol table.
2.1.4 Identi ers
Identi ers are names given to C entities, such as variables, functions, structures etc. An identi er can only have alphanumeric characters (a-z, A-Z, 0-9) and underscore (_). The
Symbols Type
+, – ADDOP
*, /, % MULOP
++, – – INCOP
<, <=, >, >=, ==, != RELOP
= ASSIGNOP
&&, || LOGICOP
&, |, , , BITOP
! NOT
( LPAREN
) RPAREN
{ LCURL
} RPAREN
[ LTHIRD
] RTHIRD
, COMMA
; SEMICOLON
Table 2: Operator and Punctuators List
rst character of an identi er can only contain alphabet (a-z, A-Z) or underscore (_). For any identi er encountered in the input le you have to print token <ID,Symbol> and also insert it in the symbol table.
2.1.5 Strings
String literals or constants are enclosed in double quotes . String can be single line or multi line. A multi line string is ended with a character in each line except the last line. You have to print a token like <STRING, abc> if you nd a string abc in the input le.
String will not be inserted in the symbol table.

“This is a single line string”;
“This is a multiline string”

Note that, just like character literal you need to convert the special characters to their original value. If a string contains a , then you need to replace that two character with a newline character. For example if the source program contains this eight characters:
Then the scanner should convert it into the following ve characters:

2.1.6 Comments
Comments can be single lined or multiple lined. A single line comment usually start with // symbols. However a comment started with // can be continued to the next line if the rst line ends with a ‘’. A multiline comment starts with /* and terminate with the characters */. If there is any comment in input le you have to recognize it but not generate any token in output le.

// A single line comment // A multiple line comment
/** Another multiple line Comment */

2.1.7 White Space
You have to ignore all the white spaces in the input le.
2.2 Line Count
You should count the number of lines in the source program.
2.3 Lexical Errors
You should detect lexical errors in the source program and report it along with line number. You have to detect following type of errors.
Too many decimal point error for character sequence like 1.2.345
Ill formed number such as 1E10.7
Invalid Su x on numeric constant or invalid pre x on identi er for character sequence like 12abcd
Multi character constant error for character sequence like ‘ab’
Un nished character such as ‘a or ‘’
Un nished string
Un nished comment
Unrecognized character
Also count the total number of errors.
3 Input
The input will be a text le containing a C source program. File name will be given from command line.
4 Output
In this assignment, there will be two output le. One is a le containing tokens. This le should be named as <YourStudentID>_token (For example 1605999_token.txt). You will output all the tokens in this le.
The other le is a log le named as <YourStudentID>_log.txt. In this le you will output all the actions performed in your program. For example, after detecting any lexeme except one representing white spaces you will print a line containing Line no
<line_count>: Token <Token> Lexeme <Lexeme> found. For example if you nd a comment //abcd at line no 5 in your source code you will print Line no 5: Token <Comment> Lexeme <abcd> found. Note that, although you will not print any token in corresponding token.txt le for comment, you will print it in log le. For any insertion into symbol table, you will print the symbol table in the output le (only print the non empty buckets). If symbol already exists print appropriate message. For any detected error print Line no 5: Corresponding error message. Print the line count and total number of errors found at the end of log le.
For more clari cation about input output please refer to the sample input output le given in moodle. You are highly encouraged to produce output exactly like the sample one.
5 Submission
All Submission will be taken via moodle. Please follow the steps given below to submit you assignment.
1. In your local machine create a new folder which name is your 7 digit student id.
2. Put the lex le named as <your_student_id>.l containing your code. Also put additional C le or header le that is necessary to compile your lex le. Do not put the generated lex.yy.c le or executable le in this folder.
3. Compress the folder in a zip le which should be named as your 7 digit student id.
4. Submit the zip le.
6 Rules

Reviews

There are no reviews yet.

Be the first to review “CSE310 – Assignment 2 Solved”

Your email address will not be published. Required fields are marked *