CS321 Solved

$ 20.99


Lab 1
Srinibas Swain (srinibas@iiitg.ac.in)
In this part of the Assignment, you will use the lexical analyser generator lex or its variant f lex.
Some useful references on Lex and Yacc:
• T. Niemann, Lex & Yacc Tutorial, http://epaperpress.com/lexandyacc/
• Doug Brown, John Levine, and Tony Mason, lex and yacc (2nd edn.), O’Reilly, 2012.
• the lex manpages.
We hope you enjoy the lab and, more generally, the unit!
1 Lex
An input file to lex is, by convention, given a name ending in .l. Such a file has three parts:
• definitions,
• rules,
• C code
These are separated by double-percent, %%. Comments begin with /* and end with */. Any comments are ignored when lex is run on the file. We will use f lex for constructing a lexical analyzer. flex is a fast lexical analyzer generator. flex takes user’s specifications and generates a combined NFA to recognize all user defined patterns, converts it to an equivalent DFA, minimizes the automaton as much as possible, and generates C code that will implement it.
1.1 Nuts and bolts of flex
flex is designed for use with C code and generates a scanner written in C. The scanner is specified using regular expressions for patterns and C code for the actions. The specification files are traditionally identified by their .l extension. You invoke flex on a .l file and it creates lex.yy.c, a source file containing a bunch of unrecognisable C code that implements a DFA encoding all your rules and including the code for the actions you specified. The file provides an extern function yylex() that will scan one token. You compile that C file normally, link with the lex library, and you have built a scanner! The scanner reads from stdin and writes to stdout by default. flex is open source and can be installed by
sudo apt-get install flex
To run the lexical analyzer, follow the following steps:
flex myFile.l creates lex.yy.c containing C code for scanner gcc -o myScan lex.yy.c -ll compiles scanner, links with lex library
./myScan executes scanner, will read from stdin
Linking with the lex library is important. It provides a simple main function that repeatedly calls the function yylex() until it reaches EOF.
1.1.1 Structure of a flex file
flex input files are structured as follows:
User subroutines
1.1.2 flex global variables
The token grabbing function yylex() takes no arguments and returns an integer. Here are some of the global variables used in flex:
• yytext is a nullterminated string containing the text of the lexeme which was last recognized as a token. This global variable is declared and managed in the lex.yy.c file. Do not modify its contents. The buffer is overwritten with each subsequent token.
• yyleng is an integer holding the length of the lexeme stored in yytext. This global variable is declared and managed in the lex.yy.c file.
• yylval is the global variable used to store attributes about the token, e.g. for an integer lexeme it might store the value, for a string literal, the pointer to its characters and so on.
• yylloc is the global variable that is used to store the location (line and column) of the token.
2 Example
This is the example we discussed in the class. The task is to identify the follwoing English verbs (tokens). is am are was were go.
/∗ very simple∗/
[ t ]+ /∗ ignore white space ∗/; is |am| are | was | were | go { printf(”%s : is a verb ” , yytext ); }
[a−zA−Z]+ { printf(”%s : is not a verb ” , yytext );}
.| n {ECHO; /∗ normal default anyway ∗/ }
{ yylex ();
3 Exercise
• Write a lexical analyzer that counts the different types of lines that contains code, that just contain comments, or are blank.
• Extend the above analyzer to count braces, keywords etc.


There are no reviews yet.

Be the first to review “CS321 Solved”

Your email address will not be published. Required fields are marked *