A weekend full of Lex / Yacc
14 de junho de 2010 16 Comentários
As part of my GSoC, i have to build a plugin to Rocs read dot files (graph files from graphviz, so i start it in this weekend.
.dot files have a singular format, it’s not a xml and can have a kind of complex structure (a graph can have nodes, edges, subgraphs, edges like nodes and also graphs, can have 0 or more properties, a graph also can be a digraph and as basic types we have integers, strings (quoted and not) )
First, i need the grammar to this file format. The first one that i found is here, is a simpliffied grammar but have all the basic graph representation. A complete grammar, can be found in graphviz site but, in this first version i will use the first one, so when all the stuff is running, just will need to update the grammar.
Ok, at that moment i got the grammar, but how parse a file? at university i’m having a class about compillers and talking about predictive parser, that can be used to read this file, off course, will need a great job to write the parser and will need a lexical analizer. Until saturday’s morning that was the plan, and now i see that i will lose many days with that.
‘What a waste of time’ i thought. Someone must have gone through it and must have writeen a tool for that. With a little more of search, i found the couple lex/yacc (replaced in GNU by tools flex/bison), and what couple!
To those that don’t know what is lex/yacc:
lex generate a lexical analiser following regular expressions defined by user;
yacc generate a parser following some user’s grammar.
and we can use the results from lex as input of yacc, making all the nasty work of writing lexer/parser.
But lex/yacc are complex (and complete) tools, i will not explain how it work, but you can take a look in this tutorial , but i will show some pitfalls that i found in my way. At end i will show a lex/yacc running up CMake to generate files to be compiled by g++ compiler (lex/yacc generate a C code).
REGEX order:
To this lex input file
[a-z]+ printf("Found a word.");
graph printf("Found the 'graph' word.");
‘graph’ will be considered as a word. Lex folow the order of it input file. Exchange those 2 lines to
graph printf("Found the 'graph' word.");
[a-z]+ printf("Found a word.");
‘graph’ now is considered the graph word.
Use C++ in parser
Who use c++ for a time get adicted to it and refuses to use other language (including c ), but yacc and lex generates c codes, so how we can use our beloved C++ in parser? simple, insted of generate a file with c extension, we generate a file with cc (or cpp, c++) extension (as the same to header file generated by yacc). The content of file will remais the same, but now gcc will treat as c++ source. Just it? you may ask. Off course not! if you do it and try to compile you will got a lot of undeclared functions.
a workaroud presented in the first reference is to embrace all function declaration into an ‘extern “C” {}’ in the yacc input file (file.y), like this:
%{
#include
#include "DotParser.h" //My include with c++ code
extern "C" {
void yyerror(const char *str){
fprintf(stderr,"error: %s\n",str);
}
int yyparse(void);
int yylex(void);
int yywrap(){
return 1;
}
void myFunctionToParseFile(){
printf("Doing something!\n");
yyparse();//start parse
}
}
%}
%%
GRAMMAR
%%
and in the lex input file (file.l) also need some love:
%{
extern "C" {
int yylex(void);
}
#include "y.tab.hh"
%}
%%
[0-9]+ printf("NUMBER");
%%
Using CMake to glue all the things
Here Andy explain how to generate files with lex/yacc. I found some errors (maybe by outdated). Changes are in bold:
# Create target for the parser
ADD_CUSTOM_TARGET(FooParser echo “Creating parser.c”)
# Create custom command for flex/lex (note the outputs)
ADD_CUSTOM_COMMAND(
SOURCE ${Foo_SOURCE_DIR}/src/lexer.l
COMMAND lex
ARGS -o${Foo_BINARY_DIR}/src/lexer.cc
${Foo_SOURCE_DIR}/src/lexer.l
TARGET FooParser
OUTPUTS ${Foo_BINARY_DIR}/src/lexer.cc)
# Create custom command for bison/yacc (note the DEPENDS)
ADD_CUSTOM_COMMAND(
SOURCE ${Foo_SOURCE_DIR}/src/parser.y
COMMAND yacc
ARGS -d ${Foo_SOURCE_DIR}/src/parser.y
-o ${Foo_BINARY_DIR}/src/parser.cc
TARGET FooParser
DEPENDS ${Foo_BINARY_DIR}/src/lexer.cc
OUTPUTS ${Foo_BINARY_DIR}/src/parser.cc)
# Add parser.c to the list of sources
SET(Foo_SRCS ${Foo_SRCS} ${Foo_BINARY_DIR}/src/parser.cc ${Foo_BINARY_DIR}/src/lexer.cc)
# Since parser.c does not exists yet when cmake is run, mark
# it as generated
SET_SOURCE_FILES_PROPERTIES(${Foo_BINARY_DIR}/src/parser.cc GENERATED)
# Include binary directory to include lexer.c in parser.c
INCLUDE_DIRECTORIES(${Foo_BINARY_DIR}/src)
with this, now i have mylex/yacc files compiled along my sources. Now is just fill up the grammar with the OO codes and merge with rocs source. When i merge with the rocs, i will not use cmake to generate lex/yacc files, i will commit the generated files only, avoiding to user that will compile the rocs to depends of lex/yacc. I am now used this format to move from me to CMake this whole set of commands.












Maybe you can just use graphviz’s library, instead of rollout yet another parser.
I’m trying to make the plugin only dependent of KDE. With Yacc/Lex i just send the generated files, no external libs. Thnks for the tip
Hej,
is there any special reason why you don’t use the read_graphviz Method from the boost project? http://www.boost.org/doc/libs/1_43_0/libs/graph/doc/read_graphviz.html
the special reason is that i did not know it :)
I will take a look on it, thanks
KGraphViewer uses boost::spirit (definitely not easy to understand) to parse Dot. You could have taken code from there:
svn+ssh://svn.kde.org/home/kde/trunk/extragear/graphics/kgraphviewer
oh, i forgot KGraphViewer !!!!
‘que coisa!’
Thanks for the link
Oi,
Provided you are using a recent CMake version, it has built-in Flex/Bison support :)
Check this post about “Flex with CMake”:
http://xtor.warp.es/?p=708
the reference that i use was from 2002, but was the ‘first’ that work :)
Good to know that cmake support Flex (and probably also supports the Bison)
I think I’ll take one of the other ways mentioned, but the tip is really good
Have you considered using Boost::Spirit – a completely object-oriented and pure C++ approach to generating parsers? Lex/yacc, while being able to produce C++ code, (although this C++ support is still experimental), doesn’t generate completely C++ code, it basically wraps C code into a class AFAIK.
And a more generic question – to what extent is Boost::Graph used in Rocs and your project? This is the de-facto standard for C++ graph operations, and while the code is scary, this is the future of C++, so I highly recommend to use Boost:Graph as much as possible so as to not create duplication of effort.
Boost::Spirit… never hear before, but i will take a look on it, thanks.
About Boost:graph, it’s a great tool. Tomaz, and me later, tried to change the actual representation of graphs to Boost::Graph, but it’s too complex to just represent few nodes. We are thinking to use Boost::graph (as plugin) if the user want to work in a huge graph. But for now, with small graphs, our structure is fine.
thanks for your words.
Hi!
First of all: In Bison 2.4 you can tell him to create C++-output, it works well, even Java works. Your extertn “C” method is hackish.
Second: Why not use the existing dot-Parser in KDE?
Third: Why not use a more modern parser-generator, e.g. AntLR or KDE’s KDev-PG-Qt? Both generate parse-trees for you, KDevPG-Qt even generates visitors, while AntLR provides LL(*) parsing.
The User
first: Yes, now i see that is a hackish(sice begin i don’t like this solution)
Second: I will do it :)
Third: Lex/yacc is what i know (until today). Now i know more options.
Thanks every one by feedback.
Am I the only one who first read the title of this blog post as “A weekend of sex/lace?”
Ohh, a quick glance may appear even with that :)
i fix putting some uppercases (i don’t want atract readers with that post title :) )
Take look at the CMake documentation on Flex/Bison modules usage:
http://www.cmake.org/cmake/help/cmake-2-8-docs.html#module:FindFLEX
Thanks for the link