14 de junho de 2010
por wiglot
As part of my GSoC, i have to build a plugin to Rocs read dot files (graph files from graphviz, so i start it in this weekend.
.dot files have a singular format, it’s not a xml and can have a kind of complex structure (a graph can have nodes, edges, subgraphs, edges like nodes and also graphs, can have 0 or more properties, a graph also can be a digraph and as basic types we have integers, strings (quoted and not) )
First, i need the grammar to this file format. The first one that i found is here, is a simpliffied grammar but have all the basic graph representation. A complete grammar, can be found in graphviz site but, in this first version i will use the first one, so when all the stuff is running, just will need to update the grammar.
Ok, at that moment i got the grammar, but how parse a file? at university i’m having a class about compillers and talking about predictive parser, that can be used to read this file, off course, will need a great job to write the parser and will need a lexical analizer. Until saturday’s morning that was the plan, and now i see that i will lose many days with that.
‘What a waste of time’ i thought. Someone must have gone through it and must have writeen a tool for that. With a little more of search, i found the couple lex/yacc (replaced in GNU by tools flex/bison), and what couple!
To those that don’t know what is lex/yacc:
lex generate a lexical analiser following regular expressions defined by user;
yacc generate a parser following some user’s grammar.
and we can use the results from lex as input of yacc, making all the nasty work of writing lexer/parser.
But lex/yacc are complex (and complete) tools, i will not explain how it work, but you can take a look in this tutorial , but i will show some pitfalls that i found in my way. At end i will show a lex/yacc running up CMake to generate files to be compiled by g++ compiler (lex/yacc generate a C code).
REGEX order:
To this lex input file
[a-z]+ printf("Found a word.");
graph printf("Found the 'graph' word.");
‘graph’ will be considered as a word. Lex folow the order of it input file. Exchange those 2 lines to
graph printf("Found the 'graph' word.");
[a-z]+ printf("Found a word.");
‘graph’ now is considered the graph word.
Use C++ in parser
Who use c++ for a time get adicted to it and refuses to use other language (including c ), but yacc and lex generates c codes, so how we can use our beloved C++ in parser? simple, insted of generate a file with c extension, we generate a file with cc (or cpp, c++) extension (as the same to header file generated by yacc). The content of file will remais the same, but now gcc will treat as c++ source. Just it? you may ask. Off course not! if you do it and try to compile you will got a lot of undeclared functions.
a workaroud presented in the first reference is to embrace all function declaration into an ‘extern “C” {}’ in the yacc input file (file.y), like this:
%{
#include
#include "DotParser.h" //My include with c++ code
extern "C" {
void yyerror(const char *str){
fprintf(stderr,"error: %s\n",str);
}
int yyparse(void);
int yylex(void);
int yywrap(){
return 1;
}
void myFunctionToParseFile(){
printf("Doing something!\n");
yyparse();//start parse
}
}
%}
%%
GRAMMAR
%%
and in the lex input file (file.l) also need some love:
%{
extern "C" {
int yylex(void);
}
#include "y.tab.hh"
%}
%%
[0-9]+ printf("NUMBER");
%%
Using CMake to glue all the things
Here Andy explain how to generate files with lex/yacc. I found some errors (maybe by outdated). Changes are in bold:
# Create target for the parser
ADD_CUSTOM_TARGET(FooParser echo “Creating parser.c”)
# Create custom command for flex/lex (note the outputs)
ADD_CUSTOM_COMMAND(
SOURCE ${Foo_SOURCE_DIR}/src/lexer.l
COMMAND lex
ARGS -o${Foo_BINARY_DIR}/src/lexer.cc
${Foo_SOURCE_DIR}/src/lexer.l
TARGET FooParser
OUTPUTS ${Foo_BINARY_DIR}/src/lexer.cc)
# Create custom command for bison/yacc (note the DEPENDS)
ADD_CUSTOM_COMMAND(
SOURCE ${Foo_SOURCE_DIR}/src/parser.y
COMMAND yacc
ARGS -d ${Foo_SOURCE_DIR}/src/parser.y
-o ${Foo_BINARY_DIR}/src/parser.cc
TARGET FooParser
DEPENDS ${Foo_BINARY_DIR}/src/lexer.cc
OUTPUTS ${Foo_BINARY_DIR}/src/parser.cc)
# Add parser.c to the list of sources
SET(Foo_SRCS ${Foo_SRCS} ${Foo_BINARY_DIR}/src/parser.cc ${Foo_BINARY_DIR}/src/lexer.cc)
# Since parser.c does not exists yet when cmake is run, mark
# it as generated
SET_SOURCE_FILES_PROPERTIES(${Foo_BINARY_DIR}/src/parser.cc GENERATED)
# Include binary directory to include lexer.c in parser.c
INCLUDE_DIRECTORIES(${Foo_BINARY_DIR}/src)
with this, now i have mylex/yacc files compiled along my sources. Now is just fill up the grammar with the OO codes and merge with rocs source. When i merge with the rocs, i will not use cmake to generate lex/yacc files, i will commit the generated files only, avoiding to user that will compile the rocs to depends of lex/yacc. I am now used this format to move from me to CMake this whole set of commands.
Gostar disso:
One blogger likes this post.
Novos Comentários: