A weekend full of Lex / Yacc

As part of my GSoC, i have to build a plugin to Rocs read dot files (graph files from graphviz, so i start it in this weekend.
.dot files have a singular format, it’s not a xml and can have a kind of complex structure (a graph can have nodes, edges, subgraphs, edges like nodes and also graphs, can have 0 or more properties, a graph also can be a digraph and as basic types we have integers, strings (quoted and not) )

First, i need the grammar to this file format. The first one that i found is here, is a simpliffied grammar but have all the basic graph representation. A complete grammar, can be found in graphviz site but, in this first version i will use the first one, so when all the stuff is running, just will need to update the grammar.

Ok, at that moment i got the grammar, but how parse a file? at university i’m having a class about compillers and talking about predictive parser, that can be used to read this file, off course, will need a great job to write the parser and will need a lexical analizer. Until saturday’s morning that was the plan, and now i see that i will lose many days with that.

‘What a waste of time’ i thought. Someone must have gone through it and must have writeen a tool for that. With a little more of search, i found the couple lex/yacc (replaced in GNU by tools flex/bison), and what couple!
To those that don’t know what is lex/yacc:
lex generate a lexical analiser following regular expressions defined by user;
yacc generate a parser following some user’s grammar.
and we can use the results from lex as input of yacc, making all the nasty work of writing lexer/parser.
But lex/yacc are complex (and complete) tools, i will not explain how it work, but you can take a look in this tutorial , but i will show some pitfalls that i found in my way. At end i will show a lex/yacc running up CMake to generate files to be compiled by g++ compiler (lex/yacc generate a C code).

REGEX order:
To this lex input file

[a-z]+ printf("Found a word.");
graph printf("Found the 'graph' word.");

‘graph’ will be considered as a word. Lex folow the order of it input file. Exchange those 2 lines to

graph printf("Found the 'graph' word.");
[a-z]+ printf("Found a word.");

‘graph’ now is considered the graph word.

Use C++ in parser
Who use c++ for a time get adicted to it and refuses to use other language (including c ), but yacc and lex generates c codes, so how we can use our beloved C++ in parser? simple, insted of generate a file with c extension, we generate a file with cc (or cpp, c++) extension (as the same to header file generated by yacc). The content of file will remais the same, but now gcc will treat as c++ source. Just it? you may ask. Off course not! if you do it and try to compile you will got a lot of undeclared functions.

a workaroud presented in the first reference is to embrace all function declaration into an ‘extern “C” {}’ in the yacc input file (file.y), like this:

%{
#include
#include "DotParser.h" //My include with c++ code

extern "C" {
void yyerror(const char *str){
fprintf(stderr,"error: %s\n",str);
}
int yyparse(void);
int yylex(void);
int yywrap(){
return 1;
}
void myFunctionToParseFile(){
printf("Doing something!\n");
yyparse();//start parse
}
}

%}
%%
GRAMMAR
%%

and in the lex input file (file.l) also need some love:

%{
extern "C" {
int yylex(void);
}
#include "y.tab.hh"

%}
%%
[0-9]+       printf("NUMBER");
%%

Using CMake to glue all the things
Here Andy explain how to generate files with lex/yacc. I found some errors (maybe by outdated). Changes are in bold:
# Create target for the parser
ADD_CUSTOM_TARGET(FooParser echo “Creating parser.c”)

# Create custom command for flex/lex (note the outputs)
ADD_CUSTOM_COMMAND(
SOURCE ${Foo_SOURCE_DIR}/src/lexer.l
COMMAND lex
ARGS -o${Foo_BINARY_DIR}/src/lexer.cc
${Foo_SOURCE_DIR}/src/lexer.l
TARGET FooParser
OUTPUTS ${Foo_BINARY_DIR}/src/lexer.cc)

# Create custom command for bison/yacc (note the DEPENDS)
ADD_CUSTOM_COMMAND(
SOURCE ${Foo_SOURCE_DIR}/src/parser.y
COMMAND yacc
ARGS -d ${Foo_SOURCE_DIR}/src/parser.y
-o ${Foo_BINARY_DIR}/src/parser.cc
TARGET FooParser
DEPENDS ${Foo_BINARY_DIR}/src/lexer.cc
OUTPUTS ${Foo_BINARY_DIR}/src/parser.cc)

# Add parser.c to the list of sources
SET(Foo_SRCS ${Foo_SRCS} ${Foo_BINARY_DIR}/src/parser.cc ${Foo_BINARY_DIR}/src/lexer.cc)

# Since parser.c does not exists yet when cmake is run, mark
# it as generated
SET_SOURCE_FILES_PROPERTIES(${Foo_BINARY_DIR}/src/parser.cc GENERATED)

# Include binary directory to include lexer.c in parser.c
INCLUDE_DIRECTORIES(${Foo_BINARY_DIR}/src)

with this, now i have mylex/yacc files compiled along my sources. Now is just fill up the grammar with the OO codes and merge with rocs source. When i merge with the rocs, i will not use cmake to generate lex/yacc files, i will commit the generated files only, avoiding to user that will compile the rocs to depends of lex/yacc. I am now used this format to move from me to CMake this whole set of commands.

16 Responses to A weekend full of Lex / Yacc

  1. pinotree disse:

    Maybe you can just use graphviz’s library, instead of rollout yet another parser.

    • wiglot disse:

      I’m trying to make the plugin only dependent of KDE. With Yacc/Lex i just send the generated files, no external libs. Thnks for the tip

  2. Tobias Koenig disse:

    Hej,

    is there any special reason why you don’t use the read_graphviz Method from the boost project? http://www.boost.org/doc/libs/1_43_0/libs/graph/doc/read_graphviz.html

    • wiglot disse:

      the special reason is that i did not know it :)

      I will take a look on it, thanks

  3. Milian Wolff disse:

    KGraphViewer uses boost::spirit (definitely not easy to understand) to parse Dot. You could have taken code from there:

    svn+ssh://svn.kde.org/home/kde/trunk/extragear/graphics/kgraphviewer

    • wiglot disse:

      oh, i forgot KGraphViewer !!!!
      ‘que coisa!’
      Thanks for the link

  4. Héctor disse:

    Oi,

    Provided you are using a recent CMake version, it has built-in Flex/Bison support :)

    Check this post about “Flex with CMake”:
    http://xtor.warp.es/?p=708

    • wiglot disse:

      the reference that i use was from 2002, but was the ‘first’ that work :)

      Good to know that cmake support Flex (and probably also supports the Bison)

      I think I’ll take one of the other ways mentioned, but the tip is really good

  5. thorGT disse:

    Have you considered using Boost::Spirit – a completely object-oriented and pure C++ approach to generating parsers? Lex/yacc, while being able to produce C++ code, (although this C++ support is still experimental), doesn’t generate completely C++ code, it basically wraps C code into a class AFAIK.
    And a more generic question – to what extent is Boost::Graph used in Rocs and your project? This is the de-facto standard for C++ graph operations, and while the code is scary, this is the future of C++, so I highly recommend to use Boost:Graph as much as possible so as to not create duplication of effort.

    • wiglot disse:

      Boost::Spirit… never hear before, but i will take a look on it, thanks.

      About Boost:graph, it’s a great tool. Tomaz, and me later, tried to change the actual representation of graphs to Boost::Graph, but it’s too complex to just represent few nodes. We are thinking to use Boost::graph (as plugin) if the user want to work in a huge graph. But for now, with small graphs, our structure is fine.

      thanks for your words.

  6. The User disse:

    Hi!

    First of all: In Bison 2.4 you can tell him to create C++-output, it works well, even Java works. Your extertn “C” method is hackish.
    Second: Why not use the existing dot-Parser in KDE?
    Third: Why not use a more modern parser-generator, e.g. AntLR or KDE’s KDev-PG-Qt? Both generate parse-trees for you, KDevPG-Qt even generates visitors, while AntLR provides LL(*) parsing.

    The User

    • wiglot disse:

      first: Yes, now i see that is a hackish(sice begin i don’t like this solution)
      Second: I will do it :)
      Third: Lex/yacc is what i know (until today). Now i know more options.

      Thanks every one by feedback.

  7. Bugsbane disse:

    Am I the only one who first read the title of this blog post as “A weekend of sex/lace?”

    • wiglot disse:

      Ohh, a quick glance may appear even with that :)

      i fix putting some uppercases (i don’t want atract readers with that post title :) )

  8. Rizo disse:

    Take look at the CMake documentation on Flex/Bison modules usage:

    http://www.cmake.org/cmake/help/cmake-2-8-docs.html#module:FindFLEX

    • wiglot disse:

      Thanks for the link

Deixe uma resposta

Preencha os seus dados abaixo ou clique em um ícone para log in:

Logotipo do WordPress.com

Você está comentando utilizando sua conta WordPress.com. Sair / Alterar )

Imagem do Twitter

Você está comentando utilizando sua conta Twitter. Sair / Alterar )

Foto do Facebook

Você está comentando utilizando sua conta Facebook. Sair / Alterar )

Foto do Google+

Você está comentando utilizando sua conta Google+. Sair / Alterar )

Conectando a %s

%d blogueiros gostam disto: