Automatic Generation of Regular Expressions from Examples with Genetic Programming

posted Mar 14, 2012, 2:49 AM by Eric Medvet   [ updated Dec 10, 2012, 6:25 AM ]
  • ACM Genetic and Evolutionary Computation Conference (GECCO), 2012, Philadelphia (US)
  • Alberto Bartoli, Giorgio Davanzo, Andrea De Lorenzo, Marco Mauri, Eric Medvet, Enrico Sorio
  • Google Scholar

We explore the practical feasibility of a system based on genetic programming (GP) for the automatic generation of regular expressions. The user describes the desired task by providing a set of labeled examples, in the form of text lines. The system uses these examples for driving the evolutionary search towards a regular expression suitable for the specified task. Usage of the system should require neither familiarity with GP nor with regular expressions syntax. In our GP implementation each individual represents a syntactically correct regular expression. We performed an experimental evaluation on two different extraction tasks applied to real-world datasets and obtained promising results in terms of precision and recall, even in comparison to an earlier state-of-the-art proposal.

Eric Medvet,
Jul 18, 2012, 7:11 AM