Publications‎ > ‎

International Journal Publications

Unveiling Evolutionary Algorithm Representation with DU Maps

posted Jul 11, 2018, 12:25 AM by Eric Medvet   [ updated Aug 1, 2018, 7:54 AM ]

Evolutionary Algorithms (EAs) have proven to be effective in tackling problems in many different domains. However, users are often required to spend a significant amount of effort in fine-tuning the EA parameters in order to make the algorithm work. In principle, visualization tools may be of great help in this laborious task, but current visualization tools are either EA-specific, and hence hardly available to all users, or too general to convey detailed information. In this work, we study the Diversity and Usage map (DU map), a compact visualization for analyzing a key component of every EA, the representation of solutions. In a single heat map, the DU map visualizes for entire runs how diverse the genotype is across the population and to which degree each gene in the genotype contributes to the solution. We demonstrate the generality of the DU map concept by applying it to six EAs that use different representations (bit and integer strings, trees, ensembles of trees, and neural networks). We present the results of an online user study about the usability of the DU map which confirm the suitability of the proposed tool and provide important insights on our design choices. By providing a visualization tool that can be easily tailored by specifying the diversity (D) and usage (U) functions, the DU map aims at being a powerful analysis tool for EAs practitioners, making EAs more transparent and hence lowering the barrier for their use.

Designing Automatically a Representation for Grammatical Evolution

posted Jul 5, 2018, 5:54 AM by Eric Medvet   [ updated Jul 17, 2018, 12:51 AM ]

A long-standing problem in Evolutionary Computation consists in how to choose an appropriate representation for the solutions. In this work we investigate the feasibility of synthesizing a representation automatically, for the large class of problems whose solution spaces can be defined by a context-free grammar. We propose a framework based on a form of meta-evolution in which individuals are candidate representations expressed with an ad hoc language that we have developed to this purpose. Individuals compete and evolve according to an evolutionary search aimed at optimizing such representation properties as redundancy, uniformity of redundancy, and locality. We assessed experimentally three variants of our framework on established benchmark problems and compared the resulting representations to human-designed representations commonly used (e.g., classical Grammatical Evolution). The results are promising as the evolved representations indeed exhibit better properties than the human-designed ones. Furthermore, the evolved representations compare favorably with the human-designed baselines in search effectiveness as well. Specifically, we select a best evolved representation as the representation with best search effectiveness on a set of learning problems and assess its effectiveness on a separate set of challenging validation problems. For each of the three proposed variants of our framework, the best evolved representation exhibits an average fitness rank on the set of validation problems that is better than the average fitness rank of the human-designed baselines on the same problems.

Evil Twins and WPA2 Enterprise: A Coming Security Disaster?

posted Dec 27, 2017, 3:21 AM by Eric Medvet   [ updated Jan 12, 2018, 12:17 AM ]

WPA2 Enterprise is a suite of protocols for secure communication in a wireless local network and has become an essential component of virtually every enterprise. In many practical deployments of this technology, a device that authenticates with username and password is at risk of leaking credentials to fraudulent access points claiming to be the enterprise network (evil twins) that may be placed virtually anywhere. While this kind of vulnerability is well known to practitioners, we believe these issues deserve a fresh look because the current technological landscape has magnified the corresponding risks. Convergence of organizations toward single sign-on architectures in which a single set of credentials unlock access to all services of the organizations, coupled with the huge diffusion of wifi-enabled personal devices which often contain enterprise credentials and that connect to wifi networks automatically, have made attacks aimed at stealing network credentials particularly attractive to attackers and hard to detect. In this paper we intend to draw the attention of the research and technological community on this important yet, in our opinion, widely underestimated risk. We also suggest a direction for investigating practical solutions able to offer stronger security without requiring any overhaul of existing protocols.

Active Learning of Regular Expressions for Entity Extraction

posted Mar 6, 2017, 1:02 AM by Eric Medvet   [ updated Mar 31, 2017, 8:35 AM ]

We consider the automatic synthesis of an entity extractor, in the form of a regular expression, from examples of the desired extractions in an unstructured text stream. This is a long-standing problem for which many different approaches have been proposed, which all require the preliminary construction of a large dataset fully annotated by the user. In this work we propose an active learning approach aimed at minimizing the user annotation effort: the user annotates only one desired extraction and then merely answers extraction queries generated by the system. During the learning process, the system digs into the input text for selecting the most appropriate extraction query to be submitted to the user in order to improve the current extractor. We construct candidate solutions with Genetic Programming and select queries with a form of querying-by-committee, i.e., based on a measure of disagreement within the best candidate solutions. All the components of our system are carefully tailored to the peculiarities of active learning with Genetic Programming and of entity extraction from unstructured text. We evaluate our proposal in depth, on a number of challenging datasets and based on a realistic estimate of the user effort involved in answering each single query. The results demonstrate high accuracy with significant savings in terms of computational effort, annotated characters and execution time over a state-of-the-art baseline.

An architecture for anonymous mobile coupons in a large network

posted Nov 17, 2016, 6:32 AM by Eric Medvet   [ updated Dec 19, 2016, 2:10 PM ]

A mobile coupon (m-coupon) can be presented with a smartphone for obtaining a financial discount when purchasing a product or service. M-coupons are a powerful marketing tool that has enjoyed a huge growth and diffusion, involving tens of millions of people each year.
We propose an architecture which may enable significant improvements over current m-coupon technology, in terms of acceptance of potential customers and of marketing actions that become feasible: the customer does not need to install any dedicated app; a m-coupon is not bound to any specific device or customer; a m-coupon may be redeemed at any store in a set of potentially many thousands of stores, without any prior arrangement between customer and store. We are not aware of any proposal with these properties.

Regex-based Entity Extraction with Active Learning and Genetic Programming

posted Aug 26, 2016, 12:49 AM by Eric Medvet   [ updated Oct 3, 2016, 2:01 AM ]

We consider the long-standing problem of the automatic generation of regular expressions for text extraction, based solely on examples of the desired behavior. We investigate several active learning approaches in which the user annotates only one desired extraction and then merely answers extraction queries generated by the system.
 The resulting framework is attractive because it is the system, not the user, which digs out the data in search of the samples most suitable to the specific learning task. We tailor our proposals to a state-of-the-art learner based on Genetic Programming and we assess them experimentally on a number of challenging tasks of realistic complexity. The results indicate that active learning is indeed a viable framework in this application domain and may thus significantly decrease the amount of costly annotation effort required.

Predicting the Effectiveness of Pattern-based Entity Extractor Inference

posted May 16, 2016, 2:21 AM by Eric Medvet   [ updated May 27, 2016, 3:31 AM ]

An essential component of any workflow leveraging digital data consists in the identification and extraction of relevant patterns from a data stream. We consider a scenario in which an extraction inference engine generates an entity extractor automatically from examples of the desired behavior, which take the form of user-provided annotations of the entities to be extracted from a dataset. We propose a methodology for predicting the accuracy of the extractor that may be inferred from the available examples. We propose several prediction techniques and analyze experimentally our proposals in great depth, with reference to extractors consisting of regular expressions. The results suggest that reliable predictions for tasks of practical complexity may indeed be obtained quickly and without actually generating the entity extractor.

Can A Machine Replace Humans In Building Regular Expressions? A Case Study

posted Mar 23, 2016, 5:05 AM by Eric Medvet   [ updated Jun 7, 2016, 2:36 AM ]

Regular expressions are routinely used in a variety of different application domains. Building a regular expression involves a considerable amount of skill, expertise and creativity. In this work we investigate whether a machine may surrogate these qualities and construct automatically regular expressions for tasks of realistic complexity. We discuss a large scale experiment involving more than 1700 users on 10 challenging tasks. We compared the solutions constructed by these users to those constructed by a tool based on Genetic Programming that we have recently developed and made publicly available. The quality of automatically-constructed solutions turned out to be similar to the quality of those constructed by the most skilled user group; and, the time for automatic construction was similar to the time required by human users.

Inference of Regular Expressions for Text Extraction from Examples

posted Mar 21, 2016, 4:48 AM by Eric Medvet   [ updated May 26, 2016, 2:57 AM ]

A large class of entity extraction tasks from text that is either semistructured or fully unstructured may be addressed by regular expressions, because in many practical cases the relevant entities follow an underlying syntactical pattern and this pattern may be described by a regular expression. In this work we consider the long-standing problem of synthesizing such expressions automatically, based solely on examples of the desired behavior.
We present the design and implementation of a system capable of addressing extraction tasks of realistic complexity. Our system is based on an evolutionary procedure carefully tailored to the specific needs of regular expression generation by examples. The procedure executes a search driven by a multiobjective optimization strategy aimed at simultaneously improving multiple performance indexes of candidate solutions while at the same time ensuring an adequate exploration of the huge solution space. We assess our proposal experimentally in great depth, on a number of challenging datasets. The accuracy of the obtained solutions seems to be adequate for practical usage and improves over earlier proposals significantly. Most importantly, our results are highly competitive even with respect to human operators. A prototype is available as a web application at

Data Quality Challenge: Toward a tool for string processing by examples

posted Jun 9, 2015, 4:25 AM by Eric Medvet   [ updated May 26, 2016, 3:47 AM ]

Many data-related activities at organizations of all sizes are concerned with low-level string processing, such as format transformation and validation, data cleaning, substring extraction and classification, and so on. Problems of this sort occur routinely in a one-off fashion as part of specific processes or activities that cannot be integrated in long-lived workflows, such as analysis of data gathered from the web or from other enterprise sources. The input stream may range from structured or semistructured data, such as database tables or spreadsheets, to unstructured data, such as text in natural language, as well as data whose structure is not available, such as a web page or a pdf invoice. These tasks are a vital ingredient of virtually every organization but are difficult to address efficiently: they are usually too simple to justify the cost and latency of a full-blown IT project, yet they are not simple enough to be solved by non-IT specialists.

1-10 of 18