Publications‎ > ‎

International Journal Publications

Automatic Search-and-Replace from Examples with Coevolutionary Genetic Programming

posted May 20, 2019, 12:11 AM by Eric Medvet   [ updated May 20, 2019, 12:12 AM ]

  • IEEE Transactions on Cybernetics (TCyb), 2019, to appear
  • Alberto Bartoli, Andrea De Lorenzo, Eric Medvet, Fabiano Tarlao
We describe the design and implementation of a system for executing search-and-replace text processing tasks automatically, based only on examples of the desired behavior. The examples consist of pairs describing the original string and the desired modified string. Their construction, thus, does not require any specific technical skill. The system constructs a solution to the specified task that can be used unchanged on popular existing software for text processing. The solution consists of a search pattern coupled with a replacement expression: the former is a regular expression which describes both the strings to be replaced and their portions to be reused in the latter, which describes how to build the modified strings. Our proposed system is internally based on Genetic Programming and implements a form of cooperative coevolution in which two separate populations are evolved independently, one for search patterns and the other for replacement expressions. We assess our proposal on six tasks of realistic complexity obtaining very good results, both in terms of absolute quality of the solutions and with respect to the challenging baselines considered.

How Phishing Pages Look Like?

posted Nov 6, 2018, 7:32 AM by Eric Medvet   [ updated Dec 4, 2018, 1:41 AM ]

Recent phishing campaigns are increasingly targeted to specific, small population of users and last for increasingly shorter life spans. There is thus an urgent need for developing defense mechanisms that do not rely on any forms of blacklisting or reputation: there is simply no time for detecting novel phishing campaigns and notify all interested organizations quickly enough. Such mechanisms should be close to browsers and based solely on the visual appearance of the rendered page. One of the major impediments to research in this area is the lack of systematic knowledge about how phishing pages actually look like. In this work we describe the technical challenges in collecting a large and diverse collection of screenshots of phishing pages and propose practical solutions. We also analyze systematically the visual similarity between phishing pages and pages of targeted organizations, from the point of view of a similarity metric that has been proposed as a foundation for visual phishing detection and from the point of view of a human operator.

Weighted Hierarchical Grammatical Evolution

posted Oct 15, 2018, 2:25 AM by Eric Medvet   [ updated Nov 8, 2018, 6:20 AM ]

Grammatical Evolution (GE) is one of the most widespread techniques in evolutionary computation. Genotypes in GE are bit strings while phenotypes are strings of a language defined by a user-provided context-free grammar (CFG). In this work, we propose a novel procedure for mapping genotypes to phenotypes that we call Weighted Hierarchical GE (WHGE). WHGE imposes a form of hierarchy on the genotype and encodes grammar symbols with a varying number of bits based on the relative expressive power of those symbols. WHGE does not impose any constraint on the overall GE framework, in particular, WHGE may handle recursive grammars, uses the classical genetic operators, and does not need to define any bound in advance on the size of phenotypes.
We assessed experimentally our proposal in depth on a set of challenging and carefully selected benchmarks, comparing the results to the standard GE framework as well as to two of the most significant enhancements proposed in the literature: Position-independent GE and Structured GE. Our results show that WHGE delivers very good results in terms of fitness as well as in terms of the properties of the genotype-phenotype mapping procedure.

A Security-Oriented Analysis of Web Inclusions in the Italian Public Administration

posted Oct 15, 2018, 2:20 AM by Eric Medvet   [ updated Dec 4, 2018, 1:37 AM ]

Modern web sites serve content that browsers fetch automatically from a number of different web servers that may be placed anywhere in the world. Such content is essential for defining the appearance and behavior of a web site and is thus a potential target for attacks. Many public administrations offer services on the web, thus we have entered a world in which web sites of public interest are continuously and systematically depending on web servers that may be located anywhere in the world and are potentially under control of other governments. In this work we focus on these issues by investigating the content included by almost 10.000 web sites of the Italian Public Administration. We analyze the nature of such content, its quantity, its geographical location, the amount of dynamic variations over time. Our analyses demonstrate that the perimeter of trust of the Italian Public Administration collectively includes countries that are well beyond the control of the Italian government and provides several insights useful for implementing a centralized monitoring service aimed at detecting anomalies.

Enterprise Wi-Fi: we need devices that are secure by default

posted Sep 5, 2018, 4:35 AM by Eric Medvet   [ updated May 13, 2019, 6:40 AM ]

Wireless networks have become an essential component of virtually every enterprise. The security technology for these networks (WPA2 Enterprise) has been designed for a world that is very different from today’s world. Basic assumptions for secure deployment of the technology are now violated systematically. As a result, Wi-Fi enabled personal devices are typically at risk of leaking single sign-on enterprise credentials everywhere and without any need of explicit action from their owners. It is necessary to emphasize this pervasive yet largely underestimated risk.

Unveiling Evolutionary Algorithm Representation with DU Maps

posted Jul 11, 2018, 12:25 AM by Eric Medvet   [ updated Aug 1, 2018, 7:54 AM ]

Evolutionary Algorithms (EAs) have proven to be effective in tackling problems in many different domains. However, users are often required to spend a significant amount of effort in fine-tuning the EA parameters in order to make the algorithm work. In principle, visualization tools may be of great help in this laborious task, but current visualization tools are either EA-specific, and hence hardly available to all users, or too general to convey detailed information. In this work, we study the Diversity and Usage map (DU map), a compact visualization for analyzing a key component of every EA, the representation of solutions. In a single heat map, the DU map visualizes for entire runs how diverse the genotype is across the population and to which degree each gene in the genotype contributes to the solution. We demonstrate the generality of the DU map concept by applying it to six EAs that use different representations (bit and integer strings, trees, ensembles of trees, and neural networks). We present the results of an online user study about the usability of the DU map which confirm the suitability of the proposed tool and provide important insights on our design choices. By providing a visualization tool that can be easily tailored by specifying the diversity (D) and usage (U) functions, the DU map aims at being a powerful analysis tool for EAs practitioners, making EAs more transparent and hence lowering the barrier for their use.

Designing Automatically a Representation for Grammatical Evolution

posted Jul 5, 2018, 5:54 AM by Eric Medvet   [ updated Jul 17, 2018, 12:51 AM ]

A long-standing problem in Evolutionary Computation consists in how to choose an appropriate representation for the solutions. In this work we investigate the feasibility of synthesizing a representation automatically, for the large class of problems whose solution spaces can be defined by a context-free grammar. We propose a framework based on a form of meta-evolution in which individuals are candidate representations expressed with an ad hoc language that we have developed to this purpose. Individuals compete and evolve according to an evolutionary search aimed at optimizing such representation properties as redundancy, uniformity of redundancy, and locality. We assessed experimentally three variants of our framework on established benchmark problems and compared the resulting representations to human-designed representations commonly used (e.g., classical Grammatical Evolution). The results are promising as the evolved representations indeed exhibit better properties than the human-designed ones. Furthermore, the evolved representations compare favorably with the human-designed baselines in search effectiveness as well. Specifically, we select a best evolved representation as the representation with best search effectiveness on a set of learning problems and assess its effectiveness on a separate set of challenging validation problems. For each of the three proposed variants of our framework, the best evolved representation exhibits an average fitness rank on the set of validation problems that is better than the average fitness rank of the human-designed baselines on the same problems.

Evil Twins and WPA2 Enterprise: A Coming Security Disaster?

posted Dec 27, 2017, 3:21 AM by Eric Medvet   [ updated Jan 12, 2018, 12:17 AM ]

WPA2 Enterprise is a suite of protocols for secure communication in a wireless local network and has become an essential component of virtually every enterprise. In many practical deployments of this technology, a device that authenticates with username and password is at risk of leaking credentials to fraudulent access points claiming to be the enterprise network (evil twins) that may be placed virtually anywhere. While this kind of vulnerability is well known to practitioners, we believe these issues deserve a fresh look because the current technological landscape has magnified the corresponding risks. Convergence of organizations toward single sign-on architectures in which a single set of credentials unlock access to all services of the organizations, coupled with the huge diffusion of wifi-enabled personal devices which often contain enterprise credentials and that connect to wifi networks automatically, have made attacks aimed at stealing network credentials particularly attractive to attackers and hard to detect. In this paper we intend to draw the attention of the research and technological community on this important yet, in our opinion, widely underestimated risk. We also suggest a direction for investigating practical solutions able to offer stronger security without requiring any overhaul of existing protocols.

Active Learning of Regular Expressions for Entity Extraction

posted Mar 6, 2017, 1:02 AM by Eric Medvet   [ updated Mar 31, 2017, 8:35 AM ]

We consider the automatic synthesis of an entity extractor, in the form of a regular expression, from examples of the desired extractions in an unstructured text stream. This is a long-standing problem for which many different approaches have been proposed, which all require the preliminary construction of a large dataset fully annotated by the user. In this work we propose an active learning approach aimed at minimizing the user annotation effort: the user annotates only one desired extraction and then merely answers extraction queries generated by the system. During the learning process, the system digs into the input text for selecting the most appropriate extraction query to be submitted to the user in order to improve the current extractor. We construct candidate solutions with Genetic Programming and select queries with a form of querying-by-committee, i.e., based on a measure of disagreement within the best candidate solutions. All the components of our system are carefully tailored to the peculiarities of active learning with Genetic Programming and of entity extraction from unstructured text. We evaluate our proposal in depth, on a number of challenging datasets and based on a realistic estimate of the user effort involved in answering each single query. The results demonstrate high accuracy with significant savings in terms of computational effort, annotated characters and execution time over a state-of-the-art baseline.

An architecture for anonymous mobile coupons in a large network

posted Nov 17, 2016, 6:32 AM by Eric Medvet   [ updated Dec 19, 2016, 2:10 PM ]

A mobile coupon (m-coupon) can be presented with a smartphone for obtaining a financial discount when purchasing a product or service. M-coupons are a powerful marketing tool that has enjoyed a huge growth and diffusion, involving tens of millions of people each year.
We propose an architecture which may enable significant improvements over current m-coupon technology, in terms of acceptance of potential customers and of marketing actions that become feasible: the customer does not need to install any dedicated app; a m-coupon is not bound to any specific device or customer; a m-coupon may be redeemed at any store in a set of potentially many thousands of stores, without any prior arrangement between customer and store. We are not aware of any proposal with these properties.

1-10 of 23