Regenerate: a language generator for extended regular expressions
- 7 April 2020
- journal article
- research article
- Published by Association for Computing Machinery (ACM) in ACM SIGPLAN Notices
- Vol. 53 (9), 202-214
- https://doi.org/10.1145/3393934.3278133
Abstract
Regular expressions are part of every programmer’s toolbox. They are used for a wide variety of language-related tasks and there are many algorithms for manipulating them. In particular, matching algorithms that detect whether a word belongs to the language described by a regular expression are well explored, yet new algorithms appear frequently. However, there is no satisfactory methodology for testing such matchers. We propose a testing methodology which is based on generating positive as well as negative examples of words in the language. To this end, we present a new algorithm to generate the language described by a generalized regular expression with intersection and complement operators. The complement operator allows us to generate both positive and negative example words from a given regular expression. We implement our generator in Haskell and OCaml and show that its performance is more than adequate for testing.Keywords
This publication has 15 references indexed in Scilit:
- Verifying a hash table and its iterators in higher-order separation logicPublished by Association for Computing Machinery (ACM) ,2017
- Efficient enumeration of words in regular languagesTheoretical Computer Science, 2009
- Three New Algorithms for Regular Language EnumerationLecture Notes in Computer Science, 2009
- Enumerating the strings of regular languagesJournal of Functional Programming, 2004
- Partial derivatives of regular expressions and finite automaton constructionsTheoretical Computer Science, 1996
- An efficient algorithm for sequential random samplingACM Transactions on Mathematical Software, 1987
- A sentence generator for testing parsersBIT Numerical Mathematics, 1972
- Programming Techniques: Regular expression search algorithmCommunications of the ACM, 1968
- Derivatives of Regular ExpressionsJournal of the ACM, 1964
- Trie memoryCommunications of the ACM, 1960