Multiple Regression Genetic Programming

MRGP is a hybrid method that combines tree-based Genetic Programming with LASSO. MRGP differs from conventional GP primarily in eliminating direct comparison of the final program output against the target variable, y. Instead, we tune in linear combination all subexpressions of a program with respect to the target output y. Then, we compare y to the output of the regression model.

Multi-level parallelism provided by FlexGP

Lasso implementation

We resort to LASSO4j, an efficient implementation of LASSO by Y. Ganjisaffar. This implementation is based on the pathwise coordinate descent method introduced in this paper:

J. H. Friedman, T. Hastie, and R. Tibshirani. Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33(1):1–22, 2 2010.

Publications

For more details of MRGP, the reader is referred to this paper:

Arnaldo, I.; Krawiec, K.; O'Reilly, UM: Multiple regression genetic programming. Proceedings of the 2014 conference on Genetic and evolutionary computation (GECCO 2014). Pages 879-886, 2014.

Please note that, in the referred paper, we employed a different implementation. The updated release version was used in this other publication:

Veeramachaneni, K; Arnaldo, I; Derby, O; O’Reilly, UM: FlexGP: Cloud-Based Ensemble Learning with Genetic Programming for Large Regression Problems. Journal of Grid Computing. November, 2014.

Tutorial

Current release provides functionality both to perform symbolic regression on numerical datasets and to test the retrieved models. In this page we provide a quick tutorial on how to get started with the MRGP.

Note: this release is only supported for Linux Debian platforms.

Step 1: Data format

Data must be provided in csv format where each line corresponds to an exemplar and the target values are placed in the last column. Note that any additional line or column containing labels or nominal values needs to be removed.

Step 2: Download the mrgp.jar file from here

Step 3: model the data

In the current release, it is only possible to learn the MRGP model directly from your terminal.

Running MRGP from the terminal

Model the data

$ java -jar mrgp.jar -train path_to_train_data -minutes 10

At the end of the run a set of files are generated:

pareto.txt: models forming the Pareto Front (accuracy vs model complexity).
leastComplex.txt: least complex model of the Pareto Front.
mostAccurate.txt: most accurate model of the Pareto Front.
knee.txt: model at the knee of the Pareto Front.
bestModelGeneration.txt: most accurate model per generation.

Test the models

The MRGP learner provides functionality to obtain the MSE and MAE of the retrieved models once the training is finished. To automatically test all the generated classifiers, type:

$ cd run_folder
$ java -jar mrgp.jar -test path_to_test_data

Bells and whistles

Change the default parameters

To modify the default parameters of the MRGP learner, it is necessary to append the flag -properties followed by the path of the properties file containing the desired parameters:

$ java -jar mrgp.jar -train path_to_train_data -minutes 10 -properties path_to_props_file

The following properties file example specifies the number of threads, the population size, the features that will be considered during the learning process, the functions employed to generate GP trees, the tournament selection size, and the mutation rate.

external_threads = 8
pop_size = 100
terminal_set = X1 X3 X4
function_set = + - * mydivide exp sin cos
tourney_size = 10
mutation_rate = 0.1

Examples

To check reports visit our blog: FlexGP Blog

Authors and Contributors

FlexGP is a project of the Any-Scale Learning For All (ALFA) group at MIT.