MRGP learner

MRGP is a hybrid method that combines tree-based Genetic Programming with LASSO. MRGP differs from conventional GP primarily in eliminating direct comparison of the final program output against the target variable, y. Instead, we tune in linear combination all subexpressions of a program with respect to the target output y. Then, we compare y to the output of the regression model.

We resort to LASSO4j, an efficient implementation of LASSO by Y. Ganjisaffar. This implementation is based on the pathwise coordinate descent method introduced in this paper:

*J. H. Friedman, T. Hastie, and R. Tibshirani. Regularization paths for generalized linear models via
coordinate descent. Journal of Statistical Software, 33(1):1–22, 2 2010.*

For more details of MRGP, the reader is referred to this paper:

*Arnaldo, I.; Krawiec, K.; O'Reilly, UM: Multiple regression genetic programming.
Proceedings of the 2014 conference on Genetic and evolutionary computation (GECCO 2014). Pages 879-886, 2014.*

Please note that, in the referred paper, we employed a different implementation. The updated release version was used in this other publication:

*Veeramachaneni, K; Arnaldo, I; Derby, O; O’Reilly, UM: FlexGP: Cloud-Based Ensemble Learning with Genetic Programming for Large Regression Problems.
Journal of Grid Computing. November, 2014.*

Current release provides functionality both to perform symbolic regression on numerical datasets and to test the retrieved models. In this page we provide a quick tutorial on how to get started with the MRGP.

Note: this release is only supported for Linux Debian platforms.

Data must be provided in csv format where each line corresponds to an exemplar and the target values are placed in the last column. Note that any additional line or column containing labels or nominal values needs to be removed.

In the current release, it is only possible to learn the MRGP model directly from your terminal.

```
$ java -jar mrgp.jar -train path_to_train_data -minutes 10
```

At the end of the run a set of files are generated:

**pareto.txt**: models forming the Pareto Front (accuracy vs model complexity).**leastComplex.txt**: least complex model of the Pareto Front.**mostAccurate.txt**: most accurate model of the Pareto Front.**knee.txt**: model at the knee of the Pareto Front.**bestModelGeneration.txt**: most accurate model per generation.

The MRGP learner provides functionality to obtain the MSE and MAE of the retrieved models once the training is finished. To automatically test all the generated classifiers, type:

```
$ cd run_folder
$ java -jar mrgp.jar -test path_to_test_data
```

To modify the default parameters of the MRGP learner, it is necessary to append the flag
*-properties* followed by the path of the properties file containing the desired parameters:

```
$ java -jar mrgp.jar -train path_to_train_data -minutes 10 -properties path_to_props_file
```

The following properties file example specifies the number of threads, the population size, the features that will be
considered during the learning process, the functions employed to generate GP trees, the tournament selection size,
and the mutation rate.
```
external_threads = 8
pop_size = 100
terminal_set = X1 X3 X4
function_set = + - * mydivide exp sin cos
tourney_size = 10
mutation_rate = 0.1
```

To check reports visit our blog: FlexGP Blog

FlexGP is a project of the Any-Scale Learning For All (ALFA) group at MIT.