EFS is a regression algorithm that outputs accurate, readable, nonlinear models.
Given a dataset with response variable **Y** and a vector of variables **X**, EFS generates a basis expansion model of the form:

,
such that is minimized.

EFS relies on the assumption that the functions
introduced in the expression above can be optimized by evolutionary computation.
We show below a model obtained for the cooling load energy efficiency dataset available at
the UCI repository. The dataset is composed of 8 features (X1,X2,...,X8) and a continuous dependent variable:

```
33.507
- 19.201 * X1
- 0.029 * X3
+ 2.372 * X7
- 0.142 * X8
- 5.513 * (div(cube(cos X4))X1)
- 9.932 * (cos(cos(cube(sin X3))))
+ 0.023 * (square X6)
- 0.238 * (sin(square X6))
- 2.967 * (-(sin(cos X3))X5)
+ 0.006 * (* X3(* X7 X5))
+ 0.085 * (cube(exp(sin X2)))
+ 2.703 * (sqrt(sqrt(sqrt(sqrt X8))))
```

# Publications

For more details of EFS, the reader is referred to
this paper:

*Arnaldo, I.; Veeramachaneni, K.; O'Reilly, UM: Building predictive models via feature synthesis.
Proceedings of the 2015 conference on Genetic and evolutionary computation (GECCO 2015). Pages 983-990, 2015.*

# Tutorial

We provide a quick tutorial to get started with EFS.

## Step 1: data format

Data must be provided in csv format where each line corresponds to an exemplar and the target values are placed
in the last column. Note that any additional line or column containing labels or nominal values needs to be removed.

## Step 2: download the efs.jar file from here

## Step 3: model the data

From the terminal:

```
$ java -jar efs.jar -train path_to_train_data -minutes num_minutes
```

Note that if the *-minutes* argument is set to *0*, the LASSO linear fit is returned. The model will be stored in the file *model.txt*.

## Step 4: test the model

EFS provides functionality to obtain the mean squared error (MSE) and mean absolute error (MAE) of the generated model. From the terminal:

```
$ java -jar efs.jar -test path_to_test_data path_to_model
```

# Credits

EFS uses the library LASSO4j, an efficient implementation of LASSO by Y. Ganjisaffar. This implementation is based on the pathwise coordinate descent
method introduced in this paper:

*J. H. Friedman, T. Hastie, and R. Tibshirani. Regularization paths for generalized linear models via
coordinate descent. Journal of Statistical Software, 33(1):1–22, 2 2010.*

#
Publications

#
Authors and Contributors

Developed by of the organization.
FlexGP is a project of the Any-Scale Learning For All (ALFA) group at MIT.