Evolutionary Feature Synthesis

An application to regression

View the Project on GitHub ignacioarnaldo/efs

EFS is a regression algorithm that outputs accurate, readable, nonlinear models. Given a dataset with response variable Y and a vector of variables X, EFS generates a basis expansion model of the form:

,   such that     is minimized.

EFS relies on the assumption that the functions introduced in the expression above can be optimized by evolutionary computation. We show below a model obtained for the cooling load energy efficiency dataset available at the UCI repository. The dataset is composed of 8 features (X1,X2,...,X8) and a continuous dependent variable:

-  19.201 * X1
-    0.029 * X3
+   2.372 * X7
-    0.142 * X8
-    5.513 * (div(cube(cos X4))X1)
-    9.932 * (cos(cos(cube(sin X3))))
+   0.023 * (square X6)
-    0.238 * (sin(square X6))
-    2.967 * (-(sin(cos X3))X5)
+   0.006 * (* X3(* X7 X5))
+   0.085 * (cube(exp(sin X2)))
+   2.703 * (sqrt(sqrt(sqrt(sqrt X8))))


For more details of EFS, the reader is referred to this paper:

Arnaldo, I.; Veeramachaneni, K.; O'Reilly, UM: Building predictive models via feature synthesis. Proceedings of the 2015 conference on Genetic and evolutionary computation (GECCO 2015). Pages 983-990, 2015.


We provide a quick tutorial to get started with EFS.

Step 1: data format

Data must be provided in csv format where each line corresponds to an exemplar and the target values are placed in the last column. Note that any additional line or column containing labels or nominal values needs to be removed.

Step 2: download the efs.jar file from here

Step 3: model the data

From the terminal:

$ java -jar efs.jar -train path_to_train_data -minutes num_minutes

Note that if the -minutes argument is set to 0, the LASSO linear fit is returned. The model will be stored in the file model.txt.

Step 4: test the model

EFS provides functionality to obtain the mean squared error (MSE) and mean absolute error (MAE) of the generated model. From the terminal:

$ java -jar efs.jar -test path_to_test_data path_to_model


EFS uses the library LASSO4j, an efficient implementation of LASSO by Y. Ganjisaffar. This implementation is based on the pathwise coordinate descent method introduced in this paper:

J. H. Friedman, T. Hastie, and R. Tibshirani. Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33(1):1–22, 2 2010.


Authors and Contributors

Developed by ignacioarnaldo of the flexgp organization. FlexGP is a project of the Any-Scale Learning For All (ALFA) group at MIT.