How to run FlexGP

WARNING: DO NOT FORGET TO TERMINATE THE INSTANCES AT THE END OF THE FLEXGP RUN

Data format

Data must be provided in csv format where each line corresponds to an exemplar and the target values are placed in the last column. Any additional line or column containing labels or nominal values needs to be removed.

In this tutorial, we employ the White Wine Quality dataset is available at the UCI Machine Learning repository website. The formatted data used in this example can be downloaded from here: train data and test data.

Step 0: obtain an Amazon Web Service account and install the EC2 API tools

Signup for a AWS account from aws.amazon.com

ALFA

FlexGP requires the installation of the EC2 API tools:

$ sudo apt-get install ec2-api-tools ec2-ami-tools

Step 1: download the FlexGP software and set the credentials

Create the folder from where you will run FlexGP:

$ cd mkdir flexgp-run
$ cd flexgp-run

Download the launch scripts, the certificates (certs) folder, the executables FlexGP.jar and mrgp-flexgp.jar, and the data folder from the github repository. The folder should look as below:

$ ls -R
.:
certs  FlexGP.jar mrgp-flexgp.jar scripts

./certs:
exportCredentials.sh

./data:
winequality-white.properties  winequality-white-test.csv  winequality-white-train.csv

./scripts:
boot_strap.sh  evogpj_config.py  evogpj_funcs.txt  evogpj_pnorms.txt  gen_terms.txt  msd_terms.txt  part_handler.py  select_data.py  split_files.py  start.sh  user_vars.sh

Edit certs/exportCredentials.sh script in the certs folder with your Amazon credentials:

$ gedit certs/exportCredentials.sh
export AWS_ACCESS_KEY=
export AWS_SECRET_KEY=

Copy your keypair to the certs folder:

$ cp path_to_your_keypair certs

Step 2: set up the FlexGP run

Edit the following variables in scripts/user_vars.sh:

ROOT: absolute path to flexgp-run
CERT: name of your certificate (minus the .pem suffix). It must match your AWS IAM user name.
TYPE: flavor of the cloud instances. Defaults to the basic t1.micro.

$ gedit scripts/user_vars.sh
ROOT=
CERT=
TYPE=t1.micro
AMI=ami-e0016c88

The AMI used by default is a public Amazon image that contains all the packages necessary to run FlexGP.

Edit the USER_VARS_SCRIPT_PATH variable in scripts/boot_strap.sh. It must contain the absolute path to the user_vars.sh script:

$ gedit scripts/boot_strap.sh
USER_VARS_SCRIPT_PATH="path_to_user_vars.sh"

Create a properties file to specify algorithmic parameters such as crossover and mutation rate, population size, maximum tree size etc. A working example for the white wine quality dataset can be found here. In particular the property external_threads sets the number of threads used to speedup MRGP in each cloud node.

external_threads = 4

Step 3: run FlexGP

To start FlexGP, run the script boot_strap.sh with the following arguments:

-n: number of nodes to run
-p: path to properties file
-d: path to training data
-s: fraction of the training data used for learning at each node

The example below will run FlexGP with 12 nodes, on the white wine quality dataset, each node selecting 50% of the training data:

$ scripts/boot_strap.sh -n 12 -p data/winequality-white.properties -d data/winequality-white-train.csv -s .5 &> flexgp.log &

For a complete description of FlexGP run options, please run:

$ scripts/boot_strap.sh

After running the script, you can verify from the AWS EC2 console that 12 intances were started:

ALFA

Step 4: retrieve the models

You can retrieve the models generated at the FlexGP nodes at any moment of the run. First edit the USER_VARS_SCRIPT_PATH variable in scripts/retrieve_logs.sh. It must contain the absolute path to the user_vars.sh script:

$ gedit scripts/retrieve_logs.sh
USER_VARS_SCRIPT_PATH="path_to_user_vars.sh"

Then run the script:

$ scripts/retrieve_logs.sh

Verify that the models and log files have been successfully downloaded:

$ ls -R models/
models/:
54.144.201.76  54.144.222.147  54.144.238.10  54.144.238.4  54.145.201.59  54.161.222.123  54.167.131.67  54.197.136.184  54.204.148.60  54.205.50.99  54.205.85.173  54.82.68.178

models/54.144.201.76:
bestModelGeneration.txt  evolve.log  init.log  problem.properties

...

models/54.144.222.147:
bestModelGeneration.txt  evolve.log  init.log  problem.properties

Step 5: model fusion

First, we select the 10 models with smallest error on the training data and store them in the file finalModels.txt. The command is as follows:

$ java -jar mrgp-flexgp.jar -getFinalModels path_to_models secondsThreshold path_to_fusion_data

In the example below, we filter the models obtained within the first ten minutes (600 seconds) of the run:

$ java -cp mrgp-flexgp.jar main.SRLearnerMenuManager -getFinalModels models 600 data/winequality-white-train.csv

Second, we fuse the extracted models. The fused model is stored in fusedModel.txt:

$ java -jar mrgp-flexgp.jar -fusedStats path_to_final_models path_to_fusion_data path_to_testing_data

Following our example, we run:

$ java -cp mrgp-flexgp.jar main.SRLearnerMenuManager -fusedStats finalModels.txt data/winequality-white-train.csv data/winequality-white-test.csv

FUSING MODELS:

TESTING FUSED MODEL:
MSE fused Model: 0.545935884870968
MAE fused Model: 0.5798478607270816

Step 6: terminate the instances

Terminate the instances from the AWS EC2 console. In our example, 12 intances need to be terminated.