WARNING: DO NOT FORGET TO TERMINATE THE INSTANCES AT THE END OF THE FLEXGP RUN
Data must be provided in csv format where each line corresponds to an exemplar and the target values are placed in the last column. Any additional line or column containing labels or nominal values needs to be removed.
In this tutorial, we employ the White Wine Quality dataset is available at the UCI Machine Learning repository website. The formatted data used in this example can be downloaded from here: train data and test data.
Signup for a AWS account from aws.amazon.com
FlexGP requires the installation of the EC2 API tools:
$ sudo apt-get install ec2-api-tools ec2-ami-tools
Create the folder from where you will run FlexGP:
$ cd mkdir flexgp-run $ cd flexgp-run
Download the launch scripts, the certificates (certs) folder, the executables FlexGP.jar and mrgp-flexgp.jar, and the data folder from the github repository. The folder should look as below:
$ ls -R .: certs FlexGP.jar mrgp-flexgp.jar scripts ./certs: exportCredentials.sh ./data: winequality-white.properties winequality-white-test.csv winequality-white-train.csv ./scripts: boot_strap.sh evogpj_config.py evogpj_funcs.txt evogpj_pnorms.txt gen_terms.txt msd_terms.txt part_handler.py select_data.py split_files.py start.sh user_vars.sh
Edit certs/exportCredentials.sh script in the certs folder with your Amazon credentials:
$ gedit certs/exportCredentials.sh export AWS_ACCESS_KEY= export AWS_SECRET_KEY=
Copy your keypair to the certs folder:
$ cp path_to_your_keypair certs
Edit the following variables in scripts/user_vars.sh:
$ gedit scripts/user_vars.sh ROOT= CERT= TYPE=t1.micro AMI=ami-e0016c88
The AMI used by default is a public Amazon image that contains all the packages necessary to run FlexGP.
Edit the USER_VARS_SCRIPT_PATH variable in scripts/boot_strap.sh. It must contain the absolute path to the user_vars.sh script:
$ gedit scripts/boot_strap.sh USER_VARS_SCRIPT_PATH="path_to_user_vars.sh"
Create a properties file to specify algorithmic parameters such as crossover and mutation rate, population size, maximum tree size etc. A working example for the white wine quality dataset can be found here. In particular the property external_threads sets the number of threads used to speedup MRGP in each cloud node.
external_threads = 4
To start FlexGP, run the script boot_strap.sh with the following arguments:
The example below will run FlexGP with 12 nodes, on the white wine quality dataset, each node selecting 50% of the training data:
$ scripts/boot_strap.sh -n 12 -p data/winequality-white.properties -d data/winequality-white-train.csv -s .5 &> flexgp.log &
For a complete description of FlexGP run options, please run:
After running the script, you can verify from the AWS EC2 console that 12 intances were started:
You can retrieve the models generated at the FlexGP nodes at any moment of the run. First edit the USER_VARS_SCRIPT_PATH variable in scripts/retrieve_logs.sh. It must contain the absolute path to the user_vars.sh script:
$ gedit scripts/retrieve_logs.sh USER_VARS_SCRIPT_PATH="path_to_user_vars.sh"
Then run the script:
Verify that the models and log files have been successfully downloaded:
$ ls -R models/ models/: 220.127.116.11 18.104.22.168 22.214.171.124 126.96.36.199 188.8.131.52 184.108.40.206 220.127.116.11 18.104.22.168 22.214.171.124 126.96.36.199 188.8.131.52 184.108.40.206 models/220.127.116.11: bestModelGeneration.txt evolve.log init.log problem.properties ... models/18.104.22.168: bestModelGeneration.txt evolve.log init.log problem.properties
First, we select the 10 models with smallest error on the training data and store them in the file finalModels.txt. The command is as follows:
$ java -jar mrgp-flexgp.jar -getFinalModels path_to_models secondsThreshold path_to_fusion_data
In the example below, we filter the models obtained within the first ten minutes (600 seconds) of the run:
$ java -cp mrgp-flexgp.jar main.SRLearnerMenuManager -getFinalModels models 600 data/winequality-white-train.csv
Second, we fuse the extracted models. The fused model is stored in fusedModel.txt:
$ java -jar mrgp-flexgp.jar -fusedStats path_to_final_models path_to_fusion_data path_to_testing_data
Following our example, we run:
$ java -cp mrgp-flexgp.jar main.SRLearnerMenuManager -fusedStats finalModels.txt data/winequality-white-train.csv data/winequality-white-test.csv FUSING MODELS: TESTING FUSED MODEL: MSE fused Model: 0.545935884870968 MAE fused Model: 0.5798478607270816