A brief guide to the FCUBE project

This is a non comprehensive guide to the FCUBE project code. If you are interested in knowing more about the project or if you want to continue its development, please contact us by email at iarnaldo@mit.edu

FCUBE's web application

We provide a tomcat application hosted on the FCUBE server that allows to interact with FCUBE via a web interface. The web application wraps FCUBE's functionality, and is meant to be example of what can be done to facilitate the use of the FCUBE platform. However, the current release does not exploit all the features of FCUBE, so we encourage users to build their own applications according to their own needs.

The web application is also meant to be an example on how the FCUBE java executable needs to be used and how the project environment (folder structure) needs to be set. Please check the folder /usr/share/tomcat/ on a FCUBE server instance to check the required folder structure.

FCUBE's server and client code

The code of the FCUBE project is composed of:

A single java executable with a command line interface that, on the server side, allows to:

generate data splits (check instructions here)
deploy FCUBE learning algorithm on Amazon EC2
retrieve the learned models
filter and fuse the learned models

The same executable is used, on the FCUBE instance side, to sample the training data and the parameters of the learners (factoring).

The executables of the stand-alone learning algorithms integrated in FCUBE

Below, we describe in detail the command-line interfaces of the FCUBE server, that is, the commands necessary to deploy learners, and retrieve and fuse models. This functionality has only been tested in Debian environments and relies on the AWS CLI tools, that need to be installed and configured with your own access and secret keys.

Step 0: learning strategy

The learning strategy is specified in a file sent to all the FCUBE instances deployed within a FCUBE run.

There are two built-in parameters for data management:

data_sample_rate: percentage of data sampled by each node
variable_sample_rate: percentage of variables sampled by each node

This file can also be used to specify ranges of values for learner-specific parameters such as learning rate, crossover rate etc. For detailed information on how to specify parameter ranges please check this documentation. We show and comment an example for our GPFunction learner (a learner inspired on Evolutionary Computation):


fixed data fcube/higgs/train
fixed threads 2

data_sample_rate float discreteSet default ( 1 ) { 0.1 ; 0.2 }
variable_sample_rate float discreteSet default ( 1 ) { 0.25 ; 0.75 ; 1 }
false_negative_weight float range default ( 0.5 ) [ 0.4 : 0.05 : 0.6 ]
xover_op string discreteSet default ( SPUCrossover ) { SPUCrossover ; KozaCrossover }
pop_size int discreteSet default ( 1000 ) { 1000 ; 1500 ; 2000 }

factoredParams {data_sample_rate, variable_sample_rate, xover_op, pop_size}

The path to the data and the number of threads are declared as fixed parameters. The built-in parameters for data management (data_sample_rate and variable_sample_rate) as well as the crossover operator and population size are all assigned a discrete set of choices. The learner-specific parameter indicating false negative weight is assigned a range of possible values. Finally, the instruction in the last line indicates the parameters that will be factored (stochastically selecting a value from the possible choices). Only the false negative weight will be set to its default value.

Step 1: Deploy a learner of the repository

Command:

$ java -jar fcube.jar -deploy gpfunction -n 40 -minutes 60 -key_name nachokey -options ruletree_factoring.options -flavor ec2_instance_type

where:

-deploy: specify learner name
-n: number of nodes
-key_name: name of EC2/OpenStack keypair
-options: path to the parameter options file
-flavor: flavor (instance type) of the FCUBE workers

Step 2: retrieve models

Command:

$ java -jar fcube.jar -retrieve mostAccurate.txt -keypairPath certs/nachokey.pem -learners gpfunction

where:

-retrieve: model file
-keypairPath: path to EC2 keypair
-learners: name of the learner(s) deployed in the run
-options: path to the parameter options file

Step 3: filter and fuse models

Command:

$ java -jar fcube.jar -filter-fuse higgs-alfa_9.csv higgs-alfa_10.csv -model mostAccurate.txt -fnweight 0.47 -learners gpfunction

where:

-filter-fuse: path to fusion training set and path to test set
-model: model file
-fnweight: false negative error weight
-learners: name of the learner(s) deployed in the run