Python package

pony_gp is an implementation of Genetic Programming(GP), see e.g. http://geneticprogramming.com. The purpose of pony_gp is to describe how the GP algorithm works. The intended use is for teaching. The aim is to allow the developer to quickly start using and developing. The design is supposed to be simple, self contained and use core python libraries.

Run

Find a equation for the input given an output.

python pony_gp.py --config=configs.ini

Example output:

Reading: fitness_cases.csv headers: ['# x0', 'x1', 'y'] exemplars:121
GP settings:
{'arities': {'+': 2, '*': 2, '/': 2, '-': 2, 'x0': 0, 'x1': 0, '0.0': 0, '1.0': 0}, 'constants': [0.0, 1.0], 'population_size': 4, 'max_depth': 5, 'elite_size': 2, 'generations': 2, 'tournament_size': 3, 'seed': 0, 'crossover_probability': 0.8, 'mutation_probability': 0.2, 'fitness_cases': [[-3.0, 4.0], [-2.0, 3.0], [-4.0, 3.0], [-5.0, -3.0], [5.0, -3.0], [0.0, -1.0], [2.0, 0.0], [-1.0, 0.0], [-2.0, -3.0], [-4.0, -2.0], [-1.0, -2.0], [5.0, 1.0], [-5.0, -1.0], [-1.0, 3.0], [4.0, 5.0], [-2.0, 1.0], [3.0, 1.0], [-3.0, 0.0], [-1.0, -4.0], [0.0, 3.0], [3.0, -3.0], [0.0, 1.0], [5.0, -2.0], [2.0, 1.0], [1.0, 3.0], [4.0, 4.0], [0.0, -4.0], [-1.0, 1.0], [-4.0, 4.0], [-5.0, 4.0], [-2.0, 0.0], [-4.0, 1.0], [-3.0, 3.0], [2.0, 5.0], [-2.0, -4.0], [2.0, -2.0], [0.0, 4.0], [0.0, -5.0], [1.0, 4.0], [5.0, 0.0], [-5.0, 5.0], [4.0, 3.0], [5.0, 2.0], [3.0, 2.0], [2.0, -1.0], [-5.0, 2.0], [-3.0, -2.0], [2.0, 2.0], [4.0, -5.0], [3.0, 4.0], [-1.0, 2.0], [-4.0, -5.0], [-5.0, -4.0], [3.0, 0.0], [-2.0, -5.0], [-3.0, -1.0], [5.0, 5.0], [-2.0, 2.0], [4.0, 1.0], [-5.0, -5.0], [4.0, -2.0], [-3.0, -4.0], [-4.0, -1.0], [1.0, 2.0], [-3.0, 2.0], [-5.0, 3.0], [4.0, 0.0], [3.0, -1.0], [-3.0, 1.0], [-3.0, 5.0], [1.0, -4.0], [2.0, 3.0], [2.0, -3.0], [1.0, -3.0], [5.0, -4.0], [1.0, 5.0], [-2.0, 4.0], [5.0, -5.0], [-5.0, 0.0], [2.0, -5.0], [1.0, -2.0], [1.0, 1.0], [4.0, -4.0], [-1.0, -5.0]], 'test_train_split': 0.7, 'config': 'configs.ini', 'verbose': None, 'symbols': {'arities': {'+': 2, '*': 2, '/': 2, '-': 2, 'x0': 0, 'x1': 0, '0.0': 0, '1.0': 0}, 'terminals': ['x0', 'x1', '0.0', '1.0'], 'functions': ['+', '*', '/', '-']}, 'targets': [25.0, 13.0, 25.0, 34.0, 34.0, 1.0, 4.0, 1.0, 13.0, 20.0, 5.0, 26.0, 26.0, 10.0, 41.0, 5.0, 10.0, 9.0, 17.0, 9.0, 18.0, 1.0, 29.0, 5.0, 10.0, 32.0, 16.0, 2.0, 32.0, 41.0, 4.0, 17.0, 18.0, 29.0, 20.0, 8.0, 16.0, 25.0, 17.0, 25.0, 50.0, 25.0, 29.0, 13.0, 5.0, 29.0, 13.0, 8.0, 41.0, 25.0, 5.0, 41.0, 41.0, 9.0, 29.0, 10.0, 50.0, 8.0, 17.0, 50.0, 20.0, 25.0, 17.0, 5.0, 13.0, 34.0, 16.0, 10.0, 10.0, 34.0, 17.0, 13.0, 13.0, 10.0, 41.0, 26.0, 20.0, 50.0, 25.0, 29.0, 5.0, 2.0, 32.0, 26.0]}
Generation:0 Duration: 0.0016 fit_ave:-572.76+-25.137 size_ave:2.00+-1.000 depth_ave:0.50+-0.500 max_size:3 max_depth:1 max_fit:-530.166667 best_solution:{'genome': ['1.0'], 'fitness': -530.1666666666666}
Generation:1 Duration: 0.0035 fit_ave:-530.17+-0.000 size_ave:1.00+-0.000 depth_ave:0.00+-0.000 max_size:1 max_depth:0 max_fit:-530.166667 best_solution:{'genome': ['1.0'], 'fitness': -530.1666666666666}
Best solution on train data:{'genome': ['1.0'], 'fitness': -530.1666666666666}
Best solution on test data:{'genome': ['1.0'], 'fitness': -487.1081081081081}

If you wish to, change the paramaters from the configs.ini file to your desired paramaters or allow it to remain at its default values.

The input with their respective output is in the file fitness_case.csv. The exemplars are generated from y = x0^2 + x1^2 from range [-5,5]

Requirements

Python >= 3.6

Usage

usage: pony_gp.py [-h] [-p POPULATION_SIZE] [-m MAX_DEPTH] [-e ELITE_SIZE]
                  [-g GENERATIONS] [--ts TOURNAMENT_SIZE] [-s SEED]
                  [--cp CROSSOVER_PROBABILITY] [--mp MUTATION_PROBABILITY]
                  [--fc FITNESS_CASES] [--tts TEST_TRAIN_SPLIT] --config
                  CONFIG [--verbose]

optional arguments:
  -h, --help            show this help message and exit
  -p POPULATION_SIZE, --population_size POPULATION_SIZE
                        Population size is the number of individual solutions
  -m MAX_DEPTH, --max_depth MAX_DEPTH
                        Max depth of tree. Partly determines the search space
                        of the solutions
  -e ELITE_SIZE, --elite_size ELITE_SIZE
                        Elite size is the number of best individual solutions
                        that are preserved between generations
  -g GENERATIONS, --generations GENERATIONS
                        Number of generations. The number of iterations of the
                        search loop.
  --ts TOURNAMENT_SIZE, --tournament_size TOURNAMENT_SIZE
                        Tournament size. The number of individual solutions
                        that are compared when determining which solutions are
                        inserted into the next generation(iteration) of the
                        search loop
  -s SEED, --seed SEED  Random seed. For replication of runs of the EA. The
                        search is stochastic and and replication of the
                        results are guaranteed the random seed
  --cp CROSSOVER_PROBABILITY, --crossover_probability CROSSOVER_PROBABILITY
                        Crossover probability, [0.0,1.0]. The probability of
                        two individual solutions to be varied by the crossover
                        operator
  --mp MUTATION_PROBABILITY, --mutation_probability MUTATION_PROBABILITY
                        Mutation probability, [0.0, 1.0]. The probability of
                        an individual solutions to be varied by the mutation
                        operator
  --fc FITNESS_CASES, --fitness_cases FITNESS_CASES
                        Fitness cases filename. The exemplars of input and the
                        corresponding out put used to train and test
                        individual solutions
  --tts TEST_TRAIN_SPLIT, --test_train_split TEST_TRAIN_SPLIT
                        Test-train data split, [0.0,1.0]. The ratio of fitness
                        cases used for training individual solutions
  --config CONFIG       Config file in Python INI format. Overridden by CLI-
                        arguments.
  --verbose, -v         Verbose printing

Output

Runs for generations

Individual Statistics

Initial tree nr: number nodes:number of nodes in tree max_depth:max tree depth tree: symbols in tree

Generation Statistics

Generation:generation number, duration:evaluation time, fit_ave:average fitness of the generation, size_ave:average number of nodes in the genearation amongst all solutions, depth_ave:average max_tree depth, max_size: maximum number of nodes, max_depth: maximum depth, max_fit: maximum fitnessm best_solution:{‘genome’: individual formula/tree, ‘fitness’`: fitness of genome}

Best Solution Statistics

Best solution on train data:{'genome': individual formula/tree, 'fitness': fitness of genome}
Best solution on test data:{'genome':individual formula/tree, 'fitness':fitness of genome}

Test

Run

python test_pony_gp.py