**Technical Approaches 2013**

ALFA explores scalable and efficient ways to combine different techniques to solve hard optimization and design problems in domains of high complexity. The unrelenting growth of complexity, data generation and tighter performance deadlines in these domains has given rise to challenges that are unyielding to statistical or kernel-based machine learning techniques, convex optimization or evolutionary algorithms alone or with serial implementations. These techniques need to be inter-operable and efficiently scalable so that they are effective at solving big data applications on the cloud.

**Contents**

- Scalable Machine Learning Frameworks

- Cloud-based Genetic Programming for Scalable Machine Learning

- Cloud Scaling Evolutionary Algorithms (GP, PSO, CMA-ES, GA)

#### Scalable Machine Learning Frameworks

We experiment with different distributed architectures for supporting large scale machine learning. Our LAMPS and DCAP modules within SCALE support distribution of serial machine learning investigations on a cloud via a server-client architecture. FlexGP is completely decentralized and thus has only clients. The clients and their learning algorithm tasks cascade out on cloud instances and communicate via a gossip-based P2P protocol. ECStar runs on a volunteer compute network. See our publications for more information.

#### Cloud-based Genetic Programming for Scalable Machine Learning

We regard genetic programming (GP) as a means of non-linear dimensionality reduction. GP dimension-reducers are valuable because they are relatively straight forward arithmetic functions of input variables. This makes them readable. Finding the reducers is called **symbolic regression**. It's *regression* because training examples are executed through a reducer and GP searches for a reducer of improved accuracy with an optimization objective to minimize the loss between prediction and actual responses of the training examples. GP has a way of mapping this loss into a bias for its multi-point, stochastic search. It's called *symbolic* to convey that GP is relatively flexible to the model structure (or representation) space it admits for possible reducers because you can select the arithmetic operations (and even provide higher order symbolic operations) GP uses. GP's variational methods during its search allow it to identify non-linear or linear relationships, as may be indicated by the exemplars. This is another important way in which GP is valuable: it seeks the best structure while not making any assumption about it.

##### Symbol Regression by Genetic Programming

We employ, devise and advance methods to use the GP Symbolic Regression for Machine Learning, e.g. classification and regression, while always thinking about how to design GP to execute in a computational and data scalable manner. Our FlexGP project has a system layer enabling it to run robustly and elastically, in a completely decentralized manner on the cloud. FlexGP's machine learning layer, circa 2013, has an instantiation of Genetic Programming Symbolic Regression.

#### Cloud Scaling Evolutionary Algorithms (GP, PSO, CMA-ES, GA)

ALFA **scales** and **enhances** evolutionary algorithms. See our publications on FlexGP, ECStar and wind turbine layout optimization (where we distribute CMA-ES or use a generative approach) for more details.Evolutionary algorithms include: Genetic Algorithms (GAs), Genetic Programming (GP), Evolutionary Strategies (ES), Particle Swarm Optimization (PSO), Ant Colony Optimization (ACO) and Bayesian Optimization Algorithm (BOA). Evolutionary algorithms are frequently (and arguably more aptly) also known as bio-inspired algorithms when PSO, ACO and BOA are considered.

Genetic programming, genetic algorithms and evolutionary strategies draw inspiration from the fundamental principles of population adaptation through inheritance, selection and genetic variation in neo-Darwinian evolution. PSO and ACO draw inspiration from collective natural behavior of swarms and ants. BOA is a pragmatic re-casting of a GA which substitutes evolutionary variation with statistical inference.

GAs, BOA and ACO typically address problems with discrete variables. ES and PSO address continuous optimization. GP is rather unique. It automatically identifies "executable" programs (more usually functions or small program modules). This allows it to be used to solve problems from a wide spectrum: automated discovery, knowledge mining, non-linear modeling without pre-supposed model structured (aka symbolic regression), automatic heuristic derivation, code conversion, code testing, etc.