Past Projects

Artificial Adversarial Intelligence

RIVALS: Evolutionary Adversarial Dynamics in Cyber Networks


Our RIVALS framework exploits competitive coevolutionary algorithms to pit adversaries against one another and direct the resulting arms race. We evolve P2P network defense strategies against DDOS attacks. Our goal is to identify robust network configurations that support mission completion despite an ongoing DDOS attack.
Publication   Blog

AVAIL: Isolation vs Contagion
Isolation is an effective defense because borders prevent flow between compartmentalized units of a system or network. In networks, this is termed segmentation. In systems requiring complex access policies, software defined perimeter (SDP) underpin the combinations in which people, resources and privileges are controlled. In the network context, we investigate the placement of defensive measures in networks using isolation to minimize malware contagion and maximize device availability. In the SDP context we investigate topologies of access control flows.

DARK HORSE: Deception vs Reconnaissance
Once an adversary has compromised a network endpoint, they perform internal network reconnaissance to execute a plan of attack. One way to protect against internal reconnaissance is by camouflage the network to delay the attacker. We exploit the flexibility of software defined networks to alter (without function loss) the network view, as well as place decoys (honeypots) on the network to trap and slow down reconnaissance.


Machine Learning in Adversarial Settings

Block Box Minimax

RECKLESS addresses black-box (gradient-free) minimax problems in a coevolutionary setup. In our first contribution, with guarantees from Danskin’s theorem, we employ an Evolutionary Strategy as a stochastic estimator for the descent direction. We validate our proposed estimator with a collection of black-box minimax problems. Its performance is comparable with its coevolutionary counterparts and favorable for high-dimensional problems.





GIGABEATS: Data Science for Medical Sensor Data

Our projects tap into machine learning to interpret and exploit repositories holding waveform data e.g. arterial blood pressure, ECG and EEG.
Website   Publication   Publication

A scalable machine learning framework that allows researchers to build predictive models through physiological waveform mining and analysis. One of it's strengths is it's flexibility, allowing for users to define features to track, conditions to detect, and filters to apply with short user-defined scripts. BeatDB is integrated within Amazon Web Services (AWS), allowing for users to run computations in parallel in the cloud. Because of these features, BeatDB allows researchers and scientists to cut down on time needed for prediction studies and data processing without sacrificing any of the parameterization and specificity to the data possible with custom (and often single-use) scripts.

Trajectories Like Mine
In data-driven precision medicine, fast yet accurate prediction of acute and critical events based on sensors' time series is of crucial importance especially in intensive care units. In such setting, promptness is demanded, so if a task can be completed dramatically faster, it is often acceptable to tolerate a slight decrease in accuracy.  To address these challenges, we are developing techniques for scalable patient record retrieval and event prediction based on locality-sensitive hashing (LSH). It has a significantly faster querying time, while maintaining the accuracy in a competitive range in comparison to the linear, exhaustive k-nearest neighbor search. The prediction based on LSH is essentially a two-step process of first quickly retrieving "patients like me", the approximate nearest neighbors of our query of interest by LSH, and second, extrapolating the information of nearest neighbors for prediction.

An integrative machine learning and econometric framework that allows researchers to build prediction and causation models drawing on observational data. OSaaS seeks to aid decision making in contexts where the ground truth is not well known and uncertainty exists around the effectiveness of one's interventions. By leveraging big data in different domains (health, transportation, commercial), OSaaS seeks to make explicit and transparent the modeling choices made by researchers that ultimately inform key decisions (what patient receives what treatment and when, transportation infrastructure and policy decisions, insurance pricing).

Sponsored by  Research North America



Big Data

Energy Efficient Data Centers

Scalable EDA-GP
We are interested in genetic programming algorithm design which executes heritable genetic variation at the population level rather than individual level. Such algorithms are called Estimation of distribution GP. The key idea is to express the genetic dependencies of a fit subset of the current generation as a multivariate distribution which can be resampled in quest of better solutions. State of art EDA-GP algorithms use a prototype tree which limits their scalability. We are in the process of investigating how local patterns can replace and improve upon a prototype tree.

A secondary goal is to investigate how taking a multivariate distribution view of GP offers insight into its evolutionary dynamics.

Website    Publication


Human Data Interaction

The goal of the "Human Data Interaction" project is to develop methods that are the intersection of data science, machine learning, and large scale interactive systems, to answer a rather very simple question: Why does it take a long time to process, analyze and derive insights from the data?

We are developing basic building blocks required for learning a model, called MLBlocks. These include multiple parametrized ways to represent data, defining and applying constructs that compare and contrast an entity in the data (student, patient, car etc.)- against self or others, leading up to forming variables, selecting models, and ultimately building and analyzing predictive models.

Feature Factory
Collaborative, interactive, crowd sourced feature discovery solution.

Deep Mining
Tuning the machine learning pipeline.

A database like interface for computer vision pipeline.

The Data Science Machine
The Data Science Machine is an end-to-end software system that is able to automatically develop predictive models from relational data.





Wind Energy

Machine Learning for Aiding Wind Farm Development

We are developing a variety of machine learning approaches to provide powerful scalable statistical tools to the wind energy community. These include a variety of modeling methods like Bayesian networks, copula based dependence modeling, Gaussian processes, and a variety of optimization approaches based on sampling, and generative approaches. We closely work and collaborate with AWS Truepower. We have identified three different areas where machine learning and information technology can help improve wind systems performance.

Wind Resource Assessment
Sparse data inadequately expresses comprehensive wind speed and directional properties of the site. Sparsity exacerbates the measure-correlate-prediction problem because, in addition to possible complex correlation between met towers and the site, there is missing data. To examine this problem, we collected wind speed and direction data from 14 airports in reasonable proximity of each other including Logan International Airport. We adaptively binned the wind data by using a Particle Swarm Optimization algorithm that selected varying direction intervals and bin sizes.

Wind Farm Turbine Optimization
We are currently developing an accurate, efficient, and parallelizable, optimization algorithm for the layout of hundreds, then 1000, turbines. Efficient and accurate optimization is challenged by large numbers of turbines, large farm areas, constraints on feasible sitings and expensive wake models that scale nonlinearly with each additional turbine. The algorithm could be incorporated as an "optimizer" component choice in a layout tool such as OpenWind.

Wind Farm Power Cable Route Design
We present some initial work towards automating the design of cabling layouts for large-scale wind farms. We build a problem model that incorporates the relevant real-world con- straints, and then decompose the problem into three layers: the circuit, the substation, and the full farm. In the case when there is a single cable type, the circuit and substation layers map to graph problems (the un- capacitated and capacitated minimum spanning tree).




Systems and Machine Learning

Autotuning in PetaBricks, ZetaBricks, D-TEC

We provide machine learning expertise to our collaborators in the PetaBricks project in the areas of onine, sideline, and offline autotuning. PetaBricks research is led by Prof. S. Amarasinghe, who leads the Commit group. Collaboratively we are developing light-weight machine learning algorithms which are capable of running PetaBricks programs fast and efficiently (with respect to power or varying required accuracy) on exascale architectures.


Data Mining Virtual Machines for Resource Optimization

Consolidation with virtual machines allows cost savings and offers energy savings and efficiency. However, the challenge is to maximize consolidation while honoring service commitments (often called SLAs or Service Level Agreements). Our technical approaches address this challenge in ways that exploit modeling, forecasting with genetic programming and reinforcement-based machine learning algorithms.

Resource Allocation in Virtual Machines for Energy Effieince Data Centers and Clouds
The aim of the project is to create a framework able to improve energy efficiency in Data Centers while respecting constraints related to quality of service. The proposed approach consists of four phases: monitoring, evaluation, adaptation, and updating.

Application Counter Intelligence
We worked on predicting impending overload of a virtual machine's resources in order to provide sufficient notice to mitigate the situation in a timely way by reallocating resource types which respond at appropriate timescales. This project was supported by 


Meta-Optimization: Improving Compilation with Genetic Programming

We used genetic programming to automatically generate application specific and general compiler priority functions. These functions are known as the "Achilles Heel" because typically compiler designers develop them by hand and test them on problem instances that rapidly drift out of date. Our priority functions worked in the context of hyperblock scheduling and register allocation.





Meta-Optimization: Improving Compilation with Genetic Programming

We used genetic programming to automatically generate application specific and general compiler priority functions. These functions are known as the "Achilles Heel" because typically compiler designers develop them by hand and test them on problem instances that rapidly drift out of date. Our priority functions worked in the context of hyperblock scheduling and register allocation. A powerpoint from a PLDI presentation is available as a pdf.


Multi-Objective Optimization Algorithm Design

We are investigating how design knowledge can easily be elicited from an expert designer to be exploited by an algorithm that returns to the designer a suite of pareto-optimal (i.e. non-dominated) designs. These designs present different tradeoffs with respect to multiple objectives and allow the designer or control algorithm to choose between them. The choice can be updated according to the current critical performance specifications. The technical challenge is to efficiently explore the space of possible solutions with scalable techniques that accomodate high dimensionality and multiple objectives.


Hybrid Machine-Learning and Optimization

Convex optimization techniques such as geometric programming and semi-definite programming are powerful techniques for design and optimization. However, they require the design problem to be modeled with a specific formulation such as a posynomial/monomial objective, constraint or sum-of-squares objective. This is often not straight forward to accomplish accurately.




Other Projects

Hierarchical Genetic Algorithms for Parallelization of Sparse Matrix Algebra

High performance computing on a multicore processor demands efficient parallelization. While dense matrices can be efficiently distributed among cores without concern for inter-chip transport costs, sparse matrix algebra requires consideration of data distribution and transport costs. In collaboration with Lincoln Labs, we have teamed a hierarchical GA with a fine grained computation model. The GAs (inner and outer) adaptively determine an efficient processor mapping for sparse matrix multiplication with respect to data processing and transport costs.(Learn More)



We are developing scalable algorithms for a variety of NP hard problems in networks. These problems emerge in ad-hoc wireless networks, sensor networks. We have designed a distributed algorithms for network coding.


Support Vector Machines: Performance Analysis

Support Vector Machines are an example of a recently developed machine learning algorithm that has rapidly been adopted by a wide range of application programmers as a means of classifying and performing data regression.


Analog Reconfigurable Systems

Model-free methods such as evolutionary algorithms allow reconfigurable systems to adapt or self-tune based solely on performance feedback. Analog reconfigurable systems have potential payoffs in two areas.


Adaptive Resource Allocation

Computer architecture and application complexity is rapidly increasing. With the adoption of multi-core processors for desktop computing, workloads are less predictable because applications are more complex in terms of thread parallelism and diverse computation demands. Decentralized adaptive strategies within the operating system or runtime system potentially are a scalable solution to handling this complexity. We investigate computational economic mechanisms that allow individual software components to introspect on performance and adapt their run time resource requests like they would in a market place of sellers and consumers.


Evolvable Hardware

We have developed an evolvable hardware testbench named GRACE. Grace's software component includes an evolutionary algorithm that generates sized analog circuit topologies. Evolved circuit designs are directly tested in silicon. They are each dynamically configured on an Field Programmable Analog Array then exercised with input signal while their output behaviour is captured and evaluated. GRACE is extensible. We plan to pursue using a highly complex reconfigurable circuit environment to evolve complex circuits such as an ADC.



ALFA Group is formerly the Evolutionary Design and Optimization Group