Jump to Past Projects

Artificial Adversarial Intelligence


RIVALS: Evolutionary Adversarial Dynamics in Cyber Networks




Our RIVALS framework exploits competitive coevolutionary algorithms to pit adversaries against one another and direct the resulting arms race. We evolve P2P network defense strategies against DDOS attacks. Our goal is to identify robust network configurations that support mission completion despite an ongoing DDOS attack.

Publication   Blog

AVAIL: Isolation vs Contagion

Isolation is an effective defense because borders prevent flow between compartmentalized units of a system or network. In networks, this is termed segmentation. In systems requiring complex access policies, software defined perimeter (SDP) underpin the combinations in which people, resources and privileges are controlled. In the network context, we investigate the placement of defensive measures in networks using isolation to minimize malware contagion and maximize device availability. In the SDP context we investigate topologies of access control flows.



DARK HORSE: Deception vs Reconnaissance

Once an adversary has compromised a network endpoint, they perform internal network reconnaissance to execute a plan of attack. One way to protect against internal reconnaissance is by camouflage the network to delay the attacker. We exploit the flexibility of software defined networks to alter (without function loss) the network view, as well as place decoys (honeypots) on the network to trap and slow down reconnaissance.



Machine Learning in Adversarial Settings

RECKLESS: Block Box Minimax

RECKLESS addresses black-box (gradient-free) minimax problems in a coevolutionary setup. In our first contribution, with guarantees from Danskin’s theorem, we employ an Evolutionary Strategy as a stochastic estimator for the descent direction. We validate our proposed estimator with a collection of black-box minimax problems. Its performance is comparable with its coevolutionary counterparts and favorable for high-dimensional problems.



LIPIZZANER: GANs and Competitive Coevolution

GANs are difficult to train due to convergence pathologies such as mode and discriminator collapse. Lipizzaner, an open source software system, allows machine learning programmers to train GANs in a distributed and robust way by using spatially distributed generator and discriminator populations. Theoretical problems and image data sets demonstrate the improved performance and scalability.

Publication   Blog


Bayesian Optimization for Nash Equilibrium

Game theory has emerged as a powerful framework for modeling a large range of multi-agent scenarios. Many algorithmic solutions require discrete, finite games with payoffs that have a closed-form specification. In contrast, many real-world applications require modeling with continuous/discrete action spaces and black-box utility functions where payoff information is available only in the form of empirical (often expensive and/or noisy) observations of strategy profiles. Few tools exist for solving the class of expensive, black-box continuous games. In this project, we investigate methods to find equilibria for such games in a sequential decision-making framework using Bayesian Optimization.

Publication   Blog  


Malware & Software Vulnerabilities

SLEIPNIR: Adversarial Machine Learning

Model based malware detectors, such as SVM and neural networks, are vulnerable to so-called adversarial examples which are modest changes to detectable malware that allows the resulting malware to evade detection. In this project, we develop methods capable of generating functionally preserved adversarial malware examples in the binary domain. We develop a method to adversarially harden malware detectors for binary representations under a bit setting constraint. Furthermore, we investigate visual hallmarks of robust generalization: good performance against unseen attacks.

Publication   Publication   Blog 

Automated Vulnerability and Malware Detection in Source Code

We are developing machine learning approaches that detect bugs and vulnerabilities, and are able to classify malicious code. We are using technologies from traditional programming analysis and combining them with machine learning to spot malicious and vulnerable programs in the wild. We are working with two programming languages- Powershell and Solidity. To date we have shown that our neural and graph-based representations of programs provide informative features to detect program properties signaling bugs. We are presently considering featureless deep learning methods.






MOOC Data Analytics

MOOC Learner Project: Advancing Learning Behavioral Analytics through Data Science

Each time a learner interacts with an e-learning system it is possible to capture a record of their engagement. Data comprising mouse clicks, video controls, problem responses, programming, collaborations and discussions then becomes available to learning science. MLP’s goal is to tap into the immense potential of this data to provide insights into how students learn and how instructors can effectively teach. The challenge is to provide technology and develop new approaches that transforms this fundamentally different set of observations into actionable knowledge.

Website   Publication   Publication




Clinical Medicine


Website   Publication   Publication



A scalable machine learning framework that allows researchers to build predictive models through physiological waveform mining and analysis. One of it's strengths is it's flexibility, allowing for users to define features to track, conditions to detect, and filters to apply with short user-defined scripts. BeatDB is integrated within Amazon Web Services (AWS), allowing for users to run computations in parallel in the cloud. Because of these features, BeatDB allows researchers and scientists to cut down on time needed for prediction studies and data processing without sacrificing any of the parameterization and specificity to the data possible with custom (and often single-use) scripts.

"Patients Like Me" for Precision Medicine (PM2)

In data-driven precision medicine, fast yet accurate prediction of acute and critical events based on sensors' time series is of crucial importance especially in intensive care units. In such setting, promptness is demanded, so if a task can be completed dramatically faster, it is often acceptable to tolerate a slight decrease in accuracy.  To address these challenges, we are developing techniques for scalable patient record retrieval and event prediction based on locality-sensitive hashing (LSH). It has a significantly faster querying time, while maintaining the accuracy in a competitive range in comparison to the linear, exhaustive k-nearest neighbor search. The prediction based on LSH is essentially a two-step process of first quickly retrieving "patients like me", the approximate nearest neighbors of our query of interest by LSH, and second, extrapolating the information of nearest neighbors for prediction.


An integrative machine learning and econometric framework that allows researchers to build prediction and causation models drawing on observational data. OSaaS seeks to aid decision making in contexts where the ground truth is not well known and uncertainty exists around the effectiveness of one's interventions. By leveraging big data in different domains (health, transportation, commercial), OSaaS seeks to make explicit and transparent the modeling choices made by researchers that ultimately inform key decisions (what patient receives what treatment and when, transportation infrastructure and policy decisions, insurance pricing).




STEALTH Tax Project


Website   Publication   Publication   Videos


STEALTH: Understanding the Relationship between Tax Non-Compliance and Tax Law

The project's goal is to develop technology that enables the discovery of non-compliant partnership transaction patterns.
 Through the use of partnerships and other "flow-through entities", taxpayers underreported more than $91 billion of income annually between 2006 and 2009 (GAO-14-453), and the trend shows little sign of stopping.

STEALTH (Simulating Tax Evasion And Law Through Heuristics)

Allows us to identify sequences of financial transactions around partnerships that accomplish the same economic purpose with differences in tax consequence. By applying a robust co-optimization and artificial intelligence modeling approach, it learns observables that indicate the presence of non-compliant behavior.




FlexGP Project

FlexGP: Flexible ML with Genetic Programming

Website   Publication   Code



A cloud based platform for generating transparent non-linear large scale regression problems.

Website   Publication


A data parallel approach to building ensemble of classifiers.

Website   Publication

Feature Learning

Evolutionary Feature Synthesis (EFS) generates accurate, readable, nonlinear features for tabular data.

Website   Publication



Past Projects