Projects

Jump to Past Projects

Contents

Artificial Adversarial Intelligence

    Machine Learning in Adversarial Settings

Understanding Programs & Programming

Data Analytics

 

 

 

Artificial Adversarial Intelligence

 

 

CICADA: Coevolutionary Intelligent COAs for Adversarial Decisions Against Allies

CICADA is designed to learn effective new Red Force brigade behaviors, expressed as novel tactics, techniques and procedures (TTPs), for force-on-force wargames at the strategic, operational, and tactical levels.

Starting Fall 2020 this project has openings for UROPs and M.Engs

This project is in collaboration with   Perspecta Labs  sponsored by  The Defense Advanced Research Projects Agency (DARPA) 

 

 

 

Adaptive Attacks Through Planning

With the recent advent of systematic collections of knowledge about cyber attacks and cybersecurity, the use of structured attack information for AI planning in the cybersecurity space has become a possibility. This project is applying AI planning techniques to develop an attack planner that executes actual cyber attacks on an emulated range.

This project is sponsored by SimSpace Corporation

 

 

 

WILEE: Agent-Based Threat Detection and Adaptive Collection for Cyber Hunting at Scale

It is imperative in cybersecurity to research and develop techniques that generate a prioritized set of threats from the adversary’s perspective. This project develops algorithms that genetically perturb threat implementations to improve detection accuracy and eliminate model overfitting. It also investigates how to efficiently access public sources of threat and vulnerability knowledge and exploit this knowledge for improved adversarial analysis.

Starting Fall 2020 this project has openings for UROPs and M.Engs

This project is in collaboration with  Perspecta Labs  sponsored by  The Defense Advanced Research Projects Agency (DARPA) 

 

 

 

Application of Coevolutionary Algorithms for an Asymmetric Cyber Game

This research project will develop a novel machine learning (ML) approach using a coevolutionary algorithm that is integrated with an Artificial Intelligence planner. The resulting system will be applied to a use case in the form of a cyber game wherein it will assume the roles of two automated game players that compete against each other. The cyber game will investigate defense strategies for unknown adversary behavior.

Starting Fall 2020 this project has openings for UROPs and M.Engs

This project is sponsored by   US Air Force Research Laboratory

 

 

 

LIPIZZANER: GANs and Competitive Coevolution

GANs are difficult to train due to convergence pathologies such as mode and discriminator collapse. Lipizzaner, an open source software system, allows machine learning programmers to train GANs in a distributed and robust way by using spatially distributed generator and discriminator populations. Theoretical problems and image data sets demonstrate the improved performance and scalability.
Website   Publication   Publication   Blog   Blog   Lipizzaner Twitter

This projct is funded by  Systems That Learn at CSAIL

 

 

 

Machine Learning in Adversarial Settings

SignHunter

In SignHunter, we present a novel black-box adversarial attack algorithm with state-of-the-art model evasion rates for query efficiency under L-1 and L-infinity metrics. SignHunter exploits a sign-based, rather than a magnitude-based, gradient estimation approach. This shifts the gradient estimation from continuous to binary black-box optimization. SignHunter adaptively constructs queries to estimate the gradient, one query relying upon the previous, rather than re-estimating the gradient each step with random query construction. Its reliance on sign bits yields a smaller memory footprint and it requires neither hyperparameter tuning nor dimensionality reduction.

Publication   Video

This project is sponsored by  MIT-IBM Watson AI Lab

 

 

 

Zo-Min-Max

ZO-Min-Max provides a principled optimization framework integrating a zeroth-order (ZO) gradient estimator with an alternating projected stochastic gradient descent-ascent method, where the former only requires a small number of function queries and the latter needs just one-step descent/ascent update. The proposed framework has a sub-linear convergence rate under mild conditions and scales gracefully with problem size.

Publication

This project is sponsored by  MIT-IBM Watson AI Lab

 

 

 

Bayesian Optimization for Nash Equilibrium

Game theory has emerged as a powerful framework for modeling a large range of multi-agent scenarios. Many algorithmic solutions require discrete, finite games with payoffs that have a closed-form specification. In contrast, many real-world applications require modeling with continuous/discrete action spaces and black-box utility functions where payoff information is available only in the form of empirical (often expensive and/or noisy) observations of strategy profiles. Few tools exist for solving the class of expensive, black-box continuous games. In this project, we investigate methods to find equilibria for such games in a sequential decision-making framework using Bayesian Optimization.
Publication   Blog

This project is sponsored by  MIT-IBM Watson AI Lab

 

 

 

SLEIPNIR: Adversarial Machine Learning

Blog Model based malware detectors, such as SVM and neural networks, are vulnerable to so-called adversarial examples which are modest changes to detectable malware that allows the resulting malware to evade detection. In this project, we develop methods capable of generating functionally preserved adversarial malware examples in the binary domain. We develop a method to adversarially harden malware detectors for binary representations under a bit setting constraint. Furthermore, we investigate visual hallmarks of robust generalization: good performance against unseen attacks.
Publication   Publication   Blog

This project is sponsored by  MIT-IBM Watson AI Lab

 

 

 

Automated Vulnerability and Malware Detection in Source Code

We are developing machine learning approaches that detect bugs and vulnerabilities, and are able to classify malicious code. We are using technologies from traditional programming analysis and combining them with machine learning to spot malicious and vulnerable programs in the wild. We are working with two programming languages- Powershell and Solidity. To date, we have shown that our neural and graph-based representations of programs provide informative features to detect program properties signaling bugs. We are presently considering featureless deep learning methods.
Publication   Publication

This project is sponsored by  MIT-IBM Watson AI Lab

 

 

 

Building Robustness with a Simple Method for Generating Black-box Adversarial Attacks for Models of Code

The challenge of adversarial examples, i.e., imperceptible perturbations in the input that result in misclassification, is now well-known to the machine learning research community, especially in the image and natural language domains. However, generating adversarial examples in the source code domain poses additional problems. Unlike vision and natural language, source code perturbations must adhere to strict semantic guidelines so that the resulting programs do not change the functional meaning of the code. We are working on a simple and efficient black-box method for generating state-of-the-art adversarial examples on models of code.

This project is sponsored by  MIT-IBM Watson AI Lab

 

 

 

 

Understanding Programs and Programming

For a variety of applications, we need systems that understand code and that are able to represent the meaning of code in a way that suits an intended purpose. This leads to us working on deep learning representations of code and collaborating with neuroscientists to understand how humans understand code.

 

 

Genetic Programming and Program Synthesis

Our goal is to develop novel genetic programming extensions as they help to automate the software sustainment loop. We will novelly combine Genetic Programming with Symbolic Solving to automate program adaptation in response to changes in software requirements or the environment in which software runs. The challenge will be to share the advantages of each method and use one to complement the other. We envision Genetic Programming evolving structural correct programs assembled from program pieces based on a context-free grammar and type checking.

Starting Fall 2020 this project has openings for UROPs and M.Engs

This project is in collaboration with   Perspecta Labs   sponsored by   The Defense Advanced Research Projects Agency (DARPA)  IDAS program

 

 

 

Neural Bases of Comprehending Computer Programs

In an attempt at understanding why we introduce bugs in the programs we write, we have studied the functions in the human brain when we read and understand computer programs. We studied whether comprehending programming languages activate the same regions of the brain as those recruited when comprehending natural languages. We found in our study that the brain regions for natural and programming languages are different. This opens up an exciting area to explore whether the bugs we introduce can be localized to activities in specific parts of the brain responsible for code processing.

Publication

This project is sponsored by  MIT-IBM Watson AI Lab

 

 

 

Deep Code Representation for Bug Detection

As domain-specific languages such as Solidity for Ethereum have emerged to program blockchain distributed ledgers, domain-specific bugs and vulnerabilities have reciprocally arisen. We propose a machine learning approach to designing classifiers that can flag specific lines of programs containing such vulnerabilities. This classification task calls for reasoning beyond what tokens are in a line to reasoning about how each token lies within a control context (e.g. loop) and how its meaning depends on its preceding definition. We are developing a distributed representation in latent feature space to express these properties while ensuring that lines of similar meaning have similar features.

This project is sponsored by  MIT-IBM Watson AI Lab  previously funded by  FinTech at CSAIL

 

 

 

 

Data Analytics

 

COVID Risk Forecasting System For Building Pedestrians

We are developing an agent-based modeling system of COVID-19 contagion as driven by the common use of campus buildings. Our system will be integrated and combined with several other COVID-19 simulation models, each modeling a different aspect of contagion or COVID-19 epidemiology and using a variety of technical approaches. The aim of this larger system is to provide an ensemble prediction of the impact that different reopening strategies have on the community prevalence of COVID-19 and minimization of health risks to MIT personnel these strategies may pose.

Website

This project is funded by  MIT Quest for Intelligence

 

 

 

MOOC Learner Project: Advancing Learning Behavioral Analytics through Data Science

Each time a learner interacts with an e-learning system it is possible to capture a record of their engagement. Data comprising mouse clicks, video controls, problem responses, programming, collaborations and discussions then becomes available to learning science. MLP’s goal is to tap into the immense potential of this data to provide insights into how students learn and how instructors can effectively teach. The challenge is to provide technology and develop new approaches that transforms this fundamentally different set of observations into actionable knowledge.
Website

An Open Learning Design, Data Analytics and Visualization Framework for E-Learning
In a continuation of previous work, we mature the end-to-end software workflow that encompasses data curation and machine learning enabled analytics by maintenance and extensions such as student programming trajectories. We conduct learning analytics research on the teaching and learning of computational thinking and computer science.
Publication   Publication

This project is in collaboration & sponsored by  The Hong Kong University of Science and Technology  under the  HKUST MIT Research Alliance Consortium

 

 

 

Real-time Modeling of Network Activity in the Developing Brain

Understanding how brain cells form functional networks during early life is key to understanding information processing in the developing brain and how this processing is disrupted in neurodevelopmental disorders. The communication between brain cells can be observed in real-time by neuroscientists using two-photon calcium imaging and/or microelectrode arrays. Yet the time necessary for analyzing these large datasets can limit our ability to study network development over time using existing methods. We use machine learning to accelerate this analysis pipeline and to enhance both the signal extraction and feature selection. Multiple opportunities exist for students to apply their interests in machine learning and (real!) neural networks into our analysis pipeline.

This project is in collaboration with   Neuronal Oscillations Group   at   University of Cambridge

 

 

 

 


Past Projects

 

Tax Non-Compliance

STEALTH (Simulating Tax Evasion and Law Through Heuristics)

Allows us to identify sequences of financial transactions around partnerships that accomplish the same economic purpose with differences in tax consequence. By applying a robust co-optimization and artificial intelligence modeling approach, it learns observables that indicate the presence of non-compliant behavior.
Website

STEALTH: Understanding the Relationship Between Tax Non-compliance and Tax Law
Through the use of partnerships and other "flow-through entities", taxpayers underreported more than $91 billion of income annually between 2006 and 2009 (GAO-14-453), and the trend shows little sign of stopping. Our goal is to develop technology that enables the discovery of non-compliant partnership transaction patterns.
Publication   Publication   Videos

Sponsored by  Mitre   September 2012 - August 2015.

 

 

 

Flexible Machine Learning with Genetic Programming

FlexGP

The FlexGP project goal is scalable machine learning using genetic programming (GP). Genetic programming is a mature, robust multi-point search technique (inspired by evolution) which supports readable, and flexibly specified learning representations which can readily express linear or non-linear data relationships. It is well suited to parallelization and machine learning. It has a strong record in real world domains.
Website   Code

FlexGP:
A cloud based platform for generating transparent non-linear large scale regression problems.
Website   Publication

FCUBE
A data parallel approach to building ensemble of classifiers.
Website   Publication

Feature Learning
Evolutionary Feature Synthesis (EFS) generates accurate, readable, nonlinear features for tabular data.
Website   Publication

 

 

More Past Projects

ALFA's research would not be possible without the past and ongoing support of our industry sponsors
The views, opinions and positions expressed by ALFA Group and on this site are theirs alone, and do not necessarily reflect the views, opinions or positions of their sponsors