Jump to Past Projects

Artificial Adversarial Intelligence


Block Box Minimax

In Sign-Hunter, we present a novel black-box adversarial attack algorithm with state-of-the-art model evasion rates for query efficiency under L-1 and L-infinity metrics. It exploits a sign-based, rather than magnitude-based, gradient estimation approach that shifts the gradient estimation from continuous to binary black-box optimization. It adaptively constructs queries to estimate the gradient, one query relying upon the previous, rather than re-estimating the gradient each step with random query construction. Its reliance on sign bits yields a smaller memory footprint and it requires neither hyperparameter tuning or dimensionality reduction.

ZO-Min-Max provides a principled optimization framework integrating a zeroth-order (ZO) gradient estimator with an alternating projected stochastic gradient descent-ascent method, where the former only requires a small number of function queries and the latter needs just one-step descent/ascent update. The proposed framework has a sub-linear convergence rate under mild conditions and scales gracefully with problem size.



LIPIZZANER: GANs and Competitive Coevolution

GANs are difficult to train due to convergence pathologies such as mode and discriminator collapse. Lipizzaner, an open source software system, allows machine learning programmers to train GANs in a distributed and robust way by using spatially distributed generator and discriminator populations. Theoretical problems and image data sets demonstrate the improved performance and scalability.
Publication   Publication   Blog



Bayesian Optimization for Nash Equilibrium

Game theory has emerged as a powerful framework for modeling a large range of multi-agent scenarios. Many algorithmic solutions require discrete, finite games with payoffs that have a closed-form specification. In contrast, many real-world applications require modeling with continuous/discrete action spaces and black-box utility functions where payoff information is available only in the form of empirical (often expensive and/or noisy) observations of strategy profiles. Few tools exist for solving the class of expensive, black-box continuous games. In this project, we investigate methods to find equilibria for such games in a sequential decision-making framework using Bayesian Optimization.
Publication   Blog  


Malware & Software Vulnerabilities

SLEIPNIR: Adversarial Machine Learning

Model based malware detectors, such as SVM and neural networks, are vulnerable to so-called adversarial examples which are modest changes to detectable malware that allows the resulting malware to evade detection. In this project, we develop methods capable of generating functionally preserved adversarial malware examples in the binary domain. We develop a method to adversarially harden malware detectors for binary representations under a bit setting constraint. Furthermore, we investigate visual hallmarks of robust generalization: good performance against unseen attacks.
Publication   Publication   Blog



Automated Vulnerability and Malware Detection in Source Code

We are developing machine learning approaches that detect bugs and vulnerabilities, and are able to classify malicious code. We are using technologies from traditional programming analysis and combining them with machine learning to spot malicious and vulnerable programs in the wild. We are working with two programming languages- Powershell and Solidity. To date we have shown that our neural and graph-based representations of programs provide informative features to detect program properties signaling bugs. We are presently considering featureless deep learning methods.
Publication   Publication



Instruction, Command Line or Script Malware Detection with Machine Learning

Extension of previous work MIT-IBM collaboration where we worked with binary Portable Executable files and the algorithmic design of adversarially robust malware detectors. In this project we tackle “code" malware. This malware is in the form of instructions, command lines or scripts that are executed. Code malware is challenging to detect because attackers have access to obfuscation and they use subtle and nuanced code manipulations. It requires an ideal representation. We will develop theory and a novel deep learning approach to code representation that allows us to develop competent code malware detectors that are adversarially robust to evasion attacks facilitated by obfuscation.




MOOC Data Analytics

MOOC Learner Project: Advancing Learning Behavioral Analytics through Data Science

Each time a learner interacts with an e-learning system it is possible to capture a record of their engagement. Data comprising mouse clicks, video controls, problem responses, programming, collaborations and discussions then becomes available to learning science. MLP’s goal is to tap into the immense potential of this data to provide insights into how students learn and how instructors can effectively teach. The challenge is to provide technology and develop new approaches that transforms this fundamentally different set of observations into actionable knowledge.

An Open Learning Design, Data Analytics and Visualization Framework for E-Learning
In a continuation of previous work, we mature the end-to-end software workflow that encompasses data curation and machine learning enabled analytics by maintenance and extensions such as student programming trajectories. We conduct learning analytics research on the teaching and learning of computational thinking and computer science.
Publication   Publication

This project is sponsored by    under the 




Past Projects


Tax Non-Compliance

STEALTH (Simulating Tax Evasion and Law Through Heuristics)

Allows us to identify sequences of financial transactions around partnerships that accomplish the same economic purpose with differences in tax consequence. By applying a robust co-optimization and artificial intelligence modeling approach, it learns observables that indicate the presence of non-compliant behavior.

STEALTH: Understanding the Relationship Between Tax Non-compliance and Tax Law
Through the use of partnerships and other "flow-through entities", taxpayers underreported more than $91 billion of income annually between 2006 and 2009 (GAO-14-453), and the trend shows little sign of stopping. Our goal is to develop technology that enables the discovery of non-compliant partnership transaction patterns.
Publication   Publication   Videos

Sponsored by  September 2012 - August 2015.




Flexible Machine Learning with Genetic Programming


The FlexGP project goal is scalable machine learning using genetic programming (GP). Genetic programming is a mature, robust multi-point search technique (inspired by evolution) which supports readable, and flexibly specified learning representations which can readily express linear or non-linear data relationships. It is well suited to parallelization and machine learning. It has a strong record in real world domains.
Website   Code

A cloud based platform for generating transparent non-linear large scale regression problems.
Website   Publication

A data parallel approach to building ensemble of classifiers.
Website   Publication

Feature Learning
Evolutionary Feature Synthesis (EFS) generates accurate, readable, nonlinear features for tabular data.
Website   Publication



More Past Projects

ALFA's research would not be possible without the past and ongoing support of our industry sponsors
The views, opinions and positions expressed by ALFA Group and on this site are theirs alone, and do not necessarily reflect the views, opinions or positions of their sponsors