Overview
I am a postdoc researcher at Microsoft Research in NY. After this
postdoc, I will join the University of British Columbia in Vancouver as
an assistant professor. I completed my PhD at Columbia University under
the supervision of
Roxana
Geambasu,
Augustin
Chaintreau, and
Daniel
Hsu. My research interests span broad areas of computer science,
including systems, privacy, security, statistics, causal inference, and
machine learning. My ongoing work focuses on enabling the promises of ML
driven ecosystems without imposing undue risks to individuals. I also spend
time rock climbing, and love to read on topics ranging from econ to history
and ecology.
Research
My research addresses the new system challenges and opportunities introduced
by the data and artificial intelligence revolutions. While data-driven
systems can yield social and economic benefits, they also open new security
and privacy threats, and their opacity can undermine users’ trust. To
address these challenges, I design, implement, and evaluate rigorous,
theory-backed systems that are both practical and provide provable
guarantees of security, privacy, and statistical soundness. To provide these
guarantees, my system designs leverage theory from statistics, machine
learning, causal inference, and differential privacy.
I am fortunate to work with multiple great collaborators including Riley
Spahn, Vaggelis Atlidakis, and Brian Goodchild from Columbia, and Siddhartha Sen, Amit Sharma, and Alex Slivkins from MSR.
Projects
- Certified Defense Against Adversarial Examples
Enormous progress in fitting accurate, complex models, such as deep
neural networks (DNNs), over large amounts of data has caused their
wide-spread adoption, including in safety and security-critical
applications where robustness against adversarial behavior is crucial.
Unfortunately, in recent years it has become increasingly clear that
DNNs are vulnerable to a variety of attacks, including adversarial
examples attack in which the adversary finds small perturbations to
correctly classified inputs that cause a DNN to produce
erroneous predictions.
Based on a novel connection between robustness to adversarial examples
and differential privacy, a cryptographically-inspired formalism from
the privacy domain, I proposed PixelDP, the first certifiably robust
defense against adversarial examples that scales to large, real-world
DNNs and datasets (such as Google’s Inception network for ImageNet)
and applies broadly to arbitrary model types.
PixelDP also enables a firewall-like security architecture, where a
small model is prepended to an existing, already trained one to make it
more robust. Such an architecture is common in traditional software
systems but unique for ML workloads.
- New Protection Abstractions for ML Ecosystems
Challenging the common practice in both private and public sectors of
collecting vast quantities of personal information, I built new data
protection abstractions better suited to ML workloads than traditional
ones.
These abstractions cleanly separate the historical data that is
summarized in protected feature models, from the currently used data
that is minimized in size and time span. This approach minimizes the
exposure of sensitive data to hackers and malicious employees.
Specifically, I build Pyramid a data management system that leverages
training set minimization techniques to reduce data exposure in ML
applications. Pyramid uses count-based featurization to summarize past
data before it is archived in cold storage. The counts, kept
differentially private, are used with a small amount of recent
observations, called the hot data, to train ML models. Using this
technique, as well as system mechanisms to reduce the impact of
differentially private noise, Pyramid is within 4% of previous models'
accuracy while training on, and thus exposing, less than 1% of the raw
data.
- Data Use Transparency Infrastructure To add transparency
to data uses on the Web, I am building a series of scalable, generic,
and reliable tools to detect data flows within and across web services.
My initial system, XRay, offers a first system design and theoretical
building blocks to detect the use of digital personal data for targeting
and personalization. The key insight in XRay is to infer targeting by
correlating user inputs (such as searches, emails, or locations) to
service outputs (such as ads, recommendations,
or prices) based on observations obtained from user profiles populated
with different subsets of the inputs. My latest tool, Sunlight,
leverages rigorous statistical methods to determine the causes of online
targeting at great scale and based on solid statistical justification.
I used my tools to run large-scale studies of online ad targeting. Among
other findings, I identified strong evidence of targeting on sensitive
personal information – such as religion and sexual orientation – and
sensitive financial information that should not be targeted according to
Google’s own privacy FAQ.
Publications
- Mathias Lécuyer, Riley Spahn, Kiran Vodrahalli, Roxana Geambasu, Daniel Hsu. "Privacy Accounting and Quality Control in the Sage Differentially Private ML Platform" (SOSP'19) [PDF]
- Mathias Lécuyer, Vaggelis Atlidakis, Roxana Geambasu, Daniel Hsu, Suman Jana. "Certified Robustness to Adversarial Examples with Differential Privacy" (S&P'19) [PDF][Code]
- Mathias Lécuyer, Riley B. Spahn, Roxana Geambasu, Tzu-Kuo Huang, and Siddhartha Sen. "Enhancing Selectivity in Big Data" (Invited paper, S&P Magazine, 2018) [PDF]
- Mathias Lécuyer, Joshua Lockerman, Lamont Nelson, Siddhartha Sen, Amit Sharma, and Aleksandrs Slivkins. "Harvesting Randomness to Optimize Distributed Systems" (HotNets'17) [PDF]
- Mathias Lécuyer, Riley B. Spahn, Roxana Geambasu, Tzu-Kuo Huang, and Siddhartha Sen. "Pyramid: Enhancing selectivity in big data protection with count featurization" (S&P'17) [PDF][Long Version][Website]
- Mathias Lécuyer, Max Tucker, Augustin Chaintreau. "Improving the transparency of the sharing economy" (WWW'17) [PDF][Data][Blog post]
- Mathias Lécuyer, Riley Spahn, Giannis Spiliopoulos, Augustin Chaintreau, Roxana Geambasu, and Daniel Hsu. "Sunlight: Fine-grained Targeting Detection at Scale with Statistical Confidence" (CCS'15) [PDF][Website][The Economist, Slate]
- Nicolas Viennot , Mathias Lécuyer, Jonathan Bell, Roxana Geambasu, and Jason Nieh. "Synapse: New Data Integration Abstractions for Agile Web Application Development" (EuroSys'15) [PDF][Website]
- Mathias Lécuyer, Guillaume Ducoffe, Francis Lan, Andrei Papancea, Theofilos Petsios, Riley Spahn, Augustin Chaintreau, and Roxana Geambasu. "XRay: Increasing the Web's Transparency with Differential Correlation" (USENIX Security'14) [PDF][Website][NYT Bits, MIT Technology Review]
Talks
- Security, Privacy, and Transparency Guarantees for Machine Learning Systems
- UBC (March 2019).
- Security, Privacy, and Transparency Guarantees for Machine Learning Systems
- UCLA (March 2019).
- Security, Privacy, and Transparency Guarantees for Machine Learning Systems
- MSR NY (February 2019).
- Certified Robustness to Adversarial Examples with Differential Privacy
- UW Security Seminar (October 2018).
- Certified Robustness to Adversarial Examples with Differential Privacy
- Berkeley Security Seminar (October 2018).
- Certified Robustness to Adversarial Examples with Differential Privacy
- Stanford Security Lunch (June 2018).
- Harvesting Randomness for Counterfactual Evaluation of Systems
- Stanford NetSeminar (June 2018).
- Certified Robustness to Adversarial Examples with Differential Privacy
- Google Brain (June 2018).
Service
- Program Committees: Eurosys 2020, USENIX Security 2020, SoCC 2019, Systems for ML workshop 2018.
- Reviewer: Journal of Machine Learning Research 2019, ACM Transactions on Internet Technology 2016