Research and Industry
My research focuses broadly on data-oriented systems and the way they drive computing. This spans topics in database systems, distributed computing,
data visualization, machine learning and programming languages.
More information on current and past research here.
In 2012 I co-founded Trifacta,
which delivers a cloud service and software system for end-user data wrangling.
Trifacta is based on prior research in my group [1, 2, 3]. You can download Trifacta Wrangler free.
I advise companies including Dell EMC, Captricity and Datometry.
- Approximation and Interaction: A Progressive's View. Keynote, NSF ACAIA, 2017 [PPTX
- Ground: A Data Context Service. CIDR 2017. [pdf]
- People, Computers, and the Hot Mess of Real Data. Keynote, ACM KDD 2016. [pdf]
- Progressive Systems, LinkedIn NYC 2015. [pdf]
- Dancing Calmly with the Devil, Keynote, ACM SoCC 2014. [pdf]
- Of Rocket Ships and Washing Machines: Data Technology for People, Keynote, Strata 2012. [video, 10:46]
- The CALM Theorem: Positive Directions for Distributed Computing, Distinguished Lecture, UCLA; Keynote IEEE/ACM ASE, 2013. [pdf]
- Keep CALM and Query On, RICON 2012, UCSD 2013, UCR 2013. [pdf] [video, 49:24].
- Consistency Analysis in Bloom: A CALM and Collected Approach, CIDR
2011. [.pptx], [.pdf]
- The Declarative Imperative: Experiences and Conjectures in Distributed Logic. Keynote, ACM PODS, 2010. [.key.zip], [pdf], [video]
- MAD Skills: New Practices for Big Data. VLDB, 2009. [pptx], [pdf]
- Quantitative Data Cleaning for Large Databases. Keynote,
QDB, 2009. [.key.zip], [pdf]
- Bricolage: Data at Play. Keynote, ICDM 2007. [.key.zip] [.mov] [pdf]
The Marvelous Structure of Reality. Keynote, WebDB 2003 [PDF], [.mov]
- Readings in Database Systems, 5th Edition. With M. Stonebraker and P. Bailis. [redbook.io]
- Anna: A KVS For Any Scale. With C. Wu, J. M. Faleiro and Y. Lin. ICDE 2018. [pdf]
- Ground: A Data Context Service. with V. Sreekanti, J. Gonzalez et al. CIDR 2017. [pdf]
- Scalable Atomic Visibility with RAMP Transactions. With P. Bailis, A. Fekete, A. Ghodsi, and I. Stoica. TODS 2016.[pdf]
- ReStream: Accelerating Backtesting and Stream Replay with Serial-Equivalent Parallel Processing. With J. Schleier-Smith, E.T. Krogen. SoCC 2016. [pdf]
- Predictive Interaction for Data Transformation. With J. Heer and S. Kandel. CIDR 2015. [pdf]
- Edelweiss: Automatic Storage Reclamation for Distributed Programming. With N. Conway, P. Alvaro and E. Andrews. VLDB 2014. [pdf]
- Blazes: Coordination Analysis for Distributed
Programs. With P. Alvaro, N. Conway, and D. Maier.
ICDE, 2014. [pdf]
- Logic and Lattices for Distributed Programming. With W. R. Marczak, P. Alvaro, N. R. Conway, and D. Maier. SoCC, 2012. [pdf]
- Enterprise Data Analysis and Visualization: An Interview Study. With S. Kandel, A. Paepcke and J. Heer. IEEE VAST, 2012. [pdf]
- Searching for Jim Gray: a technical overview.
(with D. L. Tennenhouse on behalf of a large team of volunteers).
Commun. ACM 54(7), 2011. [pdf]
- Wrangler: Interactive Visual Specification of Data Transformation Scripts (with S. Kandel, A. Paepcke, and J. Heer). CHI 2011. [PDF]
- Data in the First Mile (with K. Chen and T. Parikh). CIDR 2011 [PDF].
- Consistency Analysis in Bloom: a CALM and Collected Approach (with P. Alvaro, N. Conway, and W.R. Marczak). CIDR 2011. [PDF]
- The Declarative Imperative: Experiences and Conjectures in Distributed Logic. SIGMOD Record 39:1, Sep. 2010. [pdf]
- Declarative Networking (with B. T. Loo, T. Condie, M. Garofalakis, D. E. Gay, P. Maniatis, R. Ramakrishnan, T. Roscoe and I. Stoica). Research Highlights, CACM 52(11), 2009. [Intro by Peter Druschel] [pdf].
- Quantitative Data Cleaning for Large Databases. White paper, United Nations Economic Commission for Europe, February, 2008. [PDF]
- Architecture of a Database System. (with M. Stonebraker and J. Hamilton). Foundations and Trends in Databases 1(2). [PDF]
- Implementing Declarative Overlays. (with B. T. Loo,
T. Condie, P. Maniatis, T. Roscoe, and I. Stoica). In 20th SOSP, 2005. [PDF]
- TinyDB: An Acqusitional Query Processing System for Sensor Networks. (with S. Madden, M. Franklin, and Wei Hong). ACM TODS. [PDF]
- Model-Driven Data Acquisition in Sensor Networks
A. Deshpande, C. Guestrin, S. Madden and W. Hong.) VLDB 2004
- Commencement Address. Computer Science, College
of Letters and Science, UC Berkeley, May 26, 2002. [pdf]
- On a Model of Indexability and its Bounds for Range
E. Koutsoupias, D. Miranker, C. Papadimitriou, and V. Samoladas). JACM
49(1) (2002). [pdf]
- Potter's Wheel: An Interactive Data Cleaning System
Raman). VLDB 2001. [PDF]
Eddies: Continuously Adaptive Query Processing (with
SIGMOD 2000. [PDF]
Interactive Data Analysis with CONTROL (with many
Computer, August 1999. [PDF]
Generalized Search Trees for Database Systems (with J.
and A. Pfeffer.) VLDB 1995. [PS]