UC Berkeley
Quotes and Links
Campus Directions
Campus Map
Directions to Soda Hall

Joseph M. Hellerstein

Jim Gray Professor of Computer Science
EECS Computer Science Division
UC Berkeley

Joseph M. Hellerstein


Selected slide presentations...

The Cloud Goes BOOM: Data-Centric Programming for Datacenters. Distinguished Lecture, Cornell, 2009. Discussion of the BOOM effort to make it Orders Of Magnitude easier to build Orders Of Magnitude bigger systems. Focus on BOOM Analytics, an Overlog reimplementation of Hadoop with many added features. [Apple Keynote], [pdf].

MAD Skills: New Practices for Big Data. VLDB, 2009. Magnetic, Agile and Deep Analytics: new approaches to handling Big Data, with a case study at Fox Audience Network over Greenplum. Discussion of warehouse design philosophy, scalable statistical methods, and engine requirements. [pptx], [pdf]

Quantitative Data Cleaning for Large Databases. Keynote, QDB Workshop, 2009. A survey of basic concepts in Robust Statistics, techniques to scale them up to large datasets, and implications for improving data entry forms. [Apple Keynote], [pdf].

Declarative Networking:What is Next. Distinguished Lecture, Johns Hopkins, Fall 2007. (Based on an earlier version given as the keynote for NetDB 2006.) An introduction to Declarative Networking, and discussion of how it can evolve from networking per se to a host of new distributed applications including intelligent systems. [video from Washington] [Apple Keynote], [Quicktime], [PDF]

Bricolage: Data at Play. Keynote, ICDM 2007. A largely non-technical discussion of the emerging Collaborative Data Analysis area, typified by Swivel and IBM Many Eyes, with a tilt toward topics of interest to Data Mining Researchers. [.key.zip] [.mov] [pdf]

Querying and Routing: Data-Centric Forays into Networking. Distinguished Lecture, UMass and UBC, 2004. Discusses connections between networking and database research, and research we're exploring at the seams, including sensornet query processing (e.g. TinyDB and BBQ) and p2p query processing (e.g. PIER and PHI). [PPT] [6up PDF]

Tutorial: Architectures and Algorithms for Internet-Scale (p2p) Data Management. VLDB 2004. A tutorial on p2p research targeted at the database community. Includes a fairly detailed intro to DHTs, a discussion of query processing on DHTs. some discussion of storage, security and other issues. [PPT] [Powerpoint Show] [2up PDF] [3up PDF]

Many Eyes: Projections for Sensor Networks. Keynote talk at MDM 2004. A tasting of technical challenges and societal questions for sensor networks. Focus on the multi-layer optimization problem of computing different classes of functions over sensornet topologies. Also summarizes recent discussions on legal implications of sensornet privacy, joint with Boalt School of Law at UC Berkeley. Apple Keynote format (.key.sit), Quicktime (reduced size)

The Marvelous Structure of Reality. Keynote talk at WebDB 2003. Highlights the false dichotomy between techniques for structured and unstructured data, based in part on analogies from Structuralist and Post-Structuralist philosophy and art. Argues that the main methodological distinction between IR and DB is not about the amount of structure, but about whether the structure is "found" or "engineered". Suggests that a healthy new direction is structured queries over new sources of "found" data, including sensor networks and the Internet's infrastructure. Apple Keynote format (.key.sit), PDF, Quicktime (reduced size).

Query Processing and Networking Infrastructures. A 2-day tutorial and workshop given at MIT, September 2002. Day 1 was a crash course on traditional query processing in relational databases and text retrieval systems. Day 2 was a more speculative discussion of resonances between query processing and data movement in networks, including an overview of our research at Berkeley on the Telegraph, TinyDB, and PIER projects. Slides are available, please do not copy without permission. Day 1, ppt, Day 2, ppt.

Adaptive Dataflow: A Database/Networking Convergence. Given at the Stanford database seminar 12/7/2001, this talk makes the case that database and networking research are converging, and gives examples from work in the Telegraph group at Berkeley. powerpoint.

We Lose. Impromptu diatribe cooked up at HPTS 2001, explaining why the DBMS community has won few of the hearts and minds of the grassroots software world, and some things we can do to be (and have) more fun. powerpoint PDF

Search and Query: An {Over, Re}view. Invited talk to the National Academy of Science Committee on Internet Navigation and the Domain Name System: Technical Alternatives and Policy Implications, July 2001. As a reductio ad absurdum, it shows how much one could put into DNS with query processing technology a la Telegraph. Also makes the point that "query" means much more than "search". powerpoint.

Online Query Processing An overview of the CONTROL project and Online QP. Versions of this talk given at MIT (1/03), Polytechnic (11/02), Harvard (10/02), OGI/PSU (1999), Internet Archive (1999). Distinguished lecture at University of Virginia, 1999. powerpoint. Based in large part on the tutorial below with additional depth and missing breadth.

Online Query Processing: A Tutorial. Joint with Peter Haas, SIGMOD 2001. powerpoint, PDF

Content Integration for E-Commerce. SIGMOD 2001. A description of the problems and some solutions in integrating information for E-Commerce. Also includes evangelism for research issues in the area, especially those surrounding semi-automatic interactive tools. Though partly a historical artifact of the Late Internet Boom Era, the technical meat is still highly relevant (content integration is a perennial problem, not an Internet fad), and the product described was still being sold by PeopleSoft at last check. PPT

Potter's Wheel: An Interactive Data Cleaning System (with Vijayshankar Raman). Slides from VLDB 2001. Describes some of the salient aspects of Potter's Wheel. pdf

Adaptive Dataflow: Rivers and Eddies. Distinguished Lecture, University of Toronto, September 2000. An early presentation of the linkages between adaptive query processing and routing. powerpoint and HTML.

Endeavour Project status reports.. Overview and status talks for the Telegraph project, given at Endeavour retreats.

Eddies: Continuously Adaptive Query Processing. An eddy is an adaptive router that can serve many of the roles of a traditional query optimizer, but can do so in an online fashion -- learning data distributions and performance characteristics and adapting to them continuously. Talk at SIGMOD 2000. powerpoint. See also the conference paper.

A Crystal Ball for Data-Intensive Processing. PowerPoint slides of an early talk on the CONTROL project, given at Microsoft, UC Berkeley, and Tel Aviv University, c. 1998.  

On the Analysis of Indexing Schemes. PowerPoint slides of a talk at PODS '97 on Indexability Theory.

GiST: A Generalized Search Tree for Database Systems, a talk given at Hebrew University in Jerusalem, Tel Aviv University, UC Berkeley, Brown University, IBM Almaden Research Center. An extended version of a talk given at VLDB95. [PS] [PDF]