Overview
This course, offered for the first time in the Spring 1999 semester,
will survey the brave new world of the web as viewed from the perspective
of database systems. The web today is a terrific resource for global information
and resource sharing – Your parents can look at your home page from across
the globe, you can read today's news articles from your hometown newspaper
while sitting in your apartment in Berkeley, you can buy books and CDs
at a discount from web sites with impressively large inventories, and you
can book travel arrangements without ever interacting with a human. But,
to really succeed, you have to have a URL. Sure, you can use a web search
engine – but much more powerful search tools, and probably more information
structure, is going to be required in order to fully realize the potential
of the web as a world-wide database system (as opposed to a world-wide
heap of mostly unstructured documents). Moreover, today's database systems
(which are largely based on 15-year-old system architectures) are not ideally
suited to the web – they can handle most of the required data types, and
new types can be defined for those that aren't covered, but they weren't
designed to handle the imprecision, statelessness, or large volumes of
requests that a popular web server must be prepared to cope with.
In a keynote speech at the ACM OOPSLA conference several years
ago, Alan Kay said "HTML is what happens when you let physicists play with
computers." We believe that (like all problems!) the web, in many ways,
is a database problem. We also believe that there is much interesting database
research to be done in order to make the world a better place for locating
and utilizing data in a web context. In this course, we will survey the
research literature in the vicinity of the web/database crossroads, examining
a variety of web-related problems and some of their initial solutions.
The goal of the course is two-fold: First, we (the instructors as well
as the students) should all leave the course with a much better understanding
of the web/database problem space and what's been done thus far. Second,
we will hopefully develop a number of new and interesting ideas for relevant
database research along the way.
Reading List
Adding Objects to Database Systems
- M. Carey et al, O-O, What Have They Done to DB2?, IBM Research Report RJ-10132, IBM Almaden Research Center, October 1998.
- R. Ramakrishnan and J. Hellerstein, Object-Database Systems, Chapter 21 of Database Management Systems (by R. Ramakrishnan), WCB/McGraw-Hill, 1998, pp. 614-645.
- P. Bernstein et al, The Asilomar Report on Database Research, ACM SIGMOD Record 27(4), pp. 74-80. (*)
The Web for Database Dummies
- T. Berners-Lee et al, The World-Wide Web, CACM 37(8), August 1994, pp. 76-82.
- HyperText Transfer Protocol
, in The Web Developer's Virtual Library, http://www.stars.com/Internet/Protocols/HTTP/article.html.
- J. Hu and D. Schmidt, JAWS: A Framework for High-Performance Web Servers, submitted for publication, http://siesta.cs.wustl.edu/~schmidt/JAWS.ps.gz.
- A. Fox et al, Cluster-Based Scalable Network Services, Proc. 1997 ACM Symp. on Operating System Principles, 1997, pp. 78-91. (*)
Database-Based Web Sites
- T. Nguyen and V. Srinivasan, Accessing Relational Databases from the World Wide Web, Proc. 1996 ACM SIGMOD Conf., pp. 529-540.
- P. Atzeni et al, Design and Maintenance of Data-Intensive Web Sites, Proc. 1998 EDBT Conf., pp. 436-450.
- M. Fernandez et al, Catching the Boat with Strudel: Experiences with a Web-Site Management System, Proc. 1998 ACM SIGMOD Conf., pp. 414-425. (*)
Web Searching and Indexing
- C. Faloutsos, Access Methods for Text, ACM Comp. Surveys 17(1), March 1985, pp. 49-74.
- V. Gudivada et al, Information Retrieval on the World Wide Web, IEEE Internet Computing 1(5), September/October 1997, pp. 58-68.
- L. Gravano and Y. Papakonstantiou, Mediating and Metasearching on the Internet, Data Engineering 21(2), June 1988, pp. 28-36.
- S. Prasad and A. Rajaraman, Virtual Database Technology, XML, and the Evolution of the Web, Data Engineering 21(2), June 1988, pp. 48-52. (*)
Heterogeneous Information Systems
- M. Tork Roth and P. Schwarz, Wrap It, Don't Scrap It! A Wrapper Architecture for Legacy Data Sources, Proc. 1997 VLDB Conf., pp. 266-275.
- A. Tomasic et al, Scaling Heterogeneous Databases and the Design of Disco, Proc. IEEE International Conf. on Distributed Computing Systems, May 1996, pp. 449-457.
- H. Garcia-Molina et al, The TSIMMIS Approach to Mediation: Data Models and Languages, Journal of Intelligent Information Systems 8(2), March/April 1997, pp. 117-132. (*)
Querying Semi-Structured Data
- A. Mendelzon, Querying the World Wide Web, International Journal on Digital Libraries, 1(1), April 1997, pp. 54-67.
- S. Abiteboul et al, The Lorel Query Language for Semistructured Data, International Journal on Digital Libraries, 1(1), April 1997, pp. 68-88.
- J. McCugh et al, Lore: A Database Management System for Semistructured Data, SIGMOD Record 26(3), September 1997. (*)
XML Promises and Reality
- R. Khare and A. Rifkin, XML -- A Door to Automated Web Applications, IEEE Internet Computing 1(4), July/August 1997, pp. 78-87. (*)
- J. Bosak, XML, Java, and the Future of the Web, http://metalab.unc.edu/pub/sun-info/standards/xml/why/xmlapps.html, March 1997.
- R. Lander, Introduction to XML, http://pdbeam.uwaterloo.ca/~rlander/XML/intro_xml.html, November 1998.
- J. Robie, What is the Document Object Model?, http://www.w3.org/TR/REC-DOM-Level-1/introduction.html, October 1998.
Heterogeneous Distributed Query Processing
- M. Stonebraker et al, Mariposa: A Wide-Area Distributed Database System, VLDB Journal 5(1), January 1996, pp. 48-63.
- L. Amsaleg et al, Cost-Based Query for Initial Delays, Proc. 1998 ACM SIGMOD Conf., pp. 130-141.
- L. Haas et al, Optimizing Queries Across Diverse Data Sources, Proc. of the 1997 VLDB Conf., pp. 276-285. (*)
Data Dissemination Approaches
- D. Goldberg et al, Using Collaborative Filtering to Weave an Information Tapestry, CACM 35(12), December 1992, pp. 61-70.
- T. Yan and H. Garcia-Molina, The SIFT Information Dissemination System, ACM Trans. on Database Systems, to appear.
- M. Franklin and S. Zdonik, "Data in Your Face": Push Technology in Perspective, Proc. 1998 ACM SIGMOD Conf., pp. 516-519. (*)
Database Issues in E-Commerce
- D. Tygar, Atomicity in Electronic Commerce, Proc. ACM Symposium on Principles of Distributed Computing, May 1996, pp. 8-26.
- M. Kumar and S. Feldman, Internet Auctions, Third USENIX Workshop on Electronic Commerce, August 1998, pp. 49-60.
- Y. Bakos, The Emerging Role of Electronic Marketplaces on the Internet, CACM 41(8), August 1998, pp. 35-42. (*)
Caching Web Data
- A. Chankhunthod et al, A Hierarchical Internet Object Cache, Proc. 1996 USENIX Technical Conf., January 1996.
- S. Venkataraman et al, Memory Management for Scalable Web Data Servers, Proc. 1997 IEEE International Conf. on Data Engineering, pp. 510-519.
- P. Cao and C. Liu, Maintaining Strong Cache Consistency in the World Wide Web, IEEE Trans. on Computers 47(4), April 1998, pp. 445-457. (*)
Wrap-Up
- D. Ritter, The Middleware Muddle, ACM SIGMOD Record 27(4), December 1988, pp. 86-93. (*)