ࡱ> J( ./ 0|DTimes New Roman0$$ 0DArial Narrowan0$$ 0" DArialNarrowan0$$ 0"0DWingdingsowan0$$ 0@DMonotype Sorts0$$ 0  @n?" dd@  @@`` |  9        C x1?0 @8 g4[d[d0 0(ppp@ ? %1>Using Collaborative Filtering to Weave an Information Tapestry[David Goldberg, David Nichols, Brian M. Oki, Douglas Terry Xerox Palo Alto Research Center Problems of current mail systemsThink about any newsgroup you subscribed: hundreds of new postings every day many of them are off the topic many more are not personally interesting to you Finding articles of interest are time-consuming*!Solution: Collaborative FilteringRecording people s reactions to documents they read, called annotations. Based on other people s feedback, a filtering process can be constructed to read only those articles that are interested to you. A step further from content-based filtering -- not only consider the document s contents, but also people s reactions.$A<  Tapestry architecture Indexer Understand formats of various types of documents -- one indexing program corresponds to one type of document. (i.e. The format of NetNews articles is different from the articles in the New York Times) Extract indexed fields from document and store them in the database. Document and Annotation StoresDocuments must be immutable due to the continuous semantics supported by the filterer -- WORM disks can be used. Documents are never deleted -- big disk storage. Attributes are extensible and can be set-valued -- several relational tables have to be provided.$ M  Appraisers Further classify and organize messages based on priorities, selected by which filter query, or any predicate you specified. They are kept in the client side -- running only over the contents of the little box instead of the incoming document stream gains performance. %Interaction with the Tapestry serviceUsing tapestry browser is preferable but not required -- you can continue to use your favorite mail reader. Tapestry browser only keeps document identifiers because of the immutable property of document store. Once a message is deleted, it still exists in the document store. #Mechanisms of retrieving documents  TQL: Tapestry Query Language/Advantages over SQL: Support extensible set of fields in a document. Support sets. Easy to use -- It is specialized. Disadvantages over SQL: Complicate the implementation: TQL has to be converted to SQL before executing, because Tapestry is built on top of a commercial database which only supports SQL.4`0 &Common document fields and their types  AnnotationsLAnnotations are separate complex objects -- they are not treated as additional document fields. The field  msg in an annotation object links it to its document. The field  type in an annotation object defines which complex object it refers to -- each type of annotation has its own structure.$' Example of TQL Filterer: Continuous SemanticsProblems with periodic execution: most of the retrieving messages are overlapped with the previous execution. unpredictable behavior: consider the query in the previous slide: (assume every condition is satisfied once the message arrives)<"dj"*Filterer: Continuous Semantics (continued)Guarantee: every user with the same filter query should see the same result -- time-independent. Solution: Continuous Semantics The results of a filter query is the set of data that would be returned if the query were executed at every instant in time. 0|Filterer: ImplementationAMonotone query: Definition: A query whose result set is non-decreasing over time. Property: Continuous Semantics is guaranteed by periodically executing the monotone query. Implication: Document and annotation stores have to be immutable. Incremental query: A query which returns only the new results in a time interval.JZZZ@ZS$Filterer: Implementation (continued)#Step 1: Query Transformation in TQLExample of Query Transformation  DiscussionsMonotone query transformation mismatch between what the user expects and the actual result set. Immutable property of document and annotation stores means inflexibility. Lots of relational tables means more join operations -- query optimizer is critical for good performance. Security issues are not addressed. Complexity of the design -- TQL is used on top of relational database. ` f3f3` fff3f` ___` ff3>?" dU@(v?uFd@ @` n?" dd@   @@``@n?" dd@  @@``PRU   @ ` `p>> 0 %     ( 4$zg  vb  Nv @   `?v2 @? 7j  Bv2 @?   N4!zgֳgֳ ? z T Click to edit Master title style! !.  H!zgֳgֳ ?V  z RClick to edit Master text styles Second Level Third Level Fourth Level Fifth Level!     SR  @ HZ?G/*?? l$ 000R  @ HZ?G/*?? l$ 000R  @ HZ?G/*?? 7l$ 000  N!zgֳgֳ ?R;  z ?*  c $T"z  z ?*2  @ CENGaH'WJITQ `T4`T4IT4ITR  HZ?G/*?? 4l$ 0002  BNC_TENG`H_INJ_TQ _6N_6NN_T6NN_TXB  @ 0*iDexRB  s *DRB  s *Dxxf  Nvy޽h @? ?/    f3f3 @Recommending a Strategy (Online)>   # 0$(  $ $  `?v2 @? @ Lb $ c $ $ Ht)zv2 @?   pB $ H1?VVNL  $# Z $ s *?Z $ s *? Z  $ s *?@O Z  $ s *?  Z  $ s *? Z  $ s *?@ O Z  $ s *?@OZ $ s *?;Z $ s *?Z $ s *?,;Z $ s *?,;Z $ s *?Z $ s *? $ Hzgֳgֳ ?  z W#Click to edit Master subtitle style$ $ $ Tzgֳgֳ ? N  z ?*  $ Ntzgֳgֳ ? N   z ?*  $ NԿzֳֳֳֳ ? z Q*nHHR $ BZ4zG/*? 4l$ 000   $ zBnC DE Fn n@vj $ Bv2 @?  ` $ Nzgֳgֳ ?  z T Click to edit Master title style! !2 $@ CENGaH'WJITQ `T4`T4IT4ITpf $ Nvy޽h @? ?$ f3f3  $(  r  S Tz$  z r  S z$  z H  0޽h ? ̙33  @ $(    r  S 4z   ~ r  S Tz V@ ~ H  0޽h ? ̙33   P(( (k (l ( C z   ~ l ( C Tz V  ~ H ( 0޽h ? f3f3   _W`&80(   0l 0 C z   z  0  fz1?` AIndexerB 0  f4z1?@P <  0 3 rz1?d HDocument store 0  fz1?  BFilterer B 0  fz1?@ P <  0 3 rB1? d JAnnotation storeB  0  fC1?`  p  <   0 3 rC1?C  *  D Little Box    0  fD1? 0  \Remailer   0  f1? P   0 3 rDE1? N   Appraiser 0 3 rF1? 3  @Tapestry BrowserB 0  fD1? P  0  f1? ` 0 3 rF1? ^   Appraiser 0 3 rG1? `  ; Mail Reader  B 0  fD1? ` B 0  `D1?   0 3 rDH1?@  DocumentsB 0  `D1?  `B 0  `D1?  @B #0  fD1?P  @B $0  `D1?@ @B %0  fD1?P` ` B &0@  `D1? ` B (0  `D1?@@` B )0  `D1? B *0  `D1?p pp B ,0@  `D1? z  -0 3 rH1? Y  Server /0 3 rI1? 8  ClientB 40  `D1?` `B 50  fD1?`PB 60@  fD1?PpPB 70  fD1?pp=B 80  fD1? PH 0 0޽h ? f3f3   p8( ZR 8l 8 C dI    l 8 C I V  H 8 0޽h ? f3f3   4(  4l 4 C DK    l 4 C K Vp  H 4 0޽h ? f3f3   <(  <l < C L    l < C dL V    H < 0޽h ? f3f3   D(  Dl D C M    l D C a~ V   H D 0޽h ? f3f3    m e H ( ZR Hl H C b~    B H  fc~1?P`  <  H 3 rc~1?t  HDocument store H 3 rd~1?Ppj J HDocument arrived  H # lTe~1? P P  HFilter Queries  H # lf~1? P `  D Appraisers    H # lf~1? `  Jad hoc queries B  H # lD1?@ B  H # lD1?P  B H # lD1?`   H # l4g~1?@0`  ABrowserB H  `D1?` @B H@  `D1?`  @H H 0޽h ? f3f3   L( ZR Ll L C g~   ~ l L C g~ h ~ H L 0޽h ? f3f3    P%(  Pl P C ti~   ~  P S ~i~ 31fdd?0`  > B P  `D1?`B P  `D1?  P 3 rj~1?`  ADocument Fields  P 3 rTk~1?`  E Field Types C  P 3 rk~1?  Eto date sender cc subject newsgroups in-reply-to words ts (timestamp)FE7 @  P 3 rl~1?p  dset of strings date string set of strings string set of strings set of documents set of strings timeedH P 0޽h ? f3f3   T( ZR Tl T C 4m~   ~ l T C l~ V ~ H T 0޽h ? f3f3:   Xb(  Xl X C ~   ~ N X  fd~1? F  Select all messages sent to  Joe and  Mike , and whose subject field or the body contained the word  CS294-7 , and to which none of them has sent a reply, and which has been endorsed by somebody.Z` X 3 r~1?O P ,$D  0 m.to = { Joe ,  Mike } AND (m.subject LIKE  %CS294-7% OR m.words={ CS294-7 }) AND NOT EXISTS (mreply: (mreply.sender= Joe OR mreply.sender= Mike ) AND mreply.in_reply_to = {m}) AND EXISTS (a: a.type= endorsement AND a.msg=m)baFH X 0޽h ? f3f3   !\( ~L~  \l \ C $~   ~ l \ C ~ :0  ~ B \ # lD1?B \  fD1?p0B \  fD1?p 0B \  fD1?pP P 0B \ # lD1?B  \  fD1?PB  \  fD1?p p PB  \  `D1?   \ 3 rD~1?`   Imessage arrivesB  \  `D1?  \ 3 r~1?`   E Joe replies   \ 3 rd~1?  :No  \ 3 r~1? 0   :No  \ 3 r$~1?  :No  \ 3 r~1?  :No  \ 3 r~1? ) AYes"  \ 3 r~1?p 0 D User A sees:  \ 3 rd~1?`   D User B sees: B \  fD1?`B \@  fD1?` \ 3 r$~1?r D Inconsistent H \ 0޽h ? f3f3   `(  =X `l ` C D~   ~ l ` C ~ V ~ H ` 0޽h ? f3f3    d( Z@ dl d C ԟ   ~ l d C 4 VpP ~ H d 0޽h ? f3f3q      0h ( Z@ hl h C    ~ B h # lD1?0P h 3 rt1?j)  Filter Query h 3 r41?  "Monotone Query  h 3 r1?p %Incremental QueryB  h # lD1? @  h 3 rT1?: :   h Tgֳgֳ ? 0P ~   h Htgֳgֳ ?  -Step 2: Query TranslationB h # lD1?X 0X  h 3 rԥ1? &  TQL h 3 r41? pO  SQL h Hgֳgֳ ?   .Step 3: Query Optimization h 3 rT1? @  SQLB h # lD1?P P0 P  h 3 r1?`  Z  IQuery optimizer h  ft1? ` 0 ]-stored procedure (maintained in the database)..H h 0޽h ? f3f3      @ l& ( P w ll l C Ԩ .  ~  l 3 r1?  4 Consider the query in slide #13:B l # lD1?  l 3 rT1?A  D Filter Query  l 3 r1?AP  >Monotone Query&  l 3 rt1?  * hm.to = { Joe ,  Mike } AND (m.subject LIKE  %CS294-7% OR m.words={ CS294-7 }) AND m.ts + [2 weeks] <= now() AND NOT EXISTS (mreply: (mreply.sender= Joe OR mreply.sender= Mike ) AND mreply.in_reply_to = {m} AND mreply.ts <= m.ts + [2 weeks]) AND EXISTS (a: a.type= endorsement AND a.msg=m)N5Uk9W% ?  l  fP1?pp   fNote: the meaning is slightly different from the original one. It returns messages that are not replied by  Joe or  Mike within 2 weeks.2H l 0޽h ? f3f3   ; 3 P p (  p p NPgֳgֳ ?06 3Example of Query Transformation p 3 r$Q1?  =)Consider the query in the previous slide:B p # lD1? p 3 rQ1? ,g >Monotone Query p 3 rR1? g W'Incremental Query(from last_t to now())(( p 3 rS1?   <m.to = { Joe ,  Mike } AND (m.subject LIKE  %CS294-7% OR m.words={ CS294-7 }) AND m.ts + [2 weeks] <= now() AND (last_t < m.ts + [2 weeks] AND m.ts + [2 weeks] <= now()) AND NOT EXISTS (mreply: (mreply.sender= Joe OR mreply.sender= Mike ) AND mreply.in_reply_to = {m} AND mreply.ts <= m.ts + [2 weeks]) AND EXISTS (a: a.type= endorsement AND a.msg=m)`vU@k:W%' ?B  p  `D1?    p 3 rS1?0 p   VThis line can be eliminated.H p 0޽h ? f3f3   $`t( %` tl t C $T    l t C T   B t # lD1?P pH t 0޽h ? f3f3rhPh?TAR"/@C 8Ea`cegrt|~|p:Oh+'0I hp0 DP p | ?Using Collaborative Filtering to Weave an Information TapestryXiahong_C:\Program Files\Microsoft Office\Templates\Presentations\Recommending a Strategy (Online).pot ronathanm F86aMicrosoft PowerPointoso@p) @&u@dIG\GoM  # F& &&#TNPPp0D zA & TNPP &&TNPP    &&--&&G/- $0ZJ1- $ZN4- $>T7- $>hZ<- $hbA - $LjF#- $LvrL&- $vzQ(- $Z*V+- $*Z[-- $^/- $h8a0- $8hc1- $d2- $vFe2- $Fv&&&- & $&&-&& &&-&&&&G/- $0ZJ1- $ZN4- $>T7- $>hZ<- $hbA - $LjF#- $LvrL&- $vzQ(- $Z*V+- $*Z[-- $^/- $h8a0- $8hc1- $d2- $vFe2- $Fv&- --&&&*r&--G/- $**rrJ1- $**rrN4- $* * rrT7- $ **r rZ<- $*+*+rrbA - $+*;*;r+rkG#- $;*J*Jr;rsL&- $J*Y*YrJr{R(- $Y*h*hrYrW+- $h*w*wrhr[-- $w**rwr^/- $**rra0- $**rrc1- $**rrd2- $**rre2- $**rr $**rrd2- $**rrc1- $**rra0- $**rr^/- $**rr[-- $*,*,rrW+- $,*;*;r,r{R(- $;*J*Jr;rsL&- $J*Y*YrJrkG#- $Y*h*hrYrbA - $h*x*xrhrZ<- $x**rxrT7- $**rrN4- $**rrJ1- $**rrG/- $**rr---&&&f3-- $--&&+&--H/- $ ++J1- $ >>+ +N4- $>\\+>+T8- $\zz+\+[<- $z+z+bA - $++kG#- $++rL&- $++zQ(- $++V+- $--++[-- $-KK+-+^/- $Kii+K+a0- $i+i+c1- $++d2- $++e2- $++ $++d2- $++c1- $;;++a0- $;YY+;+^/- $Yww+Y+[-- $w+w+V+- $++{R(- $++sL&- $++kG#- $  ++bA - $ **+ +[<- $*HH+*+T8- $Hff+H+N4- $f+f+J1- $++H/- $++---&& !w*wgw - &8R;f3--%9S9--&&&&--&&- $&&&- & $&&-&& &&-B( UUUU-&&&&- $&- --&&&@&--&&- $@@&&&- & $@@&&-&& &&-B( UUUU-&&@&&- $@@&- --&&&`&--&&- $``&&&- & $``&&-&& &&-B( UUUU-&&`&&- $``&- --&&&&--&&- $&&&- & $&&-&& &&-B( UUUU-&&&&- $&- --&&&@&--&&- $@@&&&- & $@@&&-&& &&-B( UUUU-&&@&&- $@@&- --&&&8&--&&- $88&&&- & $88&&-&& &&-B( UUUU-&&9&&- $88&- --&&&`&--&&- $``&&&- & $``&&-&& &&-B( UUUU-&&`&&- $``&- --&&& &--&&- $  &&&- & $  &&-&& &&-B( UUUU-&& &&- $  &- --&&&*&--&&- $**&&&- & $**&&-&& &&-B( UUUU-&&+&&- $**&- --&&&2&--&&- $22&&&- & $22&&-&& &&-B( UUUU-&&2&&- $22&- --&&& &--&&- $  &&&- & $  &&-&& &&-B( UUUU-&& &&- $  &- --&&&R&--&&- $RR&&&- & $RR&&-&& &&-B( UUUU-&&R&&- $RR&- --&&&*&--&&- $**&&&- & $**&&-&& &&-B( UUUU-&&+&&- $**&- --&&&&G'M& &GOq& &2Of3-- $MO33--& &)(rf3-- %&p&**--&&;&--H/- $ ;;J1- $ >>; ;N4- $>\\;>;T8- $\zz;\;[<- $z;z;bA - $;;kG#- $;;rL&- $;;zQ(- $;;V+- $--;;[-- $-KK;-;^/- $Kii;K;a0- $i;i;c1- $;;d2- $;;e2- $;; $;;d2- $;;c1- $;;;;a0- $;YY;;;^/- $Yww;Y;[-- $w;w;V+- $;;{R(- $;;sL&- $;;kG#- $  ;;bA - $ **; ;[<- $*HH;*;T8- $Hff;H;N4- $f;f;J1- $;;H/- $;;---&&&gf3--F%!y X8!1CXowo"jDhhjow-Hay;Z|--&--1,-- "Arial Narroww*wgwv  - .E2 6)Using Collaborative Filtering to Weave an          * . .%2 !6Information Tapestry  $   .--%MMB-- "Arial !w*wgw - .42 lLDavid Goldberg, David Nichols,     . .02 LBrian M. Oki, Douglas Terryl      . .62 LXerox Palo Alto Research Center         .--"Systemwf  -&TNPP &՜.+,D՜.+,   ( 0 3On-screen Show%University of California at Berkeleyan\j Times New Roman Arial NarrowArial WingdingsMonotype Sorts!Recommending a Strategy (Online)?Using Collaborative Filtering to Weave an Information Tapestry!Problems of current mail systems"Solution: Collaborative FilteringTapestry architectureIndexerDocument and Annotation Stores Appraisers&Interaction with the Tapestry service$Mechanisms of retrieving documents TQL: Tapestry Query Language'Common document fields and their types AnnotationsExample of TQLFilterer: Continuous Semantics+Filterer: Continuous Semantics (continued)Filterer: Implementation%Filterer: Implementation (continued) Example of Query TransformationNo Slide Title Discussions  Fonts UsedDesign Template Slide Titlesem  = I Q Yaiqy _PID_GUID TemplateType GraphicType Compression ScreenSize ScreenUsage MailAddress HomePage Other DownloadOriginal DownloadIEButton UseBrowserColor BackColor TextColor LinkColor VisitedColorTransparentButton ButtonType ShowNotes NavBtnPos OutputDirAN{55712F81-E179-11D2-8687-00A0C933C4D3}dcs294@db.cs.berkeley.eduon!http://db.cs.berkeley.edu/topicsnuettp  f3 H:\ronathan\cs294le _ronathan  !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`acdefghijklmnopqrstuvwxyz{|}~Root EntrydO)roanCurrent UserGALLIAZ ^SummaryInformation(b0IPowerPoint Document (DocumentSummaryInformation8 E  fH:\\TRIPLEROootq+H`