Desktop version

Home arrow Computer Science arrow Designing Data-Intensive Applications. The Big Ideas Behind Reliable, Scalable and Maintainable Systems

Table-table join (materialized view maintenance)

Consider the Twitter timeline example that we discussed in “Describing Load” on page 11. We said that when a user wants to view their home timeline, it is too expensive to iterate over all the people the user is following, find their recent tweets, and merge them.

Instead, we want a timeline cache: a kind of per-user “inbox” to which tweets are written as they are sent, so that reading the timeline is a single lookup. Materializing and maintaining this cache requires the following event processing: [1]

  • • When a user deletes a tweet, it is removed from all users’ timelines.
  • • When user u1 starts following user u2, recent tweets by u2 are added to u1’s timeline.
  • • When user u1 unfollows user u2, tweets by u2 are removed from u1’s timeline.

To implement this cache maintenance in a stream processor, you need streams of events for tweets (sending and deleting) and for follow relationships (following and unfollowing). The stream process needs to maintain a database containing the set of followers for each user so that it knows which timelines need to be updated when a new tweet arrives [86].

Another way of looking at this stream process is that it maintains a materialized view for a query that joins two tables (tweets and follows), something like the following:

SELECT follows.follower_id AS timeline_id,

array_agg(tweets.[2] ORDER BY tweets.timestamp DESC)

FROM tweets

JOIN follows ON follows.followee_id = tweets.sender_id GROUP BY follows.follower_id

The join of the streams corresponds directly to the join of the tables in that query. The timelines are effectively a cache of the result of this query, updated every time the underlying tables change.iii

  • [1] When user u sends a new tweet, it is added to the timeline of every user who isfollowing u.
  • [2] 2 If you regard a stream as the derivative of a table, as in Figure 11-6, and regard a join as a product of twotables uv, something interesting happens: the stream of changes to the materialized join follows the productrule (uv)' = u'v + uv'. In words: any change of tweets is joined with the current followers, and any change offollowers is joined with the current tweets [49, 50].
< Prev   CONTENTS   Source   Next >

Related topics