Data locality for queries
A document is usually stored as a single continuous string, encoded as JSON, XML, or a binary variant thereof (such as MongoDB’s BSON). If your application often needs to access the entire document (for example, to render it on a web page), there is a performance advantage to this storage locality. If data is split across multiple tables, like in Figure 2-1, multiple index lookups are required to retrieve it all, which may require more disk seeks and take more time.
The locality advantage only applies if you need large parts of the document at the same time. The database typically needs to load the entire document, even if you access only a small portion of it, which can be wasteful on large documents. On updates to a document, the entire document usually needs to be rewritten—only modifications that don’t change the encoded size of a document can easily be performed in place . For these reasons, it is generally recommended that you keep documents fairly small and avoid writes that increase the size of a document . These performance limitations significantly reduce the set of situations in which document databases are useful.
It’s worth pointing out that the idea of grouping related data together for locality is not limited to the document model. For example, Google’s Spanner database offers the same locality properties in a relational data model, by allowing the schema to declare that a table’s rows should be interleaved (nested) within a parent table . Oracle allows the same, using a feature called multi-table index cluster tables . The column-family concept in the Bigtable data model (used in Cassandra and HBase) has a similar purpose of managing locality .
We will also see more on locality in Chapter 3.