Desktop version

Home arrow Computer Science arrow Designing Data-Intensive Applications. The Big Ideas Behind Reliable, Scalable and Maintainable Systems

Formats for Encoding Data

Programs usually work with data in (at least) two different representations:

  • 1. In memory, data is kept in objects, structs, lists, arrays, hash tables, trees, and so on. These data structures are optimized for efficient access and manipulation by the CPU (typically using pointers).
  • 2. When you want to write data to a file or send it over the network, you have to encode it as some kind of self-contained sequence of bytes (for example, a JSON document). Since a pointer wouldn’t make sense to any other process, this sequence-of-bytes representation looks quite different from the data structures that are normally used in memory.[1] [2]

Thus, we need some kind of translation between the two representations. The translation from the in-memory representation to a byte sequence is called encoding (also known as serialization or marshalling), and the reverse is called decoding (parsing, deserialization, unmarshalling).“

  • [1] With the exception of some special cases, such as certain memory-mapped files or when operating directlyon compressed data (as described in “Column Compression” on page 97).
  • [2] Note that encoding has nothing to do with encryption. We don’t discuss encryption in this book.
< Prev   CONTENTS   Source   Next >

Related topics