Patterns in data storage

My focus in this chapter is on how technological advances are causing ever more dramatic problems in the survival of information and data storage. Therefore it is interesting to contrast how information was stored in the pre-computer era and at present. Analyses of data storage in 1986 show it was totally different in character from the present day.

First, virtually the only method was with analogue systems. Of the total data stored each year, it was estimated that it was made up of roughly 25 per cent for music on vinyl discs and cassettes; ~13 per cent for photographs on print or negatives; and ~60 per cent on videocassette format. Since then, the quantity stored has exponentially grown by at least a factor of ten every decade (i.e. by the present day, that is a thousand-fold increase—and still rising). Most obvious is that the pattern has dramatically shifted so that there is probably no more than, say, 5 per cent still in the analogue format: digital tapes come in at ~10 per cent; DVD, etc. at ~20 per cent; and hard disk stores ~40 per cent. In other words, we are totally relying on the computer-accessible data stores.

Over this period, the computer storage formats went through a variety of disc and tape formats with ever-increasing size from, say, the 250 kilobyte floppy discs up to the current terabyte storage devices. For those who are embarrassed to ask, the prefixes kilo-, mega-, and tera- just mean changes from a thousand, to a million, to a million million. Very roughly, the range of disc storage from the 1980s to the present is equivalent to a scaling from an area the size of the top of a golf tee to an area the size of five football pitches. (I find this easier to imagine than just the numbers.) The progress is impressive, but the only easy way forward has been to make all the former methods obsolete. Typically, each style of storage medium has vanished within a decade, as the size and power increased.

The overall effect is that few of us have equipment that can read any of the information that was stored on the early systems. This is not a passing phase, but a problem that will be ongoing as it is the only sustainable way to continue the expansion of data storage.

Many companies advertise that because of this difficulty, the solution is to keep all our data on files stored in some vast remote site, euphemistically called ‘the cloud’. The benefits are clear that if the cloud management could continuously update the formats of our stored data, then it might avoid obsolescent equipment and inaccessibility. However, this would imply the cloud company had access to our files, with several clear disadvantages: (a) the files may be confidential; (b) the files may be encrypted; (c) the files may be corrupted in the process; (d) experience in running the cloud stores is limited, as the concept has not been in existence for long. Therefore, we have no way of guessing which cloud companies will survive, or, if they are taken over, whether the existing arrangements and contracts will continue. Another potential difficulty is that if we are paying for the storage (and eventually this will be the case, even if the present loss leader is to make it free), then when we die, or our company goes bankrupt, our payments will stop and the entire store may be deleted, or at least be inaccessible without payment.

There have already been court cases where people wished to share data in their cloud (e.g. in one case, a vast music collection), but the company not only refused this but implied it would not be available after the death of the owner. So it is not family silver being passed from one generation to next, nor the bundle of love letters tied in a ribbon that can move to the next generation (and historians). Instead, death may result in the loss of all items in the cloud. Thus the cloud is an excellent step in terms of storage space, but a disaster in every other respect. If your will and bonds are stored in the same way, then they could equally vanish into the clouds.

The other very significant fact is that for cloud storage, we must be able to access it, so in times of power loss, or Internet traffic jams, this will not be feasible. The final problem is that, as will be discussed in the chapter on computer crime and cyberwar, a determined government, terrorist, or misanthrope could destroy a great deal of data with each attack. Judging from all other trends in this area, then the destructive power of such attacks is also increasing exponentially.

The same level of caution is needed with respect to other suggestions for a paperless society; for example, the suggestions that UK medical records could be totally in this format.

I am perhaps being overcritical of cloud storage, as for many companies it has the benefit that it can be accessed by many employees at different locations. Nevertheless, I personally would ensure that somewhere I had a total back-up copy, which is secure and accessible only to a few, and isolated from email and Internet access. For everyone else, the moral seems to be that one should only use it if you could survive the loss of any information stored there.

