Skip to main content

Where Have All Our Records Gone?

By:
Summer 1985 | Volume 1 |  Issue 1

THE DISAPPEARING RECORD: A few years after the last American troops left Vietnam in 1973, the Pentagon turned over a big batch of microfilm, more than one hundred rolls, to the National Archives. The film carried every enemy document captured by U.S. forces during the war—a spectacular trove of information for some future historian, and most of it not existing in any other form. But no historian now would dare approach that file, because the automated, coded index to the three million random images cannot be read by any instrument known to still exist today.

Strangely, the machine that could have done the job was widely available a mere twenty years ago. But following a classic pattern, it enjoyed a brief popularity and then was suddenly obsolete. The Army’s equipment was apparently left in Saigon and probably destroyed. So government archivists are now trying to coax another machine (related to the original) into translating the index code.

Franklin D. Roosevelt once praised the keeping of records by “modern processes like the microfilm,” which he called the “only form of insurance that will stand the test of time.” It wasn’t until after Vietnam, the first war to be fought in the age of both the copying machine and the computer, that a difficult truth began to emerge: The technologies that generate the rich record of the past can also obscure it.

A two-hundred-page report issued last year by the Committee on the Records of Government makes this point resoundingly clear; the committee found that “the United States is in danger of losing its memory.” Technological obsolescence is largely to blame, along with the sheer uncontrolled volume of paper records, which has made some of them impossible to catalog and ever retrieve.

After the typewriter became truly practical in the 187Os, the annual accumulation of paper records grew each year, and when photocopying and xerography took hold, the stash swelled to huge proportions. In 1968 alone, U.S. government records increased by 4,500,000 cubic feet. Then computer tapes began to hold a fair portion of the total government record. By 1983 the combined agencies had about twelve million tapes, and three-quarters of all federal information was being electronically stored.

Some of these records —if they haven’t simply deteriorated beyond repair—may require equipment long gone. The dangers inherent in electronic record-keeping are superbly illustrated by the saga of the U.S. Census Bureau, a pioneer in the use of electronic tabulating devices. The bureau used the very first one for the 1890 census, a year after the invention was patented by an American, Herman Hollerith. Census information was recorded on cards as a punched code, which was then “read” by Hollerith’s machine as it kept a tally of individual responses.

The success of the method carried the punch-card system, with improvements, well into this century. The cards themselves, though, tended to pile up. So in 1936 the National Archives dumped its collection of eight million punch cards from the 1930 census. And after the next two censuses, cards were again routinely destroyed.

It was simply that no one had thought of a coded card as an actual record. It meant little to the naked eye, and the most important statistical information was securely down on paper. But the only way to make new tabulations is to run the punch cards through again—and now that’s no longer possible.

In a poignant reversal the Census Bureau eventually found itself with plenty of compact records but no way of getting at them. For the 1960 census, punch cards were abandoned in favor of magnetic tape—five thousand reels of it were processed on the agency’s nine-year-old UNIVAC 1 computer. But by 1975 not one bit of information could be retrieved from any of the tapes.

Of two UNIVACs left in the world that could read the data, one was in Japan, and the other—the one used for the census—had already been dismantled at the Smithsonian Institution. The tape had been wound on metal reels that no later UNIVAC computers would accept. And even if you could get the reels into a more current machine, it wouldn’t be able to read the antiquated code that the census data had been entered in.

Though the original schedules were on microfilm, the idea of receding some 180 million individual responses was unthinkable. Instead the National Archives took a device that was part of the original coding process and, around 1970, whacked it into shape as a reading instrument. From 642 tapes that were deemed to be of permanent value, 99 percent of the data was recovered and transferred to standard magnetic tape in an updated code.

By 1975 no information could be retrieved from any of the 1960 census tapes.

But it was a Herculean task; the jury-rigged mechanism couldn’t get through a single reel without breaking down. And the cost was tremendous, but funds were provided because the census was so important.

In the meantime, more chaos is generated every day by the U.S. government, whose agencies have at least nineteen thousand large computers and more than two hundred thousand personal computers all over the world. It’s anybody’s guess as to how many of these use incompatible or outdated equipment and codes, but there are sure to be problems when people start trying to look back at the record.

The picture isn’t entirely grim though. Charles Dollar, the former head of the machine-readable records division at the Library of Congress, says that a standardized coding system is just around the corner, and once it takes effect in federal offices, it will spread elsewhere. Meanwhile all we can do is keep transferring the records to newer media when possible.

At this point the problems start to come less from technology and more from its users. David Allison, a historian for the Department of Energy who used to work at the Navy Research Lab in Bethesda, remembers that the crucial index to a decade of electronically recorded Navy correspondence was lost when the file was moved from a personal computer to a database system—no one bothered to make a hard copy and code it into the new system.

That sort of thing happens all the time, says Allison, because “technical people who work with automated equipment have zero interest in the past. None of them say: Where are the records going to be five years from now? The tough issues are not technical. In fact, if the technical problems were harder, there would probably be more interest in them.”

As standards for electronic records take effect, something resembling serenity may return. True, there will be some gaps in the historical record, but Thomas Brown, an archivist at the National Archives, remains sanguine. “The United States isn’t really losing its memory,” he says. “It’s only suffering from temporary amnesia.”

COPYRIGHT MADNESS: John Ott’s sneeze achieved fame as the subject of the very first motion picture protected by copyright. After recording his employee’s performance in the spring of 1893, Thomas Edison claimed his legal right to the immortal image by making a paper positive print of each individual exposure. He had to, because until 1912—when the law finally recognized a motion picture as more than a collection of stills- the new technology had to be shunted in under legislation for the old.

If it took a while for us to see where the boundaries of motion-picture art lie, it is taking even longer for us to deal with such definitions in the electronics age. Thus a recent report from the Office of Technology Assessment reveals unrest in the land of copyright litigation, and it’s all because the law “continues to use concepts fashioned over the previous 200 years.”

Vaguely described by the OTA as a “bundle of rights attached to the intangible form of an intellectual, scientific, or artistic creation,” copyright law has absorbed a great deal of technological change since 1790, when Congress passed the first legislation.

Photographs and negatives were added to the code in 1865, motion pictures in 1912, and sound recordings in 1972. Finally, in 1976, the subject matter of copyright became “any tangible medium of expression, now known or later developed.” But even that wasn’t enough. Each amendment had embodied the same old doctrine: that the form given to the expression of an idea can be copyrighted, but the idea expressed cannot. (An idea can only be protected by a patent, if at all.) The trouble is, with electronic information, that distinction is blurred.

Take a computer program, for example. As a text, it’s a form of expression and therefore should be copyrightable. But a program also controls an entire process, and the only protection for processes and systems is a patent. And there’s the rub: If the program expresses some algorithm, mathematical truth, or natural law—as do most programs for personal computers—it’s against the law to patent it.

To surmount this and other dilemmas, the OTA offers a series of possible solutions, ranging from making occasional amendments to existing law to actually abandoning the notion of property. But primarily the report calls for serious soul-searching on the part of lawmakers, who are invited to reexamine the conceptual underpinnings of copyright law. In the process they may have to come up with a few new definitions of function and form, and decide—for the moment anyway—exactly where those ineffables reside.

There is at least one sense in which the issue of the author’s rights may become of lesser, rather than greater, concern. “When the element of human labor involved in the processing of information is replaced by automation,” remarks the the OTA report, “the incentive of copyright protection may become entirely disconnected from the authorship it seeks to inspire. Information … generated by a computer is ‘authored,’ if at all, by a program that is indifferent to legal incentives.”

We hope you enjoyed this essay.

Please support America's only magazine of the history of engineering and innovation, and the volunteers that sustain it with a donation to Invention & Technology.

Donate

Stay informed - subscribe to our newsletter.
The subscriber's email address.