It isn’t just accessible physical copies of books that we lost during that awful period; we also lost content. “A major concern about filming7 is that many filmed titles have missing pages, even though the film was inspected,” Gay Walker wrote—the Ace comb effect applies to books, too. (That is, when libraries replace several differently damaged copies of a book with microfilm of the same copy of a book, and the microfilm turns out to lack something, we’re less well off, informationally as well as artifactually, than we were before the program began.) In the late eighties, the University of California at Berkeley sent test shipments of thirty to fifty books each to five top-notch microfilm labs, telling them that they wanted “the highest quality film8 of the books sent to them, totally reproducing the text of the volumes.” Even in the test batch, one of the five filmers was discovered to have missed pages. Other problems continue to crop up: a 1993 audit of microfilm from Ohio State, Yale, and Harvard found that one third of the film collections “did not resolve to9 the established ANSI [American National Standards Institute] resolution standards”; the auditors hypothesized that “some camera or processing settings were incorrect.”
Helmut Bansa, editor of Restaurator, told me that he has heard that some U.S. libraries “want to have the originals back.” He wouldn’t give details, though. “I can only report that some American libraries that have done this microfilming and throwing away now regret to have done that.” Which ones in particular? “Even if I would know I wouldn’t tell you,” Bansa said.
CHAPTER 34
* * *
Turn the Pages Once
“This is all about Pat Battin’s vision of a digital library,” Randy Silverman, Utah’s conservator, said to me: “One digital library fits all.” Microfilm is really only a passing spasm, a “cost-effective buffer technology”1 (as one of the newsletters of the Commission on Preservation and Access has it) that will carry us closer to the far digital shore. Silverman sums up the Commission’s thinking as follows: “How to fund technology in a time of barely increasing acquisition funds? And even if all the scholars have monitors, it’s like having a color TV in 1964. What are you going to watch? Somehow we’ve got to get some goods online. Preservation was a natural cause to help justify the conversion to an international electronic library. Battin played it for all it was worth.” If you unwrapped three million word-mummies—if you mined them from the stacks, shredded them, and cooked their brittle bookstock with the help of steady disaster-relief money—you could pump the borderless bitstream full of rich new content.
In 1994, shortly before Battin retired from the Commission, her newsletter published a special “Working Paper on the Future.” Its authorship is credited to the Commission’s board and staff, but it reads like one of her own heartfelt manifestos. “The next step for our nation’s libraries and archives is an affordable and orderly transition into the digital library of the future,” the working paper contends. All “information repositories,” large and small, private and public, “must” make this transition. (Why must they all?) We’re going to need continuing propaganda for it to be successful, too: “Changing a well-entrenched paradigm requires frequent and public articulation of the new mind set required in many arenas.” And, yes, there will be “high initial costs” involved in making the transition; these may threaten to “paralyze initiative,” unless we make a “thoughtful” comparison to what is described as “the rapidly escalating costs of traditional library storage and services.”
Doesn’t that sound a lot like Michael Lesk—once of Bellcore, now at the National Science Foundation handing out bags of seed money for digital library projects, who believes that libraries would save money if they got rid of the vast majority of their nineteenth-century duplicates in favor of middling-resolution networked facsimiles, and who says he routinely tells libraries that they might not want to repair their buildings, since they could digidump most of what their stacks held instead?
One reason Battin sounds like Lesk is that she worked closely with him for several years; she invited him to serve, along with a number of other resolute anti-artifactualists, on what has proved to be her most consequential committee: the Technology Assessment Advisory Committee.2 Lesk wrote the TAAC committee’s first report, “Image Formats for Preservation and Access”: “Because microfilm to digital image conversion is going to be relatively straightforward,” Lesk mispredicted, “and the primary cost of either microfilming or digital scanning is in selecting the book, handling it, and turning the pages, librarians should use either method as they can manage, expecting to convert to digital form over the next decade.” He and another non-librarian, Stuart Lynn—at the time Cornell’s vice president for information technologies (who retired in 1999 from the chief information officership of the University of California, where he kept an eye on digital-library projects partially funded by Michael Lesk’s National Science Foundation)—took the position in the advisory committee’s discussions that the costs of digital conversion and storage had dropped to the point (Verner Clapp’s long-dreamed-of point) that it was almost as cheap to scan-and-discard as to build.
Michael Lesk is uncharismatic and plodding; Stuart Lynn, however, is an Oxford-educated mathematician with a measure of brusque charm. He and his onetime colleague Anne Kenney (currently Cornell’s assistant director of preservation) became, with the financial support of Battin’s Commission, the Mellon Foundation (always ready to help), and Xerox Corporation, the progenitors of some of the most successful digital-library projects of the nineties. Stuart Lynn believed in an economic model in which digital preservation would be, as he told me, “self-funding.” If you were able to “funge immortality dollars into operating dollars”—that is, if you assumed a certain (fairly high) per-item cost for physical book storage, and if you ejected the original books once you digitized them and relied on virtual storage, and if you sold facsimiles-on-demand produced on a Xerox DocuTech high-speed printer (Lynn was serving on an advisory panel at Xerox at the time), you would, so his hopeful model suggested, come out more or less even—and you’d have all the emoluments of networked access.
But digital storage, with its eternally morphing and data-orphaning formats, was not then and is not now an accepted archival-storage medium. A true archive must be able to tolerate years of relative inattention; scanned copies of little-used books, however, demand constant refreshment, software-revision-upgrading, and new machinery, the long-term costs of which are unknowable but high. The relatively simple substitution3 of electronic databases for paper card catalogs, and the yearly maintenance of these databases, has very nearly blown the head gaskets of many libraries. They have smiled bravely through their pain, while hewing madly away at staffing and book-buying budgets behind the scenes; and there is still greater pain to come. Since an average book, whose description in an online catalog takes up less than a page’s worth of text, is about two hundred pages long, a fully digitized library collection requires a live data-swamp roughly two hundred times the size of its online catalog. And that’s just for an old-fashioned full-text ASCII digital library—not one that captures the appearance of the original typeset pages. If you want to see those old pages as scanned images, the storage and transmission requirements are going to be, say, twenty-five times higher than that of plain ASCII text—Lesk says it’s a hundred times higher, but let’s assume advances in compression and the economies of shared effort—which means that the overhead cost of a digital library that delivers the look (if not the feel) of former pages at medium resolution is going to run about five thousand times the overhead of the digital catalog. If your library spends three hundred thousand dollars per year to maintain its online catalog, it will have to come up with $1.5 billion a year to maintain copies of those books on its servers in the form of remotely accessible scanned files. If you want color scans, as people increasingly do, because they feel more attuned to the surrogate when they can see the particular creamy hue of the paper or the brown tint of the ink, it’ll cost you
a few billion more than that. These figures are very loose and undoubtedly wrong—but the truth is that nobody has ever underestimated the cost of any computer project, and the costs will be yodelingly high in any case. “Our biggest misjudgment was4 underestimating the cost of automation,” William Welsh told an interviewer in 1984. “Way back when a consultant predicted the cost of an automated systems approach, we thought it was beyond our means. Later, we went ahead, not realizing that even the first cost predictions were greatly underestimated. The costs of software and maintenance just explode the totals.”
Things that cost a lot, year after year, are subject, during lean decades, to deferred maintenance or outright abandonment. If you put some books and papers in a locked storage closet and come back fifteen years later, the documents will be readable without the typesetting systems and printing presses and binding machines that produced them; if you lock up computer media for the same interval (some once-standard eight-inch floppy disks from the mid-eighties, say), the documents they hold will be extremely difficult to reconstitute. We will certainly get more adept at long-term data storage, but even so, a collection of live book-facsimiles on a computer network is like a family of elephants at a zoo: if the zoo runs out of money for hay and bananas, for vets and dung-trucks, the elephants will sicken and die.
This is an alternative route to a point that Walt Crawford and Michael Gorman make very well in their snappy 1995 book Future Libraries: Dreams, Madness, and Reality. It would take, Crawford and Gorman estimate, about 168 gigabytes of memory, after compression, to store one year’s worth of page-images of The New Yorker, scanned at moderate resolution, in color; thus, if you wanted to make two decades of old New Yorkers accessible in an electronic archive, you would consume more memory than OCLC uses to hold its entire ASCII bibliographic database. “No amount of handwaving, mumbo-jumbo, or blithe assumptions that the future will answer all problems can disguise the plain fact that society cannot afford anything even approaching universal conversion,” Crawford and Gorman write. “We have not the money or time to do the conversion and cannot provide the storage.”
E-futurists of a certain sort—those who talk dismissively of books as tree-corpses—sometimes respond to observations about digital expense and impermanency by shrugging and saying that if people want to keep reading some electronic copy whose paper source was trashed, they’ll find the money to keep it alive on whatever software and hardware wins out in the market. This is the use-it-or-lose-it argument, and it is a deadly way to run a culture. Over a few centuries, library books (and newspapers and journals) that were ignored can become suddenly interesting, and heavily read books, newspapers, and journals can drop way down in the charts; one of the important functions, and pleasures, of writing history is that of cultural tillage, or soil renewal: you trowel around in unfashionable holding places for things that have lain untouched for decades to see what particularities they may yield to a new eye. We mustn’t model the digital library on the day-to-day operation of a single human brain, which quite properly uses-or-loses, keeps uppermost in mind what it needs most often, and does not refresh, and eventually forgets, what it very infrequently considers—after all, the principal reason groups of rememberers invented writing and printing was to record accurately what they sensed was otherwise likely to be forgotten.
Mindful of the unprovenness of long-term digital storage, yet eager to spend large amounts of money right away, Lesk, Lynn, Battin, and the Technology Assessment Advisory Committee adopted Warren Haas’s position: microfilm strenuously in the short term, digitize from the microfilm (rather than from originals) in the fullness of time. Turn the pages once was the TAAC’s motto. Microfilm has, Stuart Lynn noted in 1992, higher resolution and superior archival quality, and we can convert later to digital images at “only a small increment5 of the original cost” of the microfilming. He sums up: “The key point is, either way, we can have our cake and eat it, too.”
As ill luck would have it, the cake went stale quickly: people just don’t want to scan from microfilm if they can avoid it. It isn’t cheap, for one thing: Stuart Lynn’s “small incremental cost” is somewhere around $40 per roll—that is, to digitize one white box of preexisting microfilm, without any secondary OCR processing, you are going to spend half as much again to convert from the film to the digital file as it cost you to produce the film in the first place. If you must manually adjust for variations in the contrast of the microfilm or in the size of the images, the cost climbs dramatically from there. And resolution is, as always, an obstacle: if you want to convert a newspaper page that was shrunk on film to a sixteenth of its original size, your scanner, lasering gamely away on each film-frame, is going to have to resolve to 9,600 dots per inch in order to achieve an “output resolution” of six hundred dots per inch. This is at or beyond the outer limits of microfilm scanners now.
And six hundred dots per inch doesn’t do justice to the tiny printing used on the editorial pages of nineteenth-century newspapers anyway. In an experiment called Project Open Book, Paul Conway demonstrated that it was possible to scan and reanimate digitally two thousand shrunken microfilm copies of monographs from Yale’s diminished history collection (1,000 volumes of Civil War history, 200 volumes of Native American history, 400 volumes on the history of Spain before the Civil War, and 400 volumes having to do with the histories of communism, socialism, and fascism)—but Conway was working from post-1983, preservation-quality microfilm made at the relatively low reduction-ratios employed for books. “We’ve pretty much figured out how to do books and serials and things up to about the size of, oh, eleven by seventeen, in various formats, whether it’s microfilm or paper,” Conway says. “We’ve kind of got that one nailed down, and the affordable technology is there to support digitization from either the original document or from its microfilm copy. But once you get larger than that, the technology isn’t there yet, [and] the testing of the existing technology to find out where it falls off is not there.” Conway hasn’t been able to put these scanned-from-microfilm books on the Web yet. “The files are not available now,” he wrote me,
because we chose (unwisely it now turns out) to build the system around a set of proprietary software and hardware products marketed by the Xerox Corporation. Our relationship with Xerox soured when the corporation would not give us the tools we needed to export the image and index data out of the Xerox system into an open, non-proprietary system. About two years ago, we decided not to upgrade the image management system that Xerox built for us. Almost immediately we started having a series of system troubles that resulted in us abandoning (temporarily) our goal of getting the books online. . . . In the meantime, the images are safe on a quite stable medium (for now anyway).
The medium is magneto-optical disk; the project was paid for in part by the National Endowment for the Humanities.
Newspapers have pages that are about twenty-three by seventeen inches—twice as big as the upper limits Conway gives. The combination of severe reduction ratios, small type, dreadful photography, and image fading in the microfilmed inventory make scanning from much of it next to impossible; one of the great sorrows of newspaper history is that the most important U.S. papers (the New York Herald Tribune, the New York World, the Chicago Tribune, etc.) were microfilmed earliest and least well, because they would sell best to other libraries. We may in time be able to apply Hubble-telescopic software corrections to mitigate some of microfilm’s focal foibles, but a state-of-the-art full-color multimegabyte digital copy of a big-city daily derived, not from the original but from black-and-white Recordak microfilm, is obviously never going to be a thing of beauty. And no image-enhancement software can know what lies behind a pox of redox, or what was on the page that a harried technician missed.
In the late eighties, the Commission on Preservation and Access wanted an all-in-one machine that would reformat in every direction. It commissioned Xerox to develop specifications for “a special composing reducing camera capable of digitizing 35mm fil
m, producing film in difference reductions (roll and fiche), paper, and creating CD-ROM products.” As with Verner Clapp’s early hardware-development projects at the Council on Library Resources, this one didn’t get very far. The master digitizers—Stuart Lynn, Anne Kenney, and others at Cornell, and the Mellon Foundation’s JSTOR team, for example—realized almost immediately that they shouldn’t waste time with microfilm if they didn’t have to. “The closer you are to the original, the better the quality,” Anne Kenney told me. “So all things being equal, if you have microfilm and the original, you scan from the original.” JSTOR came to the same conclusion:
One interesting discovery that we made in the process of obtaining bids is that working from paper copies of back issues of journals, rather than from microfilm, produces higher quality results and is—to our surprise—considerably cheaper. This conclusion has important implications beyond JSTOR.
It sure does have important implications: it means that most of the things that libraries chopped and chucked in the cause of filming died for nothing, since the new generation of facsimilians may, unless we can make them see reason, demand to do it all over again.
CHAPTER 35
* * *
Suibtermanean Convumision
The second major wave of book wastage and mutilation, comparable to the microfilm wave but potentially much more extensive, is just beginning. At the upper echelons of the University of California’s library system, a certain “Task Force on Collection Management Strategies in the Digital Environment” met early in 1999 to begin thinking about scanning and discarding components of its multi-library collections. Two of the librarians “anticipated resistance1 to the loss of printed resources, especially by faculty in the Humanities, but agreed that the conversation had to begin.” Others prudently pointed out that the “dollar and space savings would likely be minimal for the foreseeable future and should not be used to justify budget reductions or delays in needed building improvements.” Still others wanted to be sure that the organizers of the program arranged things so that the “campuses which discarded their copies would not be disadvantaged.”