But people were willing to give Egyptian linen a try—mummies were incontestably imported and unwrapped, their coverings tossed into papermaking macerators and, possibly, inked into issues of the New York Tribune, and then volumes of the New York Tribune were microfilmed by Kodak’s Recordak Corporation in the thirties, and all over the country old volumes of the Tribune and the Sun and the Times were thrown away or auctioned off, and now much of the transfigured shroudage is reburied once again, though in less pharaonic company, in the company of a million old, undecaying phone books. (The Library of Congress is now microfilming and tossing its early phone-book collection, too, by the way.)
“Duplication in libraries is a real problem,” Brother Lemberg explained to me when I called him. “We all have the same thing! What if just one library owned a paper copy, and we had an electronic library for the rest of them? They could all get rid of theirs. It’s an exciting kind of idea for retrospective work.” Of the books that would be purged in accordance with Lemberg’s cost analysis, about thirty million of them, according to one of his statistical tables, would date from 1896 or before.
I learned about Lemberg’s dissertation from a recent textbook by Michael Lesk,3 Practical Digital Libraries: Books, Bytes and Bucks. Lesk works at the National Science Foundation, where, as division director of Information and Intelligent Systems, he helps administer the government’s omni-tentacled Digital Library Initiative, which has dispensed many millions of federal dollars for library projects. The National Science Foundation began its sponsorship of the digital initiative in 1992, with the help of NASA and DARPA, the Defense Advanced Research Projects Agency; now the Library of Congress and the National Endowment for the Humanities are also participating. In an early round of funding, six universities—including Harvard, Berkeley, Stanford, and the University of Michigan—received about four million dollars apiece to help them find their way to the all-electric word kitchen. Lesk agrees with Brother Lemberg: in order for libraries to provide the best service to the most people for the least money, they ought to begin large-scale scanning projects right now, and simultaneously divest themselves of their originals.
When I interviewed Lesk (in December 1998), he offered me an example of what a library might do to help finance a digitization project. Stanford was spending, he said, about fifty-five million dollars to repair its on-campus library building, damaged in the 1989 earthquake. “You could scan the entire contents of that building for less. Now the problem is that letting that building fall down is not acceptable to the Stanford community.” Later he said: “I routinely suggest to libraries, ‘You know, gee, maybe you should think about repairing your building. Maybe you don’t want to do it, because maybe you want to do something else.’ ” Lesk would not get rid of all scanned duplicate books, however. “There are various caveats,” he said.
You might feel that you wanted more than one copy, just in case there was a fire in Washington. Suppose the book had been bound by a famous bookbinder, you might not want to tear that copy apart. But basically, yes, for the vast majority of nineteenth-century material, where you don’t particularly care about the binding or the paper, the book is falling apart, there are no illustrations, I would say, yes, you would be better off with one digital copy and one carefully watched paper copy, than you are with relying on eight different or ten different decaying paper copies and no digital access.
Why not both? Why can’t we have the benefits of the new and extravagantly expensive digital copy and keep the convenience and beauty and historical testimony of the original books resting on the shelves, where they’ve always been, thanks to the sweat equity of our prescient predecessors? We can’t have both, in Michael Lesk’s view, because the destruction of the old library will help pay for the creation of the new library. The fewer books that remain on the shelves, the lower the storage cost—that’s the first-order “benefit” to Lesk’s government-financed plan. And the fewer physical books that are on the shelves—the more they must fly by wire—the more the public will be obliged to consent to the spending of ongoingly immense sums necessary for global conversion, storage, networked delivery, and “platform migration.” Lesk used to work at Bellcore, the research group owned until recently by NYNEX and other Baby Bell phone companies; it isn’t entirely surprising that millions of dollars of the National Science Foundation’s grant money are going to telecommunications networks, notably to the creation of a joint MCI WorldCom/NSF “very high performance Backbone Network Service,”4 or vBNS, and hookups thereto, that will connect phone companies, university science labs, and university libraries, for the benefit of all concerned.
None of this would disturb me—who can quarrel with high-performance backbones?—if an attack on low-tech book spines weren’t also part of the plan. The attack is part of the plan, though, just as it was when the Library of Congress began destroying the newspapers. The single “carefully watched paper copy” that Lesk thinks we should keep will generally not be the one that is actually scanned, because the scanned book is thrown away afterward: big projects like JSTOR (Journal STORage, the Andrew W. Mellon Foundation’s digitally copied database of scholarly periodicals, many going back to the nineteenth century) and the Making of America (Cornell’s and the University of Michigan’s growing collection of digital books) routinely prepare for digitization5 by cutting up the book or journal volume they have in hand, so that the pages can lie flat on the scanner’s glass. Michigan’s librarians choose digital conversion, according Carla Montori, the head of preservation, “knowing that the original will be disbound,6 and that there will be little chance it can be rebound.” The disbound Making of America7 books are, some of them, uncommon mid-nineteenth-century titles; e.g., Henry Cheever’s The Island World of the Pacific (1856), The American Mission in the Sandwich Islands (1866), Josiah Parsons Cooke’s Religion and Chemistry; or, Proofs of God’s Plan in the Atmosphere and its Elements (1865), John C. Duval’s The Adventures of Big-Foot Wallace, the Texas Ranger and Hunter (1870), and Mary Grey Lundie Duncan’s America As I Found It (1852). Michigan’s preservation department maintains that the thousands of books they have scanned were all terminally brittle—but the term “brittle” has shown itself to be remarkably pliant in recent decades, and nobody now can evaluate Michigan’s diagnoses, since most of the scanned remnants went in the trash.
“At the moment it looks as if [disbinding] is the cheapest way to do things,” Lesk told me. He is even bolder in a paper entitled “Substituting Images for Books: The Economics for Libraries,” where he argues for the outright hashing of a better copy of a book over one that is worn out or very brittle, simply because it’s less expensive to destroy the book in better condition. “It is substantially cheaper8 to scan a book if the paper is strong and can be fed through a stack feeder, rather than requiring manual handling of each page,” he writes; thus “it may turn out that a small library located in a rural and cold mountain location with few readers and clean air has a copy in much better shape, and one that can be scanned more economically.”
Of course, that small mountain library, having done such a fine job of safekeeping all those years, may have “less motivation to scan a book which is not yet deteriorating”—hence the need, in Lesk’s central-plannerly view, for a nationwide cooperative authority that will order that library to guillotine its copy and feed it to the scanner for the greater good.
Lesk’s candor is impressive: he acknowledges that the resolution of today’s scanned offerings may be crude by tomorrow’s standards, or even by comparison with today’s microfilm. “I would like to see, as soon as possible, a lot of scanning, so that momentum builds for doing this job,” he told me. “It is likely that to build support for a conversion of this sort, what matters much more is that a lot of stuff gets done, than that the stuff that gets done is of the ultimate highest quality.” Better to have to scan some things twice, in Lesk’s view, than not to scan at all—assuming, of course, that there is still a physical copy left to destroy when
it comes time for the retake. Lesk also recognizes that in a cooperative project involving millions of volumes, there will be errors and omissions. “The odds are that there will be things lost,” he said. Some projects, such as JSTOR, have the money to do a careful preliminary check to be sure that no pages or issues are missing, but most places, he says, “won’t be able to afford the JSTOR quality standards.”
I was interested to hear Lesk offer JSTOR as a paragon of quality. JSTOR is the most successful of the large-scale digitization projects; it has big money and big names behind it (including lifelong library automator Richard De Gennaro, former chief librarian at Harvard and, before that, at the New York Public Library); it can be marvelously helpful in finding things that you didn’t know existed, or that you do know exist but don’t have handy. Its intent, however, is not supplemental but substitutional: back issues of scholarly journals are, in the words of its creator, William G. Bowen, ex-president of the Andrew W. Mellon Foundation and of Princeton, “avaricious in [their] consumption9 of stack space”; JSTOR will allow libraries “to save valuable shelf space on the campus by moving the back issues off campus or, in some instances, by discarding the paper issues altogether.” Taking this cue, Barbara Sagraves, head of preservation at the Dartmouth library, wrote in an online discussion group in 1997 that questions about weeding the collection had “bubbled up” at her library. “The development of JSTOR and the promise of electronic archiving creates the possibility of withdrawing paper copies and relying solely on the electronic version,” she writes. Although she wants to make clear that Dartmouth is “in no way considering that option,” she says that construction planning has made librarians there “step back and question retention decisions in light of new means of information delivery.” In a survey conducted by JSTOR10 in 1999, thirteen percent of the respondents had already “discarded outright” bound volumes of which electronic copies exist on JSTOR, and another twenty-five percent had plans to do so; twenty-four percent have stopped binding incoming issues.
Lesk likes JSTOR for that very reason. He wants to divert capital funds from book-stack square footage into database maintenance, to create a habit of dependence on the electronic copy over the paper original, to increase the market share of digital archives. And he is right that JSTOR’s staff takes pains in the preparation of what they reproduce: they make sure that a given run of back issues is as complete as possible before they scan and dump it.
What about quality, though? The printable, black-and-white page-pictures that JSTOR stores are good—their resolution is six hundred dots per inch, about the same as what you would get using a photocopier. (What you see on-screen is less good than that, because the images are compressed for faster loading, and the computer screen imposes its own limitations.) But the searchable text that JSTOR derives from these page-pictures is, by normal nineteenth- and twentieth-century publishing standards, intolerably corrupt. OCR (optical character recognition) software, which has the job of transmuting a digital picture of a page into a searchable series of letters, has made astonishing improvements, but it can’t yet equal even a middling typesetter, especially on old fonts. Thus JSTOR’s OCR accuracy rate is held (with editorial intervention) to 99.95 percent. This may sound exacting, but the percentage measures errors per hundred letters, not per hundred words or pages. A full-text electronic version of a typical JSTOR article will introduce into the clickstream a newly minted typo every two thousand characters—that is, one every page or two. For instance, I searched JSTOR for “modem life”11 and got hits going back to the April 1895 issue of Mind: the character-recognition software has difficulty distinguishing between “rn” and “m” and hasn’t yet been told that there were no modems in 1895.
It’s easy to fix individual flukes like this, once they are pointed out, but the unpredictable OCR misreads of characters in proper names, in dates, in page numbers, in statistics, and in foreign quotations are much costlier to control. That’s why JSTOR allows you to see only the image of the page, and prevents you from scrolling through its searchable text: if scholars were free to read the naked OCR output, they might, after a few days, be disturbed by the frequency and strangeness of its mistakes, especially in the smaller type of footnotes and bibliographies, and they might no longer be willing to put their trust in the scholarly integrity of the database.
Half joking, I pointed out to Michael Lesk that if a great many libraries follow his advice by scanning everything in sight and clearing their shelves once they do, the used-book market will collapse. Lesk replied evenly, “If you’ve ever tried taking a pile of used books to a local bookseller, you know that for practical purposes, most used books are already worthless. Certainly old scientific journals are worse than worthless. You will have to pay somebody to cart them away, in general.” (Online used-book sites, such as abebooks.com, Bibliofind, and Alibris, where millions of dollars worth of ex-library books and journals change hands, might contest that statement.) I asked Lesk whether he owned many books. He said he had several thousand of them—most of them printed on “crummy paper.”
CHAPTER 8
* * *
A Chance to Begin Again
Before Michael Lesk, though, came the grand old men of microfilm—people like M. Llewellyn Raney (director of libraries at the University of Chicago, who in 1936 wrote that the “application of the camera1 to the production of literature ranks next to that of the printing press”); and Fremont Rider, the slightly askew head librarian at Wesleyan; and Rider’s authoritative follower, Verner Clapp. We must learn more about these men.
In an article in a 1940 issue of the Journal of Documentary Reproduction, Llewellyn Raney provided an early hint of developments to come: he coyly described a dinner at the Cosmos Club in Washington, where “a couple of curious librarians2 and a Foundation scout” discussed with some microphotography experts the economics of book storage versus “miniature reproduction.” The question was whether “discarding might introduce a new economy”:
If the volumes in question could be abandoned afterward, then the bindings might be removed and the books reduced to loose sheets in case anything were gained by this course. Gain there would be, because sheets could be fed down the chute to a rotary camera glimpsing both sides at once far more rapidly than the open volume on a cradle by successive turning of the leaves.
Not only would the microfilm’s images look better—the activity would be cheaper. “So ended an intriguing night out,” wrote Raney. “The participants are of a mind to repeat it—often.”
A few years later, Fremont Rider had a revelation. He conceived of a kind of bibliographical perpetual-motion machine: a book-conversion plan that would operate at a profit. In 1953, he described it as follows:
Every research library would3 actually save money if it absolutely threw away almost all of the volumes now lying on its shelves—volumes which it has already bought, bound and cataloged, and would save money even if it had to pay out cold cash to acquire microtextual copies of them to replace them! This is the startling fact which most librarians are not yet really aware of.
Assume, Rider goes on to say, that each discarded volume would have a salvage value of two dollars. Out of that income, the library would pay for the book’s microtext replacement, house the microtext in perpetuity, and derive, besides, “an actual cash profit on the substitution.” He writes: “If there was ever a case in library technology of having one’s cake and eating it too this substitution of microtext books for salvageable bookform books would seem to be it!”
A cash profit—sounds mighty good. Miles O. Price, then the director of Columbia’s library, said in the discussion that followed this presentation that he has “long been a microtext enthusiast.” But (and this is where I got my comment to Michael Lesk about the collapse of the used-book market) he quibbled with the cost analysis: “Discarded material will have low salvage value because of the number of libraries which will be discarding.” James T. Babb4 of Yale “felt the need for the physical boo
k to exist somewhere in the Northeast,” but he thought that the need would decrease. According to the synopsis of the discussion, only one library manager that day reacted with anything like revulsion or outrage at Rider’s plan. Charles David, head of the University of Pennsylvania’s library, found the economic analysis “exasperating” and questioned its soundness. He said that it was an “invitation to librarians to destroy books by the millions.”
And that is what it was. Fremont Rider was a giant of twentieth-century librarianship; his erratic career repays study. He had a persuasive and colorful prose style, and his poems (a number of which he published in his autobiography) have a certain sorrowful throb:
Roses, jasmine,5
Frankincense, myrrh—
Grey death dust
In the soul’s sepulchre.
(Read it slowly, Rider recommends.) At Syracuse University, he edited the Onondagan—this was back in 1905, when they still had their run of the Syracuse Daily Standard—then he went to library school in Albany, where Melvil Dewey (whose biography Rider later wrote) hired him as a secretary. But Dewey’s adjustments to the decimal system couldn’t hold Rider’s attention, and by 1907 he was in New York turning out pulp mystery stories and, very briefly, headlines for Hearst’s yellow-pennanted flagship, the New York American. For The Delineator (a magazine edited by Theodore Dreiser) he produced a series of pieces on spirit rappings, levitation, astral bodies, multiple personalities, and other phenomena that have “converted to psychism6 the greatest scientists of Europe, and are now creating widespread comment in every intelligent center of the globe.” These were collected in his first book, Are the Dead Alive? It isn’t an entirely dispassionate work: “the fact that tables and other articles of furniture do under certain conditions move, apparently of their own accord, must be admitted as established.” (Rider was a fervent italicizer.)