Internet Archive Wants to Store Everything, Including Books

What does a library look like anymore?

When Egyptian King Ptolemy I built the Library of Alexandria nearly 2,300 years ago, the great library became the intellectual center of the ancient world. Ptolemy hoped to gather as much human knowledge as possible. Even ships anchored in the port were impounded until all the manuscripts they contained could be copied. World leaders lent their scrolls for duplication, and library officials traveled far and wide to purchase entire collections. Meanwhile, dutiful scribes hand-copied the library’s awesome collection, which eventually grew to as many as 700,000 scrolls.

NOTE: Books with bindings and covers had not yet been invented. 2,300 years ago, “books” were available only as long scrolls of parchment.

Brewster Kahle is a modern-day Ptolemy: he wants to ensure universal access to all human knowledge. And now he thinks that goal is within our grasp. In fact, his web site, called The Internet Archive, has already stored 380 billion web pages. Yes, that’s BILLIONS of web pages. However, this online archive has a lot more than just web pages. It serves as an online library, the largest such library in the world. It also has 20 million books and texts, 4.5 million audio recordings (including 180,000 live concerts), 4 million videos (including 1.6 million Television News programs), 3 million images and 200,000 software programs, all available at no charge to you. As of the day I wrote this article, the Internet Archive has 7,295,193 users. In fact, this online library gets more visitors in a year than most other libraries do in a lifetime.

Kahle is no stranger to the Internet. He earned a Bachelor of Science degree from the Massachusetts Institute of Technology in 1982. He studied artificial intelligence with Marvin Minsky and W. Daniel Hillis. In 1983, he helped start Thinking Machines, serving six years as a lead engineer for the parallel supercomputer maker. In the late 1980s, he pioneered the Internet’s first publishing system, known as WAIS (Wide Area Information Server), which was sold to AOL in 1995. He then co-founded Alexa Internet, which was sold to Amazon.com in 1999.

The Internet Archive is Kahle’s most ambitious project. He founded it in 1996 as a non-profit organization based in San Francisco, California. It started as a few servers running in Kahle’s attic. In late 1999, the organization started to grow to include more well-rounded collections. Today the Internet Archive includes texts (including complete books), audio, moving images, and software as well as archived web pages in its collections. It also provides specialized services for adaptive reading and information access for the blind and other persons with disabilities.

The Internet Archive now includes several divisions: The Wayback Machine, Open Library, Audio Archive, and more. The web site proudly proclaims, “Our mission is to provide Universal Access to All Knowledge.” Web pages are normally found at http://www.archive.org while books and many other materials are found at http://www.OpenLibrary.org. Both of those addresses link to different parts of the Internet Archive.

Brewster Kahle’s latest organization is working on digitizing and storing the entire World Wide Web and making what has been digitized so far freely accessible at http://www.archive.org. If a bit of genealogy information was published on the web in the past but has since disappeared, there is an excellent chance that you can find an old copy of the information on Archive.org. Six hundred thousand people use the Internet Archive every day, conducting two thousand searches a second.

The Internet Archive is physically located at 500 Funston Avenue in San Francisco. It looks like a Greek Revival temple. There is a good reason for the similarity: it was built in 1923 by the Fourth Church of Christ, Scientist, and remained a church until Brewster Kahle bought the building. He wanted to move the Internet Archive out of his attic and into a much larger facility that could hold rows and rows of servers and disk arrays containing petabytes of data.

500 Funston Avenue in San Francisco

Brewster Kahle also is working on making all the stored material available in many different places. The information is available on desktop computers, laptops, tablets, eBook readers, cell phones, and most anyplace else there is a demand. Many libraries around the world also have “print on demand” printers that will download a book from The Internet Archive/Open Library, print it, bind it, and make it available to a patron whenever requested. These books are actual digital images of the original books.

Kahle’s associates also built an “Internet Bookmobile,” a van that drives around the country downloading public-domain books from the archive via a satellite network link and making them available as printed books to anyone who wishes to obtain one.

Kahle describes his “Internet Bookmobile” this way:

Why a Bookmobile? Just like the bookmobiles of the past brought wonderful books to people in towns across America, this century’s bookmobile will bring an entire digital library to their grandchildren. The Internet Archive’s mission is to provide universal access to human knowledge, and given the advancement of digital storage and communications, this goal is now achievable. Part of accomplishing that goal is to make sure that public domain books are available digitally. Another part is making sure people across the country have access to those works whether by reading on a screen, or more likely, to be printed back out again as a book.

So what is the Bookmobile? It is a mobile digital library capable of downloading public domain books from the Internet via satellite and printing them anytime, anywhere, for anyone. It will be traveling across the country from San Francisco to Washington D.C., stopping at schools, libraries and retirement homes; places where people understand the value of a book. After the bookmobile leaves, each library will understand what it would take to make, print, and bind public domain books for their patrons.

It is interesting that Brewster Kahle reports that it is cheaper to print a new book than it is to pay for the labor to reclaim the book, check it in, and reshelve it. The reprinted books are given away free of charge. Of course, donations are always gladly accepted.

I visited the “Internet Bookmobile” a few years ago when it was parked at Walden Pond in Massachusetts. The van being used was the smallest “bookmobile” I had ever seen, much smaller than the usual buses used for bookmobiles. I assume it was cheaper also. However, the number of available books was much greater than that of any traditional bookmobile. When a patron asked for any of the millions of available books, it was downloaded and delivered as a printed book within 5 minutes or so.

Archive.org bookmobiles have also visited other places that really need it — Uganda, Egypt, and India — printing out books for children at a cost to Archive.org of about $1 a piece. However, the books normally are given away at no charge. Then there are the archive’s newer offerings: music concerts and feature films.

A book scanner at the Internet Archive

As a reference of his goals, Brewster Kahle points out that the Library of Congress houses about 28 million books. Kahle estimates that his organization can scan and digitize each book for $10 a piece. That would cost about $280 million, or the equivalent of only half the Library of Congress’ annual budget.

To be sure, legally obtaining copyrighted material has its challenges, especially music and videos. But Kahle is chipping away where he can. In 2003, the Archive encountered possible issues involving the Digital Millennium Copyright Act. This act could make it impossible to legally archive early computer software and games. The Internet Archive worked with the U.S. Copyright Office to obtain an exemption for many copyright-protected works. Details are available at http://www.archive.org/about/dmca.php.

In many other cases, authors and/or publishers actually give their books or magazines to the Open Library/Internet Archive, along with a signed release allowing the non-profit to give away the books or magazines without restrictions.

When asked about intellectual property issues, Brewster Kahle responded:

I see what we’re doing as being very much in the tradition of Ben Franklin’s and Carnegie’s vision of the library system and sort of the Thomas Jefferson ideal of making an educated populace.

Then there is the question of “can we?” Within technological audiences, this is often the issue.

The “may we?” question is legal and societal.

The Internet Archive attempts to scan and digitize all books, not just the ones that are out of copyright. However, because of copyright laws, not everything the Internet Archive has digitized is available on the World Wide Web. In the archive.org headquarters building at 300 Funston Avenue, there is a scanning station and a listening room with armchairs, coffee tables, bookshelves, and headphones. Visitors can access everything, whether under copyright or not, from that room in the same manner as visiting any “bricks and mortar” library. Just as a traditional library legally allows in-person visitors to access materials that are still under copyright, the Internet Archive/Open Library does the same for its in-person visitors.

Brewster Kahle chuckles at the cornerstone of the headquarters building that commemorates the date it was laid: 1923. Since books published prior to 1923 are free of copyright and made available online, the date seems especially significant. The building also closely resembles the logo of the Internet Archive, a logo that was created some years before the Fourth Church of Christ, Scientist, building became available for sale.

How big is the Internet Archive, including web pages, books, videos, programs, and more? A little math can give us a fairly reasonable estimate. A typical book contains about a megabyte of information. A megabyte is a million bytes. A gigabyte is a billion bytes. A terabyte is a million million bytes. A petabyte is a million gigabytes. In the lobby of the Internet Archive, you can get a free bumper sticker that says “10,000,000,000,000,000 Bytes Archived.” That’s ten petabytes. It’s also obsolete. That figure is from 2012. Since then, it’s more than doubled.

Archive.org already has a huge collection of books, old Web sites, music, videos, and more. I used it recently to look at www.FamilySearch.org’s Web pages in 1999 and at www.Ancestry.com from October, 1996. My, how those pages have changed! So have this newsletter’s web pages at http://www.eogn.com.

If you are looking for an old, out-of-print genealogy book, you probably should start first at https://archive.org/details/genealogy.

I also downloaded the Grateful Dead concert of September 22nd, 1987, at The Spectrum in Philadelphia. The entire show was a 69.6-megabyte compressed file and required about five minutes to download. Not bad for a one-hour recording! I then decompressed the file and listened to the entire one hour-plus show. I also was able to copy it to my MP3 iPod player so that I can listen at my leisure in the automobile, on airplanes and elsewhere. To be sure, this was not a professionally recorded show (The Dead always allowed amateur recordings of their shows.) However, it will appeal to Deadhead fans, and it records a moment in rock history.

Music concerts are not the only audio recordings. The Presidential Recordings Collection is made up of two distinct sub-collections: public speeches made by U.S. Presidents and secret recordings made in the White House between 1940 and 1973. Yes, Richard M. Nixon’s famous—and infamous—White House tapes that reveal for the first time the President’s uncensored words, completely unfiltered and spoken by the President himself are available online. I suspect Richard Nixon never expected that to happen when he uttered those words between 1971 and 1973.

The UK Central Government Web Archive is a selective collection of UK Government websites, archived from August 2003, which The Internet Archive has collected on behalf of the National Archives of the United Kingdom. You can read more about the UK Central Government Web Archive at http://www.nationalarchives.gov.uk/preservation/webarchive/

There are many, many more items available on The Internet Archive. It has become a major resource for Web users and especially for historians and genealogists. Even images of the original U.S. Census records from 1790 through 1940 are available free of charge at The Internet Archive. Those images have not been indexed by the Internet Archive’s organization, however. If you want to view indexed census entries, you will still need to visit one of the commercial sites that offer such indexes.

If your personal search for a Web page yields a “404 — Page Not Found” error, you probably can still find an earlier version of the page on The Internet Archive. You can access the Archive now at http://www.archive.org.

The Internet Archives also maintains blogs that you can read in any RSS newsreader. Point your newsreader to http://www.archive.org/services/collection-rss.php and to http://www.archive.org/iathreads/posts-display-new.php?forum=web&mode=rss.

Where does all this information come from? Archive has many partners who supply the free information. In addition, YOU can add even more information. Anyone with a free Archive.org account can upload media to the Internet Archive.

Thirteen years ago, I had an opportunity to interview Brewster Kahle and ask him a number of questions about the Internet Archive, Open Library, and the Wayback Machine. The interview was recorded, and you can watch a video of the conversation. It is available on the Internet Archive (naturally!) at https://archive.org/details/AnInterviewWithBrewsterKahle.

While the video is now 13 years old, almost everything that Brewster talked about is still accurate today.

Are you looking for information of some sort? You might start at https://archive.org.

Internet Archive Wants to Store Everything, Including Books

Related