Larimer County Genealogical Society

We’re Losing Our Digital History. Can the Internet Archive Save It?

Research shows 25% of web pages posted between 2013 and 2023 have vanished. A few organisations are racing to save the echoes of the web, but new risks threaten their very existence.

It’s possible, thanks to surviving fragments of papyrus, mosaics and wax tablets, to learn what Pompeiians ate for breakfast 2,000 years ago. Understand enough Medieval Latin, and you can learn how many livestock were reared at farms in Northumberland in 11th Century England – thanks to the Domesday Book, the oldest document held in the UK National Archives. Through letters and novels, the social lives of the Victorian era – and who they loved and hated – come into view.

But historians of the future may struggle to understand fully how we lived our lives in the early 21st Century. That’s because of a potentially history-deleting combination of how we live our lives digitally – and a paucity of official efforts to archive the world’s information as it’s produced these days.

However, an informal group of organisations are pushing back against the forces of digital entropy – many of them operated by volunteers with little institutional support. None is more synonymous with the fight to save the web than the Internet Archive, an American non-profit based in San Francisco, started in 1996 as a passion project by internet pioneer Brewster Kahl. The organisation has embarked what may be the most ambitious digital archiving project of all time, gathering 866 billion web pages, 44 million books, 10.6 million videos of films and television programmes and more. Housed in a handful of data centres scattered across the world, the collections of the Internet Archive and a few similar groups are the only things standing in the way of digital oblivion.

“The risks are manifold. Not just that technology may fail, but that certainly happens. But more important, that institutions fail, or companies go out of business. News organisations are gobbled up by other news organisations, or more and more frequently, they’re shut down,” says Mark Graham, director of the Internet Archive’s Wayback Machine, a tool that collects and stores snapshots of websites for posterity. There are numerous incentives to put content online, he says, but there’s little pushing companies to maintain it over the long term.

Despite the Internet Archive’s achievements thus far, the organisation and others like it face financial threats, technical challenges, cyberattacks and legal battles from businesses who dislike the idea of freely available copies of their intellectual property. And as recent court losses show, the project of saving the internet could be just as fleeting as the content it’s trying to protect.

“More and more of our intellectual endeavours, more of our entertainment, more of our news, and more of our conversations exist only in a digital environment,” Graham says. “That environment is inherently fragile.”

Saving our history

A quarter of all web pages that existed at some point between 2013 and 2023 now… don’t. That’s according to a recent study by Pew Research Center, a think tank based in Washington, DC, which raised the alarm of our disappearing digital history. Researchers found the problem is more acute the older a web page is: 38% of web pages that Pew tried to access that existed in 2013 no longer function. But it’s also an issue for more recent publications. Some 8% of web pages published at some point 2023 were gone by October that same year.

This isn’t just a concern for history buffs and internet obsessives. According to the study, one in five government websites contains at least one broken link. Pew found more than half of Wikipedia articles have a broken link in their references section, meaning the evidence backing up the online encyclopaedia’s information is slowly disintegrating.

You can read more in an article by Chris Stokel-Walker published in the BBC web site at: https://www.bbc.co.uk/future/article/20240912-the-archivists-battling-to-save-the-internet.