Skip to content

The Wayback Machine and Cloudflare Want to Backstop the Web

The web is decentralized and fluid by design, but all that chaos and ephemerality can make it difficult to keep a site up and online without interruption. That’s what has made the Internet Archive’s Wayback Machine feature so invaluable over the years, maintaining a history of long-forgotten pages. Now its deep memory will help make sure the sites you visit never go down, through a partnership with the internet infrastructure company Cloudflare.

Since 2010, Cloudflare has offered a feature called Always On, which caches a static version of sites that it can serve to visitors in case of downtime. Always On was one of CloudFlare’s original offerings; John Graham-Cumming, the company’s chief technology officer, says the infrastructure powering it was due to be rearchitected. In thinking about how to modernize it, the team had an idea: Why not use the Wayback Machine, the existing crawling and caching juggernaut, to power Always On? The Internet Archive already offered an application programming interface that would make it easy for Cloudflare to pull what it needed.

“We worked with them to make sure they were OK with us using it in this way,” Graham-Cumming says. “It’s one of those things where it’s like, yeah, this works for everybody, so let’s do it. If you come to a website that uses Cloudflare and it’s offline, we will show the latest version that’s in the Wayback Machine archive.”

The Internet Archive says it welcomed the opportunity to collaborate with Cloudflare for Always On. And the organization has recently expanded its focus on website reliability and technical integrity across the web. In February, it announced a project with the Brave browser to offer a recent cache of a website if users run into a 404 error. Some browser extensions have provided this functionality over the years, but the Internet Archive says that integrating it fully in a browser and offering it through Always On is a positive step.

The partnership with Cloudflare will also enable the Wayback Machine to find even more websites to crawl, a boon to the Internet Archive. For more than two decades, the Wayback Machine has archived as much of the public web as it can, adding more than a billion URLs a day to the corpus. In all, the archive contains more than 468 billion web pages and more than 45 petabytes of data. But even with all the signals, lists, and sources the Wayback Machine uses to crawl far and wide, the Internet Archive is still always looking for ways to find sites it’s missed. Always On offers one, because of Cloudflare’s broad, far-flung customer base.

Cloudflare serves more than 25 million sites, and domain operators will need to opt in to use Always On with the Wayback Machine. The service has always been free to Cloudflare users and will continue to be. But Internet Archive founder Brewster Kahle and Wayback Machine director Mark Graham say that their infrastructure will be able to handle the additional queries and data pulls from Always On.

“We’d just like to make the web more reliable,” Kahle says. “We want a robust infrastructure out there and we can be part of it, but we’re not all of it. We want multiple participants to be working together in all different ways. We would not be a very good content distribution network and maybe Cloudflare wouldn’t necessarily be the best archive of the web.”

Kahle says the partnership with Cloudflare has been very constructive in early testing, and he’d like to see more collaborations that cross what he calls “the .com, .org boundary.”

The Wayback Machine’s Graham emphasizes, though, that ultimately any collaboration or project must serve the Internet Archive’s core mission. “We’re always on the hunt for more ways we can do a better job of archiving more of the public web,” he says. “This is another source of web resources for us to preserve and make available—hopefully forever, certainly for our lifetimes. As long as we’re around we’re going to keep this thing up.”

Probably the kind of rare dedication you want as the insurance policy for your website.

More Great WIRED Stories