The Pew Research Center in Washington has published an analysis that reveals how a vast number of news and contents are effectively and slowly disappearing from the web according to the phenomenon of “Digital Decay”.
Part of the study looked at a representative sample of web pages that have existed over the past decade to see how many are still accessible today. The independent US research center, for this analysis, has collected a sample of pages from the Common Crawl web repository (a non-profit organization that crawls the entire web and provides its archives and datasets to the public for free) for every year from 2013 to 2023. The researchers’ next step was then to try to access those pages to see how many still existed. The Internet is an unimaginably vast archive of modern life, with hundreds of billions of web pages indexed. However, these contents sometimes disappear. Indeed it turned out that a quarter of all web pages existing between 2013 and 2023 are no longer accessible as of October 2023. Research reveals that in most cases this is due to a single page being deleted or removed from a functioning website. For older content, this trend is even stronger. Approximately 38% of web pages that existed in 2013 are unavailable today, in contrast, only 8% of pages created in 2023 suffered the same fate, suggesting a possible evolution in web preservation technologies or strategies.
This “digital decay” occurs in different online environments. The study examined links appearing on government and news websites as well as in the “References” section of Wikipedia pages as of spring 2023. This analysis found that 23% of news web pages contain at least one broken link, as do 21% of web pages from government sites. 54% of Wikipedia pages contains at least one link in the “References” section that points to a page that no longer exists.
Digital decay manifests itself also on social media, a sample of tweets was collected in real time during the spring of 2023 on social media platform X (known as Twitter) and monitored for the following three months. Of this sample almost one in five tweets is no longer publicly visible on the site a few months after being published. In 60% of these cases, the account that originally posted the tweet was made private, suspended, or deleted altogether. In the remaining 40%, the account owner deleted the individual tweet, even if the account still exists.
Some types of tweets tend to disappear more often than others. Over 40% of tweets written in Turkish or Arabic are no longer visible on the site within three months of posting. And tweets from accounts with default profile settings are more likely to disappear from public view.
According to the Pew Research Center, these findings highlight theThe importance of developing more robust strategies for the preservation of digital content, especially considering their fundamental role as repositories of modern history and as educational and information resources. The phenomenon of digital decay poses significant challenges for digital archiving and collective memory online, prompting reflections on the long-term durability and accessibility of digital content.
#Internet #Vanishing #Study #Reveals #Phenomenon #Digital #Decay