• 0 Posts
  • 9 Comments
Joined 2 months ago
cake
Cake day: November 25th, 2024

help-circle
  • As I understand it, their data does in fact enter into the Wayback Machine. They are just also available in the direct WARC archive files(which IMO sounds beneficial to the idea of exporting in bulk to another backup host). At least that’s how their FAQ reads.

    And given that they focus on web crawling, and not other arbitrary data formats that IA accepts, 2.8% of over 100 petabytes is still a respectable amount of data.

    That said, help is help. If another archival project team wants me to run a worker node so they can distribute load and dodge crawler blocks, let me know, I’ve got space.




  • There are alternative archival sites, some that operate outside US tampering, but IA is certainly the primary.

    Unfortunately, the IA is absolutely massive. Anyone backing up anything is just grabbing what is personal to them, hopefully in a way that the pieces can be authenticated and re-assembled, but unlike Wikipedia we aren’t talking about copies of the whole thing, not even close. I think they are near or recently over 100 petabytes? Much will be lost if/when the IA is eventually targeted and disabled for whatever reason they come up with.

    If the IA were to be backed up at any meaningful scale, I would think to ask the British to encourage their Museum to embrace the stereotype that they readily take everything, and apply it to the internet. America can no longer be trusted to house any accurate history of anything.