
yeah I’m not the one who generated the url list but I’ve also been getting a lot without a downloadable document. I’m going to start on one of the url lists posted here soon

yeah I’m not the one who generated the url list but I’ve also been getting a lot without a downloadable document. I’m going to start on one of the url lists posted here soon

alrighty, I’m currently in the middle of the archive.org upload but I can transfer the chunks I already have over to a different machine and do it there with a new IP

age gate > page not found

I messaged you on the other site; I’m currently getting a Could not determine Content-Length (got None) error

this method is not working for me anymore

I’m waiting for /u/Kindly_District9380 's version but I’ve been slowly working backwards on this in the meantime https://archive.org/details/dataset9_url_list

I’m using a partial download I already had and not the 48gb version but I will be gathering as many chunks as I can as well. Thanks for making this

I’ll get the first set (42k files in 31G) uploading as soon as I get it zipped up. it’s the one least likely to have any new files in it since I started at the beginning like others but it’s worth a shot

maybe archive.org? that way they can be torrented if others want to attempt their own merging techniques? either way it will be a long upload, my speed is not especially good. I’m still churning through one set of urls that is 1.2M lines, most are failing but I have 65k from that batch so far.

looking forward to your torrent, will seed.
I have several incomplete sets of files from dataset 9 that I downloaded with a scraped set of urls - should I try to get them to you to compare as well?
fantastic work btw