• 0 Posts
  • 3 Comments
Joined 12 days ago
cake
Cake day: January 31st, 2026

help-circle


  • Superb, I have 1-8, 11-12.

    Only remaining 10 (to complete - downloading from Archive.org now)

    Dataset 9 is the biggest. I ended up writing a parser to go through every page on justice.gov and make an index list.

    Current estimate of files list is:

    • ~1,022,500 files (50 files/page × 20,450 pages)
    • My scraped index so far: 528,586 files / 634,573 URLs
    • Currently downloading individual files: 24,371 files (29GB)
    • Download rate ~1 file/sec to avoid getting blocked = ~12 days continuous for full set

    Your merged 45GB + 86GB torrents (~500K-700K files) would be a huge help. Happy to cross-reference with my scraped URL list to find any gaps.