a TorrentFreak article got me spooked so I fired up the ol’ yt-dlp. Got the entire channel, including comments, description metadata, and thumbnail images.

A significant number of videos were actually unavailable because of an odd YouTube bug where 15+ year old videos were listed as “currently being processed”. I may re-run this later (since I ran it in archive file mode) to get the missing videos, as it seems there may be about 300 out of 4911 videos missing.

  • mtcerio@lemmy.world
    link
    fedilink
    arrow-up
    1
    ·
    1 year ago

    Gives an idea of the amount of data YouTube is storing, if only this one channel is 250GB!

    • empireOfLove@lemmy.oneOP
      link
      fedilink
      English
      arrow-up
      1
      ·
      edit-2
      1 year ago

      And mind you, they have a high number of videos but most are short clips and all of them are low res, 360p or 480p max. Any other channel uploading HD or 4k content will be orders of magnitudes larger for fewer videos.

    • people_are_cute@lemmy.sdf.org
      link
      fedilink
      arrow-up
      1
      ·
      edit-2
      1 year ago

      And that 250GB is probably just the downloaded and HEVC-compressed files. YouTube actually promotes uploading in raw formats for best quality, just 3-4 full-length movies would be enough to fill 250GB for them

  • Sailing7@lemmy.ml
    link
    fedilink
    arrow-up
    0
    ·
    1 year ago

    Nice!

    Could you fell us what tool you used to also get the description text and the comments? With dlp i only found the option of downloading the video itself.

    • totallynotfbi@lemm.ee
      link
      fedilink
      arrow-up
      1
      ·
      1 year ago

      yt-dlp does support fetching comments and description text - if you use the --write-info-json and --write-comments options, it will save them as a JSON file alongside other video metadata.

    • empireOfLove@lemmy.oneOP
      link
      fedilink
      English
      arrow-up
      0
      ·
      1 year ago

      They still happily exist on YouTube- for now. So no point in re-hosting, they’ll get squirreled away into the Giant Hard Drive of Doom.

      If something happens to the actual archive project in the near future, I’ll likely section them up into 20gb pieces and post them out on a torrent someplace.

        • raoulraoul@lemmy.world
          link
          fedilink
          arrow-up
          2
          ·
          1 year ago

          As if they’re not having enough trouble with hosting “questionable” content! You obviously didn’t read the torrentfreak article making the rounds.

          Internet Archive != the Pirate Bay.

          For now, DON’T contaminate the IA with the Classic Chicago Television channel.

        • empireOfLove@lemmy.oneOP
          link
          fedilink
          English
          arrow-up
          2
          ·
          1 year ago

          Nah. IA doesn’t need to deal with this volume of shit and they already have enough of a hard time dealing with copyright trolls.

          If this channel is impacted in the future, I’ll probably put out a few torrents with the videos and post them here.