• asjmcguire@kbin.social
    link
    fedilink
    arrow-up
    12
    ·
    2 years ago

    Reddit has been going for like a billion years, and you only got 80GB - I mean even zipped, that can’t even be a fraction of the data surely?

    • ddnomad@infosec.pub
      link
      fedilink
      English
      arrow-up
      16
      ·
      edit-2
      2 years ago

      Depends on what kind of data, if it’s mostly internal documents / dumps of whatever communication systems they use etc, it would not be too large (mostly because of retention policies on that software).

      If it is actually the data straight from Reddit’s production databases, then 80GB does sound questionable. But then what kind of data are we talking about? Is it actually valuable?

      Anyways, this is big (if true).

    • eighty@lemmy.one
      link
      fedilink
      arrow-up
      12
      ·
      2 years ago

      I’d be surprised if the data was just content. Memes and texts aren’t particularly valuable.

      However, data that can be used for tracking/developing user profiles such as what they’re subscribed to, how active they are, and how they all link to one another is especially useful for conpetetitors and marketers. Plus any personal data such as emails and profiles. I wouldn’t be surprised if you managed to get a huge amount of data under 80gb if it’s just text (think how big a 80gb excel sheet would be)

    • Trebach@kbin.social
      link
      fedilink
      arrow-up
      4
      ·
      2 years ago

      I could get 80 GB of Reddit data in a day. ArchiveTeam has uploaded 2.97 PB (1PB is 1024 TB or 1048576 GB) so far trying to back up all of Reddit to the Internet Archive and they’re still not finished!