• bjorney@lemmy.ca
    link
    fedilink
    English
    arrow-up
    0
    ·
    6 months ago

    I’m sorry but this says nothing about how they lied about the training cost - nor does their citation. Their argument boils down to “that number doesn’t include R&D and capital expenditures” but why would that need to be included - the $6m figure was based on the hourly rental costs of the hardware, not the cost to build a data center from scratch with the intention of burning it to the ground when you were done training.

    It’s like telling someone they didn’t actually make $200 driving Uber on the side on a Friday night because they spent $20,000 on their car, but ignoring the fact that they had to buy the car either way to get to their 6 figure day job

    • ebu@awful.systems
      link
      fedilink
      English
      arrow-up
      0
      ·
      6 months ago

      i think you’re missing the point that “Deepseek was made for only $6M” has been the trending headline for the past while, with the specific point of comparison being the massive costs of developing ChatGPT, Copilot, Gemini, et al.

      to stretch your metaphor, it’s like someone rolling up with their car, claiming it only costs $20 (unlike all the other cars that cost $20,000), when come to find out that number is just how much it costs to fill the gas tank up once

      • msage@programming.dev
        link
        fedilink
        English
        arrow-up
        0
        ·
        6 months ago

        No, it’s not. OpenAI doesn’t spend all that money on R&D, they spent majority of it on the actual training (hardware, electricity).

        And that’s (supposedly) only $6M for Deepseek.

        So where is the lie?

        • froztbyte@awful.systems
          link
          fedilink
          English
          arrow-up
          0
          ·
          edit-2
          6 months ago

          shot:

          majority of it on the actual training (hardware, …)

          chaser:

          And that’s (supposedly) only $6M for Deepseek.

          citation:

          After experimentation with models with clusters of thousands of GPUs, High Flyer made an investment in 10,000 A100 GPUs in 2021 before any export restrictions. That paid off. As High-Flyer improved, they realized that it was time to spin off “DeepSeek” in May 2023 with the goal of pursuing further AI capabilities with more focus.

          So where is the lie?

          your post is asking a lot of questions already answered by your posting

          • msage@programming.dev
            link
            fedilink
            English
            arrow-up
            0
            ·
            6 months ago

            SemiAnalysis is “confident”

            They did not answer anything, only alluded.

            Just because they bought GPUs like everyone else doesn’t mean they could not train it cheaper.

            • self@awful.systems
              link
              fedilink
              English
              arrow-up
              1
              ·
              6 months ago

              standard “fuck off programming.dev” ban with a side of who the fuck cares. deepseek isn’t the good guys, you weird fucks don’t have to go to a nitpick war defending them, there’s no good guys in LLMs and generative AI. all these people are grifters, all of them are gaming the benchmarks they designed to be gamed, nobody’s getting good results out of this fucking mediocre technology.

      • bjorney@lemmy.ca
        link
        fedilink
        English
        arrow-up
        0
        ·
        6 months ago

        DeepSeek-V3 costs only 2.788M GPU hours for its full training. Assuming the rental price of the H800 GPU is $2 per GPU hour, our total training costs amount to only $5.576M. Note that the aforementioned costs include only the official training of DeepSeek-V3, excluding the costs associated with prior research and ablation experiments on architectures, algorithms, or data.

        Emphasis mine. Deepseek was very upfront that this 6m was training only. No other company includes r&d and salaries when they report model training costs, because those aren’t training costs

        • ebu@awful.systems
          link
          fedilink
          English
          arrow-up
          0
          ·
          edit-2
          6 months ago

          consider this paragraph from the Wall Street Journal:

          DeepSeek said training one of its latest models cost $5.6 million, compared with the $100 million to $1 billion range cited last year by Dario Amodei, chief executive of the AI developer Anthropic, as the cost of building a model.

          you’re arguing to me that they technically didn’t lie – but it’s pretty clear that some people walked away with a false impression of the cost of their product relative to their competitors’ products, and they financially benefitted from people believing in this false impression.

          • V0ldek@awful.systems
            link
            fedilink
            English
            arrow-up
            1
            ·
            6 months ago

            Okay I mean, I hate to somehow come to the defense of a slop company? But WSJ saying nonsense is really not their fault, like even that particular quote clearly says “DeepSeek said training one” cost $5.6M. That’s just a true statement. No one in their right mind includes the capital expenditure in that, the same way when you say “it took us 100h to train a model” that doesn’t include building a data center in those 100h.

            Beside whether they actually lied or not, it’s still immensely funny to me that they could’ve just told a blatant lie nobody factchecked and it shook the market to the fucking core wiping off like billions in valuation. Very real market based on very real fundamentals run by very serious adults.

            • ebu@awful.systems
              link
              fedilink
              English
              arrow-up
              1
              ·
              edit-2
              6 months ago

              i can admit it’s possible i’m being overly cynical here and it is just sloppy journalism on Raffaele Huang/his editor/the WSJ’s part. but i still think that it’s a little suspect on the grounds that we have no idea how many times they had to restart training due to the model borking, other experiments and hidden costs, even before things like the necessary capex (which goes unmentioned in the original paper – though they note using a 2048-GPU cluster of H800’s that would put them down around $40m). i’m thinking in the mode of “the whitepaper exists to serve the company’s bottom line”

              btw announcing my new V7 model that i trained for the $0.26 i found on the street just to watch the stock markets burn

              • V0ldek@awful.systems
                link
                fedilink
                English
                arrow-up
                1
                ·
                6 months ago

                but i still think that it’s a little suspect on the grounds that we have no idea how many times they had to restart training due to the model borking, other experiments and hidden cost

                Oh ye, I totally agree on this one. This entire genAI enterprise insults me on a fundamental level as a CS researcher, there’s zero transparency or reproducibility, no one reviews these claims, it’s a complete shitshow from terrible, terrible benchmarks, through shoddy methodology, up to untestable and bonkers claims.

                I have zero good faith for the press, though, they’re experts in painting any and all tech claims in the best light possible like their lives fucking depend on it. We wouldn’t be where we are right now if anyone at any “reputable” newspaper like WSJ asked one (1) question to Sam Altman like 3 years ago.