• brucethemoose@lemmy.world
    link
    fedilink
    English
    arrow-up
    2
    ·
    edit-2
    2 days ago

    Running the model can be no more taxing than playing a modern video game, except the load is not constant.

    This is not true, Deepseek R1 is huge. There’s a lot of confusion between the smaller distillations based on Qwen 2.5 (some that can run on consumer GPUs), and the “full” Deepseek R1 based on Deepseekv3

    Your point mostly stands, but the “full” model is hundreds of gigabytes, and the paper mentioned something like a bank of 370 GPUs being optimal for hosting. It’s very efficient because its only like 30B active, which is bonkers, but still.