• brucethemoose@lemmy.world
    link
    fedilink
    arrow-up
    1
    ·
    edit-2
    28 days ago

    How useful would the training data be

    Open datasets are getting much better (Tulu for an instruct database/recipe is a great example), but its clear the giants still have “secret sauce” that gives them at least a small edge over open datasets.

    There actually seems to be some vindication of using massively multilingual datasets as well, as the hybrid chinese/english models are turning out very good.