I’m @froztbyte more or less everywhere that matters

  • 4 Posts
  • 187 Comments
Joined 3 years ago
cake
Cake day: July 2nd, 2023

help-circle





  • in today’s news about magical prompts that super totes give you superpowers:

    We introduced SKILLSBENCH, the first benchmark to systematically evaluate Agent Skills as first-class artifacts. Across 84 tasks, 7 agent-model configurations, and 7,308 trajectories under three conditions (no Skills, curated Skills, self-generated Skills), our evaluation yields four key findings: (1) curated Skills provide substantial but variable benefit (+16.2 percentage points average, with high variance across domains and configurations); (2) self-generated Skills provide negligible or negative benefit (–1.3pp average), demonstrating that effective Skills require human-curated domain expertise

    I am jack’s surprised face

    …and given I have other yaks, I shall not step on my “software and tools don’t have to suck” soapbox right now