In the preview image (/ thumbnail?) it looks like the cat is strapped down on the floor like a giant.
Oh, the RECIPES TIPS & TRICKS npc! Gotta ask them if you need a hint.
The path is the destination.
In search of cheese, the spicy cosmocat came to earth.
In germany at least, if someone else’s bag is heavy, we ask if they’re carrying stones in it.
“Do you have stones in your purse?” “No. It is a stone!”
O2: I am the reason things burn.
+
H2: I make the sun burn.
=
H2O: I destroy houses by flooding, I kill people by drowning them, I am used by CIA for torture (waterboarding).
Maybe NaCl is just scared by the H2O in the soup and its parents.
An AI agent is just an intelligent agent, see https://en.wikipedia.org/wiki/Intelligent_agent.
Or do you mean that the things they call AI agents aren’t actually AI agents?
The paper didn’t include the exact details of this (which made me mad). But if there’s a person actively making parts of the work, and just using an AI chatbot as help, it’s not an AI agent, right, right? So I assumed it’s autonomous.
Title is misleading. It’s only outperforming some of the other participants. Also note that obviously not everyone is participating full try-hard.
In the first ctf, the top teams finish all 20 challenges in under an hour. Apparently it were simple challenges that could be solved with standard techniques:
We were impressed the humans could match AI speeds, and reached out to the human teams for comments. Participants attributed their ability to solve the challenges quickly to their extensive experience as professional CTF players, noting that they were familiar with the standard techniques commonly used to solve such problems.
They obviously also used tools. And so did the AI teams:
Most prompt tweaks were about:
[…]
• recommending particular tools that were easier for the LLM to use.
In the 2nd ctf (the bigger one with hard challenges), the AI teams only solved the easier ones, it looks like.
I haven’t looked at the actual challenges. Would be too much effort. And the paper doesn’t speak about the kind of challenges that were solved.
The 50% completion time looks to me like it’s flawed. If I understand it right, it’s assuming that each team is doing every task in parallel and starts directly, which is not possible if you don’t have enough (equally good) team members.
Don’t get me wrong, making an AIs that is able to solve such challenges autonomously at all is impressive. But I hate over-interpretation of results.
(Why did I waste my time again?)
For the uninformed: https://www.youtube.com/watch?v=aPkwGhwU_FY
The solution is obviously to learn german. Then you can watch with our excellent and easily intelligible dubs.
Give your cat a damn spoon!
Here’s a more lovely version: https://www.youtube.com/watch?v=AwUPTYDY2bM
“Dollar Tree” is an existing company?! I thought first (I read top to bottom) that it was a metaphor for a company that makes money “like a tree”.
Looks more like a fly agaric. (Oh god, difficult english words. In german we just call it a fly mushroom.)