Samples · lm_eval_harness.gsm8k
Run #75 · Adapter v1.0.0+humaneval-removed+gen-kwargs-pairing · 119/1319 Samples angezeigt
· Score 90.4%
KI-Auswertung
Generiert 2026-05-13 21:38 · claude-sonnet-4-6Zusammenfassung
Das Modell Qwen3-Coder-Next erreicht auf GSM8K eine Pass-Rate von 92,5 % (Score 90,4 %), was ein solides, aber nicht herausragendes Ergebnis für mehrstufige Grundschulmathematik darstellt.
Stärken
- Einfache und mittelschwere Rechenaufgaben werden zuverlässig und mit sauberem Rechenweg gelöst.
- Umrechnung von Einheiten sowie lineare Mehrstufenprobleme (Groceries, Pool-Füllungskosten, Prozentsätze) gelingen konsistent.
- Null Fehler (errors=0), das Modell bricht nie ab oder liefert ungültige Ausgaben.
Schwächen
- Aufgaben mit indirekten oder impliziten Bezügen werden falsch interpretiert, z. B. „10 % schneller laufen" wird als Zeitreduktion durch Divisor 1,1 statt als direkte Subtraktion behandelt.
- Off-by-one-Fehler bei inklusiven Zeiträumen (z. B. Gene-Quiltblock-Aufgabe: 12 statt 11 Jahre).
- Mehrdeutige Problemformulierungen verleiten zu Überanalyse, wodurch das Modell teils falsche Relationen (z. B. Lylah's Gehalt) einführt.
- Wahrscheinlichkeitsaufgaben: Das Modell berechnet korrekt, interpretiert die Frage jedoch falsch (relative statt absolute Differenz).
Auffälligkeiten
Wiederkehrendes Muster: Bei Aufgaben, die eine eindeutige, kurze Antwort erfordern, produziert das Modell ausführliche Alternativüberlegungen und verfehlt dabei das gesuchte einfache Ergebnis. Dies deutet auf eine Tendenz zur Überantwortung (verbosity bias) hin.
Empfehlung
Sampling-Temperatur senken (z. B. auf 0.0 oder greedy decoding), um das Modell bei klaren Zahlenaufgaben von spekulativen Alternativpfaden abzuhalten und die Pass-Rate weiter in Richtung 95 %+ zu treiben.
Übersicht
1319 SamplesVerteilung
Score-Histogramm
0.0 ────── 1.0
| Frage-ID | Status | Score | Prompt | Latenz | Tokens/s | TTFT | |
|---|---|---|---|---|---|---|---|
| 1200 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: James is… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1201 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Josh and… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1202 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: John tak… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1203 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Winston … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1204 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Marissa … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1205 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Monika w… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1206 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Mikaela … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1207 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: 1 chocol… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1208 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Betty ha… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1209 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Brian ca… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1210 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Frank ha… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1211 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Joyce, M… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1212 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A pad of… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1213 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Carson n… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1214 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Percy wa… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1215 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A window… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1216 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A 10 met… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1217 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Miggy's … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1218 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A bicycl… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1219 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: On the i… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1220 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Jerry is… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1221 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Paul, a … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1222 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Two bowl… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1223 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Kurt's o… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1224 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Mary is … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1225 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Juanita … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1226 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Gunther … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1227 | failed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A questi… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1228 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A bird i… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1229 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Sam is s… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1230 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Marj has… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1231 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Tina dec… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1232 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Justice … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1233 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: James bu… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1234 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Two year… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1235 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Brandon … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1236 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Ali and … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1237 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Dan spen… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1238 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Luther m… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1239 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Laura lo… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1240 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Jackson … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1241 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Felix is… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1242 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: The Peri… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1243 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Theresa … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1244 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Cassy pa… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1245 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: John wri… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1246 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Lorie ha… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1247 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Five cow… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1248 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Ivan has… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1249 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Jean has… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1250 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: John man… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1251 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Bill is … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1252 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Kimberly… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1253 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: There ar… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1254 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Dianne r… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1255 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A compan… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1256 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: There ar… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1257 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A plane … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1258 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Hani sai… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1259 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Ember is… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1260 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Emmett d… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1261 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Viviana … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1262 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: To earn … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1263 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: The jour… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1264 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: John's p… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1265 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: One batc… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1266 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Charlie … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1267 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Two days… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1268 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Kim has … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1269 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Jose bou… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1270 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: The batt… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1271 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Ahmed is… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1272 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Anika ha… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1273 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Mark con… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1274 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Hattie a… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1275 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: There we… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1276 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A bond p… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1277 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: 9 years … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1278 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A grocer… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1279 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Kim driv… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1280 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Joe goes… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1281 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Archie h… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1282 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A rectan… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1283 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A bag of… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1284 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A buildi… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1285 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Chris ha… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1286 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Kendra m… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1287 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: An apple… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1288 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A tank w… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1289 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: James ha… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1290 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Elsa sta… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1291 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Flies ar… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1292 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Celina e… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1293 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Hannah h… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1294 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: An artis… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1295 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Jon make… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1296 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Jeannie … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1297 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: The bake… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1298 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: The teac… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1299 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Sally to… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1300 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: John's s… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1301 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Maria is… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1302 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Jason is… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1303 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Each wee… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1304 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Beth is … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1305 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: The stad… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1306 | failed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Mary is … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1307 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A portab… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1308 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: In ten y… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1309 | failed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: If Clove… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1310 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A pen is… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1311 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Jake buy… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1312 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: You draw… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1313 | failed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: OpenAI r… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1314 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: If Jason… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1315 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Jessica'… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1316 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A packag… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1317 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Magdalen… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1318 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A car us… | — | — | — | ||
|
Lade Detail …
|
|||||||
119 von 1319 Samples · Limit 200
‹ Vorherige