Samples · lm_eval_harness.gsm8k
Run #75 · Adapter v1.0.0+humaneval-removed+gen-kwargs-pairing · 200/1319 Samples angezeigt
· Score 90.4%
KI-Auswertung
Generiert 2026-05-13 21:38 · claude-sonnet-4-6Zusammenfassung
Das Modell Qwen3-Coder-Next erreicht auf GSM8K eine Pass-Rate von 92,5 % (Score 90,4 %), was ein solides, aber nicht herausragendes Ergebnis für mehrstufige Grundschulmathematik darstellt.
Stärken
- Einfache und mittelschwere Rechenaufgaben werden zuverlässig und mit sauberem Rechenweg gelöst.
- Umrechnung von Einheiten sowie lineare Mehrstufenprobleme (Groceries, Pool-Füllungskosten, Prozentsätze) gelingen konsistent.
- Null Fehler (errors=0), das Modell bricht nie ab oder liefert ungültige Ausgaben.
Schwächen
- Aufgaben mit indirekten oder impliziten Bezügen werden falsch interpretiert, z. B. „10 % schneller laufen" wird als Zeitreduktion durch Divisor 1,1 statt als direkte Subtraktion behandelt.
- Off-by-one-Fehler bei inklusiven Zeiträumen (z. B. Gene-Quiltblock-Aufgabe: 12 statt 11 Jahre).
- Mehrdeutige Problemformulierungen verleiten zu Überanalyse, wodurch das Modell teils falsche Relationen (z. B. Lylah's Gehalt) einführt.
- Wahrscheinlichkeitsaufgaben: Das Modell berechnet korrekt, interpretiert die Frage jedoch falsch (relative statt absolute Differenz).
Auffälligkeiten
Wiederkehrendes Muster: Bei Aufgaben, die eine eindeutige, kurze Antwort erfordern, produziert das Modell ausführliche Alternativüberlegungen und verfehlt dabei das gesuchte einfache Ergebnis. Dies deutet auf eine Tendenz zur Überantwortung (verbosity bias) hin.
Empfehlung
Sampling-Temperatur senken (z. B. auf 0.0 oder greedy decoding), um das Modell bei klaren Zahlenaufgaben von spekulativen Alternativpfaden abzuhalten und die Pass-Rate weiter in Richtung 95 %+ zu treiben.
Übersicht
1319 SamplesVerteilung
Score-Histogramm
0.0 ────── 1.0
| Frage-ID | Status | Score | Prompt | Latenz | Tokens/s | TTFT | |
|---|---|---|---|---|---|---|---|
| 400 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: It costs… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 401 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: There ar… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 402 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: If Jason… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 403 | failed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Traci an… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 404 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Nadine w… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 405 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: There ar… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 406 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A snack … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 407 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Jeremy b… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 408 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Emma got… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 409 | failed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: There ar… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 410 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: There ar… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 411 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Yuan is … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 412 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A sack o… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 413 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Malcolm … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 414 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: John buy… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 415 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Melody n… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 416 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Aliens a… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 417 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Ashton h… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 418 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Tina sav… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 419 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Four yea… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 420 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Mark pla… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 421 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Ursula i… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 422 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: To win a… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 423 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Brian li… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 424 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Ten perc… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 425 | failed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Tim has … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 426 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Marie, t… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 427 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Tommy or… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 428 | failed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Verna we… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 429 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Every da… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 430 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Each yea… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 431 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Jenine c… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 432 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Kim drin… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 433 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: The cost… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 434 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Jack is … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 435 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: While bi… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 436 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Jane bri… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 437 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Class A … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 438 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A book c… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 439 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Bob has … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 440 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Darius, … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 441 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Three bl… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 442 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: James ha… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 443 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: At Mario… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 444 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Toby hel… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 445 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Christin… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 446 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Sara and… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 447 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Brenda w… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 448 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: With her… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 449 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Jim\u201… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 450 | failed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Nick hid… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 451 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Billy wa… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 452 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: At the f… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 453 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Dawn ear… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 454 | failed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Nicky we… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 455 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: In a sur… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 456 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Tommy ha… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 457 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: One dand… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 458 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Last yea… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 459 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: It takes… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 460 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: John buy… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 461 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Toby is … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 462 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: James de… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 463 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Greg is … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 464 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Jaron wa… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 465 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Sara got… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 466 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Jennifer… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 467 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: The mayo… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 468 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Dolly wa… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 469 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A jar on… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 470 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A pen co… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 471 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A jet tr… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 472 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: In a sin… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 473 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Leo and … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 474 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Andrew i… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 475 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Robi and… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 476 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Aunt Ang… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 477 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Susan is… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 478 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Buffy ha… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 479 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Louis is… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 480 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Chang's … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 481 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Thomas b… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 482 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A case o… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 483 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Miss Ais… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 484 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Two trai… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 485 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Isabella… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 486 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Ivy drin… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 487 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A plumbe… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 488 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A new ed… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 489 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Paul is … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 490 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Uncle Br… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 491 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Karen ba… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 492 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Sally an… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 493 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Trevor a… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 494 | failed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Tom buys… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 495 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Toby is … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 496 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: If Beth … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 497 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Piazzano… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 498 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: There ar… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 499 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Ursula w… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 500 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Jerry pa… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 501 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Fabian i… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 502 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: An alien… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 503 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Last wee… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 504 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Sienna g… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 505 | failed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Reynald … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 506 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Austin h… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 507 | failed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Tatuya, … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 508 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A basket… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 509 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: At a bir… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 510 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: George w… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 511 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: I caught… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 512 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Carson i… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 513 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Billy ma… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 514 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: John buy… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 515 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A school… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 516 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: If there… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 517 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Patricia… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 518 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Chelsea … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 519 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Rob has … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 520 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Allie pi… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 521 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: John mak… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 522 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: At Mario… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 523 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Adam bou… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 524 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Winston … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 525 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Sophie d… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 526 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A bird w… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 527 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A TV sho… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 528 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Pat\u201… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 529 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Henry ha… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 530 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Weng ear… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 531 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: An unusu… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 532 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Ginger l… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 533 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Princess… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 534 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Meadow h… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 535 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: John use… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 536 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Julie st… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 537 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Ginger o… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 538 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: 68% of a… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 539 | failed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Mr. Rock… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 540 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Rhea buy… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 541 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Five coa… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 542 | failed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Daniel h… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 543 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Ellie ha… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 544 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A charit… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 545 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Mr. Smit… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 546 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Janet go… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 547 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Mary bou… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 548 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Isabella… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 549 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: The cost… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 550 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: The rate… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 551 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Susie ha… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 552 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: In a Mat… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 553 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Jack ord… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 554 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Melony m… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 555 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Barry ha… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 556 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Yanna bo… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 557 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A Printi… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 558 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: I caught… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 559 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Penelope… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 560 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Calvin i… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 561 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Annie wa… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 562 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Mr. and … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 563 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Sam is t… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 564 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Hallie i… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 565 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A young … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 566 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Ella spe… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 567 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: James ma… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 568 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Jerry ca… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 569 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Malcolm … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 570 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Carrie i… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 571 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: In a hou… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 572 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: John buy… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 573 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A monkey… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 574 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Kyle bik… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 575 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Mr. Finn… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 576 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A wildli… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 577 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Tom goes… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 578 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Braden h… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 579 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Last yea… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 580 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Dale and… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 581 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Jeff com… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 582 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A casino… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 583 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Gretchen… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 584 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Jordan r… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 585 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: The peri… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 586 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Lana and… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 587 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: When the… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 588 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: To get t… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 589 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Alex is … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 590 | failed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Mandy is… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 591 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Jane bou… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 592 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: The rati… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 593 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Grandma … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 594 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Peter kn… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 595 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Luigi bo… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 596 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Nick is … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 597 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: The area… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 598 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: The news… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 599 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: 20% of t… | — | — | — | ||
|
Lade Detail …
|
|||||||