Samples · lm_eval_harness.gsm8k
Run #75 · Adapter v1.0.0+humaneval-removed+gen-kwargs-pairing · 200/1319 Samples angezeigt
· Score 90.4%
KI-Auswertung
Generiert 2026-05-13 21:38 · claude-sonnet-4-6Zusammenfassung
Das Modell Qwen3-Coder-Next erreicht auf GSM8K eine Pass-Rate von 92,5 % (Score 90,4 %), was ein solides, aber nicht herausragendes Ergebnis für mehrstufige Grundschulmathematik darstellt.
Stärken
- Einfache und mittelschwere Rechenaufgaben werden zuverlässig und mit sauberem Rechenweg gelöst.
- Umrechnung von Einheiten sowie lineare Mehrstufenprobleme (Groceries, Pool-Füllungskosten, Prozentsätze) gelingen konsistent.
- Null Fehler (errors=0), das Modell bricht nie ab oder liefert ungültige Ausgaben.
Schwächen
- Aufgaben mit indirekten oder impliziten Bezügen werden falsch interpretiert, z. B. „10 % schneller laufen" wird als Zeitreduktion durch Divisor 1,1 statt als direkte Subtraktion behandelt.
- Off-by-one-Fehler bei inklusiven Zeiträumen (z. B. Gene-Quiltblock-Aufgabe: 12 statt 11 Jahre).
- Mehrdeutige Problemformulierungen verleiten zu Überanalyse, wodurch das Modell teils falsche Relationen (z. B. Lylah's Gehalt) einführt.
- Wahrscheinlichkeitsaufgaben: Das Modell berechnet korrekt, interpretiert die Frage jedoch falsch (relative statt absolute Differenz).
Auffälligkeiten
Wiederkehrendes Muster: Bei Aufgaben, die eine eindeutige, kurze Antwort erfordern, produziert das Modell ausführliche Alternativüberlegungen und verfehlt dabei das gesuchte einfache Ergebnis. Dies deutet auf eine Tendenz zur Überantwortung (verbosity bias) hin.
Empfehlung
Sampling-Temperatur senken (z. B. auf 0.0 oder greedy decoding), um das Modell bei klaren Zahlenaufgaben von spekulativen Alternativpfaden abzuhalten und die Pass-Rate weiter in Richtung 95 %+ zu treiben.
Übersicht
1319 SamplesVerteilung
Score-Histogramm
0.0 ────── 1.0
| Frage-ID | Status | Score | Prompt | Latenz | Tokens/s | TTFT | |
|---|---|---|---|---|---|---|---|
| 600 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Amanda\u… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 601 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Danny ma… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 602 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A single… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 603 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Cowboy M… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 604 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: One-thir… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 605 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: There ar… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 606 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Bryan's … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 607 | failed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Earl sta… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 608 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A cooler… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 609 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Jerry ch… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 610 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: For the … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 611 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Merry is… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 612 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A family… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 613 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Marly ha… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 614 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Cheryl a… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 615 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Carter c… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 616 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A giant … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 617 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Jacob ta… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 618 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Mark is … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 619 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: At the s… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 620 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Every tr… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 621 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: If it ta… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 622 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Aaron is… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 623 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: There we… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 624 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A garden… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 625 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: For brea… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 626 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: James ca… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 627 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Lagoon i… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 628 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Darcie i… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 629 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Liam is … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 630 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: On Monda… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 631 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Paul's g… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 632 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: My dog w… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 633 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Yesterda… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 634 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Last yea… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 635 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A group … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 636 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Marcus h… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 637 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: James de… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 638 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Dan owns… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 639 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Hannah b… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 640 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Gervais … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 641 | failed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Every Ha… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 642 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A pelica… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 643 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Aleesia … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 644 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: In a fie… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 645 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Lenny bo… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 646 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: James ha… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 647 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Tracy fe… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 648 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Steven h… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 649 | failed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Two frie… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 650 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Kate bou… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 651 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Suzanna'… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 652 | failed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: On Satur… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 653 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Gertrude… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 654 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Dale and… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 655 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Tim is s… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 656 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A chocol… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 657 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Archie i… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 658 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Ishmael … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 659 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: There ar… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 660 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: The chic… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 661 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: James wa… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 662 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Roberta … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 663 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: After fi… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 664 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A park h… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 665 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Emma is … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 666 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Mary has… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 667 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Rubert h… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 668 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Victoria… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 669 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: During A… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 670 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A trader… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 671 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: In a Mat… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 672 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: James dr… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 673 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: The P.T.… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 674 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Mary bou… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 675 | failed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: There we… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 676 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A plumbe… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 677 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Caleb ha… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 678 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Greta wo… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 679 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Tony dec… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 680 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Wendy ha… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 681 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Jane sew… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 682 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Jane sew… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 683 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: The numb… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 684 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Mary is … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 685 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Dane\u20… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 686 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Nathan i… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 687 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Mrs. Ama… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 688 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: You can … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 689 | failed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: James is… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 690 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Sunnyval… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 691 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Bill is … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 692 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Andre ca… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 693 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Kaiden a… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 694 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: At a bir… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 695 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Janet is… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 696 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Lemuel i… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 697 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Nellie c… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 698 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Sofia an… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 699 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: The mayo… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 700 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Parker a… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 701 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Jill is … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 702 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Mark con… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 703 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: John dec… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 704 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Jacob ca… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 705 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Mr. Will… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 706 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Mike wat… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 707 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Angus, P… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 708 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Tim watc… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 709 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A boat c… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 710 | failed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Among th… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 711 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: In a ric… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 712 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Wade is … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 713 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Mary is … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 714 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Billy ca… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 715 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Lena pla… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 716 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Matt can… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 717 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Austin a… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 718 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: John dec… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 719 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: The tall… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 720 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A soccer… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 721 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Molly go… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 722 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Emma has… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 723 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Bryan bo… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 724 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: 5 years … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 725 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: One of t… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 726 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A vendor… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 727 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Bob grew… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 728 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Andy har… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 729 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Lennon i… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 730 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Four run… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 731 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Kayla an… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 732 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: It takes… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 733 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Three to… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 734 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Luther m… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 735 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: The bask… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 736 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: When Nat… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 737 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A pie sh… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 738 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Mo is bu… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 739 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: The scho… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 740 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Darrel h… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 741 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: There ar… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 742 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A party … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 743 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A shopke… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 744 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: An avant… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 745 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Ian had … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 746 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Tom need… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 747 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Jean the… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 748 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Christy … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 749 | failed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: At a bir… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 750 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A vampir… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 751 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: It's Yve… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 752 | failed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Olaf is … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 753 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Tim has … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 754 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Silvio w… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 755 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Laura wa… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 756 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Gina has… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 757 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: 1,800 fi… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 758 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Amy bake… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 759 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A spiral… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 760 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Martin r… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 761 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: On Frida… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 762 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: In a fac… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 763 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Colin ra… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 764 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Mrs. Gar… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 765 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Vincent'… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 766 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Droid ow… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 767 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: There ar… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 768 | failed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Mark is … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 769 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Kelly, B… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 770 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Jangshe … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 771 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Jose thr… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 772 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Wendy ju… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 773 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A banana… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 774 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Preston … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 775 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Jenna se… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 776 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Marta is… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 777 | failed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: There ar… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 778 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Tate fin… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 779 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Nancy ha… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 780 | failed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Wyatt's … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 781 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Mark had… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 782 | failed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Tim host… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 783 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A swarm … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 784 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Jasper w… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 785 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: An arche… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 786 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Four cat… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 787 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: The coun… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 788 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Bryan wo… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 789 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A mounta… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 790 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: The gaug… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 791 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: The gree… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 792 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Four day… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 793 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Michael … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 794 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: On Black… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 795 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Mrs. Cro… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 796 | failed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Freddie … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 797 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Quinn ca… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 798 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: It takes… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 799 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Andrew b… | — | — | — | ||
|
Lade Detail …
|
|||||||