Samples · lm_eval_harness.gsm8k
KI-Auswertung
Generiert 2026-05-12 19:41 · claude-sonnet-4-6Zusammenfassung
Das Modell `mlx-community/Qwen3-Coder-Next` erreicht auf GSM8K eine Pass-Rate von 87,6 % bei 2638 bewerteten Aufgaben und zeigt damit solide, aber nicht fehlerfreie Leistung bei mehrstufigen Grundschul-Rechenaufgaben.
Stärken
- Korrekte Lösungen bei einfachen bis mittelkomplexen mehrstufigen Aufgaben (Prozentrechnung, Einheitenumrechnung, lineare Gleichungen)
- Strukturierte Schritt-für-Schritt-Darstellung mit klaren Zwischenergebnissen
- Keine Fehler (0 Errors), das Modell gibt stets eine Antwort
Schwächen
- Fehler bei Aufgaben mit rückwärts gerichteter Logik (Reverse-Engineering von Startwerten, z. B. Staubsauger-Aufgabe)
- Fehler bei Ambiguität in Prozentangaben: Das Modell interpretiert „Zeitverbesserung durch 10 % mehr Geschwindigkeit" als inverse Beziehung statt direkter Reduktion
- Fehler bei off-by-one-Problemen in Jahresberechnungen (Gene-Quilt-Aufgabe: 12 statt 11 Jahre)
- Teilweise abgeschnittene Antworten (Response endet mitten im Satz/Rechnung), was auf Token-Limit-Probleme hindeuten kann
Auffälligkeiten
Wiederkehrendes Muster: Das Modell rechnet korrekt, zieht aber falsche Schlüsse am Ende — insbesondere bei Aufgaben, die implizite Konventionen voraussetzen (z. B. „Wert steigt um X %" bezogen auf Kaufpreis vs. Reparaturkosten). Zudem treten Abbrüche in den Responses auf, was auf ein zu niedrig gesetztes `max_tokens`-Limit hindeutet.
Empfehlung
`max_new_tokens` erhöhen, um abgeschnittene Antworten zu vermeiden, und anschließend den Sub-Benchmark für Aufgaben mit Prozentrechnung und Rückwärtslogik gezielt re-evaluieren, um zu prüfen, ob die Fehlerrate dort überproportional hoch ist.
Übersicht
1319 Samples| Frage-ID | Status | Score | Prompt | Latenz | Tokens/s | TTFT | |
|---|---|---|---|---|---|---|---|
| 0 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Jen and … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Steve fi… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 2 | failed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Tom can … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 3 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Half of … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 4 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Surfers … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 5 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Ivan had… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 6 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Stella w… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 7 | failed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Ravi can… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 8 | failed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Goldie m… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 9 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: James li… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 10 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Yves and… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 11 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: James bi… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 12 | failed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: It will … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 13 | failed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Jerome i… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 14 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: James re… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 15 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Tamara i… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 16 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: There ar… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 17 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Bekah ha… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 18 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A beadsh… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 19 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Peter ne… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 20 | failed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Abel lea… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 21 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Aida has… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 22 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Samantha… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 23 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Jake agr… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 24 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Ronald c… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 25 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Erik's d… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 26 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Since 19… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 27 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Ines had… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 28 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Eliza ca… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 29 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Francis … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 30 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A labora… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 31 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Rob plan… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 32 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Pete wal… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 33 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A ship l… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 34 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A fruit … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 35 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Luther m… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 36 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Andy bak… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 37 | failed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Jake and… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 38 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Greg's P… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 39 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Federal … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 40 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A pound … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 41 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Lindsey … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 42 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Janice g… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 43 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Alex is … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 44 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Karen pa… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 45 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Danai is… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 46 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Alexande… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 47 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Janet ma… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 48 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Anthony … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 49 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Carla is… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 50 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: One batc… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 51 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Belle ea… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 52 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: If Sally… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 53 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Manny is… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 54 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Pete's m… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 55 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Fabian i… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 56 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: At the z… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 57 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Mr. Maxi… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 58 | failed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: In a 50-… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 59 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Andy get… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 60 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A school… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 61 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Mark hir… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 62 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Jerry ha… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 63 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Chuck ca… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 64 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Jordan i… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 65 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Kate has… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 66 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Kingsley… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 67 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Rebecca … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 68 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Carly ha… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 69 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A car us… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 70 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: In a cit… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 71 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: There ar… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 72 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Bob and … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 73 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Camille … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 74 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Three bl… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 75 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Jill bou… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 76 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Marj has… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 77 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Bernie l… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 78 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: At footb… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 79 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Tim used… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 80 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: It raine… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 81 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: John dec… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 82 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: John goe… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 83 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A superm… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 84 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Lassie e… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 85 | failed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Katie ha… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 86 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Nancy is… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 87 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Jane's g… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 88 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Jane's m… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 89 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Jack is … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 90 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Keenan n… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 91 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: The rate… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 92 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Daisy is… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 93 | failed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Lena is … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 94 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Martha n… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 95 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Farrah o… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 96 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Free Chr… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 97 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Jerry is… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 98 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Andrew h… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 99 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: In the s… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 100 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Hannah b… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 101 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Angela's… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 102 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Tom can … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 103 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Martin r… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 104 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Lard dec… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 105 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Kimberly… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 106 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: If Billy… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 107 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Mary wen… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 108 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Janice n… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 109 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: For the … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 110 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: John is … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 111 | failed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: If Jason… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 112 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Janice n… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 113 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A show d… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 114 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Mitzi br… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 115 | failed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: In a sho… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 116 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Steve fi… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 117 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Daria is… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 118 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Frank an… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 119 | failed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Michael … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 120 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Joseph g… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 121 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Dawn ear… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 122 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Carla's … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 123 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Anthony … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 124 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Joan is … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 125 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: There ar… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 126 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: The Chry… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 127 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Joshua, … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 128 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Al is 25… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 129 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: There ar… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 130 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Leticia,… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 131 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Mark bui… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 132 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: The thre… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 133 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Team Soc… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 134 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Gina can… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 135 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Kate bou… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 136 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Maddie w… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 137 | failed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: The bask… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 138 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A newspa… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 139 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Alicia h… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 140 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Clayton … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 141 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Lena pla… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 142 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: It takes… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 143 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: In 2004,… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 144 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Ali has … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 145 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Sarah wa… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 146 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Xena is … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 147 | failed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Mr. Smit… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 148 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: John buy… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 149 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Lucas' f… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 150 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: In Dana'… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 151 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Tim has … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 152 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Max was … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 153 | failed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Jack ord… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 154 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Mrs. Lop… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 155 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: The groc… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 156 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: If Tony … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 157 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Michael … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 158 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Mike and… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 159 | failed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Ben has … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 160 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Tino has… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 161 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Rick too… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 162 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: There ar… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 163 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: There ar… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 164 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Micah ca… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 165 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Mark is … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 166 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A third … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 167 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Juan bou… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 168 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Ellie we… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 169 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: John buy… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 170 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: There we… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 171 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Billy an… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 172 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Kennedy'… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 173 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: In a sch… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 174 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: The gove… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 175 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Kekai's … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 176 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Brayden … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 177 | failed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Maggie h… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 178 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Janice's… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 179 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A carpen… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 180 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Aaron pa… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 181 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: The area… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 182 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Elvis an… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 183 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A vegan … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 184 | failed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Gilbert … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 185 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: There we… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 186 | failed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Michael … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 187 | failed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: It takes… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 188 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Nurse Mi… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 189 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Ibrahim … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 190 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Bill dec… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 191 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: After Be… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 192 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A typica… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 193 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: James du… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 194 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Christi … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 195 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Monica i… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 196 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Johns go… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 197 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: John buy… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 198 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: The nove… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 199 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Mark bou… | — | — | — | ||
|
Lade Detail …
|
|||||||