Samples · lm_eval_harness.gsm8k
Run #75 · Adapter v1.0.0+humaneval-removed+gen-kwargs-pairing · 200/1319 Samples angezeigt
· Score 90.4%
KI-Auswertung
Generiert 2026-05-13 21:38 · claude-sonnet-4-6Zusammenfassung
Das Modell Qwen3-Coder-Next erreicht auf GSM8K eine Pass-Rate von 92,5 % (Score 90,4 %), was ein solides, aber nicht herausragendes Ergebnis für mehrstufige Grundschulmathematik darstellt.
Stärken
- Einfache und mittelschwere Rechenaufgaben werden zuverlässig und mit sauberem Rechenweg gelöst.
- Umrechnung von Einheiten sowie lineare Mehrstufenprobleme (Groceries, Pool-Füllungskosten, Prozentsätze) gelingen konsistent.
- Null Fehler (errors=0), das Modell bricht nie ab oder liefert ungültige Ausgaben.
Schwächen
- Aufgaben mit indirekten oder impliziten Bezügen werden falsch interpretiert, z. B. „10 % schneller laufen" wird als Zeitreduktion durch Divisor 1,1 statt als direkte Subtraktion behandelt.
- Off-by-one-Fehler bei inklusiven Zeiträumen (z. B. Gene-Quiltblock-Aufgabe: 12 statt 11 Jahre).
- Mehrdeutige Problemformulierungen verleiten zu Überanalyse, wodurch das Modell teils falsche Relationen (z. B. Lylah's Gehalt) einführt.
- Wahrscheinlichkeitsaufgaben: Das Modell berechnet korrekt, interpretiert die Frage jedoch falsch (relative statt absolute Differenz).
Auffälligkeiten
Wiederkehrendes Muster: Bei Aufgaben, die eine eindeutige, kurze Antwort erfordern, produziert das Modell ausführliche Alternativüberlegungen und verfehlt dabei das gesuchte einfache Ergebnis. Dies deutet auf eine Tendenz zur Überantwortung (verbosity bias) hin.
Empfehlung
Sampling-Temperatur senken (z. B. auf 0.0 oder greedy decoding), um das Modell bei klaren Zahlenaufgaben von spekulativen Alternativpfaden abzuhalten und die Pass-Rate weiter in Richtung 95 %+ zu treiben.
Übersicht
1319 SamplesVerteilung
Score-Histogramm
0.0 ────── 1.0
| Frage-ID | Status | Score | Prompt | Latenz | Tokens/s | TTFT | |
|---|---|---|---|---|---|---|---|
| 0 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Jen and … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 1 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Steve fi… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 2 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Tom can … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 3 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Half of … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 4 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Surfers … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 5 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Ivan had… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 6 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Stella w… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 7 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Ravi can… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 8 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Goldie m… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 9 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: James li… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 10 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Yves and… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 11 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: James bi… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 12 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: It will … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 13 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Jerome i… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 14 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: James re… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 15 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Tamara i… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 16 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: There ar… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 17 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Bekah ha… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 18 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A beadsh… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 19 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Peter ne… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 20 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Abel lea… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 21 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Aida has… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 22 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Samantha… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 23 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Jake agr… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 24 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Ronald c… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 25 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Erik's d… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 26 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Since 19… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 27 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Ines had… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 28 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Eliza ca… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 29 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Francis … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 30 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A labora… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 31 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Rob plan… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 32 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Pete wal… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 33 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A ship l… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 34 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A fruit … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 35 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Luther m… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 36 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Andy bak… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 37 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Jake and… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 38 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Greg's P… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 39 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Federal … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 40 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A pound … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 41 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Lindsey … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 42 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Janice g… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 43 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Alex is … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 44 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Karen pa… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 45 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Danai is… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 46 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Alexande… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 47 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Janet ma… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 48 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Anthony … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 49 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Carla is… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 50 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: One batc… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 51 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Belle ea… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 52 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: If Sally… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 53 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Manny is… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 54 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Pete's m… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 55 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Fabian i… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 56 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: At the z… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 57 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Mr. Maxi… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 58 | failed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: In a 50-… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 59 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Andy get… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 60 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A school… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 61 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Mark hir… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 62 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Jerry ha… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 63 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Chuck ca… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 64 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Jordan i… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 65 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Kate has… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 66 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Kingsley… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 67 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Rebecca … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 68 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Carly ha… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 69 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A car us… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 70 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: In a cit… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 71 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: There ar… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 72 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Bob and … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 73 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Camille … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 74 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Three bl… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 75 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Jill bou… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 76 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Marj has… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 77 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Bernie l… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 78 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: At footb… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 79 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Tim used… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 80 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: It raine… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 81 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: John dec… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 82 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: John goe… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 83 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A superm… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 84 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Lassie e… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 85 | failed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Katie ha… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 86 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Nancy is… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 87 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Jane's g… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 88 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Jane's m… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 89 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Jack is … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 90 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Keenan n… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 91 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: The rate… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 92 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Daisy is… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 93 | failed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Lena is … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 94 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Martha n… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 95 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Farrah o… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 96 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Free Chr… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 97 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Jerry is… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 98 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Andrew h… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 99 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: In the s… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 100 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Hannah b… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 101 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Angela's… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 102 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Tom can … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 103 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Martin r… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 104 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Lard dec… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 105 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Kimberly… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 106 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: If Billy… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 107 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Mary wen… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 108 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Janice n… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 109 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: For the … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 110 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: John is … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 111 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: If Jason… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 112 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Janice n… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 113 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A show d… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 114 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Mitzi br… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 115 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: In a sho… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 116 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Steve fi… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 117 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Daria is… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 118 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Frank an… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 119 | failed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Michael … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 120 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Joseph g… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 121 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Dawn ear… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 122 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Carla's … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 123 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Anthony … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 124 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Joan is … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 125 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: There ar… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 126 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: The Chry… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 127 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Joshua, … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 128 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Al is 25… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 129 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: There ar… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 130 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Leticia,… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 131 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Mark bui… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 132 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: The thre… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 133 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Team Soc… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 134 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Gina can… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 135 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Kate bou… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 136 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Maddie w… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 137 | failed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: The bask… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 138 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A newspa… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 139 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Alicia h… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 140 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Clayton … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 141 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Lena pla… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 142 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: It takes… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 143 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: In 2004,… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 144 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Ali has … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 145 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Sarah wa… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 146 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Xena is … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 147 | failed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Mr. Smit… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 148 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: John buy… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 149 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Lucas' f… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 150 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: In Dana'… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 151 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Tim has … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 152 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Max was … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 153 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Jack ord… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 154 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Mrs. Lop… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 155 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: The groc… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 156 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: If Tony … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 157 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Michael … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 158 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Mike and… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 159 | failed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Ben has … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 160 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Tino has… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 161 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Rick too… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 162 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: There ar… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 163 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: There ar… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 164 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Micah ca… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 165 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Mark is … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 166 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A third … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 167 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Juan bou… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 168 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Ellie we… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 169 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: John buy… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 170 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: There we… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 171 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Billy an… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 172 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Kennedy'… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 173 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: In a sch… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 174 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: The gove… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 175 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Kekai's … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 176 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Brayden … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 177 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Maggie h… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 178 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Janice's… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 179 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A carpen… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 180 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Aaron pa… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 181 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: The area… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 182 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Elvis an… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 183 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A vegan … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 184 | failed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Gilbert … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 185 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: There we… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 186 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Michael … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 187 | failed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: It takes… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 188 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Nurse Mi… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 189 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Ibrahim … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 190 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Bill dec… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 191 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: After Be… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 192 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A typica… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 193 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: James du… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 194 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Christi … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 195 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Monica i… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 196 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Johns go… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 197 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: John buy… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 198 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: The nove… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 199 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Mark bou… | — | — | — | ||
|
Lade Detail …
|
|||||||
200 von 1319 Samples · Limit 200
Nächste ›