Samples · lm_eval_harness.gsm8k
Run #75 · Adapter v1.0.0+humaneval-removed+gen-kwargs-pairing · 200/1319 Samples angezeigt
· Score 90.4%
KI-Auswertung
Generiert 2026-05-13 21:38 · claude-sonnet-4-6Zusammenfassung
Das Modell Qwen3-Coder-Next erreicht auf GSM8K eine Pass-Rate von 92,5 % (Score 90,4 %), was ein solides, aber nicht herausragendes Ergebnis für mehrstufige Grundschulmathematik darstellt.
Stärken
- Einfache und mittelschwere Rechenaufgaben werden zuverlässig und mit sauberem Rechenweg gelöst.
- Umrechnung von Einheiten sowie lineare Mehrstufenprobleme (Groceries, Pool-Füllungskosten, Prozentsätze) gelingen konsistent.
- Null Fehler (errors=0), das Modell bricht nie ab oder liefert ungültige Ausgaben.
Schwächen
- Aufgaben mit indirekten oder impliziten Bezügen werden falsch interpretiert, z. B. „10 % schneller laufen" wird als Zeitreduktion durch Divisor 1,1 statt als direkte Subtraktion behandelt.
- Off-by-one-Fehler bei inklusiven Zeiträumen (z. B. Gene-Quiltblock-Aufgabe: 12 statt 11 Jahre).
- Mehrdeutige Problemformulierungen verleiten zu Überanalyse, wodurch das Modell teils falsche Relationen (z. B. Lylah's Gehalt) einführt.
- Wahrscheinlichkeitsaufgaben: Das Modell berechnet korrekt, interpretiert die Frage jedoch falsch (relative statt absolute Differenz).
Auffälligkeiten
Wiederkehrendes Muster: Bei Aufgaben, die eine eindeutige, kurze Antwort erfordern, produziert das Modell ausführliche Alternativüberlegungen und verfehlt dabei das gesuchte einfache Ergebnis. Dies deutet auf eine Tendenz zur Überantwortung (verbosity bias) hin.
Empfehlung
Sampling-Temperatur senken (z. B. auf 0.0 oder greedy decoding), um das Modell bei klaren Zahlenaufgaben von spekulativen Alternativpfaden abzuhalten und die Pass-Rate weiter in Richtung 95 %+ zu treiben.
Übersicht
1319 SamplesVerteilung
Score-Histogramm
0.0 ────── 1.0
| Frage-ID | Status | Score | Prompt | Latenz | Tokens/s | TTFT | |
|---|---|---|---|---|---|---|---|
| 200 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Matias i… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 201 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Kylie ha… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 202 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Denny is… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 203 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Juniper,… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 204 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Alex was… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 205 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Libby ha… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 206 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: James bu… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 207 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: John com… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 208 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Isabel w… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 209 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Bill has… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 210 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A town i… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 211 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Ariana i… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 212 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Bert fil… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 213 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Eight pe… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 214 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Sammy ha… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 215 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: The sum … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 216 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: The mini… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 217 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Michael … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 218 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: The popu… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 219 | failed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Albert h… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 220 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A teache… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 221 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A window… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 222 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A footba… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 223 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Amy, Jer… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 224 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Jake spl… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 225 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Zoe want… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 226 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: James bu… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 227 | failed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Edward h… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 228 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Carla's … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 229 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: John is … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 230 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: When you… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 231 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A class … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 232 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Josie's … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 233 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Janet ma… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 234 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Big Joe … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 235 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Melody n… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 236 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: 14 less … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 237 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Claire w… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 238 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: An aqua … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 239 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Robby, J… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 240 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A bear i… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 241 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: At a sch… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 242 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Cristine… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 243 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: For brea… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 244 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: John dri… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 245 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: It takes… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 246 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Dante ne… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 247 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Marcel g… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 248 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Baking i… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 249 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Jack lea… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 250 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Jessie w… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 251 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Cindy wa… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 252 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Miley bo… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 253 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Barbara … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 254 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A desert… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 255 | failed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A gift s… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 256 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: The elec… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 257 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: There ar… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 258 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Mr. Alon… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 259 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A movie … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 260 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: John can… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 261 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Blake go… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 262 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Jenny is… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 263 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Aria has… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 264 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Haley ha… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 265 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Last yea… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 266 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Karen is… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 267 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A bird i… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 268 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Lilibeth… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 269 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Lucia is… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 270 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Summer a… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 271 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A tank c… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 272 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: There ar… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 273 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: John wri… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 274 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Sab and … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 275 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: There ar… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 276 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Shem mak… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 277 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A farmer… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 278 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Jesse re… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 279 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Sally se… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 280 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: On his c… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 281 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Enrique … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 282 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Simon ha… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 283 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Berry is… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 284 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Joanna a… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 285 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Jay & Gl… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 286 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Carla, K… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 287 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Rajesh w… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 288 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: There we… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 289 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Jake is … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 290 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Jonas is… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 291 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Jerry is… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 292 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: John and… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 293 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Eric dec… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 294 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Maria bo… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 295 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Kris has… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 296 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: The tota… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 297 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Kwame st… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 298 | failed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Three pl… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 299 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Amanda's… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 300 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Steve's … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 301 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Grandma … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 302 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Stephani… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 303 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Parker i… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 304 | failed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: When thr… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 305 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Milly is… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 306 | failed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A pet sh… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 307 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Tom uses… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 308 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: In the p… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 309 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Reggie a… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 310 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Darla ha… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 311 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Tommy is… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 312 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: During o… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 313 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Paul mad… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 314 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Peggy is… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 315 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: If it ta… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 316 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Haley is… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 317 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Jason is… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 318 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A couple… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 319 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Johnny w… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 320 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Southton… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 321 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: The popu… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 322 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Mr. Lang… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 323 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Jeremy b… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 324 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Shelly a… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 325 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: There ar… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 326 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: On a con… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 327 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Alyssa a… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 328 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Teairra … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 329 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Dikembe … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 330 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Peter an… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 331 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Alfonso … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 332 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Peter is… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 333 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A small … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 334 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A spider… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 335 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: John wri… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 336 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A pole i… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 337 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A school… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 338 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Ishmael … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 339 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: George's… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 340 | failed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Three lo… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 341 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Jesse is… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 342 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Mark col… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 343 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Lorin ha… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 344 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Sophie's… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 345 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: The pet … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 346 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Pauline … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 347 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Theo can… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 348 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: There ar… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 349 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Mark has… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 350 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Mike can… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 351 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Layla is… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 352 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: James ha… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 353 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Carol fi… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 354 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: To make … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 355 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Hannah b… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 356 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Baking i… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 357 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Carson c… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 358 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Bert mad… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 359 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Tracy se… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 360 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Dana nor… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 361 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Abigail … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 362 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: John has… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 363 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Summer p… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 364 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Tonya ha… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 365 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: The Zarg… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 366 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Andrew b… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 367 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: One of t… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 368 | failed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: There ar… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 369 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Maya's o… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 370 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Archie r… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 371 | failed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Abe find… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 372 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: On our l… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 373 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A train … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 374 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A printi… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 375 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Calum ru… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 376 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Anna has… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 377 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Salaria … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 378 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Based on… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 379 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Nedy can… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 380 | failed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: On Monda… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 381 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Mary is … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 382 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Tom uses… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 383 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A hand-c… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 384 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: The dist… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 385 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Hartley … | — | — | — | ||
|
Lade Detail …
|
|||||||
| 386 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Jeff had… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 387 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A single… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 388 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Gretchen… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 389 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: There ar… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 390 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: TJ ran a… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 391 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: One hund… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 392 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Seven pa… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 393 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Johnny i… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 394 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Tim buys… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 395 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Rich ran… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 396 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Celina e… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 397 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Each ban… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 398 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: The numb… | — | — | — | ||
|
Lade Detail …
|
|||||||
| 399 | passed | {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: There we… | — | — | — | ||
|
Lade Detail …
|
|||||||