Samples · lm_eval_harness.gsm8k

Run #75 · Adapter v1.0.0+humaneval-removed+gen-kwargs-pairing · 200/1319 Samples angezeigt · Score 90.4%
‹ Zurück zum Run-Detail

KI-Auswertung

Generiert 2026-05-13 21:38 · claude-sonnet-4-6

Zusammenfassung

Das Modell Qwen3-Coder-Next erreicht auf GSM8K eine Pass-Rate von 92,5 % (Score 90,4 %), was ein solides, aber nicht herausragendes Ergebnis für mehrstufige Grundschulmathematik darstellt.

Stärken

  • Einfache und mittelschwere Rechenaufgaben werden zuverlässig und mit sauberem Rechenweg gelöst.
  • Umrechnung von Einheiten sowie lineare Mehrstufenprobleme (Groceries, Pool-Füllungskosten, Prozentsätze) gelingen konsistent.
  • Null Fehler (errors=0), das Modell bricht nie ab oder liefert ungültige Ausgaben.

Schwächen

  • Aufgaben mit indirekten oder impliziten Bezügen werden falsch interpretiert, z. B. „10 % schneller laufen" wird als Zeitreduktion durch Divisor 1,1 statt als direkte Subtraktion behandelt.
  • Off-by-one-Fehler bei inklusiven Zeiträumen (z. B. Gene-Quiltblock-Aufgabe: 12 statt 11 Jahre).
  • Mehrdeutige Problemformulierungen verleiten zu Überanalyse, wodurch das Modell teils falsche Relationen (z. B. Lylah's Gehalt) einführt.
  • Wahrscheinlichkeitsaufgaben: Das Modell berechnet korrekt, interpretiert die Frage jedoch falsch (relative statt absolute Differenz).

Auffälligkeiten

Wiederkehrendes Muster: Bei Aufgaben, die eine eindeutige, kurze Antwort erfordern, produziert das Modell ausführliche Alternativüberlegungen und verfehlt dabei das gesuchte einfache Ergebnis. Dies deutet auf eine Tendenz zur Überantwortung (verbosity bias) hin.

Empfehlung

Sampling-Temperatur senken (z. B. auf 0.0 oder greedy decoding), um das Modell bei klaren Zahlenaufgaben von spekulativen Alternativpfaden abzuhalten und die Pass-Rate weiter in Richtung 95 %+ zu treiben.

Übersicht

1319 Samples
Verteilung
1246
73
Score-Histogramm
0 – 0.1: 73 0.1 – 0.2: 0 0.2 – 0.3: 0 0.3 – 0.4: 0 0.4 – 0.5: 0 0.5 – 0.6: 0 0.6 – 0.7: 0 0.7 – 0.8: 0 0.8 – 0.9: 0 0.9 – 1: 1246
0.0 ────── 1.0
Status Score-Schwelle Score < 0.5
Frage-ID Status Score Prompt Latenz Tokens/s TTFT
1000 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Rose bou…
Lade Detail …
1001 failed 0% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Thomas h…
Lade Detail …
1002 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Nala fou…
Lade Detail …
1003 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: John's g…
Lade Detail …
1004 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Ray buys…
Lade Detail …
1005 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Calvin a…
Lade Detail …
1006 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Jacob ca…
Lade Detail …
1007 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Lidia bo…
Lade Detail …
1008 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: The aver…
Lade Detail …
1009 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A man de…
Lade Detail …
1010 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Jason is…
Lade Detail …
1011 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Rebecca …
Lade Detail …
1012 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: The Peri…
Lade Detail …
1013 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Quinn ca…
Lade Detail …
1014 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A footba…
Lade Detail …
1015 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Nancy bu…
Lade Detail …
1016 failed 0% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Ian is l…
Lade Detail …
1017 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Carrie i…
Lade Detail …
1018 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Carla is…
Lade Detail …
1019 failed 0% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A farmer…
Lade Detail …
1020 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Emma buy…
Lade Detail …
1021 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Tom orig…
Lade Detail …
1022 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Ken crea…
Lade Detail …
1023 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A pie sh…
Lade Detail …
1024 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A car us…
Lade Detail …
1025 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: It\u2019…
Lade Detail …
1026 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Milly is…
Lade Detail …
1027 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Barney c…
Lade Detail …
1028 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Mr. Bodh…
Lade Detail …
1029 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Daragh h…
Lade Detail …
1030 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Christia…
Lade Detail …
1031 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Josie's …
Lade Detail …
1032 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: On a far…
Lade Detail …
1033 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A baker …
Lade Detail …
1034 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: The age …
Lade Detail …
1035 failed 0% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Kobe and…
Lade Detail …
1036 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: There ar…
Lade Detail …
1037 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: James ca…
Lade Detail …
1038 failed 0% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A bag of…
Lade Detail …
1039 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Last yea…
Lade Detail …
1040 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: The tota…
Lade Detail …
1041 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: It takes…
Lade Detail …
1042 failed 0% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Kobe and…
Lade Detail …
1043 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Lilith i…
Lade Detail …
1044 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Nine hun…
Lade Detail …
1045 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Daria is…
Lade Detail …
1046 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Megan pa…
Lade Detail …
1047 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Every ye…
Lade Detail …
1048 failed 0% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: John cli…
Lade Detail …
1049 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Michelle…
Lade Detail …
1050 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Joseph a…
Lade Detail …
1051 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: After a …
Lade Detail …
1052 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: LaKeisha…
Lade Detail …
1053 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: One of t…
Lade Detail …
1054 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Brenda v…
Lade Detail …
1055 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Chantal …
Lade Detail …
1056 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Celina e…
Lade Detail …
1057 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Mr. Mitc…
Lade Detail …
1058 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: In 5 yea…
Lade Detail …
1059 failed 0% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Smaug th…
Lade Detail …
1060 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Martha h…
Lade Detail …
1061 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: If Sally…
Lade Detail …
1062 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: At a gym…
Lade Detail …
1063 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Matt is …
Lade Detail …
1064 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Two-thir…
Lade Detail …
1065 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Janice h…
Lade Detail …
1066 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Phoebe e…
Lade Detail …
1067 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: It takes…
Lade Detail …
1068 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: John sco…
Lade Detail …
1069 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: After sh…
Lade Detail …
1070 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A crayon…
Lade Detail …
1071 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: At a CD …
Lade Detail …
1072 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: John buy…
Lade Detail …
1073 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Christin…
Lade Detail …
1074 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: In five …
Lade Detail …
1075 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Uncle Ju…
Lade Detail …
1076 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Marianne…
Lade Detail …
1077 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Helen cu…
Lade Detail …
1078 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Jackie w…
Lade Detail …
1079 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Hattie a…
Lade Detail …
1080 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Pima inv…
Lade Detail …
1081 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Beatrice…
Lade Detail …
1082 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Samara a…
Lade Detail …
1083 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A train …
Lade Detail …
1084 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: There ar…
Lade Detail …
1085 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: On a roa…
Lade Detail …
1086 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Austin h…
Lade Detail …
1087 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: There ar…
Lade Detail …
1088 failed 0% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Mark buy…
Lade Detail …
1089 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Archie i…
Lade Detail …
1090 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: John buy…
Lade Detail …
1091 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Wilson g…
Lade Detail …
1092 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Thomas h…
Lade Detail …
1093 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: There ar…
Lade Detail …
1094 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A craft …
Lade Detail …
1095 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Isabella…
Lade Detail …
1096 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: At the B…
Lade Detail …
1097 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: 20 birds…
Lade Detail …
1098 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: James sp…
Lade Detail …
1099 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Neeley b…
Lade Detail …
1100 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Jill spe…
Lade Detail …
1101 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Next yea…
Lade Detail …
1102 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Jason's …
Lade Detail …
1103 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: There ar…
Lade Detail …
1104 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Amber is…
Lade Detail …
1105 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: The cost…
Lade Detail …
1106 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: John dec…
Lade Detail …
1107 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Kelsey h…
Lade Detail …
1108 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Tom rent…
Lade Detail …
1109 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Wanda we…
Lade Detail …
1110 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A clinic…
Lade Detail …
1111 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Bobby ne…
Lade Detail …
1112 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A school…
Lade Detail …
1113 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Paul is …
Lade Detail …
1114 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Ian used…
Lade Detail …
1115 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: John goe…
Lade Detail …
1116 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Grace ju…
Lade Detail …
1117 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Victoria…
Lade Detail …
1118 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Bill's r…
Lade Detail …
1119 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Each mem…
Lade Detail …
1120 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Bob orde…
Lade Detail …
1121 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Stormi i…
Lade Detail …
1122 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Jason go…
Lade Detail …
1123 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Dino doe…
Lade Detail …
1124 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: After sh…
Lade Detail …
1125 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Carl has…
Lade Detail …
1126 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A hotel …
Lade Detail …
1127 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Hansel h…
Lade Detail …
1128 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Megan is…
Lade Detail …
1129 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Mr. Caid…
Lade Detail …
1130 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Ittymang…
Lade Detail …
1131 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Building…
Lade Detail …
1132 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Mark has…
Lade Detail …
1133 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: The bala…
Lade Detail …
1134 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A city h…
Lade Detail …
1135 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Veronica…
Lade Detail …
1136 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Darcy wa…
Lade Detail …
1137 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: An autho…
Lade Detail …
1138 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Kate sav…
Lade Detail …
1139 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Steve ow…
Lade Detail …
1140 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Nurse Mi…
Lade Detail …
1141 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: John has…
Lade Detail …
1142 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: James do…
Lade Detail …
1143 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: The numb…
Lade Detail …
1144 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Tom read…
Lade Detail …
1145 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Winston …
Lade Detail …
1146 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: James pa…
Lade Detail …
1147 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Borgnine…
Lade Detail …
1148 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Milena i…
Lade Detail …
1149 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Michael\…
Lade Detail …
1150 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Juan bou…
Lade Detail …
1151 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: In a com…
Lade Detail …
1152 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: When Mic…
Lade Detail …
1153 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Myrtle\u…
Lade Detail …
1154 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Jake has…
Lade Detail …
1155 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Janet ha…
Lade Detail …
1156 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Marcus c…
Lade Detail …
1157 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A school…
Lade Detail …
1158 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: John eat…
Lade Detail …
1159 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: During t…
Lade Detail …
1160 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Evan\u20…
Lade Detail …
1161 failed 0% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Mike beg…
Lade Detail …
1162 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: The ski …
Lade Detail …
1163 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Jimmy is…
Lade Detail …
1164 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Louise i…
Lade Detail …
1165 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: To make …
Lade Detail …
1166 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A class …
Lade Detail …
1167 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: There we…
Lade Detail …
1168 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A bag wa…
Lade Detail …
1169 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Vanessa …
Lade Detail …
1170 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Ginger l…
Lade Detail …
1171 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Jason is…
Lade Detail …
1172 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Jerry wa…
Lade Detail …
1173 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Clementi…
Lade Detail …
1174 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Paul nee…
Lade Detail …
1175 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Marnie o…
Lade Detail …
1176 failed 0% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: James bu…
Lade Detail …
1177 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: The outd…
Lade Detail …
1178 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A class …
Lade Detail …
1179 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Mary Ann…
Lade Detail …
1180 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Willie c…
Lade Detail …
1181 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Last nig…
Lade Detail …
1182 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Care and…
Lade Detail …
1183 failed 0% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: While ch…
Lade Detail …
1184 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Cora sta…
Lade Detail …
1185 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Stuart i…
Lade Detail …
1186 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: John ass…
Lade Detail …
1187 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Frank ha…
Lade Detail …
1188 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Ellen is…
Lade Detail …
1189 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Hans boo…
Lade Detail …
1190 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: An apart…
Lade Detail …
1191 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Josh has…
Lade Detail …
1192 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Patricia…
Lade Detail …
1193 failed 0% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Hershel …
Lade Detail …
1194 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: James wa…
Lade Detail …
1195 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Mr. Sanc…
Lade Detail …
1196 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Janice a…
Lade Detail …
1197 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Yulia wa…
Lade Detail …
1198 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: To make …
Lade Detail …
1199 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Quentavi…
Lade Detail …
200 von 1319 Samples · Limit 200 ‹ Vorherige Nächste ›