Samples · lm_eval_harness.gsm8k

Run #75 · Adapter v1.0.0+humaneval-removed+gen-kwargs-pairing · 200/1319 Samples angezeigt · Score 90.4%
‹ Zurück zum Run-Detail

KI-Auswertung

Generiert 2026-05-13 21:38 · claude-sonnet-4-6

Zusammenfassung

Das Modell Qwen3-Coder-Next erreicht auf GSM8K eine Pass-Rate von 92,5 % (Score 90,4 %), was ein solides, aber nicht herausragendes Ergebnis für mehrstufige Grundschulmathematik darstellt.

Stärken

  • Einfache und mittelschwere Rechenaufgaben werden zuverlässig und mit sauberem Rechenweg gelöst.
  • Umrechnung von Einheiten sowie lineare Mehrstufenprobleme (Groceries, Pool-Füllungskosten, Prozentsätze) gelingen konsistent.
  • Null Fehler (errors=0), das Modell bricht nie ab oder liefert ungültige Ausgaben.

Schwächen

  • Aufgaben mit indirekten oder impliziten Bezügen werden falsch interpretiert, z. B. „10 % schneller laufen" wird als Zeitreduktion durch Divisor 1,1 statt als direkte Subtraktion behandelt.
  • Off-by-one-Fehler bei inklusiven Zeiträumen (z. B. Gene-Quiltblock-Aufgabe: 12 statt 11 Jahre).
  • Mehrdeutige Problemformulierungen verleiten zu Überanalyse, wodurch das Modell teils falsche Relationen (z. B. Lylah's Gehalt) einführt.
  • Wahrscheinlichkeitsaufgaben: Das Modell berechnet korrekt, interpretiert die Frage jedoch falsch (relative statt absolute Differenz).

Auffälligkeiten

Wiederkehrendes Muster: Bei Aufgaben, die eine eindeutige, kurze Antwort erfordern, produziert das Modell ausführliche Alternativüberlegungen und verfehlt dabei das gesuchte einfache Ergebnis. Dies deutet auf eine Tendenz zur Überantwortung (verbosity bias) hin.

Empfehlung

Sampling-Temperatur senken (z. B. auf 0.0 oder greedy decoding), um das Modell bei klaren Zahlenaufgaben von spekulativen Alternativpfaden abzuhalten und die Pass-Rate weiter in Richtung 95 %+ zu treiben.

Übersicht

1319 Samples
Verteilung
1246
73
Score-Histogramm
0 – 0.1: 73 0.1 – 0.2: 0 0.2 – 0.3: 0 0.3 – 0.4: 0 0.4 – 0.5: 0 0.5 – 0.6: 0 0.6 – 0.7: 0 0.7 – 0.8: 0 0.8 – 0.9: 0 0.9 – 1: 1246
0.0 ────── 1.0
Status Score-Schwelle Score < 0.5
Frage-ID Status Score Prompt Latenz Tokens/s TTFT
200 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Matias i…
Lade Detail …
201 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Kylie ha…
Lade Detail …
202 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Denny is…
Lade Detail …
203 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Juniper,…
Lade Detail …
204 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Alex was…
Lade Detail …
205 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Libby ha…
Lade Detail …
206 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: James bu…
Lade Detail …
207 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: John com…
Lade Detail …
208 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Isabel w…
Lade Detail …
209 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Bill has…
Lade Detail …
210 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A town i…
Lade Detail …
211 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Ariana i…
Lade Detail …
212 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Bert fil…
Lade Detail …
213 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Eight pe…
Lade Detail …
214 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Sammy ha…
Lade Detail …
215 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: The sum …
Lade Detail …
216 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: The mini…
Lade Detail …
217 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Michael …
Lade Detail …
218 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: The popu…
Lade Detail …
219 failed 0% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Albert h…
Lade Detail …
220 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A teache…
Lade Detail …
221 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A window…
Lade Detail …
222 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A footba…
Lade Detail …
223 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Amy, Jer…
Lade Detail …
224 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Jake spl…
Lade Detail …
225 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Zoe want…
Lade Detail …
226 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: James bu…
Lade Detail …
227 failed 0% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Edward h…
Lade Detail …
228 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Carla's …
Lade Detail …
229 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: John is …
Lade Detail …
230 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: When you…
Lade Detail …
231 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A class …
Lade Detail …
232 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Josie's …
Lade Detail …
233 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Janet ma…
Lade Detail …
234 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Big Joe …
Lade Detail …
235 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Melody n…
Lade Detail …
236 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: 14 less …
Lade Detail …
237 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Claire w…
Lade Detail …
238 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: An aqua …
Lade Detail …
239 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Robby, J…
Lade Detail …
240 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A bear i…
Lade Detail …
241 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: At a sch…
Lade Detail …
242 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Cristine…
Lade Detail …
243 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: For brea…
Lade Detail …
244 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: John dri…
Lade Detail …
245 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: It takes…
Lade Detail …
246 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Dante ne…
Lade Detail …
247 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Marcel g…
Lade Detail …
248 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Baking i…
Lade Detail …
249 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Jack lea…
Lade Detail …
250 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Jessie w…
Lade Detail …
251 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Cindy wa…
Lade Detail …
252 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Miley bo…
Lade Detail …
253 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Barbara …
Lade Detail …
254 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A desert…
Lade Detail …
255 failed 0% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A gift s…
Lade Detail …
256 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: The elec…
Lade Detail …
257 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: There ar…
Lade Detail …
258 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Mr. Alon…
Lade Detail …
259 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A movie …
Lade Detail …
260 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: John can…
Lade Detail …
261 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Blake go…
Lade Detail …
262 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Jenny is…
Lade Detail …
263 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Aria has…
Lade Detail …
264 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Haley ha…
Lade Detail …
265 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Last yea…
Lade Detail …
266 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Karen is…
Lade Detail …
267 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A bird i…
Lade Detail …
268 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Lilibeth…
Lade Detail …
269 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Lucia is…
Lade Detail …
270 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Summer a…
Lade Detail …
271 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A tank c…
Lade Detail …
272 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: There ar…
Lade Detail …
273 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: John wri…
Lade Detail …
274 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Sab and …
Lade Detail …
275 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: There ar…
Lade Detail …
276 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Shem mak…
Lade Detail …
277 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A farmer…
Lade Detail …
278 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Jesse re…
Lade Detail …
279 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Sally se…
Lade Detail …
280 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: On his c…
Lade Detail …
281 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Enrique …
Lade Detail …
282 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Simon ha…
Lade Detail …
283 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Berry is…
Lade Detail …
284 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Joanna a…
Lade Detail …
285 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Jay & Gl…
Lade Detail …
286 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Carla, K…
Lade Detail …
287 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Rajesh w…
Lade Detail …
288 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: There we…
Lade Detail …
289 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Jake is …
Lade Detail …
290 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Jonas is…
Lade Detail …
291 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Jerry is…
Lade Detail …
292 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: John and…
Lade Detail …
293 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Eric dec…
Lade Detail …
294 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Maria bo…
Lade Detail …
295 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Kris has…
Lade Detail …
296 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: The tota…
Lade Detail …
297 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Kwame st…
Lade Detail …
298 failed 0% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Three pl…
Lade Detail …
299 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Amanda's…
Lade Detail …
300 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Steve's …
Lade Detail …
301 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Grandma …
Lade Detail …
302 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Stephani…
Lade Detail …
303 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Parker i…
Lade Detail …
304 failed 0% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: When thr…
Lade Detail …
305 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Milly is…
Lade Detail …
306 failed 0% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A pet sh…
Lade Detail …
307 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Tom uses…
Lade Detail …
308 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: In the p…
Lade Detail …
309 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Reggie a…
Lade Detail …
310 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Darla ha…
Lade Detail …
311 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Tommy is…
Lade Detail …
312 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: During o…
Lade Detail …
313 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Paul mad…
Lade Detail …
314 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Peggy is…
Lade Detail …
315 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: If it ta…
Lade Detail …
316 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Haley is…
Lade Detail …
317 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Jason is…
Lade Detail …
318 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A couple…
Lade Detail …
319 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Johnny w…
Lade Detail …
320 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Southton…
Lade Detail …
321 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: The popu…
Lade Detail …
322 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Mr. Lang…
Lade Detail …
323 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Jeremy b…
Lade Detail …
324 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Shelly a…
Lade Detail …
325 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: There ar…
Lade Detail …
326 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: On a con…
Lade Detail …
327 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Alyssa a…
Lade Detail …
328 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Teairra …
Lade Detail …
329 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Dikembe …
Lade Detail …
330 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Peter an…
Lade Detail …
331 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Alfonso …
Lade Detail …
332 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Peter is…
Lade Detail …
333 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A small …
Lade Detail …
334 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A spider…
Lade Detail …
335 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: John wri…
Lade Detail …
336 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A pole i…
Lade Detail …
337 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A school…
Lade Detail …
338 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Ishmael …
Lade Detail …
339 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: George's…
Lade Detail …
340 failed 0% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Three lo…
Lade Detail …
341 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Jesse is…
Lade Detail …
342 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Mark col…
Lade Detail …
343 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Lorin ha…
Lade Detail …
344 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Sophie's…
Lade Detail …
345 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: The pet …
Lade Detail …
346 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Pauline …
Lade Detail …
347 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Theo can…
Lade Detail …
348 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: There ar…
Lade Detail …
349 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Mark has…
Lade Detail …
350 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Mike can…
Lade Detail …
351 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Layla is…
Lade Detail …
352 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: James ha…
Lade Detail …
353 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Carol fi…
Lade Detail …
354 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: To make …
Lade Detail …
355 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Hannah b…
Lade Detail …
356 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Baking i…
Lade Detail …
357 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Carson c…
Lade Detail …
358 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Bert mad…
Lade Detail …
359 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Tracy se…
Lade Detail …
360 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Dana nor…
Lade Detail …
361 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Abigail …
Lade Detail …
362 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: John has…
Lade Detail …
363 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Summer p…
Lade Detail …
364 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Tonya ha…
Lade Detail …
365 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: The Zarg…
Lade Detail …
366 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Andrew b…
Lade Detail …
367 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: One of t…
Lade Detail …
368 failed 0% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: There ar…
Lade Detail …
369 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Maya's o…
Lade Detail …
370 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Archie r…
Lade Detail …
371 failed 0% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Abe find…
Lade Detail …
372 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: On our l…
Lade Detail …
373 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A train …
Lade Detail …
374 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A printi…
Lade Detail …
375 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Calum ru…
Lade Detail …
376 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Anna has…
Lade Detail …
377 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Salaria …
Lade Detail …
378 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Based on…
Lade Detail …
379 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Nedy can…
Lade Detail …
380 failed 0% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: On Monda…
Lade Detail …
381 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Mary is …
Lade Detail …
382 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Tom uses…
Lade Detail …
383 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A hand-c…
Lade Detail …
384 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: The dist…
Lade Detail …
385 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Hartley …
Lade Detail …
386 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Jeff had…
Lade Detail …
387 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A single…
Lade Detail …
388 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Gretchen…
Lade Detail …
389 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: There ar…
Lade Detail …
390 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: TJ ran a…
Lade Detail …
391 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: One hund…
Lade Detail …
392 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Seven pa…
Lade Detail …
393 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Johnny i…
Lade Detail …
394 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Tim buys…
Lade Detail …
395 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Rich ran…
Lade Detail …
396 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Celina e…
Lade Detail …
397 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Each ban…
Lade Detail …
398 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: The numb…
Lade Detail …
399 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: There we…
Lade Detail …
200 von 1319 Samples · Limit 200 ‹ Vorherige Nächste ›