Samples · lm_eval_harness.gsm8k

Run #75 · Adapter v1.0.0+humaneval-removed+gen-kwargs-pairing · 200/1319 Samples angezeigt · Score 90.4%
‹ Zurück zum Run-Detail

KI-Auswertung

Generiert 2026-05-13 21:38 · claude-sonnet-4-6

Zusammenfassung

Das Modell Qwen3-Coder-Next erreicht auf GSM8K eine Pass-Rate von 92,5 % (Score 90,4 %), was ein solides, aber nicht herausragendes Ergebnis für mehrstufige Grundschulmathematik darstellt.

Stärken

  • Einfache und mittelschwere Rechenaufgaben werden zuverlässig und mit sauberem Rechenweg gelöst.
  • Umrechnung von Einheiten sowie lineare Mehrstufenprobleme (Groceries, Pool-Füllungskosten, Prozentsätze) gelingen konsistent.
  • Null Fehler (errors=0), das Modell bricht nie ab oder liefert ungültige Ausgaben.

Schwächen

  • Aufgaben mit indirekten oder impliziten Bezügen werden falsch interpretiert, z. B. „10 % schneller laufen" wird als Zeitreduktion durch Divisor 1,1 statt als direkte Subtraktion behandelt.
  • Off-by-one-Fehler bei inklusiven Zeiträumen (z. B. Gene-Quiltblock-Aufgabe: 12 statt 11 Jahre).
  • Mehrdeutige Problemformulierungen verleiten zu Überanalyse, wodurch das Modell teils falsche Relationen (z. B. Lylah's Gehalt) einführt.
  • Wahrscheinlichkeitsaufgaben: Das Modell berechnet korrekt, interpretiert die Frage jedoch falsch (relative statt absolute Differenz).

Auffälligkeiten

Wiederkehrendes Muster: Bei Aufgaben, die eine eindeutige, kurze Antwort erfordern, produziert das Modell ausführliche Alternativüberlegungen und verfehlt dabei das gesuchte einfache Ergebnis. Dies deutet auf eine Tendenz zur Überantwortung (verbosity bias) hin.

Empfehlung

Sampling-Temperatur senken (z. B. auf 0.0 oder greedy decoding), um das Modell bei klaren Zahlenaufgaben von spekulativen Alternativpfaden abzuhalten und die Pass-Rate weiter in Richtung 95 %+ zu treiben.

Übersicht

1319 Samples
Verteilung
1246
73
Score-Histogramm
0 – 0.1: 73 0.1 – 0.2: 0 0.2 – 0.3: 0 0.3 – 0.4: 0 0.4 – 0.5: 0 0.5 – 0.6: 0 0.6 – 0.7: 0 0.7 – 0.8: 0 0.8 – 0.9: 0 0.9 – 1: 1246
0.0 ────── 1.0
Status Score-Schwelle Score < 0.5
Frage-ID Status Score Prompt Latenz Tokens/s TTFT
800 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Bill get…
Lade Detail …
801 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Danny ma…
Lade Detail …
802 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Katie ha…
Lade Detail …
803 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Joan is …
Lade Detail …
804 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Alexande…
Lade Detail …
805 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Troy mak…
Lade Detail …
806 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: In a hou…
Lade Detail …
807 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Randy ha…
Lade Detail …
808 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A 1000 c…
Lade Detail …
809 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Brittany…
Lade Detail …
810 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Aaron is…
Lade Detail …
811 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Marta is…
Lade Detail …
812 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Tall Tun…
Lade Detail …
813 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: On the i…
Lade Detail …
814 failed 0% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Paddy's …
Lade Detail …
815 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A single…
Lade Detail …
816 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Lance ha…
Lade Detail …
817 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A soccer…
Lade Detail …
818 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Marcel g…
Lade Detail …
819 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Laura to…
Lade Detail …
820 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Jerry ha…
Lade Detail …
821 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Johnson …
Lade Detail …
822 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A charit…
Lade Detail …
823 failed 0% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: In four …
Lade Detail …
824 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Andy had…
Lade Detail …
825 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Penn ope…
Lade Detail …
826 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Porter e…
Lade Detail …
827 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A flower…
Lade Detail …
828 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Three ti…
Lade Detail …
829 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A blind …
Lade Detail …
830 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: An emplo…
Lade Detail …
831 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: The scho…
Lade Detail …
832 failed 0% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Elsa sta…
Lade Detail …
833 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Hans boo…
Lade Detail …
834 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Alyana h…
Lade Detail …
835 failed 0% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Matthias…
Lade Detail …
836 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A compan…
Lade Detail …
837 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: The area…
Lade Detail …
838 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Annie ha…
Lade Detail …
839 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: The pric…
Lade Detail …
840 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: George w…
Lade Detail …
841 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Joe like…
Lade Detail …
842 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Two whit…
Lade Detail …
843 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: 40% of t…
Lade Detail …
844 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Alan bou…
Lade Detail …
845 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Madeline…
Lade Detail …
846 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Vincent'…
Lade Detail …
847 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: The numb…
Lade Detail …
848 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Paula wa…
Lade Detail …
849 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Julia ha…
Lade Detail …
850 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Charles …
Lade Detail …
851 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Sandra h…
Lade Detail …
852 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Cristine…
Lade Detail …
853 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: John pla…
Lade Detail …
854 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Ben want…
Lade Detail …
855 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Bailey b…
Lade Detail …
856 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Marites …
Lade Detail …
857 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: The clas…
Lade Detail …
858 failed 0% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: In a cer…
Lade Detail …
859 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: John inj…
Lade Detail …
860 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: James cr…
Lade Detail …
861 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: John bui…
Lade Detail …
862 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: An apple…
Lade Detail …
863 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: There ar…
Lade Detail …
864 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: One hund…
Lade Detail …
865 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Martin i…
Lade Detail …
866 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Trent cr…
Lade Detail …
867 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Tom deci…
Lade Detail …
868 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: James ge…
Lade Detail …
869 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Jessica …
Lade Detail …
870 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Susan ha…
Lade Detail …
871 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Avery op…
Lade Detail …
872 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Jerry is…
Lade Detail …
873 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Tamara i…
Lade Detail …
874 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: At a bus…
Lade Detail …
875 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Candy th…
Lade Detail …
876 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: It takes…
Lade Detail …
877 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Duke was…
Lade Detail …
878 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Michael …
Lade Detail …
879 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: There ar…
Lade Detail …
880 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Jean nee…
Lade Detail …
881 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Tom and …
Lade Detail …
882 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Emmanuel…
Lade Detail …
883 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: While bi…
Lade Detail …
884 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A bag of…
Lade Detail …
885 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Malcolm …
Lade Detail …
886 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: In a cer…
Lade Detail …
887 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A mad sc…
Lade Detail …
888 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Hash has…
Lade Detail …
889 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: There ar…
Lade Detail …
890 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A buildi…
Lade Detail …
891 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Noah is …
Lade Detail …
892 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Grant sp…
Lade Detail …
893 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Emma tra…
Lade Detail …
894 failed 0% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A clerk …
Lade Detail …
895 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Elizabet…
Lade Detail …
896 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Ali had …
Lade Detail …
897 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Marla ha…
Lade Detail …
898 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: You have…
Lade Detail …
899 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Tony has…
Lade Detail …
900 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Jason wo…
Lade Detail …
901 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Davante …
Lade Detail …
902 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Elsie ha…
Lade Detail …
903 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A baker …
Lade Detail …
904 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: In a gro…
Lade Detail …
905 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Nancy bu…
Lade Detail …
906 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Carla wo…
Lade Detail …
907 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: There ar…
Lade Detail …
908 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: John sta…
Lade Detail …
909 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Carrie h…
Lade Detail …
910 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Mr. John…
Lade Detail …
911 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: John pla…
Lade Detail …
912 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Laran ha…
Lade Detail …
913 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Mason, N…
Lade Detail …
914 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Mr. Caid…
Lade Detail …
915 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Evergree…
Lade Detail …
916 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A blind …
Lade Detail …
917 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Evie is …
Lade Detail …
918 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Adam had…
Lade Detail …
919 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: There ar…
Lade Detail …
920 failed 0% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: There ar…
Lade Detail …
921 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Gavin ha…
Lade Detail …
922 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Elroy de…
Lade Detail …
923 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: At Penny…
Lade Detail …
924 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Before M…
Lade Detail …
925 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Billy is…
Lade Detail …
926 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Anya was…
Lade Detail …
927 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Mrs. Ama…
Lade Detail …
928 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Ali, Nad…
Lade Detail …
929 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Lily goe…
Lade Detail …
930 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Thor is …
Lade Detail …
931 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Sherman …
Lade Detail …
932 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: John pla…
Lade Detail …
933 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Leila ea…
Lade Detail …
934 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Janina s…
Lade Detail …
935 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Erwin ea…
Lade Detail …
936 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: John pai…
Lade Detail …
937 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Anna's m…
Lade Detail …
938 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Phil has…
Lade Detail …
939 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Samantha…
Lade Detail …
940 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: When the…
Lade Detail …
941 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Clyde's …
Lade Detail …
942 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A flower…
Lade Detail …
943 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Jorge bo…
Lade Detail …
944 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: John buy…
Lade Detail …
945 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Lyanna s…
Lade Detail …
946 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Isabella…
Lade Detail …
947 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: While pr…
Lade Detail …
948 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Nathan b…
Lade Detail …
949 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Three ti…
Lade Detail …
950 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: For an o…
Lade Detail …
951 failed 0% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Mandy is…
Lade Detail …
952 failed 0% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Stephani…
Lade Detail …
953 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: John get…
Lade Detail …
954 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Every da…
Lade Detail …
955 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Tony has…
Lade Detail …
956 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Unique i…
Lade Detail …
957 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Jack and…
Lade Detail …
958 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Fred has…
Lade Detail …
959 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Mark lov…
Lade Detail …
960 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: John buy…
Lade Detail …
961 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: The Lady…
Lade Detail …
962 failed 0% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A family…
Lade Detail …
963 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: If I rea…
Lade Detail …
964 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Wyatt's …
Lade Detail …
965 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Merry ha…
Lade Detail …
966 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Peter is…
Lade Detail …
967 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Anna's m…
Lade Detail …
968 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Jamie is…
Lade Detail …
969 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Anne is …
Lade Detail …
970 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A park h…
Lade Detail …
971 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: In the f…
Lade Detail …
972 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: At a peo…
Lade Detail …
973 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A cafe h…
Lade Detail …
974 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Kyle has…
Lade Detail …
975 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Annie sp…
Lade Detail …
976 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A dietit…
Lade Detail …
977 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Willow\u…
Lade Detail …
978 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Kamil wa…
Lade Detail …
979 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A conven…
Lade Detail …
980 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Monica i…
Lade Detail …
981 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: A bowl o…
Lade Detail …
982 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Madeline…
Lade Detail …
983 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Gary bou…
Lade Detail …
984 failed 0% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: John car…
Lade Detail …
985 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: John is …
Lade Detail …
986 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Louie se…
Lade Detail …
987 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Mr. Maxi…
Lade Detail …
988 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Mr. Caid…
Lade Detail …
989 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: James ha…
Lade Detail …
990 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Allen or…
Lade Detail …
991 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: On a par…
Lade Detail …
992 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Daisy's …
Lade Detail …
993 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Brenda p…
Lade Detail …
994 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Mitch ha…
Lade Detail …
995 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Roman th…
Lade Detail …
996 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Kyle mak…
Lade Detail …
997 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: James ca…
Lade Detail …
998 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Anya has…
Lade Detail …
999 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Question: Unique i…
Lade Detail …
200 von 1319 Samples · Limit 200 ‹ Vorherige Nächste ›