Samples · lm_eval_harness.ifeval

Run #70 · Adapter v1.0.0+humaneval-unsafe-flag · 200/541 Samples angezeigt · Score 80.2%
‹ Zurück zum Run-Detail

KI-Auswertung

Generiert 2026-05-13 04:00 · claude-sonnet-4-6

Zusammenfassung

Das Modell erreicht eine Pass-Rate von 80,2 % im IFEval-Benchmark, was auf solide, aber nicht fehlerfreie Befolgung strikter Formatanweisungen hindeutet. Bei rund einem Fünftel der Aufgaben scheitert es an präzisen Format- oder Inhaltsvorgaben.

Stärken

  • Komplexe Mehrfachanweisungen (z. B. Abschnittstitel mit `SECTION X`, doppelte eckige Klammern, Wiederholung des Prompts) werden zuverlässig umgesetzt.
  • Sprachliche Constraints wie Kommaverbot oder reine Kleinschreibung werden in vielen Fällen korrekt eingehalten.
  • Keine technischen Errors (0 von 541 Anfragen).

Schwächen

  • Exakte Zählvorgaben werden nicht eingehalten: Bei Bullet-Point-Aufgaben liefert das Modell 6 statt 3 Punkte; Pflichtwiederholungen enthalten unerlaubte Zusatzzeichen.
  • Strenges Zeichenausschlussverbot (z. B. kein „t" im gesamten Text, kein „c") wird konsequent verletzt — das Modell hält solche Low-Level-Constraints nicht durchgängig ein.
  • Formatvorgaben wie „genau zwei Antworten, getrennt durch `**`" werden ignoriert (nur eine Antwort ohne Trennzeichen).
  • Längenvorgaben (mind. 800 Wörter, in doppelte Anführungszeichen gewickelt) werden teils nur unvollständig oder abgeschnitten erfüllt.

Auffälligkeiten

Die Failures konzentrieren sich auf zwei Mustertypen: (1) Zeichenebene-Constraints (verbotene Buchstaben, exakte Sondersymbol-Wiederholungen) und (2) exakte Mengenvorgaben (Bullet-Anzahl, Antwort-Anzahl). Komplexere semantische Anweisungen gelingen besser als niedrigschwellige, mechanische Formatregeln.

Empfehlung

Gezielte Feinabstimmung oder Chain-of-Thought-Prompting speziell für Zähl- und Zeichenebene-Constraints einsetzen; alternativ einen systematischen Constraint-Verifier als Post-Processing-Schicht ergänzen und den IFEval-Subset mit Zeichenausschluss-Aufgaben gesondert evaluieren.

Übersicht

541 Samples
Verteilung
434
107
Score-Histogramm
0 – 0.1: 107 0.1 – 0.2: 0 0.2 – 0.3: 0 0.3 – 0.4: 0 0.4 – 0.5: 0 0.5 – 0.6: 0 0.6 – 0.7: 0 0.7 – 0.8: 0 0.8 – 0.9: 0 0.9 – 1: 434
0.0 ────── 1.0
Status Score-Schwelle Score < 0.5
Frage-ID Status Score Prompt Latenz Tokens/s TTFT
0 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Write a 300+ word …
Lade Detail …
1 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"I am planning a tr…
Lade Detail …
2 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Write a resume for…
Lade Detail …
3 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Write an email to …
Lade Detail …
4 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Given the sentence…
Lade Detail …
5 failed 0% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Write a dialogue b…
Lade Detail …
6 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Write a 2 paragrap…
Lade Detail …
7 failed 0% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Write me a resume …
Lade Detail …
8 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Write a letter to …
Lade Detail …
9 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Write a long email…
Lade Detail …
10 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Write a blog post …
Lade Detail …
11 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Can you help me ma…
Lade Detail …
12 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Write a story of e…
Lade Detail …
13 failed 0% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Write a detailed r…
Lade Detail …
14 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Write a short blog…
Lade Detail …
15 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Please provide the…
Lade Detail …
16 failed 0% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"What is a name tha…
Lade Detail …
17 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Write two jokes ab…
Lade Detail …
18 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Are hamburgers san…
Lade Detail …
19 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"make a tweet for p…
Lade Detail …
20 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Write a poem about…
Lade Detail …
21 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Given the sentence…
Lade Detail …
22 failed 0% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Write a short star…
Lade Detail …
23 failed 0% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Write a logic quiz…
Lade Detail …
24 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Write a 4 section …
Lade Detail …
25 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Write the lyrics t…
Lade Detail …
26 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Explain in French …
Lade Detail …
27 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Write a funny haik…
Lade Detail …
28 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Rewrite the follow…
Lade Detail …
29 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"What are the advan…
Lade Detail …
30 failed 0% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Write a social med…
Lade Detail …
31 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Write a rubric for…
Lade Detail …
32 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Write a riddle for…
Lade Detail …
33 failed 0% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Write a template f…
Lade Detail …
34 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Explain why people…
Lade Detail …
35 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Can you give me tw…
Lade Detail …
36 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"What happened when…
Lade Detail …
37 failed 0% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"What sentiments ex…
Lade Detail …
38 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Write an ad copy f…
Lade Detail …
39 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Which one is a bet…
Lade Detail …
40 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Write a blog post …
Lade Detail …
41 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Write a poem about…
Lade Detail …
42 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"How many feet off …
Lade Detail …
43 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Write an advertise…
Lade Detail …
44 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Write a funny post…
Lade Detail …
45 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Can you give me a …
Lade Detail …
46 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Write me a templat…
Lade Detail …
47 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"\\\"The man was ar…
Lade Detail …
48 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Write a poem that'…
Lade Detail …
49 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"I have a dime. Wha…
Lade Detail …
50 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Write a haiku abou…
Lade Detail …
51 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Can you elaborate …
Lade Detail …
52 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Write an essay abo…
Lade Detail …
53 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Can you give me an…
Lade Detail …
54 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Take the text belo…
Lade Detail …
55 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Write a limerick a…
Lade Detail …
56 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"What is the differ…
Lade Detail …
57 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"How did a man name…
Lade Detail …
58 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"What is the histor…
Lade Detail …
59 failed 0% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Write a short prop…
Lade Detail …
60 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"List exactly 10 po…
Lade Detail …
61 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Write an outline f…
Lade Detail …
62 failed 0% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Which of the follo…
Lade Detail …
63 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"What are the pros …
Lade Detail …
64 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Write a funny arti…
Lade Detail …
65 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Create a table wit…
Lade Detail …
66 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"I would like to st…
Lade Detail …
67 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Write a brief biog…
Lade Detail …
68 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"A nucleus is a clu…
Lade Detail …
69 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Compose song lyric…
Lade Detail …
70 failed 0% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Write an extravaga…
Lade Detail …
71 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Draft a blog post …
Lade Detail …
72 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Improper use of th…
Lade Detail …
73 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Write a planning d…
Lade Detail …
74 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Write a limerick a…
Lade Detail …
75 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Write me a funny s…
Lade Detail …
76 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Write a letter to …
Lade Detail …
77 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Write a blog post …
Lade Detail …
78 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"List the pros and …
Lade Detail …
79 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"I'm interested in …
Lade Detail …
80 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Write a 30-line po…
Lade Detail …
81 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Write a joke about…
Lade Detail …
82 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"A colt is 5 feet t…
Lade Detail …
83 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Write a story abou…
Lade Detail …
84 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Write an article a…
Lade Detail …
85 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Write a song that …
Lade Detail …
86 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Can you rewrite \\…
Lade Detail …
87 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Write a weird poem…
Lade Detail …
88 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Write a blog post …
Lade Detail …
89 failed 0% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Write a casual, in…
Lade Detail …
90 failed 0% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Write a song about…
Lade Detail …
91 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Write a haiku abou…
Lade Detail …
92 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Generate a forum t…
Lade Detail …
93 failed 0% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Write a five line …
Lade Detail …
94 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"I am a software en…
Lade Detail …
95 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Write an essay of …
Lade Detail …
96 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Come up with a pro…
Lade Detail …
97 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Write a cover lett…
Lade Detail …
98 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Could you tell me …
Lade Detail …
99 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"In this task, repe…
Lade Detail …
100 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Write a file for a…
Lade Detail …
101 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"I want to apply fo…
Lade Detail …
102 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Name a new fashion…
Lade Detail …
103 failed 0% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Write a song about…
Lade Detail …
104 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Compose a poem tha…
Lade Detail …
105 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Write a story for …
Lade Detail …
106 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Criticize this sen…
Lade Detail …
107 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Write two limerick…
Lade Detail …
108 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Summarize the foll…
Lade Detail …
109 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Compose a poem all…
Lade Detail …
110 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"First repeat the r…
Lade Detail …
111 failed 0% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Write an extremely…
Lade Detail …
112 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"What's a good way …
Lade Detail …
113 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Write a rap about …
Lade Detail …
114 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"I really love the …
Lade Detail …
115 failed 0% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Write a plot for a…
Lade Detail …
116 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Make the sentence …
Lade Detail …
117 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Create a 5 day iti…
Lade Detail …
118 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Make a rubric for …
Lade Detail …
119 failed 0% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Write a short essa…
Lade Detail …
120 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"\\\"Coincidence is…
Lade Detail …
121 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Write a cover lett…
Lade Detail …
122 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Create a riddle ab…
Lade Detail …
123 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Write a casual blo…
Lade Detail …
124 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"I work in the mark…
Lade Detail …
125 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Generate two alter…
Lade Detail …
126 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Create a resume fo…
Lade Detail …
127 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"I'm a 12th grader …
Lade Detail …
128 failed 0% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Can you write me a…
Lade Detail …
129 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Generate a busines…
Lade Detail …
130 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Write a rubric for…
Lade Detail …
131 failed 0% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Can you provide a …
Lade Detail …
132 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Write a funny and …
Lade Detail …
133 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"For the following …
Lade Detail …
134 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Request:\\n 1. Wh…
Lade Detail …
135 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Write a profession…
Lade Detail …
136 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Write an interesti…
Lade Detail …
137 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Write a blog post …
Lade Detail …
138 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Write a fairy tale…
Lade Detail …
139 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"What is the next n…
Lade Detail …
140 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Rewrite the follow…
Lade Detail …
141 failed 0% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Write an essay abo…
Lade Detail …
142 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"How to write a goo…
Lade Detail …
143 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Write a song about…
Lade Detail …
144 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"The Jimenez family…
Lade Detail …
145 failed 0% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Write an itinerary…
Lade Detail …
146 failed 0% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Hallucinate a resu…
Lade Detail …
147 failed 0% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"A filmmaker is try…
Lade Detail …
148 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Who won the defama…
Lade Detail …
149 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Explain to a group…
Lade Detail …
150 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Gandalf was a wiza…
Lade Detail …
151 failed 0% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Write an obviously…
Lade Detail …
152 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"I want to write a …
Lade Detail …
153 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Can you write a po…
Lade Detail …
154 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Explain Generative…
Lade Detail …
155 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"I asked a friend a…
Lade Detail …
156 failed 0% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Write a song about…
Lade Detail …
157 failed 0% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Create an English …
Lade Detail …
158 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"I was hoping you c…
Lade Detail …
159 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"I work for a softw…
Lade Detail …
160 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Write a very angry…
Lade Detail …
161 failed 0% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Can you please con…
Lade Detail …
162 failed 0% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Write a short arti…
Lade Detail …
163 failed 0% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Write a song about…
Lade Detail …
164 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Write a riddle for…
Lade Detail …
165 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Write a strange ra…
Lade Detail …
166 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Expand the followi…
Lade Detail …
167 failed 0% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Create a blog post…
Lade Detail …
168 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"How can I learn to…
Lade Detail …
169 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Write an angry twe…
Lade Detail …
170 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"What do you think …
Lade Detail …
171 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"What has a dome bu…
Lade Detail …
172 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Rewrite the follow…
Lade Detail …
173 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Elaborate on the f…
Lade Detail …
174 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Why star wars is s…
Lade Detail …
175 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Write a rubric, in…
Lade Detail …
176 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"I'm a new puppy ow…
Lade Detail …
177 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"My brother is tryi…
Lade Detail …
178 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Before you answer …
Lade Detail …
179 failed 0% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"What are the uses …
Lade Detail …
180 failed 0% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Write a 100-word a…
Lade Detail …
181 failed 0% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"What are some star…
Lade Detail …
182 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Write a blog post …
Lade Detail …
183 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Write a story abou…
Lade Detail …
184 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Generate a summary…
Lade Detail …
185 failed 0% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Write a story from…
Lade Detail …
186 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Write a review of …
Lade Detail …
187 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Invent a funny tag…
Lade Detail …
188 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"What is another wo…
Lade Detail …
189 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Write a funny song…
Lade Detail …
190 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Are the weather co…
Lade Detail …
191 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Write a summary of…
Lade Detail …
192 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Write a joke with …
Lade Detail …
193 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Write a very long …
Lade Detail …
194 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Titan makes clothi…
Lade Detail …
195 failed 0% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"A psychologist is …
Lade Detail …
196 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Could you give me …
Lade Detail …
197 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Could you give me …
Lade Detail …
198 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"Write a riddle abo…
Lade Detail …
199 passed 100% {…} {"gen_args_0":{"arg_0":["[{\"role\": \"user\", \"content\": \"If you gulped down…
Lade Detail …
200 von 541 Samples · Limit 200 Nächste ›