Samples · bfcl.single_turn
KI-Auswertung
Generiert 2026-05-13 21:38 · claude-sonnet-4-6Zusammenfassung
Das Modell erreicht eine Pass-Rate von ~49 %, was einer knappen Mehrheit an Fehlern entspricht. Die Performance liegt damit deutlich unter einem praxistauglichen Schwellenwert für Single-Turn Function Calling.
Stärken
- Keine Errors (0 von 1390 Aufrufen), das Modell produziert stets strukturierte Ausgaben.
- Argumente und Werte werden inhaltlich meist korrekt übergeben (richtige Zahlen, richtige Parameter-Namen).
Schwächen
- Dominantes Muster: Das Modell ersetzt Punkte in Funktionsnamen durch Unterstriche (`math.factorial` → `math_factorial`), was systematisch zu `wrong_func_name`-Fehlern führt.
- Vereinzelt treten Typ-Fehler auf (z. B. `interval` als flaches Array statt Array-of-Arrays), was auf mangelndes Schema-Verständnis bei verschachtelten Typen hindeutet.
- Optionale Parameter werden häufig weggelassen, obwohl sie im Expected-Output als leerer String akzeptiert würden — dies könnte weitere stille Fehler verursachen.
Auffälligkeiten
Fast alle sichtbaren Failures basieren auf einem einzigen, konsistenten Fehler: Punkt-Notation in Funktionsnamen wird nicht verwendet. Dies deutet darauf hin, dass das Modell entweder im Training kaum Beispiele mit Namespace-Trennzeichen (`.`) gesehen hat oder der System-Prompt / Tool-Schema diese Konvention nicht ausreichend vorgibt. Der Typ-Fehler bei `interval` ist ein separates, aber ebenfalls reproduzierbares Problem bei Array-of-Floats-Schemas.
Empfehlung
Den Tool-Schema-Prompt explizit um den Hinweis ergänzen, dass Funktionsnamen exakt — inklusive Punkt-Notation — aus dem Schema übernommen werden müssen; alternativ Few-Shot-Beispiele mit Dot-Namespaces in den System-Prompt integrieren und anschließend den Benchmark erneut auswerten.
Übersicht
707 Samples| Frage-ID | Status | Score | Prompt | Latenz | Tokens/s | TTFT | |
|---|---|---|---|---|---|---|---|
| simple_python_1 | failed | {…} [[{"role":"user","content":"Calculate the factorial of 5 using math functions."}… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_2 | failed | {…} [[{"role":"user","content":"Calculate the hypotenuse of a right triangle given t… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_3 | failed | {…} [[{"role":"user","content":"Find the roots of a quadratic equation with coeffici… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_8 | failed | {…} [[{"role":"user","content":"What's the area of a circle with a radius of 10?"}]] | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_9 | failed | {…} [[{"role":"user","content":"Calculate the area of a circle with a radius of 5 un… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_12 | failed | {…} [[{"role":"user","content":"Calculate the circumference of a circle with radius … | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_13 | failed | {…} [[{"role":"user","content":"Calculate the area under the curve y=x^2 from x=1 to… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_16 | failed | {…} [[{"role":"user","content":"Calculate the derivative of the function 2x^2 at x =… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_18 | failed | {…} [[{"role":"user","content":"Find the prime factors of the number 123456."}]] | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_19 | failed | {…} [[{"role":"user","content":"Calculate the greatest common divisor of two numbers… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_20 | failed | {…} [[{"role":"user","content":"Find the highest common factor of 36 and 24."}]] | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_21 | failed | {…} [[{"role":"user","content":"Find the Greatest Common Divisor (GCD) of two number… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_22 | failed | {…} [[{"role":"user","content":"Calculate the greatest common divisor of two given n… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_24 | failed | {…} [[{"role":"user","content":"Find the greatest common divisor (GCD) of 12 and 18"… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_30 | failed | {…} [[{"role":"user","content":"What is the final velocity of a vehicle that started… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_35 | failed | {…} [[{"role":"user","content":"Find an all vegan restaurant in New York that opens … | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_37 | failed | {…} [[{"role":"user","content":"Find the estimated travel time by car from San Franc… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_45 | failed | {…} [[{"role":"user","content":"Calculate the energy (in Joules) absorbed or release… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_50 | failed | {…} [[{"role":"user","content":"What is the change in entropy in Joules per Kelvin o… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_51 | failed | {…} [[{"role":"user","content":"Calculate the entropy change for a certain process g… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_55 | failed | {…} [[{"role":"user","content":"Find me detailed information about the structure of … | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_56 | failed | {…} [[{"role":"user","content":"What are the names of proteins found in the plasma m… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_58 | failed | {…} [[{"role":"user","content":"What is the function of ATP synthase in mitochondria… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_60 | failed | {…} [[{"role":"user","content":"Find the type of gene mutation based on SNP (Single … | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_63 | failed | {…} [[{"role":"user","content":"Find out how genetically similar a human and a chimp… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_66 | failed | {…} [[{"role":"user","content":"Get me data on average precipitation in the Amazon r… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_69 | failed | {…} [[{"role":"user","content":"Find out the population and species of turtles in Mi… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_76 | failed | {…} [[{"role":"user","content":"Get me the predictions of the evolutionary rate for … | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_77 | failed | {…} [[{"role":"user","content":"Find a nearby restaurant that serves vegan food in L… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_81 | failed | {…} [[{"role":"user","content":"Find the fastest route from San Francisco to Los Ang… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_82 | failed | {…} [[{"role":"user","content":"Calculate the average of list of integers [12, 15, 1… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_85 | failed | {…} [[{"role":"user","content":"What's the approximate distance between Boston, MA, … | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_86 | failed | {…} [[{"role":"user","content":"Find the shortest distance between two cities, New Y… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_87 | failed | {…} [[{"role":"user","content":"Sort the list [5, 3, 4, 1, 2] in ascending order."}]… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_90 | failed | {…} [[{"role":"user","content":"Retrieve Personal Info and Job History data of a spe… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_92 | failed | {…} [[{"role":"user","content":"Find all movies starring Leonardo DiCaprio in the ye… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_96 | failed | {…} [[{"role":"user","content":"Find records in database in user table where age is … | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_97 | failed | {…} [[{"role":"user","content":"Calculate the factorial of the number 5"}]] | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_103 | failed | {…} [[{"role":"user","content":"Calculate the area under the curve y=3x^2 + 2x - 4, … | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_104 | failed | {…} [[{"role":"user","content":"Calculate the area of a triangle with base 6 and hei… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_105 | failed | {…} [[{"role":"user","content":"Calculate the power of 3 raised to the power 4."}]] | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_109 | failed | {…} [[{"role":"user","content":"Generate a random forest model with 100 trees and a … | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_111 | failed | {…} [[{"role":"user","content":"Generate a random number from a normal distribution … | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_113 | failed | {…} [[{"role":"user","content":"What's the probability of rolling a six on a six-sid… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_114 | failed | {…} [[{"role":"user","content":"Find the probability of getting exactly 5 heads in 1… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_116 | failed | {…} [[{"role":"user","content":"What's the probability of drawing a king from a well… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_118 | failed | {…} [[{"role":"user","content":"Perform a two-sample t-test on my experiment data of… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_119 | failed | {…} [[{"role":"user","content":"Perform a hypothesis test for two independent sample… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_123 | failed | {…} [[{"role":"user","content":"Perform a two-sample t-test to determine if there is… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_126 | failed | {…} [[{"role":"user","content":"What is the coefficient of determination (R-squared)… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_128 | failed | {…} [[{"role":"user","content":"What's the quarterly dividend per share of a company… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_130 | failed | {…} [[{"role":"user","content":"What's the NPV (Net Present Value) of a series of ca… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_133 | failed | {…} [[{"role":"user","content":"Predict the future value of a $5000 investment with … | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_134 | failed | {…} [[{"role":"user","content":"Predict the total expected profit of stocks XYZ in 5… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_144 | failed | {…} [[{"role":"user","content":"Find the market performance of the S&P 500 and the D… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_148 | failed | {…} [[{"role":"user","content":"Calculate the future value of an investment with an … | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_156 | failed | {…} [[{"role":"user","content":"Look up details of a felony crime record for case nu… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_157 | failed | {…} [[{"role":"user","content":"Find out if an individual John Doe with a birthday 0… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_163 | failed | {…} [[{"role":"user","content":"Provide me with the property records of my house loc… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_165 | failed | {…} [[{"role":"user","content":"Retrieve cases from 2020 about theft crimes in Los A… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_166 | failed | {…} [[{"role":"user","content":"Find a lawyer specializing in divorce cases and char… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_167 | failed | {…} [[{"role":"user","content":"Retrieve the details of a Supreme Court case titled … | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_169 | failed | {…} [[{"role":"user","content":"Find the details of the court case identified by doc… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_170 | failed | {…} [[{"role":"user","content":"Find a historical law case about fraud from 2010 to … | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_172 | failed | {…} [[{"role":"user","content":"How to obtain the detailed case information of the '… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_175 | failed | {…} [[{"role":"user","content":"How many months of experience a Lawyer John Doe has … | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_176 | failed | {…} [[{"role":"user","content":"Find details of patent lawsuits involving the compan… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_184 | failed | {…} [[{"role":"user","content":"I need the details of the lawsuit case with case ID … | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_188 | failed | {…} [[{"role":"user","content":"What is the humidity level in Miami, Florida in the … | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_193 | failed | {…} [[{"role":"user","content":"Find the best local nurseries in Toronto with a good… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_199 | failed | {…} [[{"role":"user","content":"Find air quality index in San Jose for next three da… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_200 | failed | {…} [[{"role":"user","content":"How much CO2 is produced annually by a gas-fueled ca… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_204 | failed | {…} [[{"role":"user","content":"Find restaurants near me within 10 miles that offer … | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_206 | failed | {…} [[{"role":"user","content":"Find the nearest park with a tennis court in London.… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_208 | failed | {…} [[{"role":"user","content":"Get me the directions from New York to Los Angeles a… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_209 | failed | {…} [[{"role":"user","content":"Locate the nearest public library in Boston, Massach… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_213 | failed | {…} [[{"role":"user","content":"Book a direct flight from San Francisco to London fo… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_214 | failed | {…} [[{"role":"user","content":"Search for upcoming month rock concerts in New York.… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_215 | failed | {…} [[{"role":"user","content":"Give me a brief on movie 'Interstellar'"}]] | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_217 | failed | {…} [[{"role":"user","content":"Analyze my fMRI data in ~\/data\/myfMRI.nii from a m… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_218 | failed | {…} [[{"role":"user","content":"Given patient with id 546382, retrieve their brain M… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_223 | failed | {…} [[{"role":"user","content":"Find social behaviors and patterns in a group size o… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_224 | failed | {…} [[{"role":"user","content":"Find the most followed person on twitter who tweets … | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_225 | failed | {…} [[{"role":"user","content":"What is the percentage of population preferring digi… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_231 | failed | {…} [[{"role":"user","content":"Provide key war events in German history from 1871 t… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_232 | failed | {…} [[{"role":"user","content":"What was the full name king of England in 1800?"}]] | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_233 | failed | {…} [[{"role":"user","content":"When did the Treaty of Tordesillas take place? Put i… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_234 | failed | {…} [[{"role":"user","content":"Find important Wars in European history during the 1… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_236 | failed | {…} [[{"role":"user","content":"Get start date on the American Civil War."}]] | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_238 | failed | {…} [[{"role":"user","content":"Who was the president of the United States during th… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_239 | failed | {…} [[{"role":"user","content":"Who was the full name of the president of the United… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_240 | failed | {…} [[{"role":"user","content":"Who was the President of the United States in 1940?"… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_244 | failed | {…} [[{"role":"user","content":"What year was the law of universal gravitation publi… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_245 | failed | {…} [[{"role":"user","content":"Who discovered radium?"}]] | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_246 | failed | {…} [[{"role":"user","content":"Who discovered Gravity and what was the method used?… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_247 | failed | {…} [[{"role":"user","content":"What was Albert Einstein's contribution to science o… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_248 | failed | {…} [[{"role":"user","content":"Who invented the theory of relativity and in which y… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_249 | failed | {…} [[{"role":"user","content":"Tell me more about Christianity and its history till… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_255 | failed | {…} [[{"role":"user","content":"Get the biography and main contributions of Pope Inn… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_260 | failed | {…} [[{"role":"user","content":"Calculate how many gallons of paint is required to p… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_264 | failed | {…} [[{"role":"user","content":"Find the size of the sculpture with title 'David' by… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_267 | failed | {…} [[{"role":"user","content":"Find the top rated modern sculpture exhibition happe… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_268 | failed | {…} [[{"role":"user","content":"Find me the sculptures of Michelangelo with material… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_270 | failed | {…} [[{"role":"user","content":"Can you give me the height and width of Empire State… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_273 | failed | {…} [[{"role":"user","content":"Find out the open hours for the Louvre Museum in Par… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_275 | failed | {…} [[{"role":"user","content":"Get the list of top 5 popular artworks at the Metrop… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_276 | failed | {…} [[{"role":"user","content":"Get the working hours of Louvre Museum in Paris."}]] | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_279 | failed | {…} [[{"role":"user","content":"What's the retail price of a Fender American Profess… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_283 | failed | {…} [[{"role":"user","content":"Find the price of a used Gibson Les Paul guitar in e… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_284 | failed | {…} [[{"role":"user","content":"Get information about the pop concerts in New York f… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_286 | failed | {…} [[{"role":"user","content":"Get concert details for the artist Beyonce performin… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_287 | failed | {…} [[{"role":"user","content":"Find me a classical concert this weekend in Los Ange… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_288 | failed | {…} [[{"role":"user","content":"Get me two tickets for next Eminem concert in New Yo… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_289 | failed | {…} [[{"role":"user","content":"Find concerts near me in Seattle that plays jazz mus… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_290 | failed | {…} [[{"role":"user","content":"What's the timing and location for The Weeknd's conc… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_291 | failed | {…} [[{"role":"user","content":"Generate a melody in C major scale, starting with th… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_293 | failed | {…} [[{"role":"user","content":"Create a mix track using notes of C major scale and … | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_294 | failed | {…} [[{"role":"user","content":"Generate a major chord progression in C key with fou… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_296 | failed | {…} [[{"role":"user","content":"Generate a major C scale progression with tempo 80 B… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_297 | failed | {…} [[{"role":"user","content":"music.theory.chordProgression(progression=['I', 'V',… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_298 | failed | {…} [[{"role":"user","content":"What key signature does C# major have?"}]] | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_300 | failed | {…} [[{"role":"user","content":"Calculate the duration between two notes of 440Hz an… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_303 | failed | {…} [[{"role":"user","content":"Get the player stats of Cristiano Ronaldo in the 201… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_304 | failed | {…} [[{"role":"user","content":"Get point and rebound stats for player 'LeBron James… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_305 | failed | {…} [[{"role":"user","content":"Calculate the overall goal and assist of soccer play… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_307 | failed | {…} [[{"role":"user","content":"Who won the basketball game between Lakers and Clipp… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_308 | failed | {…} [[{"role":"user","content":"What are the next five matches for Manchester United… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_309 | failed | {…} [[{"role":"user","content":"Find me the record of Tom Brady in the 2020 NFL seas… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_311 | failed | {…} [[{"role":"user","content":"Find me the detailed profile of basketball player Le… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_313 | failed | {…} [[{"role":"user","content":"What's the total worth in euro of Messi according to… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_314 | failed | {…} [[{"role":"user","content":"Find all the major achievements of the footballer Li… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_320 | failed | {…} [[{"role":"user","content":"Fetch the basketball league standings, where Golden … | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_322 | failed | {…} [[{"role":"user","content":"Get the current ranking for Liverpool Football Club … | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_323 | failed | {…} [[{"role":"user","content":"Who is ranked as the top player in woman tennis?"}]] | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_324 | failed | {…} [[{"role":"user","content":"Find the score of last game for Los Angeles Lakers i… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_325 | failed | {…} [[{"role":"user","content":"Who won the last match between Chicago Bulls and Los… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_327 | failed | {…} [[{"role":"user","content":"Give me the schedule of Manchester United for the ne… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_328 | failed | {…} [[{"role":"user","content":"Find the rating and player count of the board game '… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_331 | failed | {…} [[{"role":"user","content":"Find the top chess players in New York with a rating… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_332 | failed | {…} [[{"role":"user","content":"What's the chess classical rating of Magnus Carlsen?… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_334 | failed | {…} [[{"role":"user","content":"Check who is the winner in a game of blackjack given… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_336 | failed | {…} [[{"role":"user","content":"Shuffle a deck of cards, and draw 3 cards from the t… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_338 | failed | {…} [[{"role":"user","content":"What is the probability of drawing a heart card from… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_339 | failed | {…} [[{"role":"user","content":"What is the probability of getting a full house in p… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_340 | failed | {…} [[{"role":"user","content":"Determine the winner in a Poker game with John havin… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_341 | failed | {…} [[{"role":"user","content":"What are the odds of drawing a heart card from a dec… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_342 | failed | {…} [[{"role":"user","content":"Find all multi-player games released in 2019 with an… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_343 | failed | {…} [[{"role":"user","content":"Fetch player statistics of 'Zelda' on Switch for use… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_347 | failed | {…} [[{"role":"user","content":"Get me the details of the last game played by Liverp… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_349 | failed | {…} [[{"role":"user","content":"Find the highest score achieved by any player in the… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_352 | failed | {…} [[{"role":"user","content":"Get the average user score for the game 'The Legend … | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_355 | failed | {…} [[{"role":"user","content":"How many calories in the Beef Lasagna Recipe from Fo… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_356 | failed | {…} [[{"role":"user","content":"Find me a recipe that serves 2 people, is vegan, and… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_363 | failed | {…} [[{"role":"user","content":"Find the closest sushi restaurant with a patio in Bo… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_365 | failed | {…} [[{"role":"user","content":"How many ounces in 2 pounds of butter?"}]] | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_366 | failed | {…} [[{"role":"user","content":"How many teaspoons are in 2 tablespoons for measurem… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_367 | failed | {…} [[{"role":"user","content":"Find me a vegan recipe for brownies which prep time … | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_369 | failed | {…} [[{"role":"user","content":"Find a grocery store near me with organic fruits and… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_370 | failed | {…} [[{"role":"user","content":"Order three bottles of olive oil and a five pound ba… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_371 | failed | {…} [[{"role":"user","content":"Check the price of tomatoes and lettuce at the Whole… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_372 | failed | {…} [[{"role":"user","content":"Find the top five organic bananas brands on the basi… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_373 | failed | {…} [[{"role":"user","content":"I want to buy apples, rice, and 12 pack of bottled w… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_374 | failed | {…} [[{"role":"user","content":"Check the amount of protein, calories and carbs in a… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_375 | failed | {…} [[{"role":"user","content":"Check the total price for three pumpkins and two doz… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_378 | failed | {…} [[{"role":"user","content":"Convert time 3pm from New York time zone to London t… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_381 | failed | {…} [[{"role":"user","content":"Check if any Hilton Hotel is available for two adult… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_382 | failed | {…} [[{"role":"user","content":"Book a single room for two nights at the Hilton Hote… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_384 | failed | {…} [[{"role":"user","content":"Book a hotel room for two adults and one child in Pa… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_385 | failed | {…} [[{"role":"user","content":"Book a hotel room with king size bed in Los Angeles … | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_388 | failed | {…} [[{"role":"user","content":"How many Canadian dollars can I get for 500 US dolla… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_390 | failed | {…} [[{"role":"user","content":"Convert 150 Euros to Canadian dollars."}]] | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_394 | failed | {…} [[{"role":"user","content":"Get me the travel distance and duration from the Eif… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_395 | failed | {…} [[{"role":"user","content":"Find the nearest parking lot within 2 miles of Centr… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_396 | failed | {…} [[{"role":"user","content":"Find a hospital within 5 km radius around Denver, Co… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_python_397 | failed | {…} [[{"role":"user","content":"Find the distance between New York and Boston, accou… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_java_0 | failed | {…} [[{"role":"user","content":"Help me initialize the GIS geometry presentation in … | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_java_1 | failed | {…} [[{"role":"user","content":"Help me generate SQL completion proposals for a tabl… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_java_2 | failed | {…} [[{"role":"user","content":"Help me generate the full SQL creation script with a… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_java_3 | failed | {…} [[{"role":"user","content":"Help me resolve a tablespace reference named 'USERSP… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_java_4 | failed | {…} [[{"role":"user","content":"Help me prepare a JDBC statement for a DB2 view name… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_java_5 | failed | {…} [[{"role":"user","content":"Help me initialize a plain text presentation for a r… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_java_6 | failed | {…} [[{"role":"user","content":"Help me update the data in a spreadsheet view within… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_java_7 | failed | {…} [[{"role":"user","content":"Help me copy an NIO resource to a new path '\/backup… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_java_8 | failed | {…} [[{"role":"user","content":"Help me update the contents of a file in the non-blo… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_java_9 | failed | {…} [[{"role":"user","content":"Help me serialize a `MultiPoint` object with 5 point… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_java_10 | failed | {…} [[{"role":"user","content":"Help me update the launcher information in the JNI B… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_java_11 | failed | {…} [[{"role":"user","content":"What is the value of the 'EnableExtensions' property… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_java_12 | failed | {…} [[{"role":"user","content":"Help me change the current schema to 'AnalyticsDB' i… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_java_13 | failed | {…} [[{"role":"user","content":"Help me prepare a JDBC statement to retrieve the pri… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_java_14 | failed | {…} [[{"role":"user","content":"In the SmartRefreshLayout library, help me trigger t… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_java_15 | failed | {…} [[{"role":"user","content":"Help me decode a 9-patch image from an input stream … | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_java_16 | failed | {…} [[{"role":"user","content":"Help me create an `InvokePolymorphicNode` for a give… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_java_17 | failed | {…} [[{"role":"user","content":"Help me attach generic type information to a constru… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_java_18 | failed | {…} [[{"role":"user","content":"Help me obtain the third page of role counts with a … | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_java_19 | failed | {…} [[{"role":"user","content":"Help me display the personal information page for a … | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_java_20 | failed | {…} [[{"role":"user","content":"Help me update the HBase mapping configuration for a… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_java_21 | failed | {…} [[{"role":"user","content":"Help me handle an exception event `ioExceptionEvent`… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_java_22 | failed | {…} [[{"role":"user","content":"Help me update the new status to 2 for a list of pro… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_java_23 | failed | {…} [[{"role":"user","content":"Help me obtain a list of new home products that cont… | — | — | — | ||
|
Lade Detail …
|
|||||||
| simple_java_24 | failed | {…} [[{"role":"user","content":"Help me change the visibility of product categories … | — | — | — | ||
|
Lade Detail …
|
|||||||