Samples · bfcl.single_turn
KI-Auswertung
Generiert 2026-05-13 21:38 · claude-sonnet-4-6Zusammenfassung
Das Modell erreicht eine Pass-Rate von ~49 %, was einer knappen Mehrheit an Fehlern entspricht. Die Performance liegt damit deutlich unter einem praxistauglichen Schwellenwert für Single-Turn Function Calling.
Stärken
- Keine Errors (0 von 1390 Aufrufen), das Modell produziert stets strukturierte Ausgaben.
- Argumente und Werte werden inhaltlich meist korrekt übergeben (richtige Zahlen, richtige Parameter-Namen).
Schwächen
- Dominantes Muster: Das Modell ersetzt Punkte in Funktionsnamen durch Unterstriche (`math.factorial` → `math_factorial`), was systematisch zu `wrong_func_name`-Fehlern führt.
- Vereinzelt treten Typ-Fehler auf (z. B. `interval` als flaches Array statt Array-of-Arrays), was auf mangelndes Schema-Verständnis bei verschachtelten Typen hindeutet.
- Optionale Parameter werden häufig weggelassen, obwohl sie im Expected-Output als leerer String akzeptiert würden — dies könnte weitere stille Fehler verursachen.
Auffälligkeiten
Fast alle sichtbaren Failures basieren auf einem einzigen, konsistenten Fehler: Punkt-Notation in Funktionsnamen wird nicht verwendet. Dies deutet darauf hin, dass das Modell entweder im Training kaum Beispiele mit Namespace-Trennzeichen (`.`) gesehen hat oder der System-Prompt / Tool-Schema diese Konvention nicht ausreichend vorgibt. Der Typ-Fehler bei `interval` ist ein separates, aber ebenfalls reproduzierbares Problem bei Array-of-Floats-Schemas.
Empfehlung
Den Tool-Schema-Prompt explizit um den Hinweis ergänzen, dass Funktionsnamen exakt — inklusive Punkt-Notation — aus dem Schema übernommen werden müssen; alternativ Few-Shot-Beispiele mit Dot-Namespaces in den System-Prompt integrieren und anschließend den Benchmark erneut auswerten.
Übersicht
707 Samples| Frage-ID | Status | Score | Prompt | Latenz | Tokens/s | TTFT | |
|---|---|---|---|---|---|---|---|
| multiple_165 | failed | {…} [[{"role":"user","content":"Find the price of a used Gibson Les Paul guitar in e… | — | — | — | ||
|
Lade Detail …
|
|||||||
| multiple_166 | failed | {…} [[{"role":"user","content":"Find me a classical concert this weekend in Los Ange… | — | — | — | ||
|
Lade Detail …
|
|||||||
| multiple_167 | failed | {…} [[{"role":"user","content":"Generate a melody in C major scale, starting with th… | — | — | — | ||
|
Lade Detail …
|
|||||||
| multiple_170 | failed | {…} [[{"role":"user","content":"Get the player stats of Cristiano Ronaldo in the 201… | — | — | — | ||
|
Lade Detail …
|
|||||||
| multiple_171 | failed | {…} [[{"role":"user","content":"Who won the basketball game between Lakers and Clipp… | — | — | — | ||
|
Lade Detail …
|
|||||||
| multiple_172 | failed | {…} [[{"role":"user","content":"Find me the detailed profile of basketball player Le… | — | — | — | ||
|
Lade Detail …
|
|||||||
| multiple_175 | failed | {…} [[{"role":"user","content":"Who is ranked as the top player in woman tennis?"}]] | — | — | — | ||
|
Lade Detail …
|
|||||||
| multiple_176 | failed | {…} [[{"role":"user","content":"Give me the schedule of Manchester United for the ne… | — | — | — | ||
|
Lade Detail …
|
|||||||
| multiple_177 | failed | {…} [[{"role":"user","content":"Find the top chess players in New York with a rating… | — | — | — | ||
|
Lade Detail …
|
|||||||
| multiple_179 | failed | {…} [[{"role":"user","content":"What is the probability of getting a full house in p… | — | — | — | ||
|
Lade Detail …
|
|||||||
| multiple_180 | failed | {…} [[{"role":"user","content":"Fetch player statistics of 'Zelda' on Switch for use… | — | — | — | ||
|
Lade Detail …
|
|||||||
| multiple_181 | failed | {…} [[{"role":"user","content":"Get me the details of the last game played by Liverp… | — | — | — | ||
|
Lade Detail …
|
|||||||
| multiple_183 | failed | {…} [[{"role":"user","content":"How many calories in the Beef Lasagna Recipe from Fo… | — | — | — | ||
|
Lade Detail …
|
|||||||
| multiple_185 | failed | {…} [[{"role":"user","content":"Find the closest sushi restaurant with a patio in Bo… | — | — | — | ||
|
Lade Detail …
|
|||||||
| multiple_186 | failed | {…} [[{"role":"user","content":"Find me a vegan recipe for brownies which prep time … | — | — | — | ||
|
Lade Detail …
|
|||||||
| multiple_187 | failed | {…} [[{"role":"user","content":"Check the price of tomatoes and lettuce at the Whole… | — | — | — | ||
|
Lade Detail …
|
|||||||
| multiple_188 | failed | {…} [[{"role":"user","content":"Find the grocery store closest to Berkeley that has … | — | — | — | ||
|
Lade Detail …
|
|||||||
| multiple_189 | failed | {…} [[{"role":"user","content":"Convert time 3pm from New York time zone to London t… | — | — | — | ||
|
Lade Detail …
|
|||||||
| multiple_190 | failed | {…} [[{"role":"user","content":"Book a single room for two nights at the Hilton Hote… | — | — | — | ||
|
Lade Detail …
|
|||||||
| multiple_192 | failed | {…} [[{"role":"user","content":"Convert 150 Euros to Canadian dollars."}]] | — | — | — | ||
|
Lade Detail …
|
|||||||
| multiple_193 | failed | {…} [[{"role":"user","content":"Get me the travel distance and duration from the Eif… | — | — | — | ||
|
Lade Detail …
|
|||||||
| multiple_196 | failed | {…} [[{"role":"user","content":"What are the names of proteins found in the plasma m… | — | — | — | ||
|
Lade Detail …
|
|||||||
| multiple_197 | failed | {…} [[{"role":"user","content":"Find the type of gene mutation based on SNP (Single … | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_0 | failed | {…} [[{"role":"user","content":"Play songs from the artists Taylor Swift and Maroon … | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_3 | failed | {…} [[{"role":"user","content":"Get the protein sequence of human HbA1c, normal hemo… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_5 | failed | {…} [[{"role":"user","content":"Find the list of TV shows and their ratings on Netfl… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_7 | failed | {…} [[{"role":"user","content":"Find the factorial of 5,10 and 15."}]] | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_8 | failed | {…} [[{"role":"user","content":"Fetch the population of New York City, NY, and Los … | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_9 | failed | {…} [[{"role":"user","content":"Find two movie theatres near San Diego with availabi… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_10 | failed | {…} [[{"role":"user","content":"Compute the Pythagorean Theorem of two side lengths:… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_11 | failed | {…} [[{"role":"user","content":"Predict house price for a house of size 3000 sq ft. … | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_12 | failed | {…} [[{"role":"user","content":"Build a decision tree classifier model with gini cri… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_13 | failed | {…} [[{"role":"user","content":"Can you give me 95% confidence interval for a sample… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_18 | failed | {…} [[{"role":"user","content":"Calculate the Future Value of an investment of $1000… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_21 | failed | {…} [[{"role":"user","content":"Show me all individuals who were convicted for money… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_22 | failed | {…} [[{"role":"user","content":"What is the status and scheduled trial date for case… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_23 | failed | {…} [[{"role":"user","content":"Please calculate the amount of alimony the payor spo… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_24 | failed | {…} [[{"role":"user","content":"Can you find me case law details of Case No 28473 an… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_25 | failed | {…} [[{"role":"user","content":"List all cases against a company named 'Dara Inc' fi… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_26 | failed | {…} [[{"role":"user","content":"Find details of lawsuits with case numbers '67813', … | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_27 | failed | {…} [[{"role":"user","content":"Find a nature reserve around Berkeley within 10 kilo… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_29 | failed | {…} [[{"role":"user","content":"Find out how much waste a family of four generates i… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_31 | failed | {…} [[{"role":"user","content":"What was the Treaty of Paris about? Also, what was t… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_32 | failed | {…} [[{"role":"user","content":"Provide me the major events during the presidency of… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_34 | failed | {…} [[{"role":"user","content":"I want to know the rise and fall of Christianity in … | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_35 | failed | {…} [[{"role":"user","content":"Fetch the details of the ancient empires Persian Emp… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_37 | failed | {…} [[{"role":"user","content":"What are the RGB and HEX color values for navy, purp… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_38 | failed | {…} [[{"role":"user","content":"What's the driving distance between New York and Was… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_39 | failed | {…} [[{"role":"user","content":"Find opening hours and ticket prices for adults and … | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_40 | failed | {…} [[{"role":"user","content":"Give me the detail of the exhibition named 'Wonder o… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_44 | failed | {…} [[{"role":"user","content":"Find me all the classical concerts near Berlin and P… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_45 | failed | {…} [[{"role":"user","content":"What's the current most played Pop song and also fin… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_48 | failed | {…} [[{"role":"user","content":"In a game of Pokemon GO, what moves can a Pikachu le… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_49 | failed | {…} [[{"role":"user","content":"Check if the player with id 3142 in team RocketLeagu… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_50 | failed | {…} [[{"role":"user","content":"How to save game progress at stage 7 in easy mode an… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_51 | failed | {…} [[{"role":"user","content":"Search for a Chicken Noodle Soup recipe and a Vegan … | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_54 | failed | {…} [[{"role":"user","content":"I want to order a large pepperoni pizza and a chicke… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_55 | failed | {…} [[{"role":"user","content":"Find a supermarket in New York City that opens 24 ho… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_56 | failed | {…} [[{"role":"user","content":"Check the hotel room availability for 'Queens Hotel'… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_57 | failed | {…} [[{"role":"user","content":"Book a room for 2 adults and a child at the Sheraton… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_60 | failed | {…} [[{"role":"user","content":"What is the results of the factorial of 5, the facto… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_61 | failed | {…} [[{"role":"user","content":"\"Can you calculate the Euclidean norm, or the lengt… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_62 | failed | {…} [[{"role":"user","content":"\"Can you help me find the roots of two quadratic eq… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_66 | failed | {…} [[{"role":"user","content":"What is the total area of three circles, where the f… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_67 | failed | {…} [[{"role":"user","content":"\"Can you calculate the area of two circles, one wit… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_70 | failed | {…} [[{"role":"user","content":"What is the combined circumference of four circles, … | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_73 | failed | {…} [[{"role":"user","content":"\"Can you compute the derivative of the function 3x^… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_75 | failed | {…} [[{"role":"user","content":"What are the prime factors of the numbers 45, 100, a… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_76 | failed | {…} [[{"role":"user","content":"What is the greatest common divisor (GCD) of the two… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_77 | failed | {…} [[{"role":"user","content":"\"Can you calculate the highest common factor of the… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_78 | failed | {…} [[{"role":"user","content":"\"Can you help me find the greatest common divisor o… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_80 | failed | {…} [[{"role":"user","content":"\"John and Mary are playing a game where they each c… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_86 | failed | {…} [[{"role":"user","content":"\"Imagine you are conducting an experiment with two … | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_91 | failed | {…} [[{"role":"user","content":"Can you help me find vegan restaurants in Los Angele… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_93 | failed | {…} [[{"role":"user","content":"What would be the estimated travel time if I start m… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_99 | failed | {…} [[{"role":"user","content":"\"Can you help me calculate the energy required for … | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_100 | failed | {…} [[{"role":"user","content":"What are the boiling and melting points of water and… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_103 | failed | {…} [[{"role":"user","content":"A chemist is conducting an experiment with a 2 kg sa… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_108 | failed | {…} [[{"role":"user","content":"Can you provide a detailed description of the struct… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_109 | failed | {…} [[{"role":"user","content":"What are the proteins found in the cell compartments… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_110 | failed | {…} [[{"role":"user","content":"\"What is the function of the molecule ATP in the mi… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_112 | failed | {…} [[{"role":"user","content":"What is the type of the genetic mutation that has th… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_115 | failed | {…} [[{"role":"user","content":"\"Could you calculate the genetic similarity between… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_118 | failed | {…} [[{"role":"user","content":"What are the precipitation statistics for the Amazon… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_119 | failed | {…} [[{"role":"user","content":"\"Can you help me identify the bird species I saw du… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_121 | failed | {…} [[{"role":"user","content":"What is the population of turtles in the Galapagos I… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_124 | failed | {…} [[{"role":"user","content":"What would be the projected population growth of Jap… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_126 | failed | {…} [[{"role":"user","content":"What would be the predicted evolutionary rate for th… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_127 | failed | {…} [[{"role":"user","content":"Can you help me find restaurants in New York, NY tha… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_131 | failed | {…} [[{"role":"user","content":"Can you find the fastest route from my home in San F… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_135 | failed | {…} [[{"role":"user","content":"What is the total distance in kilometers if I start … | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_136 | failed | {…} [[{"role":"user","content":"What is the shortest distance between New York and L… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_139 | failed | {…} [[{"role":"user","content":"Can you use the function 'employee.fetch_data' to fe… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_140 | failed | {…} [[{"role":"user","content":"Can you find all the Drama and Comedy movies that Le… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_144 | failed | {…} [[{"role":"user","content":"What is the result if you calculate the factorial of… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_150 | failed | {…} [[{"role":"user","content":"\"Can you calculate the area under the curve for the… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_151 | failed | {…} [[{"role":"user","content":"\"Can you help me calculate the total area of three … | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_152 | failed | {…} [[{"role":"user","content":"\"Can you calculate the result of the following math… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_155 | failed | {…} [[{"role":"user","content":"You are given a dataset with various variables inclu… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_156 | failed | {…} [[{"role":"user","content":"You are given a dataset \"data_random_forest\" in th… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_158 | failed | {…} [[{"role":"user","content":"You are a data scientist working on a project that r… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_159 | failed | {…} [[{"role":"user","content":"\"In a board game, you have a six-sided die. You are… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_160 | failed | {…} [[{"role":"user","content":"\"In a game of chance, you have a 0.3 probability of… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_164 | failed | {…} [[{"role":"user","content":"You are a data analyst and you have been given two 2… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_167 | failed | {…} [[{"role":"user","content":"What is the coefficient of determination (R squared)… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_168 | failed | {…} [[{"role":"user","content":"\"Can you help me calculate the quarterly dividend p… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_172 | failed | {…} [[{"role":"user","content":"\"Imagine you have two different investment opportun… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_173 | failed | {…} [[{"role":"user","content":"\"John has decided to invest in two different funds.… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_181 | failed | {…} [[{"role":"user","content":"Can you use the get_stock_prices function to retriev… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_182 | failed | {…} [[{"role":"user","content":"\"John is planning to invest in a mutual fund. He is… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_183 | failed | {…} [[{"role":"user","content":"\"Imagine you are a drone operator. You are currentl… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_184 | failed | {…} [[{"role":"user","content":"\"Can you help me find the roots of two different qu… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_185 | failed | {…} [[{"role":"user","content":"Can you estimate the population of Bengal Tigers in … | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_189 | failed | {…} [[{"role":"user","content":"Can you find me parks in New York, USA that have a T… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_191 | failed | {…} [[{"role":"user","content":"Can you help me find public libraries in New York, N… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_194 | failed | {…} [[{"role":"user","content":"Can you find me upcoming jazz events in Los Angeles,… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_195 | failed | {…} [[{"role":"user","content":"Can you retrieve additional information like Directo… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_multiple_0 | failed | {…} [[{"role":"user","content":"Find the sum of all the multiples of 3 and 5 between… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_multiple_1 | failed | {…} [[{"role":"user","content":"Find the area of a rectangle with length 7 and bread… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_multiple_2 | failed | {…} [[{"role":"user","content":"Find the area and perimeter of a circle with a radiu… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_multiple_7 | failed | {…} [[{"role":"user","content":"Calculate the time required for a car moving at 50 m… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_multiple_8 | failed | {…} [[{"role":"user","content":"Calculate the final velocity of a moving object give… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_multiple_10 | failed | {…} [[{"role":"user","content":"Buy me a ticket to the Mamma Mia musical for next Fr… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_multiple_11 | failed | {…} [[{"role":"user","content":"What is the Electric field at 3m from a point charge… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_multiple_13 | failed | {…} [[{"role":"user","content":"'Calculate the energy required to heat 100 grams of … | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_multiple_14 | failed | {…} [[{"role":"user","content":"Give me the population size of tigers in Bangladesh … | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_multiple_15 | failed | {…} [[{"role":"user","content":"Find a Chinese restaurant near me in New York and su… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_multiple_19 | failed | {…} [[{"role":"user","content":"Perform Gaussian integral of the function exp(-x^2) … | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_multiple_20 | failed | {…} [[{"role":"user","content":"Determine the median and variance for the following … | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_multiple_21 | failed | {…} [[{"role":"user","content":"Use the data from dataset.csv file and fit a linear … | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_multiple_22 | failed | {…} [[{"role":"user","content":"Find me the sales growth rate for company XYZ for th… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_multiple_23 | failed | {…} [[{"role":"user","content":"Calculate the net profit margin of Company XYZ given… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_multiple_24 | failed | {…} [[{"role":"user","content":"Invest $2000 in Google and withdraw $1000 from Apple… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_multiple_25 | failed | {…} [[{"role":"user","content":"How much would it cost me to invest in 50 shares of … | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_multiple_26 | failed | {…} [[{"role":"user","content":"Get me the transaction history for my account '00125… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_multiple_27 | failed | {…} [[{"role":"user","content":"Transfer $5000 from my checking to saving account. A… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_multiple_28 | failed | {…} [[{"role":"user","content":"Find the conviction status of a criminal with name J… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_multiple_29 | failed | {…} [[{"role":"user","content":"Find cases that pertain to 'Theft' from court record… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_multiple_30 | failed | {…} [[{"role":"user","content":"Find all law cases where Charles Dickens is a party … | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_multiple_31 | failed | {…} [[{"role":"user","content":"Find how many cases and the judge handling a specifi… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_multiple_33 | failed | {…} [[{"role":"user","content":"Locate all supermarkets in Los Angeles and find the … | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_multiple_35 | failed | {…} [[{"role":"user","content":"Identify objects in my backyard image my_backyard_im… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_multiple_36 | failed | {…} [[{"role":"user","content":"Find overview about the Battle of Waterloo and the s… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_multiple_37 | failed | {…} [[{"role":"user","content":"Get me the timeline of World War 2 in Europe and the… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_multiple_38 | failed | {…} [[{"role":"user","content":"What was the average life expectancy in the USA in t… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_multiple_39 | failed | {…} [[{"role":"user","content":"What is the exact birthdate of Nikola Tesla and what… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_multiple_40 | failed | {…} [[{"role":"user","content":"What is the weight of Neutron and Proton in atomic m… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_multiple_41 | failed | {…} [[{"role":"user","content":"Create a square painting with blue background and di… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_multiple_42 | failed | {…} [[{"role":"user","content":"Find me a bronze statue in the Modern Arts Museum in… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_multiple_44 | failed | {…} [[{"role":"user","content":"Design a house with 3 bedrooms, 2 bathrooms and a ga… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_multiple_45 | failed | {…} [[{"role":"user","content":"Calculate the volume of a cuboid with a height of 10… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_multiple_46 | failed | {…} [[{"role":"user","content":"Find the operational hours for Louvre Museum and the… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_multiple_48 | failed | {…} [[{"role":"user","content":"What is the equivalent note of C in Indian musical s… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_multiple_50 | failed | {…} [[{"role":"user","content":"Analyze the performance of the L.A Lakers in their l… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_multiple_52 | failed | {…} [[{"role":"user","content":"Play the Game of life for 3 rounds starting from an … | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_multiple_53 | failed | {…} [[{"role":"user","content":"Find a board game with complexity rating under 2.5 a… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_multiple_54 | failed | {…} [[{"role":"user","content":"In game Battle Reign, change the armor level to 5 an… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_multiple_55 | failed | {…} [[{"role":"user","content":"I want a homemade healthy spaghetti recipe that is g… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_multiple_56 | failed | {…} [[{"role":"user","content":"What is the current time in Beijing and Tokyo and wh… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_multiple_57 | failed | {…} [[{"role":"user","content":"Find hotels in Paris, France and New York, USA with … | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_multiple_58 | failed | {…} [[{"role":"user","content":"\"Imagine you are a geometry teacher preparing for y… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_multiple_59 | failed | {…} [[{"role":"user","content":"\"Imagine you are a math teacher preparing for a geo… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_multiple_60 | failed | {…} [[{"role":"user","content":"\"What is the capital city of Australia, what is the… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_multiple_61 | failed | {…} [[{"role":"user","content":"\"Could you please help me with a couple of calculat… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_multiple_62 | failed | {…} [[{"role":"user","content":"\"A car is traveling on a straight road. At the star… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_multiple_63 | failed | {…} [[{"role":"user","content":"\"Can you tell me what the weather was like in New Y… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_multiple_64 | failed | {…} [[{"role":"user","content":"\"Can you help me understand the ecological impact o… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_multiple_65 | failed | {…} [[{"role":"user","content":"\"Can you help me find a property in San Francisco, … | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_multiple_67 | failed | {…} [[{"role":"user","content":"\"Can you help me with some math problems? First, I … | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_multiple_68 | failed | {…} [[{"role":"user","content":"\"Can you help me analyze the financial performance … | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_multiple_69 | failed | {…} [[{"role":"user","content":"\"Imagine you are a real estate investor. You bought… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_multiple_70 | failed | {…} [[{"role":"user","content":"\"Can you help me compare the potential energy outpu… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_multiple_71 | failed | {…} [[{"role":"user","content":"\"Could you first check the availability of a sculpt… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_multiple_73 | failed | {…} [[{"role":"user","content":"\"Could you provide me with the following informatio… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_multiple_74 | failed | {…} [[{"role":"user","content":"\"Can you provide me with the current statistics for… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_multiple_75 | failed | {…} [[{"role":"user","content":"\"Can you help me plan my day? I want to start from … | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_multiple_76 | failed | {…} [[{"role":"user","content":"\"Could you please tell me the selling price of the … | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_multiple_77 | failed | {…} [[{"role":"user","content":"\"Can you help me with some gaming information? Firs… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_multiple_78 | failed | {…} [[{"role":"user","content":"\"Can you help me plan a trip? I would like to first… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_multiple_79 | failed | {…} [[{"role":"user","content":"\"Imagine you are working on a programming project a… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_multiple_80 | failed | {…} [[{"role":"user","content":"\"Can you help me with a math problem? I have two fu… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_multiple_81 | failed | {…} [[{"role":"user","content":"\"In a park, there is a rectangular playground with … | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_multiple_82 | failed | {…} [[{"role":"user","content":"\"Imagine you are a sculptor working on a large proj… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_multiple_84 | failed | {…} [[{"role":"user","content":"\"Imagine you are a math teacher preparing for a cla… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_multiple_86 | failed | {…} [[{"role":"user","content":"\"A car starts from rest and travels a distance of 1… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_multiple_87 | failed | {…} [[{"role":"user","content":"\"A car is initially at rest and then starts moving … | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_multiple_88 | failed | {…} [[{"role":"user","content":"\"Could you help me find a book in the library? I am… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_multiple_89 | failed | {…} [[{"role":"user","content":"\"Could you help me plan my day? I need to go from m… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_multiple_91 | failed | {…} [[{"role":"user","content":"\"Imagine you are a scientist studying the energy pr… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_multiple_92 | failed | {…} [[{"role":"user","content":"\"Imagine you are planning a trip to Mars. You weigh… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_multiple_93 | failed | {…} [[{"role":"user","content":"\"Could you tell me the estimated date of the Jurass… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_multiple_94 | failed | {…} [[{"role":"user","content":"\"Given the list of words ['apple', 'banana', 'cherr… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_multiple_95 | failed | {…} [[{"role":"user","content":"\"Could you help me with some calculations? First, I… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_multiple_96 | failed | {…} [[{"role":"user","content":"\"Can you help me find a pet-friendly library with a… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_multiple_97 | failed | {…} [[{"role":"user","content":"\"John has decided to invest his savings. He has $50… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_multiple_99 | failed | {…} [[{"role":"user","content":"\"Could you please provide me with the historical di… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_multiple_102 | failed | {…} [[{"role":"user","content":"\"John has $5000 that he wants to invest. He is cons… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_multiple_103 | failed | {…} [[{"role":"user","content":"\"Can you help me find a divorce lawyer in New York,… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_multiple_105 | failed | {…} [[{"role":"user","content":"\"Can you help me plan a trip? I need to know the di… | — | — | — | ||
|
Lade Detail …
|
|||||||
| parallel_multiple_107 | failed | {…} [[{"role":"user","content":"\"Can you help me find a book? I'm not sure of the t… | — | — | — | ||
|
Lade Detail …
|
|||||||