Samples · swe_bench.swe_bench_lite

Run #49 · Adapter v1.0.0+patch-apply-detection · 10/10 Samples angezeigt · Score 0%
‹ Zurück zum Run-Detail

KI-Auswertung

Keine KI-Auswertung verfügbar.

Übersicht

10 Samples
Verteilung
3
7
Score-Histogramm
0 – 0.1: 10 0.1 – 0.2: 0 0.2 – 0.3: 0 0.3 – 0.4: 0 0.4 – 0.5: 0 0.5 – 0.6: 0 0.6 – 0.7: 0 0.7 – 0.8: 0 0.8 – 0.9: 0 0.9 – 1: 0
0.0 ────── 1.0
Latenz (ms)
p50: 9847 p95: 17319 mean: 10852
Tokens/s
p50: 85.7 mean: 93.4
Top-Fehlermuster
  • patch_apply_failed
Status Score-Schwelle Zurücksetzen Score < 0.5
Frage-ID Status Score Prompt Latenz Tokens/s TTFT
astropy__astropy-12907 error 0% 7902 ms 131.4
Lade Detail …
astropy__astropy-14182 failed 0% 6488 ms 130.2
Lade Detail …
astropy__astropy-14365 error 0% 13857 ms 86.1
Lade Detail …
astropy__astropy-14995 error 0% 7708 ms 82.1
Lade Detail …
astropy__astropy-6938 error 0% 14360 ms 86.9
Lade Detail …
astropy__astropy-7746 error 0% 10887 ms 85.9
Lade Detail …
django__django-10914 error 0% 9875 ms 84.4
Lade Detail …
django__django-10924 failed 0% 19740 ms 85.5
Lade Detail …
django__django-11001 failed 0% 9819 ms 84.8
Lade Detail …
django__django-11019 error 0% 7882 ms 76.6
Lade Detail …