Compare commits
179 Commits
b8e8a83e49
...
main
| Author | SHA1 | Date | |
|---|---|---|---|
| 2d1b1d3ec6 | |||
| 1d701ae75e | |||
| a32c4787f8 | |||
| 6ccf6fb0e1 | |||
| a3ea0c2fda | |||
| 3b1a02a9af | |||
| 56133b5d19 | |||
| 9670c85750 | |||
| 20a1e5f015 | |||
| a16c33f4fb | |||
| afef340eb8 | |||
| a65a25c56c | |||
| 178bef1277 | |||
| 1649d2e864 | |||
| 3caefa2f6e | |||
| 65e7365e75 | |||
| 9aa4a46768 | |||
| 61966783e3 | |||
| 7bcba3daf8 | |||
| 7d49d62f81 | |||
| 5b8919ef89 | |||
| a4942edb9f | |||
| 8fc31f2a53 | |||
| 01364b7031 | |||
| f3cd1347ab | |||
| 5ea2540588 | |||
| b91253235e | |||
| ac2e3e92fc | |||
| 0975385101 | |||
| bb8be3ffb4 | |||
| 8fbb8eda2d | |||
| 742f331d93 | |||
| 2f602717b8 | |||
| d003f73217 | |||
| 882bcece06 | |||
| 477c21efd0 | |||
| 088bad7b21 | |||
| de3e33d46e | |||
| dcdb360098 | |||
| 0b926c2cad | |||
| a8f731d38e | |||
| 5d0baf3ff1 | |||
| 8e9fbc5422 | |||
| 06089a58b2 | |||
| a25c52cff4 | |||
| 0c3303a640 | |||
| ba48b737f2 | |||
| a3f1ead3e6 | |||
| 7fe72480b1 | |||
| 92964e322f | |||
| e54c1b057c | |||
| 1de7e5c90b | |||
| e360896436 | |||
| 6a40ca5730 | |||
| 2d470ee418 | |||
| 062e6af776 | |||
| 75870c1100 | |||
| 6e83fad31d | |||
| 0f3310996e | |||
| e2a16b8ff6 | |||
| a0d3748faf | |||
| 01b4fb8e22 | |||
| e7b33b7d6f | |||
| 9da5540ca2 | |||
| 838d5fbd73 | |||
| d02f6a51c1 | |||
| 8ba9ef83a3 | |||
| f50dc884a3 | |||
| 7b27800390 | |||
| b93ae2fd1b | |||
| 4c116428c3 | |||
| 542230f091 | |||
| c217271907 | |||
| a08b5f3893 | |||
| 25b9ab0c37 | |||
| 62c9b6e17e | |||
| ad097ca712 | |||
| 868d116961 | |||
| 02e3701d77 | |||
| b3abf4e89f | |||
| 9f2899b83d | |||
| 4a811e4171 | |||
| 8efbf96295 | |||
| 16f40a7536 | |||
| 42ee959781 | |||
| 0850a139f1 | |||
| d6a544909c | |||
| 8f154a578c | |||
| 7221f5e920 | |||
| 34a56e408d | |||
| ecd4bc2ac3 | |||
| 7dc2af59c3 | |||
| 4aa09e1025 | |||
| 20cea8f268 | |||
| 38a18c555b | |||
| 8138e41aa1 | |||
| 6ee5bdf960 | |||
| cf3bf54bf8 | |||
| 56f21a96c9 | |||
| 763b93396c | |||
| e09962940a | |||
| 5e44b63b0c | |||
| 0f3881aa02 | |||
| fa85dcc5b3 | |||
| 58d93613f0 | |||
| 66b4435362 | |||
| 3a00de9b8e | |||
| 670141c8c3 | |||
| 59daebbd38 | |||
| 42b71dbf77 | |||
| b88a741f85 | |||
|
|
68c7195d54 | ||
|
|
3d20238eef | ||
|
|
8b8ba01af3 | ||
|
|
a3b95a56e8 | ||
|
|
5b20ebe800 | ||
|
|
ffe9bd6902 | ||
|
|
d27068b11a | ||
|
|
8468724a4c | ||
|
|
6ef71b7e5c | ||
|
|
b2ee8b9031 | ||
|
|
c1a5f8aff5 | ||
|
|
8ee997cb56 | ||
|
|
cd67562a67 | ||
|
|
1f85c03624 | ||
|
|
74a2045def | ||
|
|
9b2b7767b5 | ||
|
|
1718805978 | ||
|
|
7fcc97f525 | ||
|
|
7ce990b42a | ||
|
|
dc71829430 | ||
|
|
5d4a553520 | ||
|
|
5e82c798b1 | ||
|
|
5f147b774f | ||
|
|
4983217ee0 | ||
|
|
27c33e41c3 | ||
|
|
2b33980be4 | ||
|
|
8995bcef30 | ||
|
|
2f140c8a15 | ||
|
|
094b183c17 | ||
|
|
a91b9539b3 | ||
|
|
6e2f85daa8 | ||
|
|
466e61d730 | ||
|
|
5f00582053 | ||
|
|
e272b0d124 | ||
|
|
d3affb3a09 | ||
|
|
1377e72f78 | ||
|
|
403f35efdc | ||
|
|
ce0ccbddd3 | ||
|
|
80806498e0 | ||
|
|
660e80c2bc | ||
|
|
591cfcb04b | ||
|
|
3cda57f0bc | ||
|
|
23e7b92d03 | ||
|
|
9f58febe21 | ||
|
|
b1de0d37f7 | ||
|
|
4ff626ab88 | ||
|
|
a45616046d | ||
|
|
ee048b0b68 | ||
|
|
4e83569194 | ||
|
|
f42b692eeb | ||
|
|
f79bb16f3d | ||
|
|
e81fc33faf | ||
|
|
433726c553 | ||
|
|
dec2e24e2f | ||
|
|
9058033669 | ||
|
|
8bd86e6325 | ||
|
|
c1133bb075 | ||
|
|
6502d75efc | ||
|
|
9f8b7fe920 | ||
|
|
746bc20fcb | ||
|
|
93f6baa0ea | ||
|
|
cc8e871735 | ||
|
|
e90f3460c3 | ||
|
|
4d74c38618 | ||
|
|
8a1b204179 | ||
|
|
b19f5a3518 | ||
|
|
38dc36e846 | ||
|
|
4fe6931b5f |
9
.gitignore
vendored
9
.gitignore
vendored
@@ -37,3 +37,12 @@ Cargo.lock
|
||||
|
||||
# Ajonaikaiset tietokannat
|
||||
*.db
|
||||
|
||||
# Lokitiedostot
|
||||
*.log
|
||||
|
||||
# Wanha versio
|
||||
temp/
|
||||
|
||||
# Muut
|
||||
zipit/**
|
||||
157
TEMPLATING.md
Normal file
157
TEMPLATING.md
Normal file
@@ -0,0 +1,157 @@
|
||||
# Templating — rakennuspalaset koodigeneroinnissa
|
||||
|
||||
## Perusperiaate
|
||||
|
||||
Kielimalli päättää **mitä** rakennetaan (entiteetit, kentät, tyypit, yhteydet).
|
||||
Template-funktiot päättävät **miten** se rakennetaan (importit, engine setup, testikonfiguraatio).
|
||||
|
||||
```
|
||||
Projektikuvaus → LLM → JSON-speksi → Templateit → Koodi → Validointi
|
||||
```
|
||||
|
||||
LLM:n kontribuutio on yksi JSON-rakenne. Kaikki muu on determinististä —
|
||||
sama speksi tuottaa aina saman koodin.
|
||||
|
||||
## Miksi tämä toimii
|
||||
|
||||
Pienen kielimallin (0.5B–7B) vahvuudet ja heikkoudet ovat epäsymmetrisiä:
|
||||
|
||||
| Tehtävä | LLM:n kyky | Ratkaisu |
|
||||
|---------|-----------|----------|
|
||||
| Tunnista entiteetit kuvauksesta | Hyvä | LLM tekee |
|
||||
| Valitse kenttätyypit | Hyvä | LLM tekee |
|
||||
| Muista importit oikein | Huono | Template tekee |
|
||||
| SQLite connect_args | Huono | Template tekee |
|
||||
| Testikonfiguraatio | Huono | Template tekee |
|
||||
| Dockerfile-rakenne | Huono | Template tekee |
|
||||
|
||||
Annetaan mallin tehdä se missä se on hyvä. Hoidetaan loput mekaanisesti.
|
||||
|
||||
## JSON-speksi
|
||||
|
||||
Kielimallin ainoa tuotos on JSON joka kuvaa projektin rakenteen:
|
||||
|
||||
```json
|
||||
{
|
||||
"project_name": "library-app",
|
||||
"entities": [
|
||||
{
|
||||
"name": "Author",
|
||||
"table_name": "authors",
|
||||
"fields": [
|
||||
{"name": "name", "sa_type": "String(255)", "py_type": "str", "nullable": false, "default": null}
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "Book",
|
||||
"table_name": "books",
|
||||
"fields": [
|
||||
{"name": "title", "sa_type": "String(255)", "py_type": "str", "nullable": false, "default": null},
|
||||
{"name": "author_id", "sa_type": "Integer", "py_type": "int", "nullable": false, "default": null}
|
||||
]
|
||||
}
|
||||
],
|
||||
"relationships": [
|
||||
{"from": "Book", "field": "author_id", "to": "Author", "type": "many-to-one"}
|
||||
],
|
||||
"extra_imports": []
|
||||
}
|
||||
```
|
||||
|
||||
Speksin laatu ratkaisee kaiken. Hyvä speksi → hyvä projekti. Huono speksi →
|
||||
teknisesti toimiva mutta sisällöllisesti väärä projekti.
|
||||
|
||||
## Architect-promptin rooli
|
||||
|
||||
Architect-agentti (JSON-speksin generoija) on kriittisin kohta koko pipelinessa.
|
||||
Sitä ohjataan neljällä keinolla:
|
||||
|
||||
1. **Chain-of-thought** — malli miettii ensin entiteetit, sitten kentät,
|
||||
sitten yhteydet, vasta lopuksi JSON
|
||||
2. **Domain-esimerkit** — Todo, verkkokauppa, blogi — malli näkee miltä
|
||||
hyvä speksi näyttää eri domaineissa
|
||||
3. **Anti-patternit** — turhat ID-kentät, Enum-tyypit, suomenkieliset nimet
|
||||
4. **Yhteyssäännöt** — jokainen `_id`-kenttä tarvitsee relationship-merkinnän
|
||||
|
||||
Isompi malli tässä yhdessä kohdassa parantaisi kaikkien projektien laatua.
|
||||
|
||||
## Templateit
|
||||
|
||||
Jokainen template on funktio joka ottaa speksin ja palauttaa koodia:
|
||||
|
||||
```
|
||||
tmplModels(spec) → models.py (SQLAlchemy, ForeignKey, relationship)
|
||||
tmplSchemas(spec) → schemas.py (Pydantic Create/Response/Detail)
|
||||
tmplMain(spec) → main.py (FastAPI CRUD + nested endpoints + FK-validointi)
|
||||
tmplTests(spec) → test_main.py (pytest + TestClient + helper-funktiot)
|
||||
tmplPyproject(spec) → pyproject.toml (PEP 621)
|
||||
tmplDockerfile() → Dockerfile (uv + non-root user)
|
||||
```
|
||||
|
||||
Templateit generoivat automaattisesti:
|
||||
- ForeignKey-constraintit ja relationship()-määrittelyt
|
||||
- Nested endpointit (`GET /authors/{id}/books/`)
|
||||
- FK-validointi (404 jos parent-entiteettiä ei ole)
|
||||
- Detail-schemat (Book + author-data mukana)
|
||||
- Test-helperit jotka luovat parent-entiteetit ensin
|
||||
- Bad FK -testit (varmistaa että orpo-validointi toimii)
|
||||
|
||||
## Validointi
|
||||
|
||||
Generoitu koodi validoidaan mekaanisesti ennen käyttöä:
|
||||
|
||||
- Syntaksitarkistus (AST parse)
|
||||
- Projektin sisäiset importit (löytyykö nimi lähdetiedostosta)
|
||||
- SQLite connect_args
|
||||
- Relatiiviset importit (kielletty)
|
||||
- Testien rakenne (ei saa kopioida appia)
|
||||
- pyproject.toml (ei poetryä)
|
||||
- Dockerfile (ei poetryä, uv cache -oikeudet)
|
||||
|
||||
Docker-testi ajaa koko projektin: build → pytest → API smoke test.
|
||||
|
||||
## Rajoitukset
|
||||
|
||||
Templateit kattavat rakenteellisesti tunnetut projektit:
|
||||
|
||||
| Stack | Kattavuus |
|
||||
|-------|-----------|
|
||||
| FastAPI + SQLAlchemy CRUD | Toimii hyvin |
|
||||
| Streamlit + DuckDB dashboard | Toimii hyvin |
|
||||
| Muu | Ei templatea → ei toimi |
|
||||
|
||||
**Ei kata:**
|
||||
- Custom business-logiikka (algoritmit, laskenta, ML)
|
||||
- Epätyypilliset arkkitehtuurit (WebSocket, graafit, tapahtumapohjaiset)
|
||||
- Frontend-sovellukset (React, Vue)
|
||||
- Mikä tahansa mitä template ei tunne
|
||||
|
||||
Arvio: templateit kattavat ~20% kaikista mahdollisista projekteista, mutta juuri
|
||||
sen 20% mitä opiskelu- ja prototyyppiympäristöissä tarvitaan useimmin.
|
||||
|
||||
## Laajentaminen
|
||||
|
||||
Uuden stackin lisääminen vaatii:
|
||||
|
||||
1. Uudet template-funktiot (käsityö, ~200–400 riviä per stack)
|
||||
2. JSON-speksin laajennos (uudet kentät jos tarvitaan)
|
||||
3. Validointisäännöt uudelle stackille
|
||||
4. Docker-testikonfiguraatio
|
||||
|
||||
Jokainen template on staattinen — se ei opi eikä sopeudu. Kattavuus kasvaa
|
||||
vain kirjoittamalla lisää templateja.
|
||||
|
||||
## Hybridi: seuraava askel
|
||||
|
||||
Paras lopputulos syntyisi yhdistelmällä:
|
||||
|
||||
```
|
||||
Speksi → Template (runko) → LLM (business-logiikka) → Validointi
|
||||
```
|
||||
|
||||
Template tuottaa toimivan CRUD-pohjan. LLM lisää domain-kohtaisen logiikan
|
||||
pienissä palasissa (yksi funktio kerrallaan). Mekaaninen validointi
|
||||
tarkistaa jokaisen lisäyksen.
|
||||
|
||||
Tämä palauttaa LLM:n epäluotettavuuden takaisin peliin, mutta rajattuna:
|
||||
virheet ovat paikallisia (yksi funktio) eivätkä rakenteellisia (koko projekti).
|
||||
4
kipina-codebench/Dockerfile.cargo-test
Normal file
4
kipina-codebench/Dockerfile.cargo-test
Normal file
@@ -0,0 +1,4 @@
|
||||
FROM rust:latest
|
||||
RUN apt-get update && apt-get install -y pkg-config libssl-dev cmake && rm -rf /var/lib/apt/lists/*
|
||||
WORKDIR /work
|
||||
ENTRYPOINT ["sh", "-c", "cp -r /src/* . && cargo test 2>&1"]
|
||||
4
kipina-codebench/Dockerfile.go-test
Normal file
4
kipina-codebench/Dockerfile.go-test
Normal file
@@ -0,0 +1,4 @@
|
||||
FROM golang:1.23-alpine
|
||||
RUN apk add --no-cache gcc musl-dev
|
||||
WORKDIR /work
|
||||
ENTRYPOINT ["sh", "-c", "cp -r /src/* . && go mod tidy 2>&1 && go test -v -count=1 ./... 2>&1"]
|
||||
5
kipina-codebench/Dockerfile.pytest
Normal file
5
kipina-codebench/Dockerfile.pytest
Normal file
@@ -0,0 +1,5 @@
|
||||
FROM python:3.14-slim
|
||||
COPY --from=ghcr.io/astral-sh/uv:latest /uv /usr/local/bin/uv
|
||||
WORKDIR /work
|
||||
ENV PYTHONPATH=/work
|
||||
ENTRYPOINT ["sh", "-c", "uv init --no-readme --python '>=3.14' 2>/dev/null && rm -f hello.py main.py && uv add fastapi 'uvicorn[standard]' sqlalchemy pytest httpx 2>/dev/null && cp /src/*.py . && rm -f app.db test.db && uv run pytest test_main.py -v --tb=short 2>&1"]
|
||||
95
kipina-codebench/README.md
Normal file
95
kipina-codebench/README.md
Normal file
@@ -0,0 +1,95 @@
|
||||
# Kipinä CodeBench
|
||||
|
||||
LLM-koodingenerointibenchmark. Testaa Ollama-mallien kykyä generoida toimivia FastAPI+SQLAlchemy-projekteja ja ajaa testit Docker-kontissa.
|
||||
|
||||
## Pikastart
|
||||
|
||||
```bash
|
||||
# 1. Rakenna Docker-testikontti
|
||||
docker build -t kipina-pytest -f Dockerfile.pytest .
|
||||
|
||||
# 2. Aja benchmark
|
||||
node benchmark.mjs --ollama http://localhost:11434 --scenarios all
|
||||
|
||||
# 3. Avaa raportti
|
||||
open /tmp/kipina-benchmark/report.html
|
||||
```
|
||||
|
||||
## Pipeline
|
||||
|
||||
```
|
||||
1. LLM → vaatimusmäärittely (prompts/client.md)
|
||||
2. LLM → JSON-speksi (prompts/spec.md)
|
||||
3. LLM → 4 Python-tiedostoa (prompts/code.md + golden-examples/)
|
||||
4. Staattinen validointi + LLM-korjaus (prompts/fix.md)
|
||||
5. Docker: uv init + uv add + pytest
|
||||
```
|
||||
|
||||
## CLI-argumentit
|
||||
|
||||
| Argumentti | Oletus | Kuvaus |
|
||||
|-----------|--------|--------|
|
||||
| `--ollama` | `http://localhost:11434` | Ollama-palvelimen URL |
|
||||
| `--hub` | - | Hub-reitti (vaihtoehto Ollamalle) |
|
||||
| `--models` | kaikki | Pilkuilla erotettu mallilista |
|
||||
| `--scenarios` | `default` (todo) | `all` = todo, users, blog |
|
||||
| `--output` | `/tmp/kipina-benchmark` | Tuloshakemisto |
|
||||
|
||||
## Hakemistorakenne
|
||||
|
||||
```
|
||||
kipina-codebench/
|
||||
├── benchmark.mjs ← runner
|
||||
├── Dockerfile.pytest ← Python 3.14 + uv testikontti
|
||||
├── report-template.html ← HTML-raporttipohja
|
||||
├── package.json
|
||||
├── prompts/ ← muokattavat promptit
|
||||
│ ├── client.md ← vaatimusmäärittely
|
||||
│ ├── spec.md ← JSON-speksi
|
||||
│ ├── code.md ← koodigenerointi
|
||||
│ └── fix.md ← korjaus
|
||||
├── golden-examples/ ← referenssitoteutukset
|
||||
│ ├── todo/ ← taso 1: perus-CRUD (6 testiä)
|
||||
│ ├── blog/ ← taso 2: relaatiot (13 testiä)
|
||||
│ └── DOCUMENTATION.md ← zensical-dokumentointiohjeet
|
||||
└── results/ ← tallennetut tulokset
|
||||
```
|
||||
|
||||
## Promptien muokkaus
|
||||
|
||||
Promptit ovat `prompts/`-kansiossa Markdown-tiedostoina. Muokkaa suoraan — benchmark lataa ne käynnistyksessä.
|
||||
|
||||
Esimerkki: lisää sääntö `prompts/code.md`:hen:
|
||||
```
|
||||
- Tests: PUT/update test data MUST include ALL required fields
|
||||
```
|
||||
|
||||
## Kultaiset esimerkit
|
||||
|
||||
`golden-examples/todo/` syötetään LLM:lle referenssinä. Malli näkee tarkalleen millaista koodia odotetaan:
|
||||
- SQLAlchemy 2.0 (DeclarativeBase, Mapped, mapped_column)
|
||||
- Pydantic v2 (ConfigDict)
|
||||
- Python 3.14 syntaksi (str | None)
|
||||
- Uniikki testidata per testi
|
||||
|
||||
Lisää uusia esimerkkejä luomalla hakemisto (esim. `golden-examples/shop/`).
|
||||
|
||||
## Pisteytys
|
||||
|
||||
| Komponentti | Pisteet | Peruste |
|
||||
|---|---|---|
|
||||
| Speksi OK | 10p | JSON-speksi onnistui |
|
||||
| Koodi generoitu | 10p | Kaikki 4 tiedostoa syntyneet |
|
||||
| Testit | 0–60p | passed/total × 60 |
|
||||
| Korjaukset | 0–20p | 0 kierrosta = 20p, 1 = 10p, 2+ = 0p |
|
||||
|
||||
Tähdet: ★★★★★ (90+), ★★★★☆ (70+), ★★★☆☆ (50+), ★★☆☆☆ (25+), ★☆☆☆☆ (1+)
|
||||
|
||||
## Käyttö git-submodulena
|
||||
|
||||
```bash
|
||||
git submodule add <repo-url> tools/codebench
|
||||
cd tools/codebench
|
||||
docker build -t kipina-pytest -f Dockerfile.pytest .
|
||||
node benchmark.mjs --ollama http://localhost:11434 --scenarios all
|
||||
```
|
||||
1028
kipina-codebench/benchmark.mjs
Normal file
1028
kipina-codebench/benchmark.mjs
Normal file
File diff suppressed because it is too large
Load Diff
84
kipina-codebench/golden-examples/DOCUMENTATION.md
Normal file
84
kipina-codebench/golden-examples/DOCUMENTATION.md
Normal file
@@ -0,0 +1,84 @@
|
||||
# Dokumentointiohjeet — Zensical
|
||||
|
||||
Hyvä dokumentointi kertoo **mitä asia ON**, ei mitä se tekee. Se on kuin zen-koan: lyhyt, tarkka, riittävä.
|
||||
|
||||
## Periaatteet
|
||||
|
||||
1. **Yksi rivi riittää.** Jos tarvitset kappaleen, koodi on liian monimutkainen.
|
||||
2. **Kerro mitä, älä miten.** `"""Tietokantamallit — SQLAlchemy 2.0, SQLite."""` ei `"""This module creates database models using SQLAlchemy..."""`
|
||||
3. **Älä toista koodia.** Jos funktio on `create_todo`, docstring ei ole "Creates a todo".
|
||||
4. **Suomi tai englanti, ei molempia.** Valitse yksi kieli per projekti.
|
||||
5. **Ei täytesanoja.** "This module provides functionality for" → poista.
|
||||
|
||||
## Mitä dokumentoidaan
|
||||
|
||||
| Kohde | Dokumentointi | Esimerkki |
|
||||
|-------|--------------|-----------|
|
||||
| **Moduuli** (.py) | Aina. Yksi rivi: mitä tiedosto sisältää. | `"""Pydantic v2 -skeemat — Create ja Response."""` |
|
||||
| **Luokka** | Aina. Mitä entiteetti edustaa. | `"""Tehtävä — otsikko, deadline, prioriteetti."""` |
|
||||
| **Funktio** | Vain jos nimi ei kerro kaikkea. | `get_db` → `"""Tietokantasessio per pyyntö."""` |
|
||||
| **CRUD-endpoint** | Ei. Nimi + HTTP-metodi riittää. | `create_todo`, `list_todos` — itsedokumentoivia |
|
||||
| **Testi** | Ei. Testin nimi on dokumentaatio. | `test_get_todo_not_found` — selvä |
|
||||
| **Konfiguraatio** | Kommentti vain jos arvo yllättää. | `check_same_thread: False # SQLite + FastAPI` |
|
||||
|
||||
## Mitä EI dokumentoida
|
||||
|
||||
- Importteja
|
||||
- Ilmeisiä parametreja (`item_id: int`)
|
||||
- Tyyppivihjeitä jotka kertovat saman asian
|
||||
- Geneerisiä "boilerplate"-docstringejä
|
||||
|
||||
## Esimerkkejä
|
||||
|
||||
### Hyvä (zensical)
|
||||
|
||||
```python
|
||||
"""Tietokantamallit — SQLAlchemy 2.0, Mapped-tyypitys, SQLite."""
|
||||
|
||||
class Todo(Base):
|
||||
"""Tehtävä — otsikko, kuvaus, deadline, prioriteetti ja status."""
|
||||
...
|
||||
|
||||
def get_db():
|
||||
"""Tietokantasessio per pyyntö."""
|
||||
...
|
||||
```
|
||||
|
||||
### Huono (verbose)
|
||||
|
||||
```python
|
||||
"""
|
||||
This module defines the database models for the Todo application.
|
||||
It uses SQLAlchemy ORM to create the database tables and provides
|
||||
the session factory for database connections.
|
||||
"""
|
||||
|
||||
class Todo(Base):
|
||||
"""
|
||||
Represents a todo item in the database.
|
||||
|
||||
Attributes:
|
||||
id: The unique identifier for the todo item.
|
||||
title: The title of the todo item.
|
||||
...
|
||||
"""
|
||||
...
|
||||
```
|
||||
|
||||
### Huono (tyhjä)
|
||||
|
||||
```python
|
||||
# Ei docstringejä ollenkaan — lukija ei tiedä mikä tiedoston rooli on
|
||||
class Todo(Base):
|
||||
__tablename__ = "todos"
|
||||
...
|
||||
```
|
||||
|
||||
## Tarkistuslista
|
||||
|
||||
Generoitu koodi on hyvin dokumentoitu kun:
|
||||
- [ ] Jokainen .py-tiedosto alkaa yksirivisellä docstringillä
|
||||
- [ ] Jokainen luokka kertoo mitä entiteetti edustaa
|
||||
- [ ] Docstringit ovat saman kielen kuin muu koodi
|
||||
- [ ] CRUD-endpointeilla ei ole turhia docstringejä
|
||||
- [ ] Kommentteja on vain siellä missä koodi yllättää
|
||||
123
kipina-codebench/golden-examples/README.md
Normal file
123
kipina-codebench/golden-examples/README.md
Normal file
@@ -0,0 +1,123 @@
|
||||
# Golden Examples — referenssitoteutukset
|
||||
|
||||
Kultaiset esimerkit ovat **täydellisiä, testattuja** FastAPI-projekteja joita LLM käyttää mallina koodigeneroinnissa. Malli näkee esimerkin ja tuottaa vastaavan rakenteen uudelle projektille.
|
||||
|
||||
## Uuden esimerkin luominen
|
||||
|
||||
### 1. Luo hakemisto
|
||||
|
||||
```bash
|
||||
mkdir golden-examples/shop
|
||||
```
|
||||
|
||||
Nimeä hakemisto skenaarion mukaan (todo, blog, shop, booking...).
|
||||
|
||||
### 2. Luo 4 tiedostoa
|
||||
|
||||
| Tiedosto | Sisältö |
|
||||
|----------|---------|
|
||||
| `models.py` | SQLAlchemy 2.0 -mallit (DeclarativeBase, Mapped, mapped_column) |
|
||||
| `schemas.py` | Pydantic v2 -skeemat (ConfigDict, `str \| None` -syntaksi) |
|
||||
| `main.py` | FastAPI CRUD -endpointit (POST 201, GET, GET/:id 404, PUT, DELETE 204) |
|
||||
| `test_main.py` | Pytest + TestClient, erillinen test.db, uniikki data per testi |
|
||||
|
||||
### 3. Noudata konventioita
|
||||
|
||||
**Python-versio:** >=3.14
|
||||
|
||||
**SQLAlchemy 2.0** (ei legacy):
|
||||
```python
|
||||
# Oikein
|
||||
class Base(DeclarativeBase):
|
||||
pass
|
||||
|
||||
class Todo(Base):
|
||||
id: Mapped[int] = mapped_column(primary_key=True, index=True)
|
||||
title: Mapped[str] = mapped_column(String(255))
|
||||
status: Mapped[str] = mapped_column(String(20), default="pending")
|
||||
|
||||
# Väärin
|
||||
Base = declarative_base()
|
||||
id = Column(Integer, primary_key=True)
|
||||
```
|
||||
|
||||
**Pydantic v2** (ei v1):
|
||||
```python
|
||||
# Oikein
|
||||
class TodoResponse(TodoCreate):
|
||||
id: int
|
||||
model_config = ConfigDict(from_attributes=True)
|
||||
|
||||
# Väärin
|
||||
class Config:
|
||||
orm_mode = True
|
||||
```
|
||||
|
||||
**Tyypitys:**
|
||||
```python
|
||||
# Oikein
|
||||
description: Mapped[str | None] = mapped_column(Text, default=None)
|
||||
|
||||
# Väärin
|
||||
description: Mapped[Optional[str]]
|
||||
```
|
||||
|
||||
**Dokumentointi (zensical):**
|
||||
```python
|
||||
"""Tietokantamallit — SQLAlchemy 2.0, Mapped-tyypitys, SQLite."""
|
||||
|
||||
class Todo(Base):
|
||||
"""Tehtävä — otsikko, kuvaus, deadline, prioriteetti ja status."""
|
||||
```
|
||||
|
||||
Yksi rivi riittää. Kerro mitä asia ON, älä mitä se tekee. Katso [DOCUMENTATION.md](DOCUMENTATION.md).
|
||||
|
||||
**Testidata — uniikki ja kuvaava:**
|
||||
```python
|
||||
# Oikein
|
||||
def test_create_todo():
|
||||
response = client.post("/todos/", json={"title": "Osta maitoa", "priority": 2})
|
||||
|
||||
def test_update_todo():
|
||||
created = client.post("/todos/", json={"title": "Vanha otsikko"}).json()
|
||||
|
||||
# Väärin — geneerinen data
|
||||
def test_create_todo():
|
||||
response = client.post("/todos/", json={"title": "test", "priority": 1})
|
||||
```
|
||||
|
||||
### 4. Testaa Docker-kontissa
|
||||
|
||||
```bash
|
||||
rm -rf /tmp/golden-test && mkdir /tmp/golden-test
|
||||
cp golden-examples/shop/*.py /tmp/golden-test/
|
||||
docker run --rm -v /tmp/golden-test:/src:ro kipina-pytest
|
||||
```
|
||||
|
||||
**Kaikkien testien pitää mennä läpi.** Ei varoituksia, ei deprecation-viestejä.
|
||||
|
||||
### 5. Vaikeustasot
|
||||
|
||||
| Taso | Esimerkit | Haaste |
|
||||
|------|-----------|--------|
|
||||
| 1 — Perus-CRUD | `todo/`, `users/`, `notes/` | Yksi entiteetti |
|
||||
| 2 — Relaatiot | `blog/`, `library/`, `school/` | Foreign key, 2–3 entiteettiä |
|
||||
| 3 — Liiketoimintalogiikka | `shop/`, `booking/` | Custom endpointit, validointi |
|
||||
|
||||
Aloita tasosta 1 ja etene. Tason 1 esimerkkien pitää olla yksinkertaisia — ne opettavat mallille perusrakenteen.
|
||||
|
||||
## Miten esimerkit vaikuttavat
|
||||
|
||||
Benchmark lataa `todo/`-esimerkin ja syöttää sen LLM:lle osana koodingenerointipromptia:
|
||||
|
||||
```
|
||||
REFERENCE IMPLEMENTATION (todo project — follow this exact structure):
|
||||
|
||||
=== models.py ===
|
||||
<todo/models.py sisältö>
|
||||
|
||||
=== schemas.py ===
|
||||
...
|
||||
```
|
||||
|
||||
Malli näkee tarkan esimerkin ja tuottaa vastaavan rakenteen uudelle projektille. Mitä parempi esimerkki, sitä parempi tulos.
|
||||
110
kipina-codebench/golden-examples/blog/main.py
Normal file
110
kipina-codebench/golden-examples/blog/main.py
Normal file
@@ -0,0 +1,110 @@
|
||||
"""FastAPI CRUD — kaksi endpoint-settiä, Author ja Post."""
|
||||
|
||||
from fastapi import FastAPI, Depends, HTTPException
|
||||
from sqlalchemy.orm import Session
|
||||
|
||||
from models import SessionLocal, Author, Post
|
||||
from schemas import AuthorCreate, AuthorResponse, PostCreate, PostResponse
|
||||
|
||||
app = FastAPI()
|
||||
|
||||
|
||||
def get_db():
|
||||
"""Tietokantasessio per pyyntö."""
|
||||
db = SessionLocal()
|
||||
try:
|
||||
yield db
|
||||
finally:
|
||||
db.close()
|
||||
|
||||
|
||||
# --- Author ---
|
||||
|
||||
|
||||
@app.post("/authors/", response_model=AuthorResponse, status_code=201)
|
||||
def create_author(item: AuthorCreate, db: Session = Depends(get_db)):
|
||||
db_item = Author(**item.model_dump())
|
||||
db.add(db_item)
|
||||
db.commit()
|
||||
db.refresh(db_item)
|
||||
return db_item
|
||||
|
||||
|
||||
@app.get("/authors/", response_model=list[AuthorResponse])
|
||||
def list_authors(db: Session = Depends(get_db)):
|
||||
return db.query(Author).all()
|
||||
|
||||
|
||||
@app.get("/authors/{item_id}", response_model=AuthorResponse)
|
||||
def get_author(item_id: int, db: Session = Depends(get_db)):
|
||||
item = db.query(Author).filter(Author.id == item_id).first()
|
||||
if not item:
|
||||
raise HTTPException(status_code=404, detail="Author not found")
|
||||
return item
|
||||
|
||||
|
||||
@app.put("/authors/{item_id}", response_model=AuthorResponse)
|
||||
def update_author(item_id: int, item: AuthorCreate, db: Session = Depends(get_db)):
|
||||
db_item = db.query(Author).filter(Author.id == item_id).first()
|
||||
if not db_item:
|
||||
raise HTTPException(status_code=404, detail="Author not found")
|
||||
for key, value in item.model_dump().items():
|
||||
setattr(db_item, key, value)
|
||||
db.commit()
|
||||
db.refresh(db_item)
|
||||
return db_item
|
||||
|
||||
|
||||
@app.delete("/authors/{item_id}", status_code=204)
|
||||
def delete_author(item_id: int, db: Session = Depends(get_db)):
|
||||
db_item = db.query(Author).filter(Author.id == item_id).first()
|
||||
if not db_item:
|
||||
raise HTTPException(status_code=404, detail="Author not found")
|
||||
db.delete(db_item)
|
||||
db.commit()
|
||||
|
||||
|
||||
# --- Post ---
|
||||
|
||||
|
||||
@app.post("/posts/", response_model=PostResponse, status_code=201)
|
||||
def create_post(item: PostCreate, db: Session = Depends(get_db)):
|
||||
db_item = Post(**item.model_dump())
|
||||
db.add(db_item)
|
||||
db.commit()
|
||||
db.refresh(db_item)
|
||||
return db_item
|
||||
|
||||
|
||||
@app.get("/posts/", response_model=list[PostResponse])
|
||||
def list_posts(db: Session = Depends(get_db)):
|
||||
return db.query(Post).all()
|
||||
|
||||
|
||||
@app.get("/posts/{item_id}", response_model=PostResponse)
|
||||
def get_post(item_id: int, db: Session = Depends(get_db)):
|
||||
item = db.query(Post).filter(Post.id == item_id).first()
|
||||
if not item:
|
||||
raise HTTPException(status_code=404, detail="Post not found")
|
||||
return item
|
||||
|
||||
|
||||
@app.put("/posts/{item_id}", response_model=PostResponse)
|
||||
def update_post(item_id: int, item: PostCreate, db: Session = Depends(get_db)):
|
||||
db_item = db.query(Post).filter(Post.id == item_id).first()
|
||||
if not db_item:
|
||||
raise HTTPException(status_code=404, detail="Post not found")
|
||||
for key, value in item.model_dump().items():
|
||||
setattr(db_item, key, value)
|
||||
db.commit()
|
||||
db.refresh(db_item)
|
||||
return db_item
|
||||
|
||||
|
||||
@app.delete("/posts/{item_id}", status_code=204)
|
||||
def delete_post(item_id: int, db: Session = Depends(get_db)):
|
||||
db_item = db.query(Post).filter(Post.id == item_id).first()
|
||||
if not db_item:
|
||||
raise HTTPException(status_code=404, detail="Post not found")
|
||||
db.delete(db_item)
|
||||
db.commit()
|
||||
45
kipina-codebench/golden-examples/blog/models.py
Normal file
45
kipina-codebench/golden-examples/blog/models.py
Normal file
@@ -0,0 +1,45 @@
|
||||
"""Tietokantamallit — SQLAlchemy 2.0, Mapped-tyypitys, ForeignKey-relaatiot."""
|
||||
|
||||
from datetime import datetime
|
||||
|
||||
from sqlalchemy import String, Text, DateTime, ForeignKey, create_engine
|
||||
from sqlalchemy.orm import DeclarativeBase, Mapped, mapped_column, relationship, sessionmaker
|
||||
|
||||
DATABASE_URL = "sqlite:///./app.db"
|
||||
engine = create_engine(DATABASE_URL, connect_args={"check_same_thread": False})
|
||||
SessionLocal = sessionmaker(autocommit=False, autoflush=False, bind=engine)
|
||||
|
||||
|
||||
class Base(DeclarativeBase):
|
||||
pass
|
||||
|
||||
|
||||
class Author(Base):
|
||||
"""Kirjoittaja — nimi, sähköposti ja bio."""
|
||||
|
||||
__tablename__ = "authors"
|
||||
|
||||
id: Mapped[int] = mapped_column(primary_key=True, index=True)
|
||||
name: Mapped[str] = mapped_column(String(255))
|
||||
email: Mapped[str] = mapped_column(String(255), unique=True)
|
||||
bio: Mapped[str | None] = mapped_column(Text, default=None)
|
||||
|
||||
posts: Mapped[list["Post"]] = relationship(back_populates="author")
|
||||
|
||||
|
||||
class Post(Base):
|
||||
"""Blogipostaus — otsikko, sisältö, kirjoittaja, julkaisuaika ja tila."""
|
||||
|
||||
__tablename__ = "posts"
|
||||
|
||||
id: Mapped[int] = mapped_column(primary_key=True, index=True)
|
||||
title: Mapped[str] = mapped_column(String(255))
|
||||
content: Mapped[str] = mapped_column(Text)
|
||||
author_id: Mapped[int] = mapped_column(ForeignKey("authors.id"))
|
||||
published_at: Mapped[datetime | None] = mapped_column(DateTime, default=None)
|
||||
status: Mapped[str] = mapped_column(String(20), default="draft")
|
||||
|
||||
author: Mapped["Author"] = relationship(back_populates="posts")
|
||||
|
||||
|
||||
Base.metadata.create_all(bind=engine)
|
||||
37
kipina-codebench/golden-examples/blog/schemas.py
Normal file
37
kipina-codebench/golden-examples/blog/schemas.py
Normal file
@@ -0,0 +1,37 @@
|
||||
"""Pydantic v2 -skeemat — Create sisääntulolle, Response vastaukselle."""
|
||||
|
||||
from datetime import datetime
|
||||
|
||||
from pydantic import BaseModel, ConfigDict
|
||||
|
||||
|
||||
class AuthorCreate(BaseModel):
|
||||
"""Uuden kirjoittajan luonti. Pakolliset: name, email."""
|
||||
|
||||
name: str
|
||||
email: str
|
||||
bio: str | None = None
|
||||
|
||||
|
||||
class AuthorResponse(AuthorCreate):
|
||||
"""Palautettava kirjoittaja — sisältää id:n."""
|
||||
|
||||
id: int
|
||||
model_config = ConfigDict(from_attributes=True)
|
||||
|
||||
|
||||
class PostCreate(BaseModel):
|
||||
"""Uuden postauksen luonti. Pakolliset: title, content, author_id."""
|
||||
|
||||
title: str
|
||||
content: str
|
||||
author_id: int
|
||||
published_at: datetime | None = None
|
||||
status: str = "draft"
|
||||
|
||||
|
||||
class PostResponse(PostCreate):
|
||||
"""Palautettava postaus — sisältää id:n."""
|
||||
|
||||
id: int
|
||||
model_config = ConfigDict(from_attributes=True)
|
||||
164
kipina-codebench/golden-examples/blog/test_main.py
Normal file
164
kipina-codebench/golden-examples/blog/test_main.py
Normal file
@@ -0,0 +1,164 @@
|
||||
"""Pytest — TestClient, erillinen test.db, uniikki data per testi."""
|
||||
|
||||
from fastapi.testclient import TestClient
|
||||
from sqlalchemy import create_engine
|
||||
from sqlalchemy.orm import sessionmaker
|
||||
|
||||
from main import app, get_db
|
||||
from models import Base
|
||||
|
||||
test_engine = create_engine(
|
||||
"sqlite:///./test.db", connect_args={"check_same_thread": False}
|
||||
)
|
||||
TestSession = sessionmaker(autocommit=False, autoflush=False, bind=test_engine)
|
||||
Base.metadata.create_all(bind=test_engine)
|
||||
|
||||
|
||||
def override_get_db():
|
||||
db = TestSession()
|
||||
try:
|
||||
yield db
|
||||
finally:
|
||||
db.close()
|
||||
|
||||
|
||||
app.dependency_overrides[get_db] = override_get_db
|
||||
client = TestClient(app)
|
||||
|
||||
|
||||
def _create_author(name="Eino Leino", email=None):
|
||||
"""Apufunktio kirjoittajan luomiseen testeissä."""
|
||||
if email is None:
|
||||
email = f"{name.lower().replace(' ', '.')}@example.com"
|
||||
return client.post(
|
||||
"/authors/", json={"name": name, "email": email}
|
||||
).json()
|
||||
|
||||
|
||||
# --- Author-testit ---
|
||||
|
||||
|
||||
def test_create_author():
|
||||
response = client.post(
|
||||
"/authors/",
|
||||
json={"name": "Aleksis Kivi", "email": "aleksis@example.com", "bio": "Suomen kansalliskirjailija"},
|
||||
)
|
||||
assert response.status_code == 201
|
||||
assert response.json()["name"] == "Aleksis Kivi"
|
||||
assert response.json()["bio"] == "Suomen kansalliskirjailija"
|
||||
assert "id" in response.json()
|
||||
|
||||
|
||||
def test_list_authors():
|
||||
_create_author("Minna Canth", "minna.canth@example.com")
|
||||
response = client.get("/authors/")
|
||||
assert response.status_code == 200
|
||||
assert len(response.json()) >= 1
|
||||
|
||||
|
||||
def test_get_author_by_id():
|
||||
created = _create_author("Väinö Linna", "vaino.linna@example.com")
|
||||
response = client.get(f"/authors/{created['id']}")
|
||||
assert response.status_code == 200
|
||||
assert response.json()["id"] == created["id"]
|
||||
|
||||
|
||||
def test_get_author_not_found():
|
||||
response = client.get("/authors/99999")
|
||||
assert response.status_code == 404
|
||||
|
||||
|
||||
def test_update_author():
|
||||
created = _create_author("Vanha Nimi", "vanha.nimi@example.com")
|
||||
response = client.put(
|
||||
f"/authors/{created['id']}",
|
||||
json={"name": "Uusi Nimi", "email": "uusi.nimi@example.com"},
|
||||
)
|
||||
assert response.status_code == 200
|
||||
assert response.json()["name"] == "Uusi Nimi"
|
||||
|
||||
|
||||
def test_delete_author():
|
||||
created = _create_author("Poistettava Kirjailija", "poistettava@example.com")
|
||||
response = client.delete(f"/authors/{created['id']}")
|
||||
assert response.status_code == 204
|
||||
response = client.get(f"/authors/{created['id']}")
|
||||
assert response.status_code == 404
|
||||
|
||||
|
||||
# --- Post-testit ---
|
||||
|
||||
|
||||
def test_create_post():
|
||||
author = _create_author("Tove Jansson", "tove.jansson@example.com")
|
||||
response = client.post(
|
||||
"/posts/",
|
||||
json={"title": "Muumipeikko ja pyrstötähti", "content": "Eräänä aamuna...", "author_id": author["id"]},
|
||||
)
|
||||
assert response.status_code == 201
|
||||
assert response.json()["title"] == "Muumipeikko ja pyrstötähti"
|
||||
assert response.json()["author_id"] == author["id"]
|
||||
assert response.json()["status"] == "draft"
|
||||
|
||||
|
||||
def test_list_posts():
|
||||
author = _create_author("Juhani Aho", "juhani.aho@example.com")
|
||||
client.post(
|
||||
"/posts/",
|
||||
json={"title": "Rautatie", "content": "Junasta kertova novelli.", "author_id": author["id"]},
|
||||
)
|
||||
response = client.get("/posts/")
|
||||
assert response.status_code == 200
|
||||
assert len(response.json()) >= 1
|
||||
|
||||
|
||||
def test_get_post_by_id():
|
||||
author = _create_author("Elias Lönnrot", "elias.lonnrot@example.com")
|
||||
created = client.post(
|
||||
"/posts/",
|
||||
json={"title": "Kalevala", "content": "Vaka vanha Väinämöinen.", "author_id": author["id"]},
|
||||
).json()
|
||||
response = client.get(f"/posts/{created['id']}")
|
||||
assert response.status_code == 200
|
||||
assert response.json()["id"] == created["id"]
|
||||
|
||||
|
||||
def test_get_post_not_found():
|
||||
response = client.get("/posts/99999")
|
||||
assert response.status_code == 404
|
||||
|
||||
|
||||
def test_update_post():
|
||||
author = _create_author("Joel Lehtonen", "joel.lehtonen@example.com")
|
||||
created = client.post(
|
||||
"/posts/",
|
||||
json={"title": "Vanha otsikko", "content": "Alkuperäinen teksti.", "author_id": author["id"]},
|
||||
).json()
|
||||
response = client.put(
|
||||
f"/posts/{created['id']}",
|
||||
json={"title": "Päivitetty otsikko", "content": "Muokattu teksti.", "author_id": author["id"], "status": "published"},
|
||||
)
|
||||
assert response.status_code == 200
|
||||
assert response.json()["title"] == "Päivitetty otsikko"
|
||||
assert response.json()["status"] == "published"
|
||||
|
||||
|
||||
def test_delete_post():
|
||||
author = _create_author("Aino Kallas", "aino.kallas@example.com")
|
||||
created = client.post(
|
||||
"/posts/",
|
||||
json={"title": "Poistettava postaus", "content": "Tämä poistetaan.", "author_id": author["id"]},
|
||||
).json()
|
||||
response = client.delete(f"/posts/{created['id']}")
|
||||
assert response.status_code == 204
|
||||
response = client.get(f"/posts/{created['id']}")
|
||||
assert response.status_code == 404
|
||||
|
||||
|
||||
def test_post_belongs_to_author():
|
||||
author = _create_author("Sofi Oksanen", "sofi.oksanen@example.com")
|
||||
post = client.post(
|
||||
"/posts/",
|
||||
json={"title": "Puhdistus", "content": "Romaani Virosta.", "author_id": author["id"]},
|
||||
).json()
|
||||
assert post["author_id"] == author["id"]
|
||||
204
kipina-codebench/golden-examples/combined-readme.md
Normal file
204
kipina-codebench/golden-examples/combined-readme.md
Normal file
@@ -0,0 +1,204 @@
|
||||
# Example 1: Todo App (single entity)
|
||||
|
||||
## models.py
|
||||
|
||||
```python
|
||||
"""Tietokantamallit — SQLAlchemy 2.0, Mapped-tyypitys, SQLite."""
|
||||
from datetime import date
|
||||
from sqlalchemy import String, Text, Date, create_engine
|
||||
from sqlalchemy.orm import DeclarativeBase, Mapped, mapped_column, sessionmaker
|
||||
|
||||
DATABASE_URL = "sqlite:///./app.db"
|
||||
engine = create_engine(DATABASE_URL, connect_args={"check_same_thread": False})
|
||||
SessionLocal = sessionmaker(autocommit=False, autoflush=False, bind=engine)
|
||||
|
||||
class Base(DeclarativeBase):
|
||||
pass
|
||||
|
||||
class Todo(Base):
|
||||
__tablename__ = "todos"
|
||||
id: Mapped[int] = mapped_column(primary_key=True, index=True)
|
||||
title: Mapped[str] = mapped_column(String(255))
|
||||
description: Mapped[str | None] = mapped_column(Text, default=None)
|
||||
due_date: Mapped[date | None] = mapped_column(Date, default=None)
|
||||
priority: Mapped[int] = mapped_column(default=1)
|
||||
status: Mapped[str] = mapped_column(String(20), default="pending")
|
||||
|
||||
Base.metadata.create_all(bind=engine)
|
||||
```
|
||||
|
||||
## schemas.py
|
||||
|
||||
```python
|
||||
from datetime import date
|
||||
from pydantic import BaseModel, ConfigDict
|
||||
|
||||
class TodoCreate(BaseModel):
|
||||
title: str
|
||||
description: str | None = None
|
||||
due_date: date | None = None
|
||||
priority: int = 1
|
||||
status: str = "pending"
|
||||
|
||||
class TodoResponse(TodoCreate):
|
||||
id: int
|
||||
model_config = ConfigDict(from_attributes=True)
|
||||
```
|
||||
|
||||
## test_main.py — exactly 6 tests per entity
|
||||
|
||||
```python
|
||||
from fastapi.testclient import TestClient
|
||||
from sqlalchemy import create_engine
|
||||
from sqlalchemy.orm import sessionmaker
|
||||
from main import app, get_db
|
||||
from models import Base
|
||||
|
||||
test_engine = create_engine("sqlite:///./test.db", connect_args={"check_same_thread": False})
|
||||
TestSession = sessionmaker(autocommit=False, autoflush=False, bind=test_engine)
|
||||
Base.metadata.create_all(bind=test_engine)
|
||||
|
||||
def override_get_db():
|
||||
db = TestSession()
|
||||
try: yield db
|
||||
finally: db.close()
|
||||
|
||||
app.dependency_overrides[get_db] = override_get_db
|
||||
client = TestClient(app)
|
||||
|
||||
def test_create_todo():
|
||||
response = client.post("/todos/", json={"title": "Osta maitoa", "priority": 2})
|
||||
assert response.status_code == 201
|
||||
assert "id" in response.json()
|
||||
|
||||
def test_list_todos():
|
||||
client.post("/todos/", json={"title": "Listattava"})
|
||||
response = client.get("/todos/")
|
||||
assert response.status_code == 200
|
||||
assert len(response.json()) >= 1
|
||||
|
||||
def test_get_todo_by_id():
|
||||
created = client.post("/todos/", json={"title": "Haettava"}).json()
|
||||
response = client.get(f"/todos/{created['id']}")
|
||||
assert response.status_code == 200
|
||||
|
||||
def test_get_todo_not_found():
|
||||
response = client.get("/todos/99999")
|
||||
assert response.status_code == 404
|
||||
|
||||
def test_update_todo():
|
||||
created = client.post("/todos/", json={"title": "Vanha"}).json()
|
||||
response = client.put(f"/todos/{created['id']}", json={"title": "Uusi"})
|
||||
assert response.status_code == 200
|
||||
|
||||
def test_delete_todo():
|
||||
created = client.post("/todos/", json={"title": "Poistettava"}).json()
|
||||
response = client.delete(f"/todos/{created['id']}")
|
||||
assert response.status_code == 204
|
||||
```
|
||||
|
||||
# Example 2: Blog (two entities with ForeignKey)
|
||||
|
||||
NOTE: ForeignKey is imported from sqlalchemy, NOT from sqlalchemy.orm!
|
||||
|
||||
## models.py
|
||||
|
||||
```python
|
||||
from datetime import datetime
|
||||
from sqlalchemy import String, Text, DateTime, ForeignKey, create_engine
|
||||
from sqlalchemy.orm import DeclarativeBase, Mapped, mapped_column, relationship, sessionmaker
|
||||
|
||||
DATABASE_URL = "sqlite:///./app.db"
|
||||
engine = create_engine(DATABASE_URL, connect_args={"check_same_thread": False})
|
||||
SessionLocal = sessionmaker(autocommit=False, autoflush=False, bind=engine)
|
||||
|
||||
class Base(DeclarativeBase):
|
||||
pass
|
||||
|
||||
class Author(Base):
|
||||
__tablename__ = "authors"
|
||||
id: Mapped[int] = mapped_column(primary_key=True, index=True)
|
||||
name: Mapped[str] = mapped_column(String(255))
|
||||
email: Mapped[str] = mapped_column(String(255), unique=True)
|
||||
bio: Mapped[str | None] = mapped_column(Text, default=None)
|
||||
posts: Mapped[list["Post"]] = relationship(back_populates="author")
|
||||
|
||||
class Post(Base):
|
||||
__tablename__ = "posts"
|
||||
id: Mapped[int] = mapped_column(primary_key=True, index=True)
|
||||
title: Mapped[str] = mapped_column(String(255))
|
||||
content: Mapped[str] = mapped_column(Text)
|
||||
author_id: Mapped[int] = mapped_column(ForeignKey("authors.id"))
|
||||
published_at: Mapped[datetime | None] = mapped_column(DateTime, default=None)
|
||||
status: Mapped[str] = mapped_column(String(20), default="draft")
|
||||
author: Mapped["Author"] = relationship(back_populates="posts")
|
||||
|
||||
Base.metadata.create_all(bind=engine)
|
||||
```
|
||||
|
||||
## schemas.py
|
||||
|
||||
```python
|
||||
from datetime import datetime
|
||||
from pydantic import BaseModel, ConfigDict
|
||||
|
||||
class AuthorCreate(BaseModel):
|
||||
name: str
|
||||
email: str
|
||||
bio: str | None = None
|
||||
|
||||
class AuthorResponse(AuthorCreate):
|
||||
id: int
|
||||
model_config = ConfigDict(from_attributes=True)
|
||||
|
||||
class PostCreate(BaseModel):
|
||||
title: str
|
||||
content: str
|
||||
author_id: int
|
||||
published_at: datetime | None = None
|
||||
status: str = "draft"
|
||||
|
||||
class PostResponse(PostCreate):
|
||||
id: int
|
||||
model_config = ConfigDict(from_attributes=True)
|
||||
```
|
||||
|
||||
## test_main.py — 6 tests per entity, create parent FIRST for child tests
|
||||
|
||||
```python
|
||||
client = TestClient(app) # same setup as above
|
||||
|
||||
def _create_author(name="Kirjailija", email=None):
|
||||
if email is None:
|
||||
email = f"{name.lower().replace(' ', '.')}@example.com"
|
||||
return client.post("/authors/", json={"name": name, "email": email}).json()
|
||||
|
||||
def test_create_author():
|
||||
response = client.post("/authors/", json={"name": "Aleksis Kivi", "email": "aleksis@example.com"})
|
||||
assert response.status_code == 201
|
||||
|
||||
def test_list_authors():
|
||||
_create_author("Minna Canth", "minna@example.com")
|
||||
response = client.get("/authors/")
|
||||
assert response.status_code == 200
|
||||
assert len(response.json()) >= 1
|
||||
|
||||
# ... (same pattern: get_by_id, not_found, update, delete)
|
||||
|
||||
def test_create_post():
|
||||
author = _create_author("Tove Jansson", "tove@example.com")
|
||||
response = client.post("/posts/", json={"title": "Artikkeli", "content": "Sisältö", "author_id": author["id"]})
|
||||
assert response.status_code == 201
|
||||
|
||||
def test_update_post():
|
||||
author = _create_author("Joel Lehtonen", "joel@example.com")
|
||||
created = client.post("/posts/", json={"title": "Vanha", "content": "Teksti", "author_id": author["id"]}).json()
|
||||
response = client.put(f"/posts/{created['id']}", json={"title": "Uusi", "content": "Muokattu", "author_id": author["id"]})
|
||||
assert response.status_code == 200
|
||||
|
||||
def test_delete_post():
|
||||
author = _create_author("Aino Kallas", "aino@example.com")
|
||||
created = client.post("/posts/", json={"title": "Poistettava", "content": "Poistetaan", "author_id": author["id"]}).json()
|
||||
response = client.delete(f"/posts/{created['id']}")
|
||||
assert response.status_code == 204
|
||||
```
|
||||
325
kipina-codebench/golden-examples/todo-go.md
Normal file
325
kipina-codebench/golden-examples/todo-go.md
Normal file
@@ -0,0 +1,325 @@
|
||||
# Todo — reference implementation (Go + Chi + SQLite)
|
||||
|
||||
This is a complete example. Generate equivalent structure for the given project.
|
||||
Use ONLY the fields from the JSON spec — do not add extras.
|
||||
|
||||
## go.mod
|
||||
|
||||
Chi v5 router, modernc.org/sqlite (pure Go, no CGO).
|
||||
|
||||
```
|
||||
module todo-go
|
||||
|
||||
go 1.23.0
|
||||
|
||||
toolchain go1.23.12
|
||||
|
||||
require (
|
||||
github.com/go-chi/chi/v5 v5.2.1
|
||||
modernc.org/sqlite v1.37.1
|
||||
)
|
||||
|
||||
require (
|
||||
github.com/dustin/go-humanize v1.0.1 // indirect
|
||||
github.com/google/uuid v1.6.0 // indirect
|
||||
github.com/mattn/go-isatty v0.0.20 // indirect
|
||||
github.com/ncruces/go-strftime v0.1.9 // indirect
|
||||
github.com/remyoudompheng/bigfft v0.0.0-20230129092748-24d4a6f8daec // indirect
|
||||
golang.org/x/exp v0.0.0-20250408133849-7e4ce0ab07d0 // indirect
|
||||
golang.org/x/sys v0.33.0 // indirect
|
||||
modernc.org/libc v1.65.7 // indirect
|
||||
modernc.org/mathutil v1.7.1 // indirect
|
||||
modernc.org/memory v1.11.0 // indirect
|
||||
)
|
||||
```
|
||||
|
||||
## models.go
|
||||
|
||||
Data structs: Todo (full row), CreateTodo (POST), UpdateTodo (PUT, all fields optional pointers).
|
||||
|
||||
```go
|
||||
package main
|
||||
|
||||
// Todo represents a task with priority and status tracking.
|
||||
type Todo struct {
|
||||
ID int64 `json:"id"`
|
||||
Title string `json:"title"`
|
||||
Description *string `json:"description,omitempty"`
|
||||
DueDate *string `json:"due_date,omitempty"`
|
||||
Priority int64 `json:"priority"`
|
||||
Status string `json:"status"`
|
||||
}
|
||||
|
||||
// CreateTodo is the request body for creating a new todo.
|
||||
type CreateTodo struct {
|
||||
Title string `json:"title"`
|
||||
Description *string `json:"description,omitempty"`
|
||||
DueDate *string `json:"due_date,omitempty"`
|
||||
Priority *int64 `json:"priority,omitempty"`
|
||||
Status *string `json:"status,omitempty"`
|
||||
}
|
||||
|
||||
// UpdateTodo is the request body for updating an existing todo.
|
||||
type UpdateTodo struct {
|
||||
Title *string `json:"title,omitempty"`
|
||||
Description *string `json:"description,omitempty"`
|
||||
DueDate *string `json:"due_date,omitempty"`
|
||||
Priority *int64 `json:"priority,omitempty"`
|
||||
Status *string `json:"status,omitempty"`
|
||||
}
|
||||
```
|
||||
|
||||
## handlers.go
|
||||
|
||||
CRUD handlers as closures taking *sql.DB. Key patterns: INSERT RETURNING, sql.ErrNoRows for 404, RowsAffected for delete.
|
||||
|
||||
```go
|
||||
package main
|
||||
|
||||
import (
|
||||
"database/sql"
|
||||
"encoding/json"
|
||||
"net/http"
|
||||
"strconv"
|
||||
"github.com/go-chi/chi/v5"
|
||||
)
|
||||
|
||||
// POST — decode JSON, defaults with nil-check, INSERT RETURNING, StatusCreated.
|
||||
func createTodo(db *sql.DB) http.HandlerFunc {
|
||||
return func(w http.ResponseWriter, r *http.Request) {
|
||||
var input CreateTodo
|
||||
if err := json.NewDecoder(r.Body).Decode(&input); err != nil {
|
||||
http.Error(w, err.Error(), http.StatusBadRequest); return
|
||||
}
|
||||
priority := int64(1)
|
||||
if input.Priority != nil { priority = *input.Priority }
|
||||
status := "pending"
|
||||
if input.Status != nil { status = *input.Status }
|
||||
var todo Todo
|
||||
err := db.QueryRow(
|
||||
`INSERT INTO todos (title, description, due_date, priority, status)
|
||||
VALUES (?, ?, ?, ?, ?) RETURNING id, title, description, due_date, priority, status`,
|
||||
input.Title, input.Description, input.DueDate, priority, status,
|
||||
).Scan(&todo.ID, &todo.Title, &todo.Description, &todo.DueDate, &todo.Priority, &todo.Status)
|
||||
if err != nil { http.Error(w, err.Error(), http.StatusInternalServerError); return }
|
||||
w.Header().Set("Content-Type", "application/json")
|
||||
w.WriteHeader(http.StatusCreated)
|
||||
json.NewEncoder(w).Encode(todo)
|
||||
}
|
||||
}
|
||||
|
||||
// GET list — db.Query + rows.Scan loop, empty slice not nil.
|
||||
func listTodos(db *sql.DB) http.HandlerFunc {
|
||||
return func(w http.ResponseWriter, r *http.Request) {
|
||||
rows, err := db.Query("SELECT id, title, description, due_date, priority, status FROM todos")
|
||||
if err != nil { http.Error(w, err.Error(), http.StatusInternalServerError); return }
|
||||
defer rows.Close()
|
||||
todos := []Todo{}
|
||||
for rows.Next() {
|
||||
var t Todo
|
||||
rows.Scan(&t.ID, &t.Title, &t.Description, &t.DueDate, &t.Priority, &t.Status)
|
||||
todos = append(todos, t)
|
||||
}
|
||||
w.Header().Set("Content-Type", "application/json")
|
||||
json.NewEncoder(w).Encode(todos)
|
||||
}
|
||||
}
|
||||
|
||||
// GET by id — QueryRow + sql.ErrNoRows → 404.
|
||||
func getTodo(db *sql.DB) http.HandlerFunc {
|
||||
return func(w http.ResponseWriter, r *http.Request) {
|
||||
id, _ := strconv.ParseInt(chi.URLParam(r, "id"), 10, 64)
|
||||
var todo Todo
|
||||
err := db.QueryRow(
|
||||
"SELECT id, title, description, due_date, priority, status FROM todos WHERE id = ?", id,
|
||||
).Scan(&todo.ID, &todo.Title, &todo.Description, &todo.DueDate, &todo.Priority, &todo.Status)
|
||||
if err == sql.ErrNoRows { http.Error(w, "not found", http.StatusNotFound); return }
|
||||
if err != nil { http.Error(w, err.Error(), http.StatusInternalServerError); return }
|
||||
w.Header().Set("Content-Type", "application/json")
|
||||
json.NewEncoder(w).Encode(todo)
|
||||
}
|
||||
}
|
||||
|
||||
// PUT — fetch existing, merge with input nil-checks, UPDATE RETURNING.
|
||||
func updateTodo(db *sql.DB) http.HandlerFunc {
|
||||
return func(w http.ResponseWriter, r *http.Request) {
|
||||
id, _ := strconv.ParseInt(chi.URLParam(r, "id"), 10, 64)
|
||||
var existing Todo
|
||||
err := db.QueryRow("SELECT id, title, description, due_date, priority, status FROM todos WHERE id = ?", id,
|
||||
).Scan(&existing.ID, &existing.Title, &existing.Description, &existing.DueDate, &existing.Priority, &existing.Status)
|
||||
if err == sql.ErrNoRows { http.Error(w, "not found", http.StatusNotFound); return }
|
||||
if err != nil { http.Error(w, err.Error(), http.StatusInternalServerError); return }
|
||||
var input UpdateTodo
|
||||
if err := json.NewDecoder(r.Body).Decode(&input); err != nil {
|
||||
http.Error(w, err.Error(), http.StatusBadRequest); return
|
||||
}
|
||||
if input.Title != nil { existing.Title = *input.Title }
|
||||
if input.Description != nil { existing.Description = input.Description }
|
||||
if input.DueDate != nil { existing.DueDate = input.DueDate }
|
||||
if input.Priority != nil { existing.Priority = *input.Priority }
|
||||
if input.Status != nil { existing.Status = *input.Status }
|
||||
var updated Todo
|
||||
err = db.QueryRow(
|
||||
`UPDATE todos SET title=?, description=?, due_date=?, priority=?, status=? WHERE id=?
|
||||
RETURNING id, title, description, due_date, priority, status`,
|
||||
existing.Title, existing.Description, existing.DueDate, existing.Priority, existing.Status, id,
|
||||
).Scan(&updated.ID, &updated.Title, &updated.Description, &updated.DueDate, &updated.Priority, &updated.Status)
|
||||
if err != nil { http.Error(w, err.Error(), http.StatusInternalServerError); return }
|
||||
w.Header().Set("Content-Type", "application/json")
|
||||
json.NewEncoder(w).Encode(updated)
|
||||
}
|
||||
}
|
||||
|
||||
// DELETE — Exec + RowsAffected == 0 → 404, else 204.
|
||||
func deleteTodo(db *sql.DB) http.HandlerFunc {
|
||||
return func(w http.ResponseWriter, r *http.Request) {
|
||||
id, _ := strconv.ParseInt(chi.URLParam(r, "id"), 10, 64)
|
||||
result, err := db.Exec("DELETE FROM todos WHERE id = ?", id)
|
||||
if err != nil { http.Error(w, err.Error(), http.StatusInternalServerError); return }
|
||||
rows, _ := result.RowsAffected()
|
||||
if rows == 0 { http.Error(w, "not found", http.StatusNotFound); return }
|
||||
w.WriteHeader(http.StatusNoContent)
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## main.go
|
||||
|
||||
Entry point: SQLite connection, table init, Chi router on port 3000.
|
||||
|
||||
```go
|
||||
package main
|
||||
|
||||
import (
|
||||
"database/sql"
|
||||
"fmt"
|
||||
"log"
|
||||
"net/http"
|
||||
|
||||
"github.com/go-chi/chi/v5"
|
||||
_ "modernc.org/sqlite"
|
||||
)
|
||||
|
||||
// InitDB creates tables if they don't exist.
|
||||
func InitDB(db *sql.DB) {
|
||||
_, err := db.Exec(`CREATE TABLE IF NOT EXISTS todos (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
title TEXT NOT NULL,
|
||||
description TEXT,
|
||||
due_date TEXT,
|
||||
priority INTEGER NOT NULL DEFAULT 1,
|
||||
status TEXT NOT NULL DEFAULT 'pending'
|
||||
)`)
|
||||
if err != nil {
|
||||
log.Fatal(err)
|
||||
}
|
||||
}
|
||||
|
||||
// NewRouter creates a chi router with all routes.
|
||||
func NewRouter(db *sql.DB) http.Handler {
|
||||
r := chi.NewRouter()
|
||||
r.Post("/todos", createTodo(db))
|
||||
r.Get("/todos", listTodos(db))
|
||||
r.Get("/todos/{id}", getTodo(db))
|
||||
r.Put("/todos/{id}", updateTodo(db))
|
||||
r.Delete("/todos/{id}", deleteTodo(db))
|
||||
return r
|
||||
}
|
||||
|
||||
func main() {
|
||||
db, err := sql.Open("sqlite", "file:app.db?mode=rwc")
|
||||
if err != nil {
|
||||
log.Fatal(err)
|
||||
}
|
||||
defer db.Close()
|
||||
InitDB(db)
|
||||
|
||||
fmt.Println("Server running: http://127.0.0.1:3000")
|
||||
log.Fatal(http.ListenAndServe("127.0.0.1:3000", NewRouter(db)))
|
||||
}
|
||||
```
|
||||
|
||||
## handlers_test.go
|
||||
|
||||
Integration tests: setupTestServer with httptest.NewServer + :memory: SQLite, unique data per test.
|
||||
|
||||
```go
|
||||
package main
|
||||
|
||||
import (
|
||||
"database/sql"
|
||||
"encoding/json"
|
||||
"fmt"
|
||||
"net/http"
|
||||
"net/http/httptest"
|
||||
"strings"
|
||||
"testing"
|
||||
_ "modernc.org/sqlite"
|
||||
)
|
||||
|
||||
func setupTestServer(t *testing.T) (*httptest.Server, *sql.DB) {
|
||||
t.Helper()
|
||||
db, err := sql.Open("sqlite", ":memory:")
|
||||
if err != nil { t.Fatal(err) }
|
||||
InitDB(db)
|
||||
return httptest.NewServer(NewRouter(db)), db
|
||||
}
|
||||
|
||||
func TestCreateTodo(t *testing.T) {
|
||||
ts, db := setupTestServer(t)
|
||||
defer ts.Close()
|
||||
defer db.Close()
|
||||
resp, err := http.Post(ts.URL+"/todos", "application/json",
|
||||
strings.NewReader(`{"title":"Buy groceries","priority":2}`))
|
||||
if err != nil { t.Fatal(err) }
|
||||
defer resp.Body.Close()
|
||||
if resp.StatusCode != http.StatusCreated { t.Fatalf("expected 201, got %d", resp.StatusCode) }
|
||||
var body map[string]interface{}
|
||||
json.NewDecoder(resp.Body).Decode(&body)
|
||||
if body["title"] != "Buy groceries" { t.Fatalf("expected 'Buy groceries', got %v", body["title"]) }
|
||||
if body["id"] == nil { t.Fatal("expected id") }
|
||||
}
|
||||
|
||||
func TestGetTodoByID(t *testing.T) {
|
||||
ts, db := setupTestServer(t)
|
||||
defer ts.Close()
|
||||
defer db.Close()
|
||||
resp, _ := http.Post(ts.URL+"/todos", "application/json",
|
||||
strings.NewReader(`{"title":"Fetchable task"}`))
|
||||
var created map[string]interface{}
|
||||
json.NewDecoder(resp.Body).Decode(&created)
|
||||
resp.Body.Close()
|
||||
id := created["id"].(float64)
|
||||
resp, _ = http.Get(ts.URL + "/todos/" + fmt.Sprintf("%.0f", id))
|
||||
defer resp.Body.Close()
|
||||
if resp.StatusCode != http.StatusOK { t.Fatalf("expected 200, got %d", resp.StatusCode) }
|
||||
}
|
||||
|
||||
func TestGetTodoNotFound(t *testing.T) {
|
||||
ts, db := setupTestServer(t)
|
||||
defer ts.Close()
|
||||
defer db.Close()
|
||||
resp, _ := http.Get(ts.URL + "/todos/99999")
|
||||
defer resp.Body.Close()
|
||||
if resp.StatusCode != http.StatusNotFound { t.Fatalf("expected 404, got %d", resp.StatusCode) }
|
||||
}
|
||||
|
||||
func TestDeleteTodo(t *testing.T) {
|
||||
ts, db := setupTestServer(t)
|
||||
defer ts.Close()
|
||||
defer db.Close()
|
||||
resp, _ := http.Post(ts.URL+"/todos", "application/json",
|
||||
strings.NewReader(`{"title":"Deletable task"}`))
|
||||
var created map[string]interface{}
|
||||
json.NewDecoder(resp.Body).Decode(&created)
|
||||
resp.Body.Close()
|
||||
id := created["id"].(float64)
|
||||
req, _ := http.NewRequest(http.MethodDelete, ts.URL+"/todos/"+fmt.Sprintf("%.0f", id), nil)
|
||||
resp, _ = http.DefaultClient.Do(req)
|
||||
defer resp.Body.Close()
|
||||
if resp.StatusCode != http.StatusNoContent { t.Fatalf("expected 204, got %d", resp.StatusCode) }
|
||||
resp, _ = http.Get(ts.URL + "/todos/" + fmt.Sprintf("%.0f", id))
|
||||
defer resp.Body.Close()
|
||||
if resp.StatusCode != http.StatusNotFound { t.Fatalf("expected 404 after delete, got %d", resp.StatusCode) }
|
||||
}
|
||||
```
|
||||
23
kipina-codebench/golden-examples/todo-go/go.mod
Normal file
23
kipina-codebench/golden-examples/todo-go/go.mod
Normal file
@@ -0,0 +1,23 @@
|
||||
module todo-go
|
||||
|
||||
go 1.23.0
|
||||
|
||||
toolchain go1.23.12
|
||||
|
||||
require (
|
||||
github.com/go-chi/chi/v5 v5.2.1
|
||||
modernc.org/sqlite v1.37.1
|
||||
)
|
||||
|
||||
require (
|
||||
github.com/dustin/go-humanize v1.0.1 // indirect
|
||||
github.com/google/uuid v1.6.0 // indirect
|
||||
github.com/mattn/go-isatty v0.0.20 // indirect
|
||||
github.com/ncruces/go-strftime v0.1.9 // indirect
|
||||
github.com/remyoudompheng/bigfft v0.0.0-20230129092748-24d4a6f8daec // indirect
|
||||
golang.org/x/exp v0.0.0-20250408133849-7e4ce0ab07d0 // indirect
|
||||
golang.org/x/sys v0.33.0 // indirect
|
||||
modernc.org/libc v1.65.7 // indirect
|
||||
modernc.org/mathutil v1.7.1 // indirect
|
||||
modernc.org/memory v1.11.0 // indirect
|
||||
)
|
||||
49
kipina-codebench/golden-examples/todo-go/go.sum
Normal file
49
kipina-codebench/golden-examples/todo-go/go.sum
Normal file
@@ -0,0 +1,49 @@
|
||||
github.com/dustin/go-humanize v1.0.1 h1:GzkhY7T5VNhEkwH0PVJgjz+fX1rhBrR7pRT3mDkpeCY=
|
||||
github.com/dustin/go-humanize v1.0.1/go.mod h1:Mu1zIs6XwVuF/gI1OepvI0qD18qycQx+mFykh5fBlto=
|
||||
github.com/go-chi/chi/v5 v5.2.1 h1:KOIHODQj58PmL80G2Eak4WdvUzjSJSm0vG72crDCqb8=
|
||||
github.com/go-chi/chi/v5 v5.2.1/go.mod h1:L2yAIGWB3H+phAw1NxKwWM+7eUH/lU8pOMm5hHcoops=
|
||||
github.com/google/pprof v0.0.0-20250317173921-a4b03ec1a45e h1:ijClszYn+mADRFY17kjQEVQ1XRhq2/JR1M3sGqeJoxs=
|
||||
github.com/google/pprof v0.0.0-20250317173921-a4b03ec1a45e/go.mod h1:boTsfXsheKC2y+lKOCMpSfarhxDeIzfZG1jqGcPl3cA=
|
||||
github.com/google/uuid v1.6.0 h1:NIvaJDMOsjHA8n1jAhLSgzrAzy1Hgr+hNrb57e+94F0=
|
||||
github.com/google/uuid v1.6.0/go.mod h1:TIyPZe4MgqvfeYDBFedMoGGpEw/LqOeaOT+nhxU+yHo=
|
||||
github.com/mattn/go-isatty v0.0.20 h1:xfD0iDuEKnDkl03q4limB+vH+GxLEtL/jb4xVJSWWEY=
|
||||
github.com/mattn/go-isatty v0.0.20/go.mod h1:W+V8PltTTMOvKvAeJH7IuucS94S2C6jfK/D7dTCTo3Y=
|
||||
github.com/ncruces/go-strftime v0.1.9 h1:bY0MQC28UADQmHmaF5dgpLmImcShSi2kHU9XLdhx/f4=
|
||||
github.com/ncruces/go-strftime v0.1.9/go.mod h1:Fwc5htZGVVkseilnfgOVb9mKy6w1naJmn9CehxcKcls=
|
||||
github.com/remyoudompheng/bigfft v0.0.0-20230129092748-24d4a6f8daec h1:W09IVJc94icq4NjY3clb7Lk8O1qJ8BdBEF8z0ibU0rE=
|
||||
github.com/remyoudompheng/bigfft v0.0.0-20230129092748-24d4a6f8daec/go.mod h1:qqbHyh8v60DhA7CoWK5oRCqLrMHRGoxYCSS9EjAz6Eo=
|
||||
golang.org/x/exp v0.0.0-20250408133849-7e4ce0ab07d0 h1:R84qjqJb5nVJMxqWYb3np9L5ZsaDtB+a39EqjV0JSUM=
|
||||
golang.org/x/exp v0.0.0-20250408133849-7e4ce0ab07d0/go.mod h1:S9Xr4PYopiDyqSyp5NjCrhFrqg6A5zA2E/iPHPhqnS8=
|
||||
golang.org/x/mod v0.24.0 h1:ZfthKaKaT4NrhGVZHO1/WDTwGES4De8KtWO0SIbNJMU=
|
||||
golang.org/x/mod v0.24.0/go.mod h1:IXM97Txy2VM4PJ3gI61r1YEk/gAj6zAHN3AdZt6S9Ww=
|
||||
golang.org/x/sync v0.14.0 h1:woo0S4Yywslg6hp4eUFjTVOyKt0RookbpAHG4c1HmhQ=
|
||||
golang.org/x/sync v0.14.0/go.mod h1:1dzgHSNfp02xaA81J2MS99Qcpr2w7fw1gpm99rleRqA=
|
||||
golang.org/x/sys v0.6.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
|
||||
golang.org/x/sys v0.33.0 h1:q3i8TbbEz+JRD9ywIRlyRAQbM0qF7hu24q3teo2hbuw=
|
||||
golang.org/x/sys v0.33.0/go.mod h1:BJP2sWEmIv4KK5OTEluFJCKSidICx8ciO85XgH3Ak8k=
|
||||
golang.org/x/tools v0.33.0 h1:4qz2S3zmRxbGIhDIAgjxvFutSvH5EfnsYrRBj0UI0bc=
|
||||
golang.org/x/tools v0.33.0/go.mod h1:CIJMaWEY88juyUfo7UbgPqbC8rU2OqfAV1h2Qp0oMYI=
|
||||
modernc.org/cc/v4 v4.26.1 h1:+X5NtzVBn0KgsBCBe+xkDC7twLb/jNVj9FPgiwSQO3s=
|
||||
modernc.org/cc/v4 v4.26.1/go.mod h1:uVtb5OGqUKpoLWhqwNQo/8LwvoiEBLvZXIQ/SmO6mL0=
|
||||
modernc.org/ccgo/v4 v4.28.0 h1:rjznn6WWehKq7dG4JtLRKxb52Ecv8OUGah8+Z/SfpNU=
|
||||
modernc.org/ccgo/v4 v4.28.0/go.mod h1:JygV3+9AV6SmPhDasu4JgquwU81XAKLd3OKTUDNOiKE=
|
||||
modernc.org/fileutil v1.3.1 h1:8vq5fe7jdtEvoCf3Zf9Nm0Q05sH6kGx0Op2CPx1wTC8=
|
||||
modernc.org/fileutil v1.3.1/go.mod h1:HxmghZSZVAz/LXcMNwZPA/DRrQZEVP9VX0V4LQGQFOc=
|
||||
modernc.org/gc/v2 v2.6.5 h1:nyqdV8q46KvTpZlsw66kWqwXRHdjIlJOhG6kxiV/9xI=
|
||||
modernc.org/gc/v2 v2.6.5/go.mod h1:YgIahr1ypgfe7chRuJi2gD7DBQiKSLMPgBQe9oIiito=
|
||||
modernc.org/libc v1.65.7 h1:Ia9Z4yzZtWNtUIuiPuQ7Qf7kxYrxP1/jeHZzG8bFu00=
|
||||
modernc.org/libc v1.65.7/go.mod h1:011EQibzzio/VX3ygj1qGFt5kMjP0lHb0qCW5/D/pQU=
|
||||
modernc.org/mathutil v1.7.1 h1:GCZVGXdaN8gTqB1Mf/usp1Y/hSqgI2vAGGP4jZMCxOU=
|
||||
modernc.org/mathutil v1.7.1/go.mod h1:4p5IwJITfppl0G4sUEDtCr4DthTaT47/N3aT6MhfgJg=
|
||||
modernc.org/memory v1.11.0 h1:o4QC8aMQzmcwCK3t3Ux/ZHmwFPzE6hf2Y5LbkRs+hbI=
|
||||
modernc.org/memory v1.11.0/go.mod h1:/JP4VbVC+K5sU2wZi9bHoq2MAkCnrt2r98UGeSK7Mjw=
|
||||
modernc.org/opt v0.1.4 h1:2kNGMRiUjrp4LcaPuLY2PzUfqM/w9N23quVwhKt5Qm8=
|
||||
modernc.org/opt v0.1.4/go.mod h1:03fq9lsNfvkYSfxrfUhZCWPk1lm4cq4N+Bh//bEtgns=
|
||||
modernc.org/sortutil v1.2.1 h1:+xyoGf15mM3NMlPDnFqrteY07klSFxLElE2PVuWIJ7w=
|
||||
modernc.org/sortutil v1.2.1/go.mod h1:7ZI3a3REbai7gzCLcotuw9AC4VZVpYMjDzETGsSMqJE=
|
||||
modernc.org/sqlite v1.37.1 h1:EgHJK/FPoqC+q2YBXg7fUmES37pCHFc97sI7zSayBEs=
|
||||
modernc.org/sqlite v1.37.1/go.mod h1:XwdRtsE1MpiBcL54+MbKcaDvcuej+IYSMfLN6gSKV8g=
|
||||
modernc.org/strutil v1.2.1 h1:UneZBkQA+DX2Rp35KcM69cSsNES9ly8mQWD71HKlOA0=
|
||||
modernc.org/strutil v1.2.1/go.mod h1:EHkiggD70koQxjVdSBM3JKM7k6L0FbGE5eymy9i3B9A=
|
||||
modernc.org/token v1.1.0 h1:Xl7Ap9dKaEs5kLoOQeQmPWevfnk/DM5qcLcYlA8ys6Y=
|
||||
modernc.org/token v1.1.0/go.mod h1:UGzOrNV1mAFSEB63lOFHIpNRUVMvYTc6yu1SMY/XTDM=
|
||||
155
kipina-codebench/golden-examples/todo-go/handlers.go
Normal file
155
kipina-codebench/golden-examples/todo-go/handlers.go
Normal file
@@ -0,0 +1,155 @@
|
||||
package main
|
||||
|
||||
import (
|
||||
"database/sql"
|
||||
"encoding/json"
|
||||
"net/http"
|
||||
"strconv"
|
||||
|
||||
"github.com/go-chi/chi/v5"
|
||||
)
|
||||
|
||||
func createTodo(db *sql.DB) http.HandlerFunc {
|
||||
return func(w http.ResponseWriter, r *http.Request) {
|
||||
var input CreateTodo
|
||||
if err := json.NewDecoder(r.Body).Decode(&input); err != nil {
|
||||
http.Error(w, err.Error(), http.StatusBadRequest)
|
||||
return
|
||||
}
|
||||
priority := int64(1)
|
||||
if input.Priority != nil {
|
||||
priority = *input.Priority
|
||||
}
|
||||
status := "pending"
|
||||
if input.Status != nil {
|
||||
status = *input.Status
|
||||
}
|
||||
var todo Todo
|
||||
err := db.QueryRow(
|
||||
`INSERT INTO todos (title, description, due_date, priority, status)
|
||||
VALUES (?, ?, ?, ?, ?)
|
||||
RETURNING id, title, description, due_date, priority, status`,
|
||||
input.Title, input.Description, input.DueDate, priority, status,
|
||||
).Scan(&todo.ID, &todo.Title, &todo.Description, &todo.DueDate, &todo.Priority, &todo.Status)
|
||||
if err != nil {
|
||||
http.Error(w, err.Error(), http.StatusInternalServerError)
|
||||
return
|
||||
}
|
||||
w.Header().Set("Content-Type", "application/json")
|
||||
w.WriteHeader(http.StatusCreated)
|
||||
json.NewEncoder(w).Encode(todo)
|
||||
}
|
||||
}
|
||||
|
||||
func listTodos(db *sql.DB) http.HandlerFunc {
|
||||
return func(w http.ResponseWriter, r *http.Request) {
|
||||
rows, err := db.Query("SELECT id, title, description, due_date, priority, status FROM todos")
|
||||
if err != nil {
|
||||
http.Error(w, err.Error(), http.StatusInternalServerError)
|
||||
return
|
||||
}
|
||||
defer rows.Close()
|
||||
var todos []Todo
|
||||
for rows.Next() {
|
||||
var t Todo
|
||||
if err := rows.Scan(&t.ID, &t.Title, &t.Description, &t.DueDate, &t.Priority, &t.Status); err != nil {
|
||||
http.Error(w, err.Error(), http.StatusInternalServerError)
|
||||
return
|
||||
}
|
||||
todos = append(todos, t)
|
||||
}
|
||||
if todos == nil {
|
||||
todos = []Todo{}
|
||||
}
|
||||
w.Header().Set("Content-Type", "application/json")
|
||||
json.NewEncoder(w).Encode(todos)
|
||||
}
|
||||
}
|
||||
|
||||
func getTodo(db *sql.DB) http.HandlerFunc {
|
||||
return func(w http.ResponseWriter, r *http.Request) {
|
||||
id, _ := strconv.ParseInt(chi.URLParam(r, "id"), 10, 64)
|
||||
var todo Todo
|
||||
err := db.QueryRow(
|
||||
"SELECT id, title, description, due_date, priority, status FROM todos WHERE id = ?", id,
|
||||
).Scan(&todo.ID, &todo.Title, &todo.Description, &todo.DueDate, &todo.Priority, &todo.Status)
|
||||
if err == sql.ErrNoRows {
|
||||
http.Error(w, "not found", http.StatusNotFound)
|
||||
return
|
||||
}
|
||||
if err != nil {
|
||||
http.Error(w, err.Error(), http.StatusInternalServerError)
|
||||
return
|
||||
}
|
||||
w.Header().Set("Content-Type", "application/json")
|
||||
json.NewEncoder(w).Encode(todo)
|
||||
}
|
||||
}
|
||||
|
||||
func updateTodo(db *sql.DB) http.HandlerFunc {
|
||||
return func(w http.ResponseWriter, r *http.Request) {
|
||||
id, _ := strconv.ParseInt(chi.URLParam(r, "id"), 10, 64)
|
||||
var existing Todo
|
||||
err := db.QueryRow(
|
||||
"SELECT id, title, description, due_date, priority, status FROM todos WHERE id = ?", id,
|
||||
).Scan(&existing.ID, &existing.Title, &existing.Description, &existing.DueDate, &existing.Priority, &existing.Status)
|
||||
if err == sql.ErrNoRows {
|
||||
http.Error(w, "not found", http.StatusNotFound)
|
||||
return
|
||||
}
|
||||
if err != nil {
|
||||
http.Error(w, err.Error(), http.StatusInternalServerError)
|
||||
return
|
||||
}
|
||||
var input UpdateTodo
|
||||
if err := json.NewDecoder(r.Body).Decode(&input); err != nil {
|
||||
http.Error(w, err.Error(), http.StatusBadRequest)
|
||||
return
|
||||
}
|
||||
if input.Title != nil {
|
||||
existing.Title = *input.Title
|
||||
}
|
||||
if input.Description != nil {
|
||||
existing.Description = input.Description
|
||||
}
|
||||
if input.DueDate != nil {
|
||||
existing.DueDate = input.DueDate
|
||||
}
|
||||
if input.Priority != nil {
|
||||
existing.Priority = *input.Priority
|
||||
}
|
||||
if input.Status != nil {
|
||||
existing.Status = *input.Status
|
||||
}
|
||||
var updated Todo
|
||||
err = db.QueryRow(
|
||||
`UPDATE todos SET title = ?, description = ?, due_date = ?, priority = ?, status = ?
|
||||
WHERE id = ?
|
||||
RETURNING id, title, description, due_date, priority, status`,
|
||||
existing.Title, existing.Description, existing.DueDate, existing.Priority, existing.Status, id,
|
||||
).Scan(&updated.ID, &updated.Title, &updated.Description, &updated.DueDate, &updated.Priority, &updated.Status)
|
||||
if err != nil {
|
||||
http.Error(w, err.Error(), http.StatusInternalServerError)
|
||||
return
|
||||
}
|
||||
w.Header().Set("Content-Type", "application/json")
|
||||
json.NewEncoder(w).Encode(updated)
|
||||
}
|
||||
}
|
||||
|
||||
func deleteTodo(db *sql.DB) http.HandlerFunc {
|
||||
return func(w http.ResponseWriter, r *http.Request) {
|
||||
id, _ := strconv.ParseInt(chi.URLParam(r, "id"), 10, 64)
|
||||
result, err := db.Exec("DELETE FROM todos WHERE id = ?", id)
|
||||
if err != nil {
|
||||
http.Error(w, err.Error(), http.StatusInternalServerError)
|
||||
return
|
||||
}
|
||||
rows, _ := result.RowsAffected()
|
||||
if rows == 0 {
|
||||
http.Error(w, "not found", http.StatusNotFound)
|
||||
return
|
||||
}
|
||||
w.WriteHeader(http.StatusNoContent)
|
||||
}
|
||||
}
|
||||
171
kipina-codebench/golden-examples/todo-go/handlers_test.go
Normal file
171
kipina-codebench/golden-examples/todo-go/handlers_test.go
Normal file
@@ -0,0 +1,171 @@
|
||||
package main
|
||||
|
||||
import (
|
||||
"database/sql"
|
||||
"encoding/json"
|
||||
"fmt"
|
||||
"net/http"
|
||||
"net/http/httptest"
|
||||
"strings"
|
||||
"testing"
|
||||
|
||||
_ "modernc.org/sqlite"
|
||||
)
|
||||
|
||||
func setupTestServer(t *testing.T) (*httptest.Server, *sql.DB) {
|
||||
t.Helper()
|
||||
db, err := sql.Open("sqlite", ":memory:")
|
||||
if err != nil {
|
||||
t.Fatal(err)
|
||||
}
|
||||
InitDB(db)
|
||||
return httptest.NewServer(NewRouter(db)), db
|
||||
}
|
||||
|
||||
func TestCreateTodo(t *testing.T) {
|
||||
ts, db := setupTestServer(t)
|
||||
defer ts.Close()
|
||||
defer db.Close()
|
||||
|
||||
resp, err := http.Post(ts.URL+"/todos", "application/json",
|
||||
strings.NewReader(`{"title":"Buy groceries","priority":2}`))
|
||||
if err != nil {
|
||||
t.Fatal(err)
|
||||
}
|
||||
defer resp.Body.Close()
|
||||
if resp.StatusCode != http.StatusCreated {
|
||||
t.Fatalf("expected 201, got %d", resp.StatusCode)
|
||||
}
|
||||
var body map[string]interface{}
|
||||
json.NewDecoder(resp.Body).Decode(&body)
|
||||
if body["title"] != "Buy groceries" {
|
||||
t.Fatalf("expected title 'Buy groceries', got %v", body["title"])
|
||||
}
|
||||
if body["id"] == nil {
|
||||
t.Fatal("expected id to be present")
|
||||
}
|
||||
}
|
||||
|
||||
func TestListTodos(t *testing.T) {
|
||||
ts, db := setupTestServer(t)
|
||||
defer ts.Close()
|
||||
defer db.Close()
|
||||
|
||||
http.Post(ts.URL+"/todos", "application/json",
|
||||
strings.NewReader(`{"title":"Listable task"}`))
|
||||
|
||||
resp, err := http.Get(ts.URL + "/todos")
|
||||
if err != nil {
|
||||
t.Fatal(err)
|
||||
}
|
||||
defer resp.Body.Close()
|
||||
if resp.StatusCode != http.StatusOK {
|
||||
t.Fatalf("expected 200, got %d", resp.StatusCode)
|
||||
}
|
||||
var body []map[string]interface{}
|
||||
json.NewDecoder(resp.Body).Decode(&body)
|
||||
if len(body) < 1 {
|
||||
t.Fatal("expected at least 1 todo")
|
||||
}
|
||||
}
|
||||
|
||||
func TestGetTodoByID(t *testing.T) {
|
||||
ts, db := setupTestServer(t)
|
||||
defer ts.Close()
|
||||
defer db.Close()
|
||||
|
||||
resp, _ := http.Post(ts.URL+"/todos", "application/json",
|
||||
strings.NewReader(`{"title":"Fetchable task"}`))
|
||||
var created map[string]interface{}
|
||||
json.NewDecoder(resp.Body).Decode(&created)
|
||||
resp.Body.Close()
|
||||
|
||||
id := created["id"].(float64)
|
||||
resp, err := http.Get(ts.URL + "/todos/" + fmt.Sprintf("%.0f", id))
|
||||
if err != nil {
|
||||
t.Fatal(err)
|
||||
}
|
||||
defer resp.Body.Close()
|
||||
if resp.StatusCode != http.StatusOK {
|
||||
t.Fatalf("expected 200, got %d", resp.StatusCode)
|
||||
}
|
||||
var body map[string]interface{}
|
||||
json.NewDecoder(resp.Body).Decode(&body)
|
||||
if body["id"] != id {
|
||||
t.Fatalf("expected id %.0f, got %v", id, body["id"])
|
||||
}
|
||||
}
|
||||
|
||||
func TestGetTodoNotFound(t *testing.T) {
|
||||
ts, db := setupTestServer(t)
|
||||
defer ts.Close()
|
||||
defer db.Close()
|
||||
|
||||
resp, err := http.Get(ts.URL + "/todos/99999")
|
||||
if err != nil {
|
||||
t.Fatal(err)
|
||||
}
|
||||
defer resp.Body.Close()
|
||||
if resp.StatusCode != http.StatusNotFound {
|
||||
t.Fatalf("expected 404, got %d", resp.StatusCode)
|
||||
}
|
||||
}
|
||||
|
||||
func TestUpdateTodo(t *testing.T) {
|
||||
ts, db := setupTestServer(t)
|
||||
defer ts.Close()
|
||||
defer db.Close()
|
||||
|
||||
resp, _ := http.Post(ts.URL+"/todos", "application/json",
|
||||
strings.NewReader(`{"title":"Old title"}`))
|
||||
var created map[string]interface{}
|
||||
json.NewDecoder(resp.Body).Decode(&created)
|
||||
resp.Body.Close()
|
||||
|
||||
id := created["id"].(float64)
|
||||
req, _ := http.NewRequest(http.MethodPut, ts.URL+"/todos/"+fmt.Sprintf("%.0f", id),
|
||||
strings.NewReader(`{"title":"New title"}`))
|
||||
req.Header.Set("Content-Type", "application/json")
|
||||
resp, err := http.DefaultClient.Do(req)
|
||||
if err != nil {
|
||||
t.Fatal(err)
|
||||
}
|
||||
defer resp.Body.Close()
|
||||
if resp.StatusCode != http.StatusOK {
|
||||
t.Fatalf("expected 200, got %d", resp.StatusCode)
|
||||
}
|
||||
var body map[string]interface{}
|
||||
json.NewDecoder(resp.Body).Decode(&body)
|
||||
if body["title"] != "New title" {
|
||||
t.Fatalf("expected 'New title', got %v", body["title"])
|
||||
}
|
||||
}
|
||||
|
||||
func TestDeleteTodo(t *testing.T) {
|
||||
ts, db := setupTestServer(t)
|
||||
defer ts.Close()
|
||||
defer db.Close()
|
||||
|
||||
resp, _ := http.Post(ts.URL+"/todos", "application/json",
|
||||
strings.NewReader(`{"title":"Deletable task"}`))
|
||||
var created map[string]interface{}
|
||||
json.NewDecoder(resp.Body).Decode(&created)
|
||||
resp.Body.Close()
|
||||
|
||||
id := created["id"].(float64)
|
||||
req, _ := http.NewRequest(http.MethodDelete, ts.URL+"/todos/"+fmt.Sprintf("%.0f", id), nil)
|
||||
resp, err := http.DefaultClient.Do(req)
|
||||
if err != nil {
|
||||
t.Fatal(err)
|
||||
}
|
||||
defer resp.Body.Close()
|
||||
if resp.StatusCode != http.StatusNoContent {
|
||||
t.Fatalf("expected 204, got %d", resp.StatusCode)
|
||||
}
|
||||
|
||||
resp, _ = http.Get(ts.URL + "/todos/" + fmt.Sprintf("%.0f", id))
|
||||
defer resp.Body.Close()
|
||||
if resp.StatusCode != http.StatusNotFound {
|
||||
t.Fatalf("expected 404 after delete, got %d", resp.StatusCode)
|
||||
}
|
||||
}
|
||||
49
kipina-codebench/golden-examples/todo-go/main.go
Normal file
49
kipina-codebench/golden-examples/todo-go/main.go
Normal file
@@ -0,0 +1,49 @@
|
||||
package main
|
||||
|
||||
import (
|
||||
"database/sql"
|
||||
"fmt"
|
||||
"log"
|
||||
"net/http"
|
||||
|
||||
"github.com/go-chi/chi/v5"
|
||||
_ "modernc.org/sqlite"
|
||||
)
|
||||
|
||||
// InitDB creates tables if they don't exist.
|
||||
func InitDB(db *sql.DB) {
|
||||
_, err := db.Exec(`CREATE TABLE IF NOT EXISTS todos (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
title TEXT NOT NULL,
|
||||
description TEXT,
|
||||
due_date TEXT,
|
||||
priority INTEGER NOT NULL DEFAULT 1,
|
||||
status TEXT NOT NULL DEFAULT 'pending'
|
||||
)`)
|
||||
if err != nil {
|
||||
log.Fatal(err)
|
||||
}
|
||||
}
|
||||
|
||||
// NewRouter creates a chi router with all routes.
|
||||
func NewRouter(db *sql.DB) http.Handler {
|
||||
r := chi.NewRouter()
|
||||
r.Post("/todos", createTodo(db))
|
||||
r.Get("/todos", listTodos(db))
|
||||
r.Get("/todos/{id}", getTodo(db))
|
||||
r.Put("/todos/{id}", updateTodo(db))
|
||||
r.Delete("/todos/{id}", deleteTodo(db))
|
||||
return r
|
||||
}
|
||||
|
||||
func main() {
|
||||
db, err := sql.Open("sqlite", "file:app.db?mode=rwc")
|
||||
if err != nil {
|
||||
log.Fatal(err)
|
||||
}
|
||||
defer db.Close()
|
||||
InitDB(db)
|
||||
|
||||
fmt.Println("Server running: http://127.0.0.1:3000")
|
||||
log.Fatal(http.ListenAndServe("127.0.0.1:3000", NewRouter(db)))
|
||||
}
|
||||
29
kipina-codebench/golden-examples/todo-go/models.go
Normal file
29
kipina-codebench/golden-examples/todo-go/models.go
Normal file
@@ -0,0 +1,29 @@
|
||||
package main
|
||||
|
||||
// Todo represents a task with priority and status tracking.
|
||||
type Todo struct {
|
||||
ID int64 `json:"id"`
|
||||
Title string `json:"title"`
|
||||
Description *string `json:"description,omitempty"`
|
||||
DueDate *string `json:"due_date,omitempty"`
|
||||
Priority int64 `json:"priority"`
|
||||
Status string `json:"status"`
|
||||
}
|
||||
|
||||
// CreateTodo is the request body for creating a new todo.
|
||||
type CreateTodo struct {
|
||||
Title string `json:"title"`
|
||||
Description *string `json:"description,omitempty"`
|
||||
DueDate *string `json:"due_date,omitempty"`
|
||||
Priority *int64 `json:"priority,omitempty"`
|
||||
Status *string `json:"status,omitempty"`
|
||||
}
|
||||
|
||||
// UpdateTodo is the request body for updating an existing todo.
|
||||
type UpdateTodo struct {
|
||||
Title *string `json:"title,omitempty"`
|
||||
Description *string `json:"description,omitempty"`
|
||||
DueDate *string `json:"due_date,omitempty"`
|
||||
Priority *int64 `json:"priority,omitempty"`
|
||||
Status *string `json:"status,omitempty"`
|
||||
}
|
||||
217
kipina-codebench/golden-examples/todo-readme.md
Normal file
217
kipina-codebench/golden-examples/todo-readme.md
Normal file
@@ -0,0 +1,217 @@
|
||||
# Todo App — FastAPI + SQLAlchemy + SQLite
|
||||
|
||||
A simple todo CRUD API. Uses only the fields defined in the spec — no extra fields.
|
||||
|
||||
## Project Structure
|
||||
|
||||
```
|
||||
models.py # SQLAlchemy 2.0 models
|
||||
schemas.py # Pydantic v2 schemas
|
||||
main.py # FastAPI CRUD endpoints
|
||||
test_main.py # Pytest with TestClient
|
||||
```
|
||||
|
||||
## models.py
|
||||
|
||||
```python
|
||||
"""Tietokantamallit — SQLAlchemy 2.0, Mapped-tyypitys, SQLite."""
|
||||
|
||||
from datetime import date
|
||||
|
||||
from sqlalchemy import String, Text, Date, create_engine
|
||||
from sqlalchemy.orm import DeclarativeBase, Mapped, mapped_column, sessionmaker
|
||||
|
||||
DATABASE_URL = "sqlite:///./app.db"
|
||||
engine = create_engine(DATABASE_URL, connect_args={"check_same_thread": False})
|
||||
SessionLocal = sessionmaker(autocommit=False, autoflush=False, bind=engine)
|
||||
|
||||
|
||||
class Base(DeclarativeBase):
|
||||
pass
|
||||
|
||||
|
||||
class Todo(Base):
|
||||
"""Tehtävä — otsikko, kuvaus, deadline, prioriteetti ja status."""
|
||||
|
||||
__tablename__ = "todos"
|
||||
|
||||
id: Mapped[int] = mapped_column(primary_key=True, index=True)
|
||||
title: Mapped[str] = mapped_column(String(255))
|
||||
description: Mapped[str | None] = mapped_column(Text, default=None)
|
||||
due_date: Mapped[date | None] = mapped_column(Date, default=None)
|
||||
priority: Mapped[int] = mapped_column(default=1)
|
||||
status: Mapped[str] = mapped_column(String(20), default="pending")
|
||||
|
||||
|
||||
Base.metadata.create_all(bind=engine)
|
||||
```
|
||||
|
||||
## schemas.py
|
||||
|
||||
```python
|
||||
"""Pydantic v2 -skeemat — Create sisääntulolle, Response vastaukselle."""
|
||||
|
||||
from datetime import date
|
||||
|
||||
from pydantic import BaseModel, ConfigDict
|
||||
|
||||
|
||||
class TodoCreate(BaseModel):
|
||||
"""Uuden tehtävän luonti. Pakolliset: title."""
|
||||
|
||||
title: str
|
||||
description: str | None = None
|
||||
due_date: date | None = None
|
||||
priority: int = 1
|
||||
status: str = "pending"
|
||||
|
||||
|
||||
class TodoResponse(TodoCreate):
|
||||
"""Palautettava tehtävä — sisältää id:n."""
|
||||
|
||||
id: int
|
||||
model_config = ConfigDict(from_attributes=True)
|
||||
```
|
||||
|
||||
## main.py
|
||||
|
||||
```python
|
||||
"""FastAPI CRUD — yksi endpoint-setti per entiteetti."""
|
||||
|
||||
from fastapi import FastAPI, Depends, HTTPException
|
||||
from sqlalchemy.orm import Session
|
||||
|
||||
from models import SessionLocal, Todo
|
||||
from schemas import TodoCreate, TodoResponse
|
||||
|
||||
app = FastAPI()
|
||||
|
||||
|
||||
def get_db():
|
||||
"""Tietokantasessio per pyyntö."""
|
||||
db = SessionLocal()
|
||||
try:
|
||||
yield db
|
||||
finally:
|
||||
db.close()
|
||||
|
||||
|
||||
@app.post("/todos/", response_model=TodoResponse, status_code=201)
|
||||
def create_todo(item: TodoCreate, db: Session = Depends(get_db)):
|
||||
db_item = Todo(**item.model_dump())
|
||||
db.add(db_item)
|
||||
db.commit()
|
||||
db.refresh(db_item)
|
||||
return db_item
|
||||
|
||||
|
||||
@app.get("/todos/", response_model=list[TodoResponse])
|
||||
def list_todos(db: Session = Depends(get_db)):
|
||||
return db.query(Todo).all()
|
||||
|
||||
|
||||
@app.get("/todos/{item_id}", response_model=TodoResponse)
|
||||
def get_todo(item_id: int, db: Session = Depends(get_db)):
|
||||
item = db.query(Todo).filter(Todo.id == item_id).first()
|
||||
if not item:
|
||||
raise HTTPException(status_code=404, detail="Todo not found")
|
||||
return item
|
||||
|
||||
|
||||
@app.put("/todos/{item_id}", response_model=TodoResponse)
|
||||
def update_todo(item_id: int, item: TodoCreate, db: Session = Depends(get_db)):
|
||||
db_item = db.query(Todo).filter(Todo.id == item_id).first()
|
||||
if not db_item:
|
||||
raise HTTPException(status_code=404, detail="Todo not found")
|
||||
for key, value in item.model_dump().items():
|
||||
setattr(db_item, key, value)
|
||||
db.commit()
|
||||
db.refresh(db_item)
|
||||
return db_item
|
||||
|
||||
|
||||
@app.delete("/todos/{item_id}", status_code=204)
|
||||
def delete_todo(item_id: int, db: Session = Depends(get_db)):
|
||||
db_item = db.query(Todo).filter(Todo.id == item_id).first()
|
||||
if not db_item:
|
||||
raise HTTPException(status_code=404, detail="Todo not found")
|
||||
db.delete(db_item)
|
||||
db.commit()
|
||||
```
|
||||
|
||||
## test_main.py
|
||||
|
||||
Exactly 6 tests per entity. Database is shared — use `>= 1` not `== 1` in list tests.
|
||||
For child entities with foreign keys: create parent FIRST, then child with parent's id.
|
||||
|
||||
```python
|
||||
"""Pytest — TestClient, erillinen test.db, uniikki data per testi."""
|
||||
|
||||
from fastapi.testclient import TestClient
|
||||
from sqlalchemy import create_engine
|
||||
from sqlalchemy.orm import sessionmaker
|
||||
|
||||
from main import app, get_db
|
||||
from models import Base
|
||||
|
||||
test_engine = create_engine(
|
||||
"sqlite:///./test.db", connect_args={"check_same_thread": False}
|
||||
)
|
||||
TestSession = sessionmaker(autocommit=False, autoflush=False, bind=test_engine)
|
||||
Base.metadata.create_all(bind=test_engine)
|
||||
|
||||
|
||||
def override_get_db():
|
||||
db = TestSession()
|
||||
try:
|
||||
yield db
|
||||
finally:
|
||||
db.close()
|
||||
|
||||
|
||||
app.dependency_overrides[get_db] = override_get_db
|
||||
client = TestClient(app)
|
||||
|
||||
|
||||
def test_create_todo():
|
||||
response = client.post("/todos/", json={"title": "Osta maitoa", "priority": 2})
|
||||
assert response.status_code == 201
|
||||
assert response.json()["title"] == "Osta maitoa"
|
||||
assert "id" in response.json()
|
||||
|
||||
|
||||
def test_list_todos():
|
||||
client.post("/todos/", json={"title": "Listattava tehtävä"})
|
||||
response = client.get("/todos/")
|
||||
assert response.status_code == 200
|
||||
assert len(response.json()) >= 1
|
||||
|
||||
|
||||
def test_get_todo_by_id():
|
||||
created = client.post("/todos/", json={"title": "Haettava tehtävä"}).json()
|
||||
response = client.get(f"/todos/{created['id']}")
|
||||
assert response.status_code == 200
|
||||
assert response.json()["id"] == created["id"]
|
||||
|
||||
|
||||
def test_get_todo_not_found():
|
||||
response = client.get("/todos/99999")
|
||||
assert response.status_code == 404
|
||||
|
||||
|
||||
def test_update_todo():
|
||||
created = client.post("/todos/", json={"title": "Vanha otsikko"}).json()
|
||||
response = client.put(
|
||||
f"/todos/{created['id']}", json={"title": "Uusi otsikko"}
|
||||
)
|
||||
assert response.status_code == 200
|
||||
assert response.json()["title"] == "Uusi otsikko"
|
||||
|
||||
|
||||
def test_delete_todo():
|
||||
created = client.post("/todos/", json={"title": "Poistettava"}).json()
|
||||
response = client.delete(f"/todos/{created['id']}")
|
||||
assert response.status_code == 204
|
||||
response = client.get(f"/todos/{created['id']}")
|
||||
assert response.status_code == 404
|
||||
```
|
||||
331
kipina-codebench/golden-examples/todo-rs.md
Normal file
331
kipina-codebench/golden-examples/todo-rs.md
Normal file
@@ -0,0 +1,331 @@
|
||||
# Todo — referenssitoteutus (Axum 0.8 + SQLx + SQLite)
|
||||
|
||||
Tämä on täydellinen esimerkki. Generoi vastaava rakenne annetulle projektille.
|
||||
Käytä VAIN JSON-spekin kenttiä — älä lisää ylimääräisiä.
|
||||
|
||||
## Cargo.toml
|
||||
|
||||
Axum 0.8, SQLx SQLite-featurella, serde JSON-serialisointiin, tower-http CORS-tukeen.
|
||||
|
||||
```toml
|
||||
[package]
|
||||
name = "todo-rs"
|
||||
version = "0.1.0"
|
||||
edition = "2024"
|
||||
|
||||
[dependencies]
|
||||
axum = "0.8"
|
||||
tokio = { version = "1", features = ["full"] }
|
||||
serde = { version = "1", features = ["derive"] }
|
||||
serde_json = "1"
|
||||
sqlx = { version = "0.8", features = ["sqlite", "runtime-tokio"] }
|
||||
tower-http = { version = "0.6", features = ["cors"] }
|
||||
|
||||
[dev-dependencies]
|
||||
reqwest = { version = "0.13", default-features = false, features = ["json", "rustls"] }
|
||||
tokio = { version = "1", features = ["full", "test-util"] }
|
||||
```
|
||||
|
||||
## src/models.rs
|
||||
|
||||
Serde-rakenteet: `Todo` (FromRow), `CreateTodo` (POST), `UpdateTodo` (PUT, kaikki kentät valinnaisia).
|
||||
|
||||
```rust
|
||||
//! Tietomallit — Todo, CreateTodo, UpdateTodo serde-rakenteina.
|
||||
|
||||
use serde::{Deserialize, Serialize};
|
||||
|
||||
/// Tehtävä — otsikko, kuvaus, deadline, prioriteetti ja status.
|
||||
#[derive(Debug, Serialize, Deserialize, sqlx::FromRow)]
|
||||
pub struct Todo {
|
||||
pub id: i64,
|
||||
pub title: String,
|
||||
pub description: Option<String>,
|
||||
pub due_date: Option<String>,
|
||||
pub priority: i64,
|
||||
pub status: String,
|
||||
}
|
||||
|
||||
/// Uuden tehtävän luonti. Pakolliset: title.
|
||||
#[derive(Debug, Deserialize)]
|
||||
pub struct CreateTodo {
|
||||
pub title: String,
|
||||
pub description: Option<String>,
|
||||
pub due_date: Option<String>,
|
||||
pub priority: Option<i64>,
|
||||
pub status: Option<String>,
|
||||
}
|
||||
|
||||
/// Tehtävän päivitys — kaikki kentät valinnaisia.
|
||||
#[derive(Debug, Deserialize)]
|
||||
pub struct UpdateTodo {
|
||||
pub title: Option<String>,
|
||||
pub description: Option<String>,
|
||||
pub due_date: Option<String>,
|
||||
pub priority: Option<i64>,
|
||||
pub status: Option<String>,
|
||||
}
|
||||
```
|
||||
|
||||
## src/handlers.rs
|
||||
|
||||
CRUD-käsittelijät. Avainpatternit: INSERT RETURNING, fetch_optional+404, rows_affected+204.
|
||||
|
||||
```rust
|
||||
//! Käsittelijät — CRUD-operaatiot todo-entiteetille.
|
||||
|
||||
use axum::extract::{Path, State};
|
||||
use axum::http::StatusCode;
|
||||
use axum::Json;
|
||||
use sqlx::SqlitePool;
|
||||
|
||||
use crate::models::{CreateTodo, Todo, UpdateTodo};
|
||||
|
||||
/// POST — INSERT RETURNING, oletusarvot unwrap_or:lla.
|
||||
pub async fn create_todo(
|
||||
State(pool): State<SqlitePool>,
|
||||
Json(input): Json<CreateTodo>,
|
||||
) -> Result<(StatusCode, Json<Todo>), StatusCode> {
|
||||
let priority = input.priority.unwrap_or(1);
|
||||
let status = input.status.unwrap_or_else(|| "pending".to_string());
|
||||
|
||||
let result = sqlx::query_as::<_, Todo>(
|
||||
"INSERT INTO todos (title, description, due_date, priority, status)
|
||||
VALUES (?, ?, ?, ?, ?)
|
||||
RETURNING id, title, description, due_date, priority, status",
|
||||
)
|
||||
.bind(&input.title)
|
||||
.bind(&input.description)
|
||||
.bind(&input.due_date)
|
||||
.bind(priority)
|
||||
.bind(&status)
|
||||
.fetch_one(&pool)
|
||||
.await
|
||||
.map_err(|_| StatusCode::INTERNAL_SERVER_ERROR)?;
|
||||
|
||||
Ok((StatusCode::CREATED, Json(result)))
|
||||
}
|
||||
|
||||
/// GET list — fetch_all.
|
||||
pub async fn list_todos(
|
||||
State(pool): State<SqlitePool>,
|
||||
) -> Result<Json<Vec<Todo>>, StatusCode> {
|
||||
let todos = sqlx::query_as::<_, Todo>("SELECT id, title, description, due_date, priority, status FROM todos")
|
||||
.fetch_all(&pool)
|
||||
.await
|
||||
.map_err(|_| StatusCode::INTERNAL_SERVER_ERROR)?;
|
||||
Ok(Json(todos))
|
||||
}
|
||||
|
||||
/// GET by id — fetch_optional, None → 404.
|
||||
pub async fn get_todo(
|
||||
State(pool): State<SqlitePool>,
|
||||
Path(id): Path<i64>,
|
||||
) -> Result<Json<Todo>, StatusCode> {
|
||||
let todo = sqlx::query_as::<_, Todo>(
|
||||
"SELECT id, title, description, due_date, priority, status FROM todos WHERE id = ?",
|
||||
)
|
||||
.bind(id)
|
||||
.fetch_optional(&pool)
|
||||
.await
|
||||
.map_err(|_| StatusCode::INTERNAL_SERVER_ERROR)?;
|
||||
match todo {
|
||||
Some(t) => Ok(Json(t)),
|
||||
None => Err(StatusCode::NOT_FOUND),
|
||||
}
|
||||
}
|
||||
|
||||
/// PUT — hae olemassaoleva, merge kentät, UPDATE RETURNING.
|
||||
pub async fn update_todo(
|
||||
State(pool): State<SqlitePool>,
|
||||
Path(id): Path<i64>,
|
||||
Json(input): Json<UpdateTodo>,
|
||||
) -> Result<Json<Todo>, StatusCode> {
|
||||
let existing = sqlx::query_as::<_, Todo>(
|
||||
"SELECT id, title, description, due_date, priority, status FROM todos WHERE id = ?",
|
||||
)
|
||||
.bind(id)
|
||||
.fetch_optional(&pool)
|
||||
.await
|
||||
.map_err(|_| StatusCode::INTERNAL_SERVER_ERROR)?;
|
||||
let existing = existing.ok_or(StatusCode::NOT_FOUND)?;
|
||||
|
||||
let updated = sqlx::query_as::<_, Todo>(
|
||||
"UPDATE todos SET title = ?, description = ?, due_date = ?, priority = ?, status = ?
|
||||
WHERE id = ? RETURNING id, title, description, due_date, priority, status",
|
||||
)
|
||||
.bind(input.title.unwrap_or(existing.title))
|
||||
.bind(input.description.or(existing.description))
|
||||
.bind(input.due_date.or(existing.due_date))
|
||||
.bind(input.priority.unwrap_or(existing.priority))
|
||||
.bind(input.status.unwrap_or(existing.status))
|
||||
.bind(id)
|
||||
.fetch_one(&pool)
|
||||
.await
|
||||
.map_err(|_| StatusCode::INTERNAL_SERVER_ERROR)?;
|
||||
Ok(Json(updated))
|
||||
}
|
||||
|
||||
/// DELETE — rows_affected == 0 → 404, muuten 204.
|
||||
pub async fn delete_todo(
|
||||
State(pool): State<SqlitePool>,
|
||||
Path(id): Path<i64>,
|
||||
) -> Result<StatusCode, StatusCode> {
|
||||
let result = sqlx::query("DELETE FROM todos WHERE id = ?")
|
||||
.bind(id)
|
||||
.execute(&pool)
|
||||
.await
|
||||
.map_err(|_| StatusCode::INTERNAL_SERVER_ERROR)?;
|
||||
if result.rows_affected() == 0 { return Err(StatusCode::NOT_FOUND); }
|
||||
Ok(StatusCode::NO_CONTENT)
|
||||
}
|
||||
```
|
||||
|
||||
## src/lib.rs
|
||||
|
||||
Kirjastomoduuli: reititin `app()` ja taulun alustus `init_db()` — julkinen API integraatiotesteille.
|
||||
|
||||
```rust
|
||||
//! Kirjastomoduuli — julkinen API integraatiotesteille.
|
||||
|
||||
pub mod handlers;
|
||||
pub mod models;
|
||||
|
||||
use axum::routing::{delete, get, post, put};
|
||||
use axum::Router;
|
||||
use sqlx::SqlitePool;
|
||||
use tower_http::cors::CorsLayer;
|
||||
|
||||
/// Luo reititin annetulla tietokantapoolilla.
|
||||
pub fn app(pool: SqlitePool) -> Router {
|
||||
Router::new()
|
||||
.route("/todos", post(handlers::create_todo))
|
||||
.route("/todos", get(handlers::list_todos))
|
||||
.route("/todos/{id}", get(handlers::get_todo))
|
||||
.route("/todos/{id}", put(handlers::update_todo))
|
||||
.route("/todos/{id}", delete(handlers::delete_todo))
|
||||
.layer(CorsLayer::permissive())
|
||||
.with_state(pool)
|
||||
}
|
||||
|
||||
/// Alusta tietokantataulu.
|
||||
pub async fn init_db(pool: &SqlitePool) {
|
||||
sqlx::query(
|
||||
"CREATE TABLE IF NOT EXISTS todos (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
title TEXT NOT NULL,
|
||||
description TEXT,
|
||||
due_date TEXT,
|
||||
priority INTEGER NOT NULL DEFAULT 1,
|
||||
status TEXT NOT NULL DEFAULT 'pending'
|
||||
)",
|
||||
)
|
||||
.execute(pool)
|
||||
.await
|
||||
.expect("Taulun luonti epäonnistui");
|
||||
}
|
||||
```
|
||||
|
||||
## src/main.rs
|
||||
|
||||
Käynnistyspiste: SQLite-pooli, taulun alustus, Axum-palvelin portissa 3000.
|
||||
|
||||
```rust
|
||||
//! Axum CRUD — yksi endpoint-setti per entiteetti, SQLite-tietokanta.
|
||||
|
||||
use sqlx::sqlite::SqlitePoolOptions;
|
||||
use todo_rs::{app, init_db};
|
||||
|
||||
#[tokio::main]
|
||||
async fn main() {
|
||||
let pool = SqlitePoolOptions::new()
|
||||
.max_connections(5)
|
||||
.connect("sqlite:./app.db?mode=rwc")
|
||||
.await
|
||||
.expect("Tietokantayhteys epäonnistui");
|
||||
|
||||
init_db(&pool).await;
|
||||
|
||||
let listener = tokio::net::TcpListener::bind("127.0.0.1:3000")
|
||||
.await
|
||||
.expect("Portin kuuntelu epäonnistui");
|
||||
|
||||
println!("Palvelin käynnissä: http://127.0.0.1:3000");
|
||||
axum::serve(listener, app(pool)).await.unwrap();
|
||||
}
|
||||
```
|
||||
|
||||
## tests/api_test.rs
|
||||
|
||||
Integraatiotestit: spawn_server (muistinvarainen SQLite, satunnaisportti), CRUD-testit uniikilla datalla.
|
||||
|
||||
```rust
|
||||
//! Integraatiotestit — muistinvarainen SQLite, uniikki data per testi.
|
||||
|
||||
use axum::http::StatusCode;
|
||||
use reqwest::Client;
|
||||
use sqlx::sqlite::SqlitePoolOptions;
|
||||
use todo_rs::{app, init_db};
|
||||
|
||||
/// Käynnistä testipalvelin satunnaisessa portissa.
|
||||
async fn spawn_server() -> (Client, String) {
|
||||
let pool = SqlitePoolOptions::new()
|
||||
.max_connections(1)
|
||||
.connect("sqlite::memory:")
|
||||
.await
|
||||
.expect("Testitietokanta epäonnistui");
|
||||
init_db(&pool).await;
|
||||
let listener = tokio::net::TcpListener::bind("127.0.0.1:0")
|
||||
.await
|
||||
.expect("Testiportin kuuntelu epäonnistui");
|
||||
let base_url = format!("http://{}", listener.local_addr().unwrap());
|
||||
let router = app(pool);
|
||||
tokio::spawn(async move { axum::serve(listener, router).await.unwrap() });
|
||||
(Client::new(), base_url)
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn test_create_todo() {
|
||||
let (client, url) = spawn_server().await;
|
||||
let res = client.post(format!("{url}/todos"))
|
||||
.json(&serde_json::json!({"title": "Osta maitoa", "priority": 2}))
|
||||
.send().await.unwrap();
|
||||
assert_eq!(res.status(), StatusCode::CREATED);
|
||||
let body: serde_json::Value = res.json().await.unwrap();
|
||||
assert_eq!(body["title"], "Osta maitoa");
|
||||
assert!(body["id"].is_number());
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn test_get_todo_by_id() {
|
||||
let (client, url) = spawn_server().await;
|
||||
let created: serde_json::Value = client.post(format!("{url}/todos"))
|
||||
.json(&serde_json::json!({"title": "Haettava tehtävä"}))
|
||||
.send().await.unwrap().json().await.unwrap();
|
||||
let id = created["id"].as_i64().unwrap();
|
||||
let res = client.get(format!("{url}/todos/{id}")).send().await.unwrap();
|
||||
assert_eq!(res.status(), StatusCode::OK);
|
||||
let body: serde_json::Value = res.json().await.unwrap();
|
||||
assert_eq!(body["id"], id);
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn test_get_todo_not_found() {
|
||||
let (client, url) = spawn_server().await;
|
||||
let res = client.get(format!("{url}/todos/99999")).send().await.unwrap();
|
||||
assert_eq!(res.status(), StatusCode::NOT_FOUND);
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn test_delete_todo() {
|
||||
let (client, url) = spawn_server().await;
|
||||
let created: serde_json::Value = client.post(format!("{url}/todos"))
|
||||
.json(&serde_json::json!({"title": "Poistettava"}))
|
||||
.send().await.unwrap().json().await.unwrap();
|
||||
let id = created["id"].as_i64().unwrap();
|
||||
let res = client.delete(format!("{url}/todos/{id}")).send().await.unwrap();
|
||||
assert_eq!(res.status(), StatusCode::NO_CONTENT);
|
||||
let res = client.get(format!("{url}/todos/{id}")).send().await.unwrap();
|
||||
assert_eq!(res.status(), StatusCode::NOT_FOUND);
|
||||
}
|
||||
```
|
||||
1
kipina-codebench/golden-examples/todo-rs/.gitignore
vendored
Normal file
1
kipina-codebench/golden-examples/todo-rs/.gitignore
vendored
Normal file
@@ -0,0 +1 @@
|
||||
target/
|
||||
16
kipina-codebench/golden-examples/todo-rs/Cargo.toml
Normal file
16
kipina-codebench/golden-examples/todo-rs/Cargo.toml
Normal file
@@ -0,0 +1,16 @@
|
||||
[package]
|
||||
name = "todo-rs"
|
||||
version = "0.1.0"
|
||||
edition = "2024"
|
||||
|
||||
[dependencies]
|
||||
axum = "0.8"
|
||||
tokio = { version = "1", features = ["full"] }
|
||||
serde = { version = "1", features = ["derive"] }
|
||||
serde_json = "1"
|
||||
sqlx = { version = "0.8", features = ["sqlite", "runtime-tokio"] }
|
||||
tower-http = { version = "0.6", features = ["cors"] }
|
||||
|
||||
[dev-dependencies]
|
||||
reqwest = { version = "0.13", default-features = false, features = ["json", "rustls"] }
|
||||
tokio = { version = "1", features = ["full", "test-util"] }
|
||||
122
kipina-codebench/golden-examples/todo-rs/src/handlers.rs
Normal file
122
kipina-codebench/golden-examples/todo-rs/src/handlers.rs
Normal file
@@ -0,0 +1,122 @@
|
||||
//! Käsittelijät — CRUD-operaatiot todo-entiteetille.
|
||||
|
||||
use axum::extract::{Path, State};
|
||||
use axum::http::StatusCode;
|
||||
use axum::Json;
|
||||
use sqlx::SqlitePool;
|
||||
|
||||
use crate::models::{CreateTodo, Todo, UpdateTodo};
|
||||
|
||||
/// Luo uusi tehtävä.
|
||||
pub async fn create_todo(
|
||||
State(pool): State<SqlitePool>,
|
||||
Json(input): Json<CreateTodo>,
|
||||
) -> Result<(StatusCode, Json<Todo>), StatusCode> {
|
||||
let priority = input.priority.unwrap_or(1);
|
||||
let status = input.status.unwrap_or_else(|| "pending".to_string());
|
||||
|
||||
let result = sqlx::query_as::<_, Todo>(
|
||||
"INSERT INTO todos (title, description, due_date, priority, status)
|
||||
VALUES (?, ?, ?, ?, ?)
|
||||
RETURNING id, title, description, due_date, priority, status",
|
||||
)
|
||||
.bind(&input.title)
|
||||
.bind(&input.description)
|
||||
.bind(&input.due_date)
|
||||
.bind(priority)
|
||||
.bind(&status)
|
||||
.fetch_one(&pool)
|
||||
.await
|
||||
.map_err(|_| StatusCode::INTERNAL_SERVER_ERROR)?;
|
||||
|
||||
Ok((StatusCode::CREATED, Json(result)))
|
||||
}
|
||||
|
||||
/// Listaa kaikki tehtävät.
|
||||
pub async fn list_todos(
|
||||
State(pool): State<SqlitePool>,
|
||||
) -> Result<Json<Vec<Todo>>, StatusCode> {
|
||||
let todos = sqlx::query_as::<_, Todo>("SELECT id, title, description, due_date, priority, status FROM todos")
|
||||
.fetch_all(&pool)
|
||||
.await
|
||||
.map_err(|_| StatusCode::INTERNAL_SERVER_ERROR)?;
|
||||
|
||||
Ok(Json(todos))
|
||||
}
|
||||
|
||||
/// Hae tehtävä id:llä.
|
||||
pub async fn get_todo(
|
||||
State(pool): State<SqlitePool>,
|
||||
Path(id): Path<i64>,
|
||||
) -> Result<Json<Todo>, StatusCode> {
|
||||
let todo = sqlx::query_as::<_, Todo>(
|
||||
"SELECT id, title, description, due_date, priority, status FROM todos WHERE id = ?",
|
||||
)
|
||||
.bind(id)
|
||||
.fetch_optional(&pool)
|
||||
.await
|
||||
.map_err(|_| StatusCode::INTERNAL_SERVER_ERROR)?;
|
||||
|
||||
match todo {
|
||||
Some(t) => Ok(Json(t)),
|
||||
None => Err(StatusCode::NOT_FOUND),
|
||||
}
|
||||
}
|
||||
|
||||
/// Päivitä tehtävä id:llä.
|
||||
pub async fn update_todo(
|
||||
State(pool): State<SqlitePool>,
|
||||
Path(id): Path<i64>,
|
||||
Json(input): Json<UpdateTodo>,
|
||||
) -> Result<Json<Todo>, StatusCode> {
|
||||
let existing = sqlx::query_as::<_, Todo>(
|
||||
"SELECT id, title, description, due_date, priority, status FROM todos WHERE id = ?",
|
||||
)
|
||||
.bind(id)
|
||||
.fetch_optional(&pool)
|
||||
.await
|
||||
.map_err(|_| StatusCode::INTERNAL_SERVER_ERROR)?;
|
||||
|
||||
let existing = existing.ok_or(StatusCode::NOT_FOUND)?;
|
||||
|
||||
let title = input.title.unwrap_or(existing.title);
|
||||
let description = input.description.or(existing.description);
|
||||
let due_date = input.due_date.or(existing.due_date);
|
||||
let priority = input.priority.unwrap_or(existing.priority);
|
||||
let status = input.status.unwrap_or(existing.status);
|
||||
|
||||
let updated = sqlx::query_as::<_, Todo>(
|
||||
"UPDATE todos SET title = ?, description = ?, due_date = ?, priority = ?, status = ?
|
||||
WHERE id = ?
|
||||
RETURNING id, title, description, due_date, priority, status",
|
||||
)
|
||||
.bind(&title)
|
||||
.bind(&description)
|
||||
.bind(&due_date)
|
||||
.bind(priority)
|
||||
.bind(&status)
|
||||
.bind(id)
|
||||
.fetch_one(&pool)
|
||||
.await
|
||||
.map_err(|_| StatusCode::INTERNAL_SERVER_ERROR)?;
|
||||
|
||||
Ok(Json(updated))
|
||||
}
|
||||
|
||||
/// Poista tehtävä id:llä.
|
||||
pub async fn delete_todo(
|
||||
State(pool): State<SqlitePool>,
|
||||
Path(id): Path<i64>,
|
||||
) -> Result<StatusCode, StatusCode> {
|
||||
let result = sqlx::query("DELETE FROM todos WHERE id = ?")
|
||||
.bind(id)
|
||||
.execute(&pool)
|
||||
.await
|
||||
.map_err(|_| StatusCode::INTERNAL_SERVER_ERROR)?;
|
||||
|
||||
if result.rows_affected() == 0 {
|
||||
return Err(StatusCode::NOT_FOUND);
|
||||
}
|
||||
|
||||
Ok(StatusCode::NO_CONTENT)
|
||||
}
|
||||
38
kipina-codebench/golden-examples/todo-rs/src/lib.rs
Normal file
38
kipina-codebench/golden-examples/todo-rs/src/lib.rs
Normal file
@@ -0,0 +1,38 @@
|
||||
//! Kirjastomoduuli — julkinen API integraatiotesteille.
|
||||
|
||||
pub mod handlers;
|
||||
pub mod models;
|
||||
|
||||
use axum::routing::{delete, get, post, put};
|
||||
use axum::Router;
|
||||
use sqlx::SqlitePool;
|
||||
use tower_http::cors::CorsLayer;
|
||||
|
||||
/// Luo reititin annetulla tietokantapoolilla.
|
||||
pub fn app(pool: SqlitePool) -> Router {
|
||||
Router::new()
|
||||
.route("/todos", post(handlers::create_todo))
|
||||
.route("/todos", get(handlers::list_todos))
|
||||
.route("/todos/{id}", get(handlers::get_todo))
|
||||
.route("/todos/{id}", put(handlers::update_todo))
|
||||
.route("/todos/{id}", delete(handlers::delete_todo))
|
||||
.layer(CorsLayer::permissive())
|
||||
.with_state(pool)
|
||||
}
|
||||
|
||||
/// Alusta tietokantataulu.
|
||||
pub async fn init_db(pool: &SqlitePool) {
|
||||
sqlx::query(
|
||||
"CREATE TABLE IF NOT EXISTS todos (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
title TEXT NOT NULL,
|
||||
description TEXT,
|
||||
due_date TEXT,
|
||||
priority INTEGER NOT NULL DEFAULT 1,
|
||||
status TEXT NOT NULL DEFAULT 'pending'
|
||||
)",
|
||||
)
|
||||
.execute(pool)
|
||||
.await
|
||||
.expect("Taulun luonti epäonnistui");
|
||||
}
|
||||
22
kipina-codebench/golden-examples/todo-rs/src/main.rs
Normal file
22
kipina-codebench/golden-examples/todo-rs/src/main.rs
Normal file
@@ -0,0 +1,22 @@
|
||||
//! Axum CRUD — yksi endpoint-setti per entiteetti, SQLite-tietokanta.
|
||||
|
||||
use sqlx::sqlite::SqlitePoolOptions;
|
||||
use todo_rs::{app, init_db};
|
||||
|
||||
#[tokio::main]
|
||||
async fn main() {
|
||||
let pool = SqlitePoolOptions::new()
|
||||
.max_connections(5)
|
||||
.connect("sqlite:./app.db?mode=rwc")
|
||||
.await
|
||||
.expect("Tietokantayhteys epäonnistui");
|
||||
|
||||
init_db(&pool).await;
|
||||
|
||||
let listener = tokio::net::TcpListener::bind("127.0.0.1:3000")
|
||||
.await
|
||||
.expect("Portin kuuntelu epäonnistui");
|
||||
|
||||
println!("Palvelin käynnissä: http://127.0.0.1:3000");
|
||||
axum::serve(listener, app(pool)).await.unwrap();
|
||||
}
|
||||
34
kipina-codebench/golden-examples/todo-rs/src/models.rs
Normal file
34
kipina-codebench/golden-examples/todo-rs/src/models.rs
Normal file
@@ -0,0 +1,34 @@
|
||||
//! Tietomallit — Todo, CreateTodo, UpdateTodo serde-rakenteina.
|
||||
|
||||
use serde::{Deserialize, Serialize};
|
||||
|
||||
/// Tehtävä — otsikko, kuvaus, deadline, prioriteetti ja status.
|
||||
#[derive(Debug, Serialize, Deserialize, sqlx::FromRow)]
|
||||
pub struct Todo {
|
||||
pub id: i64,
|
||||
pub title: String,
|
||||
pub description: Option<String>,
|
||||
pub due_date: Option<String>,
|
||||
pub priority: i64,
|
||||
pub status: String,
|
||||
}
|
||||
|
||||
/// Uuden tehtävän luonti. Pakolliset: title.
|
||||
#[derive(Debug, Deserialize)]
|
||||
pub struct CreateTodo {
|
||||
pub title: String,
|
||||
pub description: Option<String>,
|
||||
pub due_date: Option<String>,
|
||||
pub priority: Option<i64>,
|
||||
pub status: Option<String>,
|
||||
}
|
||||
|
||||
/// Tehtävän päivitys — kaikki kentät valinnaisia.
|
||||
#[derive(Debug, Deserialize)]
|
||||
pub struct UpdateTodo {
|
||||
pub title: Option<String>,
|
||||
pub description: Option<String>,
|
||||
pub due_date: Option<String>,
|
||||
pub priority: Option<i64>,
|
||||
pub status: Option<String>,
|
||||
}
|
||||
262
kipina-codebench/golden-examples/todo-rs/tests/api_test.rs
Normal file
262
kipina-codebench/golden-examples/todo-rs/tests/api_test.rs
Normal file
@@ -0,0 +1,262 @@
|
||||
//! Integraatiotestit — muistinvarainen SQLite, uniikki data per testi.
|
||||
|
||||
use axum::http::StatusCode;
|
||||
use reqwest::Client;
|
||||
use sqlx::sqlite::SqlitePoolOptions;
|
||||
use todo_rs::{app, init_db};
|
||||
|
||||
/// Käynnistä testipalvelin satunnaisessa portissa.
|
||||
async fn spawn_server() -> (Client, String) {
|
||||
let pool = SqlitePoolOptions::new()
|
||||
.max_connections(1)
|
||||
.connect("sqlite::memory:")
|
||||
.await
|
||||
.expect("Testitietokanta epäonnistui");
|
||||
|
||||
init_db(&pool).await;
|
||||
|
||||
let listener = tokio::net::TcpListener::bind("127.0.0.1:0")
|
||||
.await
|
||||
.expect("Testiportin kuuntelu epäonnistui");
|
||||
let addr = listener.local_addr().unwrap();
|
||||
let base_url = format!("http://{addr}");
|
||||
|
||||
let router = app(pool);
|
||||
tokio::spawn(async move {
|
||||
axum::serve(listener, router).await.unwrap();
|
||||
});
|
||||
|
||||
(Client::new(), base_url)
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn test_create_todo() {
|
||||
let (client, url) = spawn_server().await;
|
||||
|
||||
let res = client
|
||||
.post(format!("{url}/todos"))
|
||||
.json(&serde_json::json!({"title": "Osta maitoa", "priority": 2}))
|
||||
.send()
|
||||
.await
|
||||
.unwrap();
|
||||
|
||||
assert_eq!(res.status(), StatusCode::CREATED);
|
||||
let body: serde_json::Value = res.json().await.unwrap();
|
||||
assert_eq!(body["title"], "Osta maitoa");
|
||||
assert_eq!(body["priority"], 2);
|
||||
assert!(body["id"].is_number());
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn test_create_todo_defaults() {
|
||||
let (client, url) = spawn_server().await;
|
||||
|
||||
let res = client
|
||||
.post(format!("{url}/todos"))
|
||||
.json(&serde_json::json!({"title": "Oletusarvotesti"}))
|
||||
.send()
|
||||
.await
|
||||
.unwrap();
|
||||
|
||||
assert_eq!(res.status(), StatusCode::CREATED);
|
||||
let body: serde_json::Value = res.json().await.unwrap();
|
||||
assert_eq!(body["priority"], 1);
|
||||
assert_eq!(body["status"], "pending");
|
||||
assert!(body["description"].is_null());
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn test_list_todos() {
|
||||
let (client, url) = spawn_server().await;
|
||||
|
||||
client
|
||||
.post(format!("{url}/todos"))
|
||||
.json(&serde_json::json!({"title": "Listattava tehtävä"}))
|
||||
.send()
|
||||
.await
|
||||
.unwrap();
|
||||
|
||||
let res = client.get(format!("{url}/todos")).send().await.unwrap();
|
||||
assert_eq!(res.status(), StatusCode::OK);
|
||||
|
||||
let body: Vec<serde_json::Value> = res.json().await.unwrap();
|
||||
assert!(body.len() >= 1);
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn test_get_todo_by_id() {
|
||||
let (client, url) = spawn_server().await;
|
||||
|
||||
let created: serde_json::Value = client
|
||||
.post(format!("{url}/todos"))
|
||||
.json(&serde_json::json!({"title": "Haettava tehtävä"}))
|
||||
.send()
|
||||
.await
|
||||
.unwrap()
|
||||
.json()
|
||||
.await
|
||||
.unwrap();
|
||||
|
||||
let id = created["id"].as_i64().unwrap();
|
||||
let res = client
|
||||
.get(format!("{url}/todos/{id}"))
|
||||
.send()
|
||||
.await
|
||||
.unwrap();
|
||||
|
||||
assert_eq!(res.status(), StatusCode::OK);
|
||||
let body: serde_json::Value = res.json().await.unwrap();
|
||||
assert_eq!(body["id"], id);
|
||||
assert_eq!(body["title"], "Haettava tehtävä");
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn test_get_todo_not_found() {
|
||||
let (client, url) = spawn_server().await;
|
||||
|
||||
let res = client
|
||||
.get(format!("{url}/todos/99999"))
|
||||
.send()
|
||||
.await
|
||||
.unwrap();
|
||||
|
||||
assert_eq!(res.status(), StatusCode::NOT_FOUND);
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn test_update_todo() {
|
||||
let (client, url) = spawn_server().await;
|
||||
|
||||
let created: serde_json::Value = client
|
||||
.post(format!("{url}/todos"))
|
||||
.json(&serde_json::json!({"title": "Vanha otsikko"}))
|
||||
.send()
|
||||
.await
|
||||
.unwrap()
|
||||
.json()
|
||||
.await
|
||||
.unwrap();
|
||||
|
||||
let id = created["id"].as_i64().unwrap();
|
||||
let res = client
|
||||
.put(format!("{url}/todos/{id}"))
|
||||
.json(&serde_json::json!({"title": "Uusi otsikko"}))
|
||||
.send()
|
||||
.await
|
||||
.unwrap();
|
||||
|
||||
assert_eq!(res.status(), StatusCode::OK);
|
||||
let body: serde_json::Value = res.json().await.unwrap();
|
||||
assert_eq!(body["title"], "Uusi otsikko");
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn test_update_todo_not_found() {
|
||||
let (client, url) = spawn_server().await;
|
||||
|
||||
let res = client
|
||||
.put(format!("{url}/todos/99999"))
|
||||
.json(&serde_json::json!({"title": "Ei löydy"}))
|
||||
.send()
|
||||
.await
|
||||
.unwrap();
|
||||
|
||||
assert_eq!(res.status(), StatusCode::NOT_FOUND);
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn test_delete_todo() {
|
||||
let (client, url) = spawn_server().await;
|
||||
|
||||
let created: serde_json::Value = client
|
||||
.post(format!("{url}/todos"))
|
||||
.json(&serde_json::json!({"title": "Poistettava"}))
|
||||
.send()
|
||||
.await
|
||||
.unwrap()
|
||||
.json()
|
||||
.await
|
||||
.unwrap();
|
||||
|
||||
let id = created["id"].as_i64().unwrap();
|
||||
let res = client
|
||||
.delete(format!("{url}/todos/{id}"))
|
||||
.send()
|
||||
.await
|
||||
.unwrap();
|
||||
|
||||
assert_eq!(res.status(), StatusCode::NO_CONTENT);
|
||||
|
||||
let res = client
|
||||
.get(format!("{url}/todos/{id}"))
|
||||
.send()
|
||||
.await
|
||||
.unwrap();
|
||||
|
||||
assert_eq!(res.status(), StatusCode::NOT_FOUND);
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn test_delete_todo_not_found() {
|
||||
let (client, url) = spawn_server().await;
|
||||
|
||||
let res = client
|
||||
.delete(format!("{url}/todos/99999"))
|
||||
.send()
|
||||
.await
|
||||
.unwrap();
|
||||
|
||||
assert_eq!(res.status(), StatusCode::NOT_FOUND);
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn test_full_lifecycle() {
|
||||
let (client, url) = spawn_server().await;
|
||||
|
||||
// Luo
|
||||
let created: serde_json::Value = client
|
||||
.post(format!("{url}/todos"))
|
||||
.json(&serde_json::json!({
|
||||
"title": "Elinkaaritesti",
|
||||
"description": "Testataan koko CRUD-kierto",
|
||||
"due_date": "2026-12-31",
|
||||
"priority": 3,
|
||||
"status": "in_progress"
|
||||
}))
|
||||
.send()
|
||||
.await
|
||||
.unwrap()
|
||||
.json()
|
||||
.await
|
||||
.unwrap();
|
||||
|
||||
let id = created["id"].as_i64().unwrap();
|
||||
assert_eq!(created["title"], "Elinkaaritesti");
|
||||
assert_eq!(created["description"], "Testataan koko CRUD-kierto");
|
||||
assert_eq!(created["due_date"], "2026-12-31");
|
||||
assert_eq!(created["priority"], 3);
|
||||
assert_eq!(created["status"], "in_progress");
|
||||
|
||||
// Päivitä
|
||||
let updated: serde_json::Value = client
|
||||
.put(format!("{url}/todos/{id}"))
|
||||
.json(&serde_json::json!({"status": "done"}))
|
||||
.send()
|
||||
.await
|
||||
.unwrap()
|
||||
.json()
|
||||
.await
|
||||
.unwrap();
|
||||
|
||||
assert_eq!(updated["status"], "done");
|
||||
assert_eq!(updated["title"], "Elinkaaritesti");
|
||||
|
||||
// Poista
|
||||
let res = client
|
||||
.delete(format!("{url}/todos/{id}"))
|
||||
.send()
|
||||
.await
|
||||
.unwrap();
|
||||
|
||||
assert_eq!(res.status(), StatusCode::NO_CONTENT);
|
||||
}
|
||||
230
kipina-codebench/golden-examples/todo.md
Normal file
230
kipina-codebench/golden-examples/todo.md
Normal file
@@ -0,0 +1,230 @@
|
||||
# Todo — referenssitoteutus (FastAPI + SQLAlchemy 2.0 + SQLite)
|
||||
|
||||
Tämä on täydellinen esimerkki. Generoi vastaava rakenne annetulle projektille.
|
||||
Käytä VAIN JSON-spekin kenttiä — älä lisää ylimääräisiä.
|
||||
|
||||
## models.py
|
||||
|
||||
SQLAlchemy 2.0: `DeclarativeBase` + `Mapped` + `mapped_column`. EI `Column()`, EI `declarative_base()`.
|
||||
|
||||
```python
|
||||
"""Tietokantamallit — SQLAlchemy 2.0, Mapped-tyypitys, SQLite."""
|
||||
|
||||
from datetime import date
|
||||
|
||||
from sqlalchemy import String, Text, Date, create_engine
|
||||
from sqlalchemy.orm import DeclarativeBase, Mapped, mapped_column, sessionmaker
|
||||
|
||||
DATABASE_URL = "sqlite:///./app.db"
|
||||
engine = create_engine(DATABASE_URL, connect_args={"check_same_thread": False})
|
||||
SessionLocal = sessionmaker(autocommit=False, autoflush=False, bind=engine)
|
||||
|
||||
|
||||
class Base(DeclarativeBase):
|
||||
pass
|
||||
|
||||
|
||||
class Todo(Base):
|
||||
"""Tehtävä — otsikko, kuvaus, deadline, prioriteetti ja status."""
|
||||
|
||||
__tablename__ = "todos"
|
||||
|
||||
id: Mapped[int] = mapped_column(primary_key=True, index=True)
|
||||
title: Mapped[str] = mapped_column(String(255))
|
||||
description: Mapped[str | None] = mapped_column(Text, default=None)
|
||||
due_date: Mapped[date | None] = mapped_column(Date, default=None)
|
||||
priority: Mapped[int] = mapped_column(default=1)
|
||||
status: Mapped[str] = mapped_column(String(20), default="pending")
|
||||
|
||||
|
||||
Base.metadata.create_all(bind=engine)
|
||||
```
|
||||
|
||||
Huomaa:
|
||||
- `str | None` (ei `Optional[str]`)
|
||||
- `String(20)` status-kentälle (ei Enum)
|
||||
- Vain spekin kentät — ei `created_at` tai muita ylimääräisiä
|
||||
|
||||
## schemas.py
|
||||
|
||||
Pydantic v2: `ConfigDict(from_attributes=True)`. EI `class Config: orm_mode = True`.
|
||||
|
||||
```python
|
||||
"""Pydantic v2 -skeemat — Create sisääntulolle, Response vastaukselle."""
|
||||
|
||||
from datetime import date
|
||||
|
||||
from pydantic import BaseModel, ConfigDict
|
||||
|
||||
|
||||
class TodoCreate(BaseModel):
|
||||
"""Uuden tehtävän luonti. Pakolliset: title."""
|
||||
|
||||
title: str
|
||||
description: str | None = None
|
||||
due_date: date | None = None
|
||||
priority: int = 1
|
||||
status: str = "pending"
|
||||
|
||||
|
||||
class TodoResponse(TodoCreate):
|
||||
"""Palautettava tehtävä — sisältää id:n."""
|
||||
|
||||
id: int
|
||||
model_config = ConfigDict(from_attributes=True)
|
||||
```
|
||||
|
||||
## main.py
|
||||
|
||||
FastAPI CRUD: POST 201, GET list, GET by id 404, PUT, DELETE 204. Käytä `model_dump()` (ei `.dict()`).
|
||||
|
||||
```python
|
||||
"""FastAPI CRUD — yksi endpoint-setti per entiteetti."""
|
||||
|
||||
from fastapi import FastAPI, Depends, HTTPException
|
||||
from sqlalchemy.orm import Session
|
||||
|
||||
from models import SessionLocal, Todo
|
||||
from schemas import TodoCreate, TodoResponse
|
||||
|
||||
app = FastAPI()
|
||||
|
||||
|
||||
def get_db():
|
||||
"""Tietokantasessio per pyyntö."""
|
||||
db = SessionLocal()
|
||||
try:
|
||||
yield db
|
||||
finally:
|
||||
db.close()
|
||||
|
||||
|
||||
@app.post("/todos/", response_model=TodoResponse, status_code=201)
|
||||
def create_todo(item: TodoCreate, db: Session = Depends(get_db)):
|
||||
db_item = Todo(**item.model_dump())
|
||||
db.add(db_item)
|
||||
db.commit()
|
||||
db.refresh(db_item)
|
||||
return db_item
|
||||
|
||||
|
||||
@app.get("/todos/", response_model=list[TodoResponse])
|
||||
def list_todos(db: Session = Depends(get_db)):
|
||||
return db.query(Todo).all()
|
||||
|
||||
|
||||
@app.get("/todos/{item_id}", response_model=TodoResponse)
|
||||
def get_todo(item_id: int, db: Session = Depends(get_db)):
|
||||
item = db.query(Todo).filter(Todo.id == item_id).first()
|
||||
if not item:
|
||||
raise HTTPException(status_code=404, detail="Todo not found")
|
||||
return item
|
||||
|
||||
|
||||
@app.put("/todos/{item_id}", response_model=TodoResponse)
|
||||
def update_todo(item_id: int, item: TodoCreate, db: Session = Depends(get_db)):
|
||||
db_item = db.query(Todo).filter(Todo.id == item_id).first()
|
||||
if not db_item:
|
||||
raise HTTPException(status_code=404, detail="Todo not found")
|
||||
for key, value in item.model_dump().items():
|
||||
setattr(db_item, key, value)
|
||||
db.commit()
|
||||
db.refresh(db_item)
|
||||
return db_item
|
||||
|
||||
|
||||
@app.delete("/todos/{item_id}", status_code=204)
|
||||
def delete_todo(item_id: int, db: Session = Depends(get_db)):
|
||||
db_item = db.query(Todo).filter(Todo.id == item_id).first()
|
||||
if not db_item:
|
||||
raise HTTPException(status_code=404, detail="Todo not found")
|
||||
db.delete(db_item)
|
||||
db.commit()
|
||||
```
|
||||
|
||||
## test_main.py
|
||||
|
||||
Testit: erillinen test.db, `override_get_db`, `TestClient`. Uniikki suomenkielinen data per testi.
|
||||
PUT-testi lähettää KAIKKI pakolliset kentät.
|
||||
|
||||
Generoi TARKALLEEN nämä 6 testiä per entiteetti — ei enempää, ei vähempää:
|
||||
1. `test_create_{entity}` — POST, assert 201 + id
|
||||
2. `test_list_{entities}` — POST ensin, GET lista, assert len >= 1
|
||||
3. `test_get_{entity}_by_id` — POST, GET by id, assert id täsmää
|
||||
4. `test_get_{entity}_not_found` — GET /99999, assert 404
|
||||
5. `test_update_{entity}` — POST, PUT kaikilla pakollisilla kentillä, assert 200
|
||||
6. `test_delete_{entity}` — POST, DELETE assert 204, GET uudestaan assert 404
|
||||
|
||||
Ei search-, filter- tai muita ylimääräisiä testejä.
|
||||
|
||||
```python
|
||||
"""Pytest — TestClient, erillinen test.db, uniikki data per testi."""
|
||||
|
||||
from fastapi.testclient import TestClient
|
||||
from sqlalchemy import create_engine
|
||||
from sqlalchemy.orm import sessionmaker
|
||||
|
||||
from main import app, get_db
|
||||
from models import Base
|
||||
|
||||
test_engine = create_engine(
|
||||
"sqlite:///./test.db", connect_args={"check_same_thread": False}
|
||||
)
|
||||
TestSession = sessionmaker(autocommit=False, autoflush=False, bind=test_engine)
|
||||
Base.metadata.create_all(bind=test_engine)
|
||||
|
||||
|
||||
def override_get_db():
|
||||
db = TestSession()
|
||||
try:
|
||||
yield db
|
||||
finally:
|
||||
db.close()
|
||||
|
||||
|
||||
app.dependency_overrides[get_db] = override_get_db
|
||||
client = TestClient(app)
|
||||
|
||||
|
||||
def test_create_todo():
|
||||
response = client.post("/todos/", json={"title": "Osta maitoa", "priority": 2})
|
||||
assert response.status_code == 201
|
||||
assert response.json()["title"] == "Osta maitoa"
|
||||
assert "id" in response.json()
|
||||
|
||||
|
||||
def test_list_todos():
|
||||
client.post("/todos/", json={"title": "Listattava tehtävä"})
|
||||
response = client.get("/todos/")
|
||||
assert response.status_code == 200
|
||||
assert len(response.json()) >= 1
|
||||
|
||||
|
||||
def test_get_todo_by_id():
|
||||
created = client.post("/todos/", json={"title": "Haettava tehtävä"}).json()
|
||||
response = client.get(f"/todos/{created['id']}")
|
||||
assert response.status_code == 200
|
||||
assert response.json()["id"] == created["id"]
|
||||
|
||||
|
||||
def test_get_todo_not_found():
|
||||
response = client.get("/todos/99999")
|
||||
assert response.status_code == 404
|
||||
|
||||
|
||||
def test_update_todo():
|
||||
created = client.post("/todos/", json={"title": "Vanha otsikko"}).json()
|
||||
response = client.put(
|
||||
f"/todos/{created['id']}", json={"title": "Uusi otsikko"}
|
||||
)
|
||||
assert response.status_code == 200
|
||||
assert response.json()["title"] == "Uusi otsikko"
|
||||
|
||||
|
||||
def test_delete_todo():
|
||||
created = client.post("/todos/", json={"title": "Poistettava"}).json()
|
||||
response = client.delete(f"/todos/{created['id']}")
|
||||
assert response.status_code == 204
|
||||
response = client.get(f"/todos/{created['id']}")
|
||||
assert response.status_code == 404
|
||||
```
|
||||
61
kipina-codebench/golden-examples/todo/main.py
Normal file
61
kipina-codebench/golden-examples/todo/main.py
Normal file
@@ -0,0 +1,61 @@
|
||||
"""FastAPI CRUD — yksi endpoint-setti per entiteetti."""
|
||||
|
||||
from fastapi import FastAPI, Depends, HTTPException
|
||||
from sqlalchemy.orm import Session
|
||||
|
||||
from models import SessionLocal, Todo
|
||||
from schemas import TodoCreate, TodoResponse
|
||||
|
||||
app = FastAPI()
|
||||
|
||||
|
||||
def get_db():
|
||||
"""Tietokantasessio per pyyntö."""
|
||||
db = SessionLocal()
|
||||
try:
|
||||
yield db
|
||||
finally:
|
||||
db.close()
|
||||
|
||||
|
||||
@app.post("/todos/", response_model=TodoResponse, status_code=201)
|
||||
def create_todo(item: TodoCreate, db: Session = Depends(get_db)):
|
||||
db_item = Todo(**item.model_dump())
|
||||
db.add(db_item)
|
||||
db.commit()
|
||||
db.refresh(db_item)
|
||||
return db_item
|
||||
|
||||
|
||||
@app.get("/todos/", response_model=list[TodoResponse])
|
||||
def list_todos(db: Session = Depends(get_db)):
|
||||
return db.query(Todo).all()
|
||||
|
||||
|
||||
@app.get("/todos/{item_id}", response_model=TodoResponse)
|
||||
def get_todo(item_id: int, db: Session = Depends(get_db)):
|
||||
item = db.query(Todo).filter(Todo.id == item_id).first()
|
||||
if not item:
|
||||
raise HTTPException(status_code=404, detail="Todo not found")
|
||||
return item
|
||||
|
||||
|
||||
@app.put("/todos/{item_id}", response_model=TodoResponse)
|
||||
def update_todo(item_id: int, item: TodoCreate, db: Session = Depends(get_db)):
|
||||
db_item = db.query(Todo).filter(Todo.id == item_id).first()
|
||||
if not db_item:
|
||||
raise HTTPException(status_code=404, detail="Todo not found")
|
||||
for key, value in item.model_dump().items():
|
||||
setattr(db_item, key, value)
|
||||
db.commit()
|
||||
db.refresh(db_item)
|
||||
return db_item
|
||||
|
||||
|
||||
@app.delete("/todos/{item_id}", status_code=204)
|
||||
def delete_todo(item_id: int, db: Session = Depends(get_db)):
|
||||
db_item = db.query(Todo).filter(Todo.id == item_id).first()
|
||||
if not db_item:
|
||||
raise HTTPException(status_code=404, detail="Todo not found")
|
||||
db.delete(db_item)
|
||||
db.commit()
|
||||
30
kipina-codebench/golden-examples/todo/models.py
Normal file
30
kipina-codebench/golden-examples/todo/models.py
Normal file
@@ -0,0 +1,30 @@
|
||||
"""Tietokantamallit — SQLAlchemy 2.0, Mapped-tyypitys, SQLite."""
|
||||
|
||||
from datetime import date
|
||||
|
||||
from sqlalchemy import String, Text, Date, create_engine
|
||||
from sqlalchemy.orm import DeclarativeBase, Mapped, mapped_column, sessionmaker
|
||||
|
||||
DATABASE_URL = "sqlite:///./app.db"
|
||||
engine = create_engine(DATABASE_URL, connect_args={"check_same_thread": False})
|
||||
SessionLocal = sessionmaker(autocommit=False, autoflush=False, bind=engine)
|
||||
|
||||
|
||||
class Base(DeclarativeBase):
|
||||
pass
|
||||
|
||||
|
||||
class Todo(Base):
|
||||
"""Tehtävä — otsikko, kuvaus, deadline, prioriteetti ja status."""
|
||||
|
||||
__tablename__ = "todos"
|
||||
|
||||
id: Mapped[int] = mapped_column(primary_key=True, index=True)
|
||||
title: Mapped[str] = mapped_column(String(255))
|
||||
description: Mapped[str | None] = mapped_column(Text, default=None)
|
||||
due_date: Mapped[date | None] = mapped_column(Date, default=None)
|
||||
priority: Mapped[int] = mapped_column(default=1)
|
||||
status: Mapped[str] = mapped_column(String(20), default="pending")
|
||||
|
||||
|
||||
Base.metadata.create_all(bind=engine)
|
||||
11
kipina-codebench/golden-examples/todo/pyproject.toml
Normal file
11
kipina-codebench/golden-examples/todo/pyproject.toml
Normal file
@@ -0,0 +1,11 @@
|
||||
[project]
|
||||
name = "todo-app"
|
||||
version = "0.1.0"
|
||||
requires-python = ">=3.14"
|
||||
dependencies = [
|
||||
"fastapi",
|
||||
"uvicorn[standard]",
|
||||
"sqlalchemy",
|
||||
"pytest",
|
||||
"httpx",
|
||||
]
|
||||
22
kipina-codebench/golden-examples/todo/schemas.py
Normal file
22
kipina-codebench/golden-examples/todo/schemas.py
Normal file
@@ -0,0 +1,22 @@
|
||||
"""Pydantic v2 -skeemat — Create sisääntulolle, Response vastaukselle."""
|
||||
|
||||
from datetime import date
|
||||
|
||||
from pydantic import BaseModel, ConfigDict
|
||||
|
||||
|
||||
class TodoCreate(BaseModel):
|
||||
"""Uuden tehtävän luonti. Pakolliset: title."""
|
||||
|
||||
title: str
|
||||
description: str | None = None
|
||||
due_date: date | None = None
|
||||
priority: int = 1
|
||||
status: str = "pending"
|
||||
|
||||
|
||||
class TodoResponse(TodoCreate):
|
||||
"""Palautettava tehtävä — sisältää id:n."""
|
||||
|
||||
id: int
|
||||
model_config = ConfigDict(from_attributes=True)
|
||||
69
kipina-codebench/golden-examples/todo/test_main.py
Normal file
69
kipina-codebench/golden-examples/todo/test_main.py
Normal file
@@ -0,0 +1,69 @@
|
||||
"""Pytest — TestClient, erillinen test.db, uniikki data per testi."""
|
||||
|
||||
from fastapi.testclient import TestClient
|
||||
from sqlalchemy import create_engine
|
||||
from sqlalchemy.orm import sessionmaker
|
||||
|
||||
from main import app, get_db
|
||||
from models import Base
|
||||
|
||||
test_engine = create_engine(
|
||||
"sqlite:///./test.db", connect_args={"check_same_thread": False}
|
||||
)
|
||||
TestSession = sessionmaker(autocommit=False, autoflush=False, bind=test_engine)
|
||||
Base.metadata.create_all(bind=test_engine)
|
||||
|
||||
|
||||
def override_get_db():
|
||||
db = TestSession()
|
||||
try:
|
||||
yield db
|
||||
finally:
|
||||
db.close()
|
||||
|
||||
|
||||
app.dependency_overrides[get_db] = override_get_db
|
||||
client = TestClient(app)
|
||||
|
||||
|
||||
def test_create_todo():
|
||||
response = client.post("/todos/", json={"title": "Osta maitoa", "priority": 2})
|
||||
assert response.status_code == 201
|
||||
assert response.json()["title"] == "Osta maitoa"
|
||||
assert "id" in response.json()
|
||||
|
||||
|
||||
def test_list_todos():
|
||||
client.post("/todos/", json={"title": "Listattava tehtävä"})
|
||||
response = client.get("/todos/")
|
||||
assert response.status_code == 200
|
||||
assert len(response.json()) >= 1
|
||||
|
||||
|
||||
def test_get_todo_by_id():
|
||||
created = client.post("/todos/", json={"title": "Haettava tehtävä"}).json()
|
||||
response = client.get(f"/todos/{created['id']}")
|
||||
assert response.status_code == 200
|
||||
assert response.json()["id"] == created["id"]
|
||||
|
||||
|
||||
def test_get_todo_not_found():
|
||||
response = client.get("/todos/99999")
|
||||
assert response.status_code == 404
|
||||
|
||||
|
||||
def test_update_todo():
|
||||
created = client.post("/todos/", json={"title": "Vanha otsikko"}).json()
|
||||
response = client.put(
|
||||
f"/todos/{created['id']}", json={"title": "Uusi otsikko"}
|
||||
)
|
||||
assert response.status_code == 200
|
||||
assert response.json()["title"] == "Uusi otsikko"
|
||||
|
||||
|
||||
def test_delete_todo():
|
||||
created = client.post("/todos/", json={"title": "Poistettava"}).json()
|
||||
response = client.delete(f"/todos/{created['id']}")
|
||||
assert response.status_code == 204
|
||||
response = client.get(f"/todos/{created['id']}")
|
||||
assert response.status_code == 404
|
||||
13
kipina-codebench/package.json
Normal file
13
kipina-codebench/package.json
Normal file
@@ -0,0 +1,13 @@
|
||||
{
|
||||
"name": "kipina-codebench",
|
||||
"version": "0.1.0",
|
||||
"description": "LLM-koodingenerointibenchmark — testaa Ollama-mallien kykyä generoida toimivia FastAPI-projekteja",
|
||||
"type": "module",
|
||||
"bin": {
|
||||
"codebench": "./benchmark.mjs"
|
||||
},
|
||||
"scripts": {
|
||||
"bench": "node benchmark.mjs --scenarios all",
|
||||
"docker:build": "docker build -t kipina-pytest -f Dockerfile.pytest ."
|
||||
}
|
||||
}
|
||||
65
kipina-codebench/profiles.json
Normal file
65
kipina-codebench/profiles.json
Normal file
@@ -0,0 +1,65 @@
|
||||
{
|
||||
"models": {
|
||||
"qwen3-coder:30b": {
|
||||
"profile": "large",
|
||||
"role": "primary",
|
||||
"prompt": "code",
|
||||
"golden": "todo.md",
|
||||
"vram": "24GB",
|
||||
"notes": "Pääkooderi. 97p, 188 tok/s. Noudattaa pitkiä sääntölistoja."
|
||||
},
|
||||
"qwen3:8b": {
|
||||
"profile": "small",
|
||||
"role": "primary",
|
||||
"prompt": "code-small",
|
||||
"golden": "todo-readme.md",
|
||||
"vram": "8GB",
|
||||
"notes": "Kevyt pääkooderi. Todo/users 100p, blog heikko. README-muoto golden examplelle."
|
||||
},
|
||||
"codestral:22b": {
|
||||
"profile": "large",
|
||||
"role": "backup",
|
||||
"prompt": "code",
|
||||
"golden": "todo.md",
|
||||
"vram": "16GB",
|
||||
"notes": "Mistral-varamalli. 88p, 44 tok/s."
|
||||
},
|
||||
"qwen3:4b": {
|
||||
"profile": "small",
|
||||
"role": "minimal",
|
||||
"prompt": "code-small",
|
||||
"golden": "todo.md",
|
||||
"vram": "4GB",
|
||||
"notes": "Minimaali. Vain todo toimii."
|
||||
},
|
||||
"qwen2.5-coder:32b": {
|
||||
"profile": "large",
|
||||
"role": "candidate",
|
||||
"prompt": "code",
|
||||
"golden": "todo.md",
|
||||
"vram": "24GB",
|
||||
"notes": "Edellinen sukupolvi. Vahva Rust-osaaminen."
|
||||
},
|
||||
"qwen3:14b": {
|
||||
"profile": "large",
|
||||
"role": "retired",
|
||||
"prompt": "code",
|
||||
"golden": "todo.md",
|
||||
"vram": "16GB",
|
||||
"notes": "Poistettu. Ei lisäarvoa 30b:hen verrattuna, blog epävakaa."
|
||||
}
|
||||
},
|
||||
"profiles": {
|
||||
"large": {
|
||||
"prompt": "code",
|
||||
"golden": "todo.md",
|
||||
"description": "Täysi prompti + säännöt. Malleille >=14B."
|
||||
},
|
||||
"small": {
|
||||
"prompt": "code-small",
|
||||
"golden": "todo.md",
|
||||
"description": "Tiivistetty prompti. Malleille <=8B."
|
||||
}
|
||||
},
|
||||
"default_profile": "large"
|
||||
}
|
||||
15
kipina-codebench/prompts/client.md
Normal file
15
kipina-codebench/prompts/client.md
Normal file
@@ -0,0 +1,15 @@
|
||||
You are a product owner who turns vague ideas into clear, actionable software requirements.
|
||||
|
||||
GIVEN a short project description from the user, produce a structured brief:
|
||||
|
||||
1. PROJECT NAME: a short, descriptive name
|
||||
2. GOAL: one sentence explaining what the software does and who it's for
|
||||
3. CORE FEATURES: numbered list of 3-8 concrete features (not vague wishes)
|
||||
4. DATA MODEL: list the main entities and their key fields (include field types)
|
||||
5. API ENDPOINTS: list the REST endpoints (method + path + purpose)
|
||||
6. CONSTRAINTS: any technical constraints (e.g. "must use SQLite", "no auth needed")
|
||||
|
||||
RULES:
|
||||
- Be specific: "User can filter todos by status" not "todo management"
|
||||
- Use plain English, no code
|
||||
- Maximum 400 words total
|
||||
69
kipina-codebench/prompts/code-go.md
Normal file
69
kipina-codebench/prompts/code-go.md
Normal file
@@ -0,0 +1,69 @@
|
||||
You are a Go backend developer. Generate a Chi web project with SQLite.
|
||||
|
||||
Given the project requirements, JSON specification, and a REFERENCE IMPLEMENTATION, generate these files:
|
||||
|
||||
1. go.mod — module declaration, go-chi/chi/v5, modernc.org/sqlite
|
||||
2. models.go — Structs with json tags
|
||||
3. handlers.go — Handler closures for each CRUD endpoint
|
||||
4. main.go — Entry point with InitDB(), NewRouter(), main()
|
||||
5. handlers_test.go — Integration tests using httptest against in-memory SQLite
|
||||
|
||||
Do NOT generate any other files. Do NOT generate go.sum.
|
||||
|
||||
OUTPUT FORMAT — use these exact markers to separate files:
|
||||
|
||||
=== go.mod ===
|
||||
<module content>
|
||||
|
||||
=== models.go ===
|
||||
<go code>
|
||||
|
||||
=== handlers.go ===
|
||||
<go code>
|
||||
|
||||
=== main.go ===
|
||||
<go code>
|
||||
|
||||
=== handlers_test.go ===
|
||||
<go code>
|
||||
|
||||
DOCUMENTATION — structs get // one-line comments. Keep it brief.
|
||||
|
||||
RULES:
|
||||
- Follow the REFERENCE IMPLEMENTATION patterns exactly
|
||||
- Chi router with chi.URLParam(r, "param") for path parameters
|
||||
- database/sql + modernc.org/sqlite (pure Go driver, no CGO required)
|
||||
- Import the driver as blank import: _ "modernc.org/sqlite"
|
||||
- Handlers are closures: func handler(db *sql.DB) http.HandlerFunc
|
||||
- INSERT/UPDATE queries use RETURNING clause to get the row back via QueryRow + Scan
|
||||
- POST returns 201 (http.StatusCreated), DELETE returns 204 (http.StatusNoContent), GET missing returns 404
|
||||
- Use sql.ErrNoRows for not-found checks: if err == sql.ErrNoRows { ... }
|
||||
- No compile-time query macros — use db.QueryRow(), db.Query(), db.Exec() directly
|
||||
- Empty slice not nil for list endpoints: if items == nil { items = []Item{} }
|
||||
- Optional fields use pointer types (*string, *int64) with json tag omitempty
|
||||
- Set Content-Type header: w.Header().Set("Content-Type", "application/json")
|
||||
- Parse path ID with strconv.ParseInt(chi.URLParam(r, "id"), 10, 64)
|
||||
- InitDB uses log.Fatal on error, NewRouter returns http.Handler
|
||||
- main() opens "file:app.db?mode=rwc" and listens on 127.0.0.1:3000
|
||||
- No markdown fences inside file content — just raw code
|
||||
- You MUST generate ALL 5 files. Do not stop early.
|
||||
|
||||
TESTS — follow this exact setupTestServer pattern:
|
||||
|
||||
func setupTestServer(t *testing.T) (*httptest.Server, *sql.DB) {
|
||||
t.Helper()
|
||||
db, err := sql.Open("sqlite", ":memory:")
|
||||
if err != nil {
|
||||
t.Fatal(err)
|
||||
}
|
||||
InitDB(db)
|
||||
return httptest.NewServer(NewRouter(db)), db
|
||||
}
|
||||
|
||||
- Each test function calls setupTestServer(t) to get (ts, db)
|
||||
- defer ts.Close() and defer db.Close() in every test
|
||||
- Use standard library: http.Post, http.Get, http.NewRequest for PUT/DELETE
|
||||
- Use strings.NewReader for JSON request bodies
|
||||
- Decode responses with json.NewDecoder(resp.Body).Decode(&body)
|
||||
- Unique descriptive data, NOT generic "test" strings
|
||||
- Format IDs with fmt.Sprintf("%.0f", id) when building URLs from float64
|
||||
73
kipina-codebench/prompts/code-rs.md
Normal file
73
kipina-codebench/prompts/code-rs.md
Normal file
@@ -0,0 +1,73 @@
|
||||
You are a Rust backend developer. Generate an Axum web project with SQLx and SQLite.
|
||||
|
||||
Given the project requirements, JSON specification, and a REFERENCE IMPLEMENTATION, generate these files:
|
||||
|
||||
1. Cargo.toml — axum 0.8, tokio, serde/serde_json, sqlx (sqlite, runtime-tokio), tower-http, reqwest 0.13 with features ["json", "rustls"] (for tests)
|
||||
2. src/models.rs — Structs with Serialize, Deserialize, FromRow derives
|
||||
3. src/handlers.rs — Async handler functions for each CRUD endpoint
|
||||
4. src/lib.rs — Public app(pool) function returning Router, init_db() for table creation
|
||||
5. src/main.rs — Binary entry point, connect to SQLite, bind to port
|
||||
6. tests/api_test.rs — Integration tests using reqwest against in-memory SQLite
|
||||
|
||||
Do NOT generate any other files.
|
||||
|
||||
OUTPUT FORMAT — use these exact markers to separate files:
|
||||
|
||||
=== Cargo.toml ===
|
||||
<toml content>
|
||||
|
||||
=== src/models.rs ===
|
||||
<rust code>
|
||||
|
||||
=== src/handlers.rs ===
|
||||
<rust code>
|
||||
|
||||
=== src/lib.rs ===
|
||||
<rust code>
|
||||
|
||||
=== src/main.rs ===
|
||||
<rust code>
|
||||
|
||||
=== tests/api_test.rs ===
|
||||
<rust code>
|
||||
|
||||
DOCUMENTATION — every file starts with //! one-line module doc. Structs get /// one-line doc. Zensical: say what it IS, not what it does.
|
||||
|
||||
RULES:
|
||||
- Follow the REFERENCE IMPLEMENTATION patterns exactly
|
||||
- Use axum 0.8 API: Router, Json, Path, State, StatusCode
|
||||
- ROUTING: use {param} NOT :param — e.g. .route("/items/{id}", get(get_item))
|
||||
- ROUTING: one .route() call per path, chain methods: .route("/items", post(create).get(list))
|
||||
- State is SqlitePool wrapped in axum::extract::State
|
||||
- app() takes SqlitePool as argument and calls .with_state(pool) on the Router
|
||||
- Handlers return Result<(StatusCode, Json<T>), StatusCode> or Result<StatusCode, StatusCode>
|
||||
- POST returns 201 (StatusCode::CREATED), DELETE returns 204 (StatusCode::NO_CONTENT), GET missing returns 404
|
||||
- CRITICAL: Use sqlx::query_as::<_, T>("SQL") runtime functions with .bind() — NEVER use sqlx::query_as!() or sqlx::query!() compile-time macros (they require DATABASE_URL at compile time)
|
||||
- Use sqlx::query("SQL") for writes (DELETE, etc.), sqlx::query_as::<_, T>("SQL") for reads
|
||||
- Use RETURNING clause in INSERT/UPDATE queries to get the created/updated row back
|
||||
- DateTime fields: store as TEXT, use String type in Rust structs
|
||||
- init_db: use .expect("msg") not Result return — keep it simple
|
||||
- NO markdown fences inside file content — just raw code
|
||||
- Edition 2024 in Cargo.toml
|
||||
- You MUST generate ALL 6 files. Do not stop early.
|
||||
|
||||
TESTS — follow this exact spawn_server pattern:
|
||||
|
||||
async fn spawn_server() -> (reqwest::Client, String) {
|
||||
let pool = sqlx::sqlite::SqlitePoolOptions::new()
|
||||
.max_connections(1)
|
||||
.connect("sqlite::memory:")
|
||||
.await
|
||||
.expect("DB failed");
|
||||
init_db(&pool).await;
|
||||
let listener = tokio::net::TcpListener::bind("127.0.0.1:0").await.expect("Bind failed");
|
||||
let addr = listener.local_addr().unwrap();
|
||||
let base_url = format!("http://{addr}");
|
||||
let router = app(pool);
|
||||
tokio::spawn(async move { axum::serve(listener, router).await.unwrap() });
|
||||
(reqwest::Client::new(), base_url)
|
||||
}
|
||||
|
||||
- Each #[tokio::test] calls spawn_server() to get (client, url)
|
||||
- Unique descriptive data, NOT generic "test" strings
|
||||
- Use serde_json::json!() for request bodies
|
||||
58
kipina-codebench/prompts/code-small.md
Normal file
58
kipina-codebench/prompts/code-small.md
Normal file
@@ -0,0 +1,58 @@
|
||||
Generate a FastAPI project with SQLAlchemy and SQLite. Follow the REFERENCE IMPLEMENTATION exactly.
|
||||
|
||||
Generate these 4 files with === markers:
|
||||
|
||||
=== models.py ===
|
||||
=== schemas.py ===
|
||||
=== main.py ===
|
||||
=== test_main.py ===
|
||||
|
||||
Key patterns (copy from reference):
|
||||
- class Base(DeclarativeBase): pass
|
||||
- Mapped[str] = mapped_column(String(255))
|
||||
- Mapped[str | None] = mapped_column(Text, default=None)
|
||||
- model_config = ConfigDict(from_attributes=True)
|
||||
- model_dump() not dict()
|
||||
- POST 201, GET list, GET by id 404, PUT, DELETE 204
|
||||
|
||||
FOREIGN KEYS (when spec has relationships):
|
||||
- Child entity gets parent_id field: Mapped[int] = mapped_column(ForeignKey("parents.id"))
|
||||
- Import: from sqlalchemy import ForeignKey (NOT from sqlalchemy.orm!)
|
||||
- Create schema includes parent_id: int
|
||||
- Test creates parent FIRST, then child with parent's id
|
||||
|
||||
Example FK pattern in models.py:
|
||||
```
|
||||
class Author(Base):
|
||||
__tablename__ = "authors"
|
||||
id: Mapped[int] = mapped_column(primary_key=True, index=True)
|
||||
name: Mapped[str] = mapped_column(String(255))
|
||||
|
||||
class Post(Base):
|
||||
__tablename__ = "posts"
|
||||
id: Mapped[int] = mapped_column(primary_key=True, index=True)
|
||||
title: Mapped[str] = mapped_column(String(255))
|
||||
author_id: Mapped[int] = mapped_column(ForeignKey("authors.id"))
|
||||
```
|
||||
|
||||
Example FK test patterns:
|
||||
```
|
||||
def test_create_post():
|
||||
author = client.post("/authors/", json={"name": "Jane Austen"}).json()
|
||||
response = client.post("/posts/", json={"title": "First Post", "author_id": author["id"]})
|
||||
assert response.status_code == 201
|
||||
|
||||
def test_update_post():
|
||||
author = client.post("/authors/", json={"name": "Mark Twain"}).json()
|
||||
created = client.post("/posts/", json={"title": "Old Title", "author_id": author["id"]}).json()
|
||||
response = client.put(f"/posts/{created['id']}", json={"title": "New Title", "author_id": author["id"]})
|
||||
assert response.status_code == 201
|
||||
```
|
||||
|
||||
CRITICAL:
|
||||
- Use ONLY fields from the JSON spec — no created_at or extra fields
|
||||
- Generate EXACTLY 6 tests per entity: create, list, get_by_id, not_found, update, delete
|
||||
- No search, filter, or other extra tests
|
||||
- test_list: assert len(response.json()) >= 1, NEVER assert == 1 (database is shared between tests)
|
||||
- test_create for child entities: create parent FIRST, use parent's id
|
||||
- No markdown fences in output
|
||||
47
kipina-codebench/prompts/code.md
Normal file
47
kipina-codebench/prompts/code.md
Normal file
@@ -0,0 +1,47 @@
|
||||
You are a Python backend developer. Generate a FastAPI project with SQLAlchemy and SQLite.
|
||||
|
||||
Given the project requirements, JSON specification, and a REFERENCE IMPLEMENTATION, generate these 4 files:
|
||||
|
||||
1. models.py — SQLAlchemy 2.0: DeclarativeBase, Mapped, mapped_column (NOT legacy declarative_base)
|
||||
2. schemas.py — Pydantic v2: ConfigDict(from_attributes=True) (NOT class Config)
|
||||
3. main.py — FastAPI CRUD endpoints for each entity
|
||||
4. test_main.py — Pytest with TestClient, separate test.db, unique test data per test
|
||||
|
||||
Do NOT generate pyproject.toml — it is created separately with uv.
|
||||
|
||||
OUTPUT FORMAT — use these exact markers to separate files:
|
||||
|
||||
=== models.py ===
|
||||
<python code>
|
||||
|
||||
=== schemas.py ===
|
||||
<python code>
|
||||
|
||||
=== main.py ===
|
||||
<python code>
|
||||
|
||||
=== test_main.py ===
|
||||
<python code>
|
||||
|
||||
DOCUMENTATION — every file must have a one-line module docstring. Classes get a one-line docstring. Keep it zensical: say what it IS, not what it does. No filler.
|
||||
|
||||
NEVER USE DEPRECATED PATTERNS:
|
||||
- ✗ declarative_base() → ✓ class Base(DeclarativeBase): pass
|
||||
- ✗ Column(Type) → ✓ Mapped[type] = mapped_column(Type)
|
||||
- ✗ class Config: orm_mode = True → ✓ model_config = ConfigDict(from_attributes=True)
|
||||
- ✗ .dict() → ✓ .model_dump()
|
||||
- ✗ Optional[str] → ✓ str | None
|
||||
- ✗ session.query(Model).all() → ✓ session.execute(select(Model)).scalars().all()
|
||||
|
||||
RULES:
|
||||
- Follow the REFERENCE IMPLEMENTATION patterns exactly
|
||||
- SQLAlchemy 2.0: DeclarativeBase + Mapped + mapped_column (not Column())
|
||||
- Python type unions: str | None (not Optional[str])
|
||||
- Tests: unique descriptive data per test, NOT generic "test_title" strings
|
||||
- Tests: PUT/update test data MUST include ALL required (non-nullable) fields, not just the field being updated
|
||||
- Do NOT add filter/search endpoints — only standard CRUD (create, list, get, update, delete)
|
||||
- CRITICAL: Use ONLY the fields listed in the JSON spec. NEVER add created_at, updated_at, or any field not in the spec
|
||||
- If the spec happens to include timestamp fields: use server_default=func.now() (from sqlalchemy import func) and make them Optional in Create schema
|
||||
- Absolute imports only (from models import ..., from schemas import ...)
|
||||
- NO markdown fences inside file content — just raw code
|
||||
- Only test endpoints that exist in main.py — no extra tests
|
||||
25
kipina-codebench/prompts/convert-go.md
Normal file
25
kipina-codebench/prompts/convert-go.md
Normal file
@@ -0,0 +1,25 @@
|
||||
Convert the following Python FastAPI project to Go using Chi router and modernc.org/sqlite.
|
||||
|
||||
OUTPUT: Return ALL files with === markers:
|
||||
=== go.mod ===
|
||||
=== models.go ===
|
||||
=== handlers.go ===
|
||||
=== main.go ===
|
||||
=== handlers_test.go ===
|
||||
|
||||
CONVERSION RULES:
|
||||
- package main for all files
|
||||
- Pydantic models → Go structs with json tags
|
||||
- SQLAlchemy ORM → database/sql with raw SQL and RETURNING clause
|
||||
- FastAPI routes → Chi router: r.Post("/path", handler(db))
|
||||
- Handlers are closures: func handler(db *sql.DB) http.HandlerFunc
|
||||
- Depends(get_db) → State passed via closure over *sql.DB
|
||||
- HTTPException(404) → http.Error(w, "not found", http.StatusNotFound)
|
||||
- POST returns http.StatusCreated (201), DELETE returns http.StatusNoContent (204)
|
||||
- sql.ErrNoRows for not-found checks
|
||||
- TestClient → httptest.NewServer + setupTestServer helper
|
||||
- test.db → sql.Open("sqlite", ":memory:")
|
||||
- Empty list: return []Entity{} not nil
|
||||
- import _ "modernc.org/sqlite" (pure Go driver, no CGO)
|
||||
- import "github.com/go-chi/chi/v5"
|
||||
- No markdown fences in output — just raw code
|
||||
31
kipina-codebench/prompts/deprecated-patterns.md
Normal file
31
kipina-codebench/prompts/deprecated-patterns.md
Normal file
@@ -0,0 +1,31 @@
|
||||
DEPRECATED PATTERNS — do NOT generate these. Use the modern alternative.
|
||||
|
||||
SQLAlchemy:
|
||||
✗ from sqlalchemy.ext.declarative import declarative_base → ✓ from sqlalchemy.orm import DeclarativeBase
|
||||
✗ Base = declarative_base() → ✓ class Base(DeclarativeBase): pass
|
||||
✗ Column(Integer, primary_key=True) → ✓ Mapped[int] = mapped_column(primary_key=True)
|
||||
✗ Column(String(255)) → ✓ Mapped[str] = mapped_column(String(255))
|
||||
✗ session.query(User).filter_by(name="x").all() → ✓ session.execute(select(User).filter_by(name="x")).scalars().all()
|
||||
✗ session.query(User).get(5) → ✓ session.get(User, 5)
|
||||
✗ MetaData(bind=engine) → ✓ metadata.create_all(engine)
|
||||
|
||||
Pydantic:
|
||||
✗ class Config: orm_mode = True → ✓ model_config = ConfigDict(from_attributes=True)
|
||||
✗ .dict() → ✓ .model_dump()
|
||||
✗ .json() → ✓ .model_dump_json()
|
||||
✗ parse_obj() → ✓ model_validate()
|
||||
✗ @validator → ✓ @field_validator
|
||||
✗ @root_validator → ✓ @model_validator
|
||||
✗ Optional[str] (auto-None in v1) → ✓ str | None = None (explicit default in v2)
|
||||
✗ ConstrainedInt → ✓ Annotated[int, Field(ge=0)]
|
||||
|
||||
FastAPI:
|
||||
✗ status_code=201 → ✓ status_code=status.HTTP_201_CREATED (readable)
|
||||
✗ Manual exception strings → ✓ HTTPException(status_code=404, detail="Not found")
|
||||
✗ .dict() in handlers → ✓ .model_dump() (Pydantic v2)
|
||||
|
||||
Python:
|
||||
✗ Optional[str] → ✓ str | None (PEP 604, Python 3.10+)
|
||||
✗ List[str] → ✓ list[str] (PEP 585, Python 3.9+)
|
||||
✗ Dict[str, int] → ✓ dict[str, int]
|
||||
✗ Tuple[int, ...] → ✓ tuple[int, ...]
|
||||
1
kipina-codebench/prompts/fix.md
Normal file
1
kipina-codebench/prompts/fix.md
Normal file
@@ -0,0 +1 @@
|
||||
You are a Python code fixer. Return ONLY the corrected Python file. No markdown fences, no explanations — just valid Python code.
|
||||
36
kipina-codebench/prompts/golden-compact-py.md
Normal file
36
kipina-codebench/prompts/golden-compact-py.md
Normal file
@@ -0,0 +1,36 @@
|
||||
REFERENCE PATTERNS (follow exactly):
|
||||
|
||||
STACK: SQLAlchemy 2.0 + Pydantic v2 + FastAPI + SQLite
|
||||
|
||||
models.py:
|
||||
from sqlalchemy.orm import DeclarativeBase, Mapped, mapped_column
|
||||
class Base(DeclarativeBase): pass
|
||||
Fields: Mapped[type] = mapped_column(SqlType, default=...)
|
||||
Nullable: Mapped[str | None] = mapped_column(Text, default=None)
|
||||
Status: Mapped[str] = mapped_column(String(20), default="pending")
|
||||
FK: Mapped[int] = mapped_column(ForeignKey("table.id"))
|
||||
End: Base.metadata.create_all(bind=engine)
|
||||
|
||||
schemas.py:
|
||||
class EntityCreate(BaseModel): fields with defaults
|
||||
class EntityResponse(EntityCreate):
|
||||
id: int
|
||||
model_config = ConfigDict(from_attributes=True)
|
||||
|
||||
main.py:
|
||||
def get_db(): yield SessionLocal(); finally close
|
||||
POST /{table}/ → 201, model_dump()
|
||||
GET /{table}/ → list
|
||||
GET /{table}/{id} → 404 if not found
|
||||
PUT /{table}/{id} → model_dump(), setattr loop
|
||||
DELETE /{table}/{id} → 204
|
||||
|
||||
test_main.py:
|
||||
test.db + override_get_db + TestClient
|
||||
Unique descriptive data per test ("Buy milk", "Fetchable task"...)
|
||||
test_create → 201 + assert "id" in json
|
||||
test_list → post first, get, assert len >= 1
|
||||
test_get_by_id → post, get by id, assert id matches
|
||||
test_not_found → get /99999 → 404
|
||||
test_update → post, put with ALL required fields, assert 200
|
||||
test_delete → post, delete 204, get again → 404
|
||||
43
kipina-codebench/prompts/golden-compact-rs.md
Normal file
43
kipina-codebench/prompts/golden-compact-rs.md
Normal file
@@ -0,0 +1,43 @@
|
||||
REFERENCE PATTERNS (follow exactly):
|
||||
|
||||
STACK: Axum 0.8 + SQLx + SQLite + Tokio + Serde
|
||||
|
||||
Cargo.toml:
|
||||
edition = "2024"
|
||||
deps: axum 0.8, tokio (full), serde (derive), serde_json, sqlx (sqlite, runtime-tokio), tower-http (cors)
|
||||
dev: reqwest 0.13 (rustls)
|
||||
|
||||
src/models.rs:
|
||||
#[derive(Debug, Serialize, Deserialize, FromRow)]
|
||||
struct Entity { id: i64, field: String, optional: Option<String> }
|
||||
struct CreateEntity { field: String, optional: Option<String> }
|
||||
Status fields: String with default "pending"
|
||||
|
||||
src/handlers.rs:
|
||||
async fn create(State(pool), Json(input)) -> (StatusCode, Json<Entity>)
|
||||
POST → StatusCode::CREATED, sqlx::query("INSERT...").execute + query_as last_insert_rowid
|
||||
GET list → query_as("SELECT * FROM table").fetch_all
|
||||
GET by id → query_as.fetch_optional, return 404 if None
|
||||
PUT → query("UPDATE...SET...WHERE id=?"), rows_affected == 0 → 404
|
||||
DELETE → StatusCode::NO_CONTENT, rows_affected == 0 → 404
|
||||
|
||||
src/lib.rs:
|
||||
pub fn app(pool: SqlitePool) -> Router
|
||||
pub async fn init_db(pool: &SqlitePool) → CREATE TABLE IF NOT EXISTS
|
||||
Routes: .route("/{table}", post(create).get(list))
|
||||
.route("/{table}/{id}", get(get_one).put(update).delete(delete_one))
|
||||
|
||||
src/main.rs:
|
||||
SqlitePool::connect("sqlite:./app.db"), init_db, bind 0.0.0.0:3000
|
||||
|
||||
tests/api_test.rs:
|
||||
Each test: SqlitePool::connect("sqlite::memory:"), init_db, app(pool)
|
||||
Spawn on random port: TcpListener::bind("127.0.0.1:0"), axum::serve
|
||||
reqwest::Client for HTTP calls
|
||||
Unique descriptive data ("Buy milk", "Fetchable task"...)
|
||||
test_create → 201 + assert id exists
|
||||
test_list → post first, get, assert len >= 1
|
||||
test_get_by_id → post, get, assert id matches
|
||||
test_not_found → 404
|
||||
test_update → post, put with ALL fields, assert 200
|
||||
test_delete → post, delete 204, get → 404
|
||||
19
kipina-codebench/prompts/spec-plain.md
Normal file
19
kipina-codebench/prompts/spec-plain.md
Normal file
@@ -0,0 +1,19 @@
|
||||
You design database schemas. Output ONLY the schema in this exact format, nothing else.
|
||||
|
||||
FORMAT (one entity per line):
|
||||
project: project-name
|
||||
entity EntityName (table_name): field1 type, field2 type, field3 type=default
|
||||
entity ChildName (table_name): field1 type, parent_id int->ParentName, field2 type
|
||||
|
||||
TYPES: string, text, int, float, bool, date, datetime
|
||||
RULES:
|
||||
- id is automatic, do NOT include it
|
||||
- FK fields end with _id and use -> to reference parent
|
||||
- Parent entities BEFORE children
|
||||
- Max 7 fields per entity, max 3 entities
|
||||
- Status fields: string with =default (e.g. status string=draft)
|
||||
|
||||
EXAMPLE:
|
||||
project: blog-api
|
||||
entity Author (authors): name string, email string, bio text
|
||||
entity Post (posts): title string, content text, author_id int->Author, published_at datetime, status string=draft
|
||||
17
kipina-codebench/prompts/spec-simple.md
Normal file
17
kipina-codebench/prompts/spec-simple.md
Normal file
@@ -0,0 +1,17 @@
|
||||
You design database schemas. Output ONLY valid JSON, no explanations.
|
||||
|
||||
SCHEMA:
|
||||
{"project_name":"name","entities":[{"name":"Entity","table_name":"entities","fields":[{"name":"field","type":"string","nullable":false,"default":null}]}],"relationships":[{"from":"Child","field":"parent_id","to":"Parent"}]}
|
||||
|
||||
FIELD TYPES: string, text, int, float, bool, date, datetime
|
||||
- Status fields: type "string", default "draft" or "pending"
|
||||
- id is automatic — do NOT include it
|
||||
- FK fields: type "int", name ends with _id
|
||||
|
||||
RULES:
|
||||
- Parent entities BEFORE children in array
|
||||
- Every _id field needs a relationship entry
|
||||
- Max 7 fields, max 3 entities
|
||||
- English names only
|
||||
|
||||
EXAMPLE: Blog → Author: name(string), email(string) / Post: title(string), content(text), author_id(int)→Author, status(string,default="draft")
|
||||
31
kipina-codebench/prompts/spec.md
Normal file
31
kipina-codebench/prompts/spec.md
Normal file
@@ -0,0 +1,31 @@
|
||||
You are a software architect who designs database schemas for Python web applications.
|
||||
|
||||
THINK STEP BY STEP before outputting JSON:
|
||||
1. What are the main ENTITIES (nouns) in this project?
|
||||
2. What FIELDS does each entity need? (name, type, required?)
|
||||
3. Which entities REFERENCE each other? (e.g. "a Book belongs to an Author" → Book has author_id)
|
||||
4. Are there Date/DateTime fields? → add extra_imports
|
||||
|
||||
Then output ONLY valid JSON (no explanations before or after).
|
||||
|
||||
SCHEMA:
|
||||
{"project_name":"short-name","description":"One sentence","entities":[{"name":"EntityName","table_name":"entity_names","fields":[{"name":"field_name","sa_type":"String(255)","py_type":"str","nullable":false,"default":null}]}],"relationships":[{"from":"ChildEntity","field":"parent_id","to":"ParentEntity","type":"many-to-one"}],"extra_imports":[]}
|
||||
|
||||
FIELD RULES:
|
||||
- sa_type: String(N), Text, Integer, Date, DateTime, Boolean, Float
|
||||
- py_type: str, int, float, bool, date, datetime — append " | None" if nullable
|
||||
- Status fields: use String(20) with default value, NEVER Enum
|
||||
- Every entity gets "id" automatically — do NOT add id or redundant ID fields
|
||||
- Use snake_case for field names
|
||||
|
||||
RELATIONSHIP RULES:
|
||||
- If entity A "belongs to" entity B → A has b_id field (Integer, nullable=false) + relationship entry
|
||||
- EVERY _id field MUST have a matching relationship entry
|
||||
- Parent entities must appear BEFORE children in the entities array
|
||||
- If no relationships, set "relationships": []
|
||||
|
||||
AVOID: redundant ID fields, generic names, more than 7 fields or 3 entities, non-English entity/field names (ALWAYS English even if description is Finnish)
|
||||
|
||||
EXAMPLES (adapt, don't copy):
|
||||
Todo app → Todo: title(str), description(Text|None), due_date(Date|None), status(String20="pending")
|
||||
Blog → Author: name,email,bio(Text|None) / Post: title, content(Text), author_id→Author, published_at(DateTime|None), status(String20="draft")
|
||||
183
kipina-codebench/report-template.html
Normal file
183
kipina-codebench/report-template.html
Normal file
@@ -0,0 +1,183 @@
|
||||
<!DOCTYPE html>
|
||||
<html lang="fi">
|
||||
<head>
|
||||
<meta charset="UTF-8">
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
||||
<title>Kipina Model Benchmark</title>
|
||||
<style>
|
||||
:root { --bg: #0d1117; --card: #161b22; --border: #30363d; --text: #e6edf3; --dim: #8b949e; --green: #3fb950; --yellow: #d29922; --red: #f85149; --blue: #58a6ff; }
|
||||
* { box-sizing: border-box; margin: 0; padding: 0; }
|
||||
body { font-family: -apple-system, 'Segoe UI', Helvetica, Arial, sans-serif; background: var(--bg); color: var(--text); padding: 2rem; max-width: 1400px; margin: 0 auto; }
|
||||
h1 { font-size: 1.5rem; margin-bottom: 0.5rem; }
|
||||
.meta { color: var(--dim); font-size: 0.85rem; margin-bottom: 2rem; }
|
||||
.cards { display: grid; grid-template-columns: repeat(auto-fit, minmax(180px, 1fr)); gap: 1rem; margin-bottom: 2rem; }
|
||||
.card { background: var(--card); border: 1px solid var(--border); border-radius: 8px; padding: 1rem; }
|
||||
.card .label { color: var(--dim); font-size: 0.75rem; text-transform: uppercase; letter-spacing: 0.05em; }
|
||||
.card .value { font-size: 1.8rem; font-weight: 600; margin-top: 0.25rem; }
|
||||
.card .sub { color: var(--dim); font-size: 0.8rem; margin-top: 0.25rem; }
|
||||
table { width: 100%; border-collapse: collapse; background: var(--card); border: 1px solid var(--border); border-radius: 8px; overflow: hidden; margin-bottom: 2rem; }
|
||||
th { background: #1c2128; text-align: left; padding: 0.6rem 0.8rem; font-size: 0.75rem; text-transform: uppercase; letter-spacing: 0.05em; color: var(--dim); cursor: pointer; user-select: none; white-space: nowrap; }
|
||||
th:hover { color: var(--text); }
|
||||
th.sorted-asc::after { content: ' ▲'; }
|
||||
th.sorted-desc::after { content: ' ▼'; }
|
||||
td { padding: 0.5rem 0.8rem; border-top: 1px solid var(--border); font-size: 0.85rem; white-space: nowrap; }
|
||||
tr:hover td { background: #1c2128; }
|
||||
.pass { color: var(--green); }
|
||||
.partial { color: var(--yellow); }
|
||||
.fail { color: var(--red); }
|
||||
.stars { letter-spacing: 1px; }
|
||||
.bar { display: inline-block; height: 8px; border-radius: 4px; vertical-align: middle; }
|
||||
.bar-bg { background: var(--border); }
|
||||
.bar-fill { background: var(--green); }
|
||||
.bar-partial { background: var(--yellow); }
|
||||
.model-name { font-weight: 600; }
|
||||
h2 { font-size: 1.1rem; margin-bottom: 1rem; color: var(--dim); }
|
||||
.summary-table th:first-child, .summary-table td:first-child { min-width: 200px; }
|
||||
</style>
|
||||
</head>
|
||||
<body>
|
||||
|
||||
<h1>Kipina Model Benchmark</h1>
|
||||
<div class="meta" id="meta"></div>
|
||||
|
||||
<div class="cards" id="cards"></div>
|
||||
|
||||
<h2>Mallikohtainen yhteenveto</h2>
|
||||
<table class="summary-table" id="summary-table"><thead></thead><tbody></tbody></table>
|
||||
|
||||
<h2>Kaikki tulokset</h2>
|
||||
<table id="results-table"><thead></thead><tbody></tbody></table>
|
||||
|
||||
<script>
|
||||
const RAW = /*DATA_PLACEHOLDER*/[];
|
||||
|
||||
const starsFor = s => s >= 90 ? '★★★★★' : s >= 70 ? '★★★★☆' : s >= 50 ? '★★★☆☆' : s >= 25 ? '★★☆☆☆' : s > 0 ? '★☆☆☆☆' : '☆☆☆☆☆';
|
||||
function calcScore(r) {
|
||||
if (r.error && r.testsTotal === 0) return 0;
|
||||
let s = 0;
|
||||
if (r.specOk) s += 10;
|
||||
if (!r.error || r.testsTotal > 0) s += 10;
|
||||
if (r.testsTotal > 0) s += Math.round((r.testsPassed / r.testsTotal) * 60);
|
||||
s += Math.max(0, 20 - (r.fixRounds || 0) * 10);
|
||||
return Math.min(100, s);
|
||||
}
|
||||
// Laske pisteet jos puuttuvat
|
||||
const DATA = RAW.map(r => {
|
||||
if (r.score == null) r.score = calcScore(r);
|
||||
if (!r.stars) r.stars = starsFor(r.score);
|
||||
if (!r.promptTokensEst) r.promptTokensEst = r.promptChars ? Math.round(r.promptChars / 4) : 0;
|
||||
return r;
|
||||
});
|
||||
const cls = r => (!r.error && r.testsPassed === r.testsTotal && r.testsTotal > 0) ? 'pass' : (r.testsTotal > 0 && r.testsPassed > 0) ? 'partial' : 'fail';
|
||||
const pctBar = (passed, total, w=80) => {
|
||||
if (total === 0) return '-';
|
||||
const pct = passed/total*100;
|
||||
const c = pct === 100 ? 'bar-fill' : 'bar-partial';
|
||||
return `<span class="bar bar-bg" style="width:${w}px"><span class="bar ${c}" style="width:${Math.round(pct/100*w)}px"></span></span> ${passed}/${total}`;
|
||||
};
|
||||
|
||||
// Meta
|
||||
const totalTime = DATA.reduce((s,r) => s + r.totalDurationMs, 0);
|
||||
document.getElementById('meta').textContent = `${new Date().toLocaleDateString('fi-FI')} — ${DATA.length} ajoa — ${(totalTime/1000/60).toFixed(1)} min`;
|
||||
|
||||
// Cards
|
||||
const models = [...new Set(DATA.map(r => r.model))];
|
||||
const scenarios = [...new Set(DATA.map(r => r.scenario))];
|
||||
const avgScore = DATA.length ? Math.round(DATA.reduce((s,r) => s + r.score, 0) / DATA.length) : 0;
|
||||
const totalPassed = DATA.reduce((s,r) => s + r.testsPassed, 0);
|
||||
const totalTests = DATA.reduce((s,r) => s + r.testsTotal, 0);
|
||||
const passRate = totalTests ? Math.round(totalPassed/totalTests*100) : 0;
|
||||
const bestModel = models.map(m => {
|
||||
const mrs = DATA.filter(r => r.model === m);
|
||||
return { model: m, avg: Math.round(mrs.reduce((s,r) => s + r.score, 0) / mrs.length) };
|
||||
}).sort((a,b) => b.avg - a.avg)[0];
|
||||
const fastestModel = models.map(m => {
|
||||
const mrs = DATA.filter(r => r.model === m);
|
||||
return { model: m, speed: Math.round(mrs.reduce((s,r) => s + r.avgTokPerSec, 0) / mrs.length) };
|
||||
}).sort((a,b) => b.speed - a.speed)[0];
|
||||
|
||||
document.getElementById('cards').innerHTML = `
|
||||
<div class="card"><div class="label">Keskiarvo</div><div class="value">${starsFor(avgScore)}</div><div class="sub">${avgScore} pistetta</div></div>
|
||||
<div class="card"><div class="label">Testien läpäisy</div><div class="value">${passRate}%</div><div class="sub">${totalPassed}/${totalTests} testiä</div></div>
|
||||
<div class="card"><div class="label">Paras malli</div><div class="value" style="font-size:1.2rem">${bestModel?.model || '-'}</div><div class="sub">${bestModel?.avg || 0}p</div></div>
|
||||
<div class="card"><div class="label">Nopein</div><div class="value" style="font-size:1.2rem">${fastestModel?.model || '-'}</div><div class="sub">${fastestModel?.speed || 0} tok/s</div></div>
|
||||
<div class="card"><div class="label">Malleja</div><div class="value">${models.length}</div><div class="sub">${scenarios.length} skenaariota</div></div>
|
||||
<div class="card"><div class="label">Kokonaisaika</div><div class="value">${(totalTime/1000/60).toFixed(1)}</div><div class="sub">minuuttia</div></div>
|
||||
`;
|
||||
|
||||
// Summary table
|
||||
const sumHead = document.querySelector('#summary-table thead');
|
||||
const sumBody = document.querySelector('#summary-table tbody');
|
||||
sumHead.innerHTML = '<tr><th>Malli</th>' + scenarios.map(s => `<th>${s}</th>`).join('') + '<th>Yht.</th><th>Out tok</th><th>Aika</th><th>tok/s</th><th>Pisteet</th></tr>';
|
||||
|
||||
const modelRows = models.map(m => {
|
||||
const mrs = DATA.filter(r => r.model === m);
|
||||
const tp = mrs.reduce((s,r) => s + r.testsPassed, 0);
|
||||
const tt = mrs.reduce((s,r) => s + r.testsTotal, 0);
|
||||
const tok = mrs.reduce((s,r) => s + r.totalTokens, 0);
|
||||
const time = mrs.reduce((s,r) => s + r.totalDurationMs, 0);
|
||||
const speed = Math.round(mrs.reduce((s,r) => s + r.avgTokPerSec, 0) / mrs.length);
|
||||
const avg = Math.round(mrs.reduce((s,r) => s + r.score, 0) / mrs.length);
|
||||
const scenCols = scenarios.map(s => {
|
||||
const r = mrs.find(r => r.scenario === s);
|
||||
if (!r) return '<td>-</td>';
|
||||
return `<td class="${cls(r)}">${pctBar(r.testsPassed, r.testsTotal, 60)} <span style="color:var(--dim)">${(r.totalDurationMs/1000).toFixed(0)}s</span></td>`;
|
||||
}).join('');
|
||||
return { avg, html: `<tr><td class="model-name">${m}</td>${scenCols}<td>${pctBar(tp, tt)}</td><td>${(tok/1000).toFixed(1)}K</td><td>${(time/1000).toFixed(0)}s</td><td>${speed}</td><td><span class="stars">${starsFor(avg)}</span> ${avg}p</td></tr>` };
|
||||
}).sort((a,b) => b.avg - a.avg);
|
||||
sumBody.innerHTML = modelRows.map(r => r.html).join('');
|
||||
|
||||
// Results table
|
||||
const resHead = document.querySelector('#results-table thead');
|
||||
const resBody = document.querySelector('#results-table tbody');
|
||||
const resCols = ['Malli','Skenaario','Speksi','Testit','Korjaus','Ctx','Out tok','Aika','tok/s','Pisteet'];
|
||||
resHead.innerHTML = '<tr>' + resCols.map((c,i) => `<th data-col="${i}">${c}</th>`).join('') + '</tr>';
|
||||
|
||||
let sortCol = 9, sortAsc = false;
|
||||
function renderResults() {
|
||||
const sorted = [...DATA].sort((a,b) => {
|
||||
const vals = [
|
||||
[a.model, b.model],
|
||||
[a.scenario, b.scenario],
|
||||
[a.specEntities, b.specEntities],
|
||||
[a.testsPassed/Math.max(a.testsTotal,1), b.testsPassed/Math.max(b.testsTotal,1)],
|
||||
[a.fixRounds, b.fixRounds],
|
||||
[a.promptTokensEst, b.promptTokensEst],
|
||||
[a.totalTokens, b.totalTokens],
|
||||
[a.totalDurationMs, b.totalDurationMs],
|
||||
[a.avgTokPerSec, b.avgTokPerSec],
|
||||
[a.score, b.score],
|
||||
][sortCol];
|
||||
const cmp = typeof vals[0] === 'string' ? vals[0].localeCompare(vals[1]) : vals[0] - vals[1];
|
||||
return sortAsc ? cmp : -cmp;
|
||||
});
|
||||
resBody.innerHTML = sorted.map(r => {
|
||||
const c = cls(r);
|
||||
return `<tr>
|
||||
<td class="model-name">${r.model}</td>
|
||||
<td>${r.scenario}</td>
|
||||
<td>${r.specOk ? `✓ ${r.specEntities}e` : '<span class="fail">✗</span>'}</td>
|
||||
<td class="${c}">${pctBar(r.testsPassed, r.testsTotal)}</td>
|
||||
<td>${r.fixRounds > 0 ? r.fixRounds + '×' : '-'}</td>
|
||||
<td>${r.promptTokensEst > 0 ? '~'+(r.promptTokensEst/1000).toFixed(1)+'K' : '-'}</td>
|
||||
<td>${r.totalTokens > 0 ? (r.totalTokens/1000).toFixed(1)+'K' : '-'}</td>
|
||||
<td>${(r.totalDurationMs/1000).toFixed(0)}s</td>
|
||||
<td>${r.avgTokPerSec.toFixed(0)}</td>
|
||||
<td><span class="stars">${r.stars}</span> ${r.score}p</td>
|
||||
</tr>`;
|
||||
}).join('');
|
||||
document.querySelectorAll('#results-table th').forEach((th,i) => {
|
||||
th.className = i === sortCol ? (sortAsc ? 'sorted-asc' : 'sorted-desc') : '';
|
||||
});
|
||||
}
|
||||
document.querySelector('#results-table thead').addEventListener('click', e => {
|
||||
const col = parseInt(e.target.dataset.col);
|
||||
if (isNaN(col)) return;
|
||||
if (sortCol === col) sortAsc = !sortAsc;
|
||||
else { sortCol = col; sortAsc = false; }
|
||||
renderResults();
|
||||
});
|
||||
renderResults();
|
||||
</script>
|
||||
</body>
|
||||
</html>
|
||||
183
kipina-codebench/results/2026-04-14T06-49.html
Normal file
183
kipina-codebench/results/2026-04-14T06-49.html
Normal file
File diff suppressed because one or more lines are too long
422
kipina-codebench/results/2026-04-14T06-49.json
Normal file
422
kipina-codebench/results/2026-04-14T06-49.json
Normal file
@@ -0,0 +1,422 @@
|
||||
[
|
||||
{
|
||||
"model": "qwen3.5:9b",
|
||||
"scenario": "todo",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 3,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 1,
|
||||
"testsPassed": 0,
|
||||
"testsFailed": 1,
|
||||
"totalDurationMs": 65901,
|
||||
"totalTokens": 5056,
|
||||
"avgTokPerSec": 82.99139473832963,
|
||||
"promptChars": 12334,
|
||||
"promptTokensEst": 3084,
|
||||
"score": 40,
|
||||
"stars": "★★☆☆☆",
|
||||
"error": null
|
||||
},
|
||||
{
|
||||
"model": "qwen3.5:9b",
|
||||
"scenario": "users",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 2,
|
||||
"validationIssues": 1,
|
||||
"fixRounds": 2,
|
||||
"testsTotal": 1,
|
||||
"testsPassed": 0,
|
||||
"testsFailed": 1,
|
||||
"totalDurationMs": 74087,
|
||||
"totalTokens": 5645,
|
||||
"avgTokPerSec": 83.57073831360164,
|
||||
"promptChars": 10757,
|
||||
"promptTokensEst": 2689,
|
||||
"score": 20,
|
||||
"stars": "★☆☆☆☆",
|
||||
"error": null
|
||||
},
|
||||
{
|
||||
"model": "qwen3.5:9b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 2,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 1,
|
||||
"testsPassed": 0,
|
||||
"testsFailed": 1,
|
||||
"totalDurationMs": 49830,
|
||||
"totalTokens": 3803,
|
||||
"avgTokPerSec": 83.26266260763309,
|
||||
"promptChars": 10826,
|
||||
"promptTokensEst": 2707,
|
||||
"score": 40,
|
||||
"stars": "★★☆☆☆",
|
||||
"error": null
|
||||
},
|
||||
{
|
||||
"model": "gemma4:e4b",
|
||||
"scenario": "todo",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 2,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 1,
|
||||
"testsPassed": 0,
|
||||
"testsFailed": 1,
|
||||
"totalDurationMs": 57032,
|
||||
"totalTokens": 4924,
|
||||
"avgTokPerSec": 106.02334905805122,
|
||||
"promptChars": 11313,
|
||||
"promptTokensEst": 2828,
|
||||
"score": 40,
|
||||
"stars": "★★☆☆☆",
|
||||
"error": null
|
||||
},
|
||||
{
|
||||
"model": "gemma4:e4b",
|
||||
"scenario": "users",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 7,
|
||||
"testsPassed": 5,
|
||||
"testsFailed": 2,
|
||||
"totalDurationMs": 54307,
|
||||
"totalTokens": 5060,
|
||||
"avgTokPerSec": 106.89447491163497,
|
||||
"promptChars": 11225,
|
||||
"promptTokensEst": 2806,
|
||||
"score": 83,
|
||||
"stars": "★★★★☆",
|
||||
"error": null
|
||||
},
|
||||
{
|
||||
"model": "gemma4:e4b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 2,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 11,
|
||||
"testsPassed": 2,
|
||||
"testsFailed": 9,
|
||||
"totalDurationMs": 57080,
|
||||
"totalTokens": 5310,
|
||||
"avgTokPerSec": 106.64914988130955,
|
||||
"promptChars": 11791,
|
||||
"promptTokensEst": 2948,
|
||||
"score": 51,
|
||||
"stars": "★★★☆☆",
|
||||
"error": null
|
||||
},
|
||||
{
|
||||
"model": "qwen2.5-coder:3b",
|
||||
"scenario": "todo",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 3,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 1,
|
||||
"testsPassed": 0,
|
||||
"testsFailed": 1,
|
||||
"totalDurationMs": 22377,
|
||||
"totalTokens": 3534,
|
||||
"avgTokPerSec": 201.24475679283708,
|
||||
"promptChars": 11479,
|
||||
"promptTokensEst": 2870,
|
||||
"score": 40,
|
||||
"stars": "★★☆☆☆",
|
||||
"error": null
|
||||
},
|
||||
{
|
||||
"model": "qwen2.5-coder:3b",
|
||||
"scenario": "users",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 2,
|
||||
"validationIssues": 8,
|
||||
"fixRounds": 2,
|
||||
"testsTotal": 1,
|
||||
"testsPassed": 0,
|
||||
"testsFailed": 1,
|
||||
"totalDurationMs": 44520,
|
||||
"totalTokens": 7495,
|
||||
"avgTokPerSec": 201.87149050701015,
|
||||
"promptChars": 11886,
|
||||
"promptTokensEst": 2972,
|
||||
"score": 20,
|
||||
"stars": "★☆☆☆☆",
|
||||
"error": null
|
||||
},
|
||||
{
|
||||
"model": "qwen2.5-coder:3b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 2,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 1,
|
||||
"testsPassed": 0,
|
||||
"testsFailed": 1,
|
||||
"totalDurationMs": 20136,
|
||||
"totalTokens": 3338,
|
||||
"avgTokPerSec": 200.86152095722105,
|
||||
"promptChars": 11228,
|
||||
"promptTokensEst": 2807,
|
||||
"score": 40,
|
||||
"stars": "★★☆☆☆",
|
||||
"error": null
|
||||
},
|
||||
{
|
||||
"model": "qwen2.5-coder:7b",
|
||||
"scenario": "todo",
|
||||
"reqOk": true,
|
||||
"specOk": false,
|
||||
"specEntities": 0,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 0,
|
||||
"testsPassed": 0,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 0,
|
||||
"totalTokens": 0,
|
||||
"avgTokPerSec": 0,
|
||||
"promptChars": 0,
|
||||
"promptTokensEst": 0,
|
||||
"score": 0,
|
||||
"stars": "",
|
||||
"error": "JSON-speksi epäonnistui"
|
||||
},
|
||||
{
|
||||
"model": "qwen2.5-coder:7b",
|
||||
"scenario": "users",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 2,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 1,
|
||||
"testsPassed": 0,
|
||||
"testsFailed": 1,
|
||||
"totalDurationMs": 20012,
|
||||
"totalTokens": 2119,
|
||||
"avgTokPerSec": 122.7557304112134,
|
||||
"promptChars": 10342,
|
||||
"promptTokensEst": 2586,
|
||||
"score": 40,
|
||||
"stars": "★★☆☆☆",
|
||||
"error": null
|
||||
},
|
||||
{
|
||||
"model": "qwen2.5-coder:7b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 2,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 1,
|
||||
"testsPassed": 0,
|
||||
"testsFailed": 1,
|
||||
"totalDurationMs": 26133,
|
||||
"totalTokens": 2715,
|
||||
"avgTokPerSec": 121.94987205993503,
|
||||
"promptChars": 11193,
|
||||
"promptTokensEst": 2798,
|
||||
"score": 40,
|
||||
"stars": "★★☆☆☆",
|
||||
"error": null
|
||||
},
|
||||
{
|
||||
"model": "qwen3:14b",
|
||||
"scenario": "todo",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 6,
|
||||
"testsPassed": 6,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 44757,
|
||||
"totalTokens": 2156,
|
||||
"avgTokPerSec": 60.77636586631207,
|
||||
"promptChars": 9635,
|
||||
"promptTokensEst": 2409,
|
||||
"score": 100,
|
||||
"stars": "★★★★★",
|
||||
"error": null
|
||||
},
|
||||
{
|
||||
"model": "qwen3:14b",
|
||||
"scenario": "users",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 7,
|
||||
"testsPassed": 7,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 41166,
|
||||
"totalTokens": 2282,
|
||||
"avgTokPerSec": 61.14821289733007,
|
||||
"promptChars": 9575,
|
||||
"promptTokensEst": 2394,
|
||||
"score": 100,
|
||||
"stars": "★★★★★",
|
||||
"error": null
|
||||
},
|
||||
{
|
||||
"model": "qwen3:14b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 2,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 12,
|
||||
"testsPassed": 12,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 66478,
|
||||
"totalTokens": 3681,
|
||||
"avgTokPerSec": 60.493817783668725,
|
||||
"promptChars": 10500,
|
||||
"promptTokensEst": 2625,
|
||||
"score": 100,
|
||||
"stars": "★★★★★",
|
||||
"error": null
|
||||
},
|
||||
{
|
||||
"model": "qwen3:8b",
|
||||
"scenario": "todo",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 7,
|
||||
"testsPassed": 7,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 29801,
|
||||
"totalTokens": 2249,
|
||||
"avgTokPerSec": 98.5661742189331,
|
||||
"promptChars": 9615,
|
||||
"promptTokensEst": 2404,
|
||||
"score": 100,
|
||||
"stars": "★★★★★",
|
||||
"error": null
|
||||
},
|
||||
{
|
||||
"model": "qwen3:8b",
|
||||
"scenario": "users",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 8,
|
||||
"testsPassed": 6,
|
||||
"testsFailed": 2,
|
||||
"totalDurationMs": 22974,
|
||||
"totalTokens": 2050,
|
||||
"avgTokPerSec": 101.2398768597589,
|
||||
"promptChars": 9273,
|
||||
"promptTokensEst": 2318,
|
||||
"score": 85,
|
||||
"stars": "★★★★☆",
|
||||
"error": null
|
||||
},
|
||||
{
|
||||
"model": "qwen3:8b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 2,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 12,
|
||||
"testsPassed": 12,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 39335,
|
||||
"totalTokens": 3537,
|
||||
"avgTokPerSec": 100.10984073540648,
|
||||
"promptChars": 10525,
|
||||
"promptTokensEst": 2631,
|
||||
"score": 100,
|
||||
"stars": "★★★★★",
|
||||
"error": null
|
||||
},
|
||||
{
|
||||
"model": "qwen3:4b",
|
||||
"scenario": "todo",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 6,
|
||||
"testsPassed": 6,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 58668,
|
||||
"totalTokens": 7134,
|
||||
"avgTokPerSec": 141.76822189196028,
|
||||
"promptChars": 15202,
|
||||
"promptTokensEst": 3801,
|
||||
"score": 100,
|
||||
"stars": "★★★★★",
|
||||
"error": null
|
||||
},
|
||||
{
|
||||
"model": "qwen3:4b",
|
||||
"scenario": "users",
|
||||
"reqOk": true,
|
||||
"specOk": false,
|
||||
"specEntities": 0,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 0,
|
||||
"testsPassed": 0,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 0,
|
||||
"totalTokens": 0,
|
||||
"avgTokPerSec": 0,
|
||||
"promptChars": 0,
|
||||
"promptTokensEst": 0,
|
||||
"score": 0,
|
||||
"stars": "",
|
||||
"error": "JSON-speksi epäonnistui"
|
||||
},
|
||||
{
|
||||
"model": "qwen3:4b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": false,
|
||||
"specEntities": 0,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 0,
|
||||
"testsPassed": 0,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 0,
|
||||
"totalTokens": 0,
|
||||
"avgTokPerSec": 0,
|
||||
"promptChars": 0,
|
||||
"promptTokensEst": 0,
|
||||
"score": 0,
|
||||
"stars": "",
|
||||
"error": "JSON-speksi epäonnistui"
|
||||
}
|
||||
]
|
||||
183
kipina-codebench/results/2026-04-14T07-13.html
Normal file
183
kipina-codebench/results/2026-04-14T07-13.html
Normal file
@@ -0,0 +1,183 @@
|
||||
<!DOCTYPE html>
|
||||
<html lang="fi">
|
||||
<head>
|
||||
<meta charset="UTF-8">
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
||||
<title>Kipina Model Benchmark</title>
|
||||
<style>
|
||||
:root { --bg: #0d1117; --card: #161b22; --border: #30363d; --text: #e6edf3; --dim: #8b949e; --green: #3fb950; --yellow: #d29922; --red: #f85149; --blue: #58a6ff; }
|
||||
* { box-sizing: border-box; margin: 0; padding: 0; }
|
||||
body { font-family: -apple-system, 'Segoe UI', Helvetica, Arial, sans-serif; background: var(--bg); color: var(--text); padding: 2rem; max-width: 1400px; margin: 0 auto; }
|
||||
h1 { font-size: 1.5rem; margin-bottom: 0.5rem; }
|
||||
.meta { color: var(--dim); font-size: 0.85rem; margin-bottom: 2rem; }
|
||||
.cards { display: grid; grid-template-columns: repeat(auto-fit, minmax(180px, 1fr)); gap: 1rem; margin-bottom: 2rem; }
|
||||
.card { background: var(--card); border: 1px solid var(--border); border-radius: 8px; padding: 1rem; }
|
||||
.card .label { color: var(--dim); font-size: 0.75rem; text-transform: uppercase; letter-spacing: 0.05em; }
|
||||
.card .value { font-size: 1.8rem; font-weight: 600; margin-top: 0.25rem; }
|
||||
.card .sub { color: var(--dim); font-size: 0.8rem; margin-top: 0.25rem; }
|
||||
table { width: 100%; border-collapse: collapse; background: var(--card); border: 1px solid var(--border); border-radius: 8px; overflow: hidden; margin-bottom: 2rem; }
|
||||
th { background: #1c2128; text-align: left; padding: 0.6rem 0.8rem; font-size: 0.75rem; text-transform: uppercase; letter-spacing: 0.05em; color: var(--dim); cursor: pointer; user-select: none; white-space: nowrap; }
|
||||
th:hover { color: var(--text); }
|
||||
th.sorted-asc::after { content: ' ▲'; }
|
||||
th.sorted-desc::after { content: ' ▼'; }
|
||||
td { padding: 0.5rem 0.8rem; border-top: 1px solid var(--border); font-size: 0.85rem; white-space: nowrap; }
|
||||
tr:hover td { background: #1c2128; }
|
||||
.pass { color: var(--green); }
|
||||
.partial { color: var(--yellow); }
|
||||
.fail { color: var(--red); }
|
||||
.stars { letter-spacing: 1px; }
|
||||
.bar { display: inline-block; height: 8px; border-radius: 4px; vertical-align: middle; }
|
||||
.bar-bg { background: var(--border); }
|
||||
.bar-fill { background: var(--green); }
|
||||
.bar-partial { background: var(--yellow); }
|
||||
.model-name { font-weight: 600; }
|
||||
h2 { font-size: 1.1rem; margin-bottom: 1rem; color: var(--dim); }
|
||||
.summary-table th:first-child, .summary-table td:first-child { min-width: 200px; }
|
||||
</style>
|
||||
</head>
|
||||
<body>
|
||||
|
||||
<h1>Kipina Model Benchmark</h1>
|
||||
<div class="meta" id="meta"></div>
|
||||
|
||||
<div class="cards" id="cards"></div>
|
||||
|
||||
<h2>Mallikohtainen yhteenveto</h2>
|
||||
<table class="summary-table" id="summary-table"><thead></thead><tbody></tbody></table>
|
||||
|
||||
<h2>Kaikki tulokset</h2>
|
||||
<table id="results-table"><thead></thead><tbody></tbody></table>
|
||||
|
||||
<script>
|
||||
const RAW = [{"model":"qwen3:14b","scenario":"todo","reqOk":true,"specOk":true,"specEntities":2,"validationIssues":0,"fixRounds":0,"testsTotal":1,"testsPassed":0,"testsFailed":1,"totalDurationMs":186642,"totalTokens":10237,"avgTokPerSec":59.06411550065281,"promptChars":10576,"promptTokensEst":2644,"score":40,"stars":"★★☆☆☆","error":null},{"model":"qwen3:14b","scenario":"users","reqOk":true,"specOk":true,"specEntities":1,"validationIssues":0,"fixRounds":0,"testsTotal":1,"testsPassed":0,"testsFailed":1,"totalDurationMs":121848,"totalTokens":6735,"avgTokPerSec":59.85231850668119,"promptChars":9684,"promptTokensEst":2421,"score":40,"stars":"★★☆☆☆","error":null},{"model":"qwen3:14b","scenario":"blog","reqOk":true,"specOk":true,"specEntities":2,"validationIssues":0,"fixRounds":0,"testsTotal":11,"testsPassed":9,"testsFailed":2,"totalDurationMs":83491,"totalTokens":4677,"avgTokPerSec":60.222832434869694,"promptChars":10423,"promptTokensEst":2606,"score":89,"stars":"★★★★☆","error":null},{"model":"qwen3:8b","scenario":"todo","reqOk":true,"specOk":true,"specEntities":1,"validationIssues":0,"fixRounds":0,"testsTotal":6,"testsPassed":6,"testsFailed":0,"totalDurationMs":56288,"totalTokens":5235,"avgTokPerSec":99.60027546406452,"promptChars":9307,"promptTokensEst":2327,"score":100,"stars":"★★★★★","error":null},{"model":"qwen3:8b","scenario":"users","reqOk":true,"specOk":true,"specEntities":1,"validationIssues":0,"fixRounds":0,"testsTotal":6,"testsPassed":5,"testsFailed":1,"totalDurationMs":59639,"totalTokens":5526,"avgTokPerSec":99.6742208632186,"promptChars":9158,"promptTokensEst":2290,"score":90,"stars":"★★★★★","error":null},{"model":"qwen3:8b","scenario":"blog","reqOk":true,"specOk":true,"specEntities":2,"validationIssues":0,"fixRounds":0,"testsTotal":11,"testsPassed":10,"testsFailed":1,"totalDurationMs":131793,"totalTokens":11779,"avgTokPerSec":97.17878362853351,"promptChars":10390,"promptTokensEst":2598,"score":95,"stars":"★★★★★","error":null}];
|
||||
|
||||
const starsFor = s => s >= 90 ? '★★★★★' : s >= 70 ? '★★★★☆' : s >= 50 ? '★★★☆☆' : s >= 25 ? '★★☆☆☆' : s > 0 ? '★☆☆☆☆' : '☆☆☆☆☆';
|
||||
function calcScore(r) {
|
||||
if (r.error && r.testsTotal === 0) return 0;
|
||||
let s = 0;
|
||||
if (r.specOk) s += 10;
|
||||
if (!r.error || r.testsTotal > 0) s += 10;
|
||||
if (r.testsTotal > 0) s += Math.round((r.testsPassed / r.testsTotal) * 60);
|
||||
s += Math.max(0, 20 - (r.fixRounds || 0) * 10);
|
||||
return Math.min(100, s);
|
||||
}
|
||||
// Laske pisteet jos puuttuvat
|
||||
const DATA = RAW.map(r => {
|
||||
if (r.score == null) r.score = calcScore(r);
|
||||
if (!r.stars) r.stars = starsFor(r.score);
|
||||
if (!r.promptTokensEst) r.promptTokensEst = r.promptChars ? Math.round(r.promptChars / 4) : 0;
|
||||
return r;
|
||||
});
|
||||
const cls = r => (!r.error && r.testsPassed === r.testsTotal && r.testsTotal > 0) ? 'pass' : (r.testsTotal > 0 && r.testsPassed > 0) ? 'partial' : 'fail';
|
||||
const pctBar = (passed, total, w=80) => {
|
||||
if (total === 0) return '-';
|
||||
const pct = passed/total*100;
|
||||
const c = pct === 100 ? 'bar-fill' : 'bar-partial';
|
||||
return `<span class="bar bar-bg" style="width:${w}px"><span class="bar ${c}" style="width:${Math.round(pct/100*w)}px"></span></span> ${passed}/${total}`;
|
||||
};
|
||||
|
||||
// Meta
|
||||
const totalTime = DATA.reduce((s,r) => s + r.totalDurationMs, 0);
|
||||
document.getElementById('meta').textContent = `${new Date().toLocaleDateString('fi-FI')} — ${DATA.length} ajoa — ${(totalTime/1000/60).toFixed(1)} min`;
|
||||
|
||||
// Cards
|
||||
const models = [...new Set(DATA.map(r => r.model))];
|
||||
const scenarios = [...new Set(DATA.map(r => r.scenario))];
|
||||
const avgScore = DATA.length ? Math.round(DATA.reduce((s,r) => s + r.score, 0) / DATA.length) : 0;
|
||||
const totalPassed = DATA.reduce((s,r) => s + r.testsPassed, 0);
|
||||
const totalTests = DATA.reduce((s,r) => s + r.testsTotal, 0);
|
||||
const passRate = totalTests ? Math.round(totalPassed/totalTests*100) : 0;
|
||||
const bestModel = models.map(m => {
|
||||
const mrs = DATA.filter(r => r.model === m);
|
||||
return { model: m, avg: Math.round(mrs.reduce((s,r) => s + r.score, 0) / mrs.length) };
|
||||
}).sort((a,b) => b.avg - a.avg)[0];
|
||||
const fastestModel = models.map(m => {
|
||||
const mrs = DATA.filter(r => r.model === m);
|
||||
return { model: m, speed: Math.round(mrs.reduce((s,r) => s + r.avgTokPerSec, 0) / mrs.length) };
|
||||
}).sort((a,b) => b.speed - a.speed)[0];
|
||||
|
||||
document.getElementById('cards').innerHTML = `
|
||||
<div class="card"><div class="label">Keskiarvo</div><div class="value">${starsFor(avgScore)}</div><div class="sub">${avgScore} pistetta</div></div>
|
||||
<div class="card"><div class="label">Testien läpäisy</div><div class="value">${passRate}%</div><div class="sub">${totalPassed}/${totalTests} testiä</div></div>
|
||||
<div class="card"><div class="label">Paras malli</div><div class="value" style="font-size:1.2rem">${bestModel?.model || '-'}</div><div class="sub">${bestModel?.avg || 0}p</div></div>
|
||||
<div class="card"><div class="label">Nopein</div><div class="value" style="font-size:1.2rem">${fastestModel?.model || '-'}</div><div class="sub">${fastestModel?.speed || 0} tok/s</div></div>
|
||||
<div class="card"><div class="label">Malleja</div><div class="value">${models.length}</div><div class="sub">${scenarios.length} skenaariota</div></div>
|
||||
<div class="card"><div class="label">Kokonaisaika</div><div class="value">${(totalTime/1000/60).toFixed(1)}</div><div class="sub">minuuttia</div></div>
|
||||
`;
|
||||
|
||||
// Summary table
|
||||
const sumHead = document.querySelector('#summary-table thead');
|
||||
const sumBody = document.querySelector('#summary-table tbody');
|
||||
sumHead.innerHTML = '<tr><th>Malli</th>' + scenarios.map(s => `<th>${s}</th>`).join('') + '<th>Yht.</th><th>Out tok</th><th>Aika</th><th>tok/s</th><th>Pisteet</th></tr>';
|
||||
|
||||
const modelRows = models.map(m => {
|
||||
const mrs = DATA.filter(r => r.model === m);
|
||||
const tp = mrs.reduce((s,r) => s + r.testsPassed, 0);
|
||||
const tt = mrs.reduce((s,r) => s + r.testsTotal, 0);
|
||||
const tok = mrs.reduce((s,r) => s + r.totalTokens, 0);
|
||||
const time = mrs.reduce((s,r) => s + r.totalDurationMs, 0);
|
||||
const speed = Math.round(mrs.reduce((s,r) => s + r.avgTokPerSec, 0) / mrs.length);
|
||||
const avg = Math.round(mrs.reduce((s,r) => s + r.score, 0) / mrs.length);
|
||||
const scenCols = scenarios.map(s => {
|
||||
const r = mrs.find(r => r.scenario === s);
|
||||
if (!r) return '<td>-</td>';
|
||||
return `<td class="${cls(r)}">${pctBar(r.testsPassed, r.testsTotal, 60)} <span style="color:var(--dim)">${(r.totalDurationMs/1000).toFixed(0)}s</span></td>`;
|
||||
}).join('');
|
||||
return { avg, html: `<tr><td class="model-name">${m}</td>${scenCols}<td>${pctBar(tp, tt)}</td><td>${(tok/1000).toFixed(1)}K</td><td>${(time/1000).toFixed(0)}s</td><td>${speed}</td><td><span class="stars">${starsFor(avg)}</span> ${avg}p</td></tr>` };
|
||||
}).sort((a,b) => b.avg - a.avg);
|
||||
sumBody.innerHTML = modelRows.map(r => r.html).join('');
|
||||
|
||||
// Results table
|
||||
const resHead = document.querySelector('#results-table thead');
|
||||
const resBody = document.querySelector('#results-table tbody');
|
||||
const resCols = ['Malli','Skenaario','Speksi','Testit','Korjaus','Ctx','Out tok','Aika','tok/s','Pisteet'];
|
||||
resHead.innerHTML = '<tr>' + resCols.map((c,i) => `<th data-col="${i}">${c}</th>`).join('') + '</tr>';
|
||||
|
||||
let sortCol = 9, sortAsc = false;
|
||||
function renderResults() {
|
||||
const sorted = [...DATA].sort((a,b) => {
|
||||
const vals = [
|
||||
[a.model, b.model],
|
||||
[a.scenario, b.scenario],
|
||||
[a.specEntities, b.specEntities],
|
||||
[a.testsPassed/Math.max(a.testsTotal,1), b.testsPassed/Math.max(b.testsTotal,1)],
|
||||
[a.fixRounds, b.fixRounds],
|
||||
[a.promptTokensEst, b.promptTokensEst],
|
||||
[a.totalTokens, b.totalTokens],
|
||||
[a.totalDurationMs, b.totalDurationMs],
|
||||
[a.avgTokPerSec, b.avgTokPerSec],
|
||||
[a.score, b.score],
|
||||
][sortCol];
|
||||
const cmp = typeof vals[0] === 'string' ? vals[0].localeCompare(vals[1]) : vals[0] - vals[1];
|
||||
return sortAsc ? cmp : -cmp;
|
||||
});
|
||||
resBody.innerHTML = sorted.map(r => {
|
||||
const c = cls(r);
|
||||
return `<tr>
|
||||
<td class="model-name">${r.model}</td>
|
||||
<td>${r.scenario}</td>
|
||||
<td>${r.specOk ? `✓ ${r.specEntities}e` : '<span class="fail">✗</span>'}</td>
|
||||
<td class="${c}">${pctBar(r.testsPassed, r.testsTotal)}</td>
|
||||
<td>${r.fixRounds > 0 ? r.fixRounds + '×' : '-'}</td>
|
||||
<td>${r.promptTokensEst > 0 ? '~'+(r.promptTokensEst/1000).toFixed(1)+'K' : '-'}</td>
|
||||
<td>${r.totalTokens > 0 ? (r.totalTokens/1000).toFixed(1)+'K' : '-'}</td>
|
||||
<td>${(r.totalDurationMs/1000).toFixed(0)}s</td>
|
||||
<td>${r.avgTokPerSec.toFixed(0)}</td>
|
||||
<td><span class="stars">${r.stars}</span> ${r.score}p</td>
|
||||
</tr>`;
|
||||
}).join('');
|
||||
document.querySelectorAll('#results-table th').forEach((th,i) => {
|
||||
th.className = i === sortCol ? (sortAsc ? 'sorted-asc' : 'sorted-desc') : '';
|
||||
});
|
||||
}
|
||||
document.querySelector('#results-table thead').addEventListener('click', e => {
|
||||
const col = parseInt(e.target.dataset.col);
|
||||
if (isNaN(col)) return;
|
||||
if (sortCol === col) sortAsc = !sortAsc;
|
||||
else { sortCol = col; sortAsc = false; }
|
||||
renderResults();
|
||||
});
|
||||
renderResults();
|
||||
</script>
|
||||
</body>
|
||||
</html>
|
||||
122
kipina-codebench/results/2026-04-14T07-13.json
Normal file
122
kipina-codebench/results/2026-04-14T07-13.json
Normal file
@@ -0,0 +1,122 @@
|
||||
[
|
||||
{
|
||||
"model": "qwen3:14b",
|
||||
"scenario": "todo",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 2,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 1,
|
||||
"testsPassed": 0,
|
||||
"testsFailed": 1,
|
||||
"totalDurationMs": 186642,
|
||||
"totalTokens": 10237,
|
||||
"avgTokPerSec": 59.06411550065281,
|
||||
"promptChars": 10576,
|
||||
"promptTokensEst": 2644,
|
||||
"score": 40,
|
||||
"stars": "★★☆☆☆",
|
||||
"error": null
|
||||
},
|
||||
{
|
||||
"model": "qwen3:14b",
|
||||
"scenario": "users",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 1,
|
||||
"testsPassed": 0,
|
||||
"testsFailed": 1,
|
||||
"totalDurationMs": 121848,
|
||||
"totalTokens": 6735,
|
||||
"avgTokPerSec": 59.85231850668119,
|
||||
"promptChars": 9684,
|
||||
"promptTokensEst": 2421,
|
||||
"score": 40,
|
||||
"stars": "★★☆☆☆",
|
||||
"error": null
|
||||
},
|
||||
{
|
||||
"model": "qwen3:14b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 2,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 11,
|
||||
"testsPassed": 9,
|
||||
"testsFailed": 2,
|
||||
"totalDurationMs": 83491,
|
||||
"totalTokens": 4677,
|
||||
"avgTokPerSec": 60.222832434869694,
|
||||
"promptChars": 10423,
|
||||
"promptTokensEst": 2606,
|
||||
"score": 89,
|
||||
"stars": "★★★★☆",
|
||||
"error": null
|
||||
},
|
||||
{
|
||||
"model": "qwen3:8b",
|
||||
"scenario": "todo",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 6,
|
||||
"testsPassed": 6,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 56288,
|
||||
"totalTokens": 5235,
|
||||
"avgTokPerSec": 99.60027546406452,
|
||||
"promptChars": 9307,
|
||||
"promptTokensEst": 2327,
|
||||
"score": 100,
|
||||
"stars": "★★★★★",
|
||||
"error": null
|
||||
},
|
||||
{
|
||||
"model": "qwen3:8b",
|
||||
"scenario": "users",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 6,
|
||||
"testsPassed": 5,
|
||||
"testsFailed": 1,
|
||||
"totalDurationMs": 59639,
|
||||
"totalTokens": 5526,
|
||||
"avgTokPerSec": 99.6742208632186,
|
||||
"promptChars": 9158,
|
||||
"promptTokensEst": 2290,
|
||||
"score": 90,
|
||||
"stars": "★★★★★",
|
||||
"error": null
|
||||
},
|
||||
{
|
||||
"model": "qwen3:8b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 2,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 11,
|
||||
"testsPassed": 10,
|
||||
"testsFailed": 1,
|
||||
"totalDurationMs": 131793,
|
||||
"totalTokens": 11779,
|
||||
"avgTokPerSec": 97.17878362853351,
|
||||
"promptChars": 10390,
|
||||
"promptTokensEst": 2598,
|
||||
"score": 95,
|
||||
"stars": "★★★★★",
|
||||
"error": null
|
||||
}
|
||||
]
|
||||
183
kipina-codebench/results/2026-04-14T07-18.html
Normal file
183
kipina-codebench/results/2026-04-14T07-18.html
Normal file
@@ -0,0 +1,183 @@
|
||||
<!DOCTYPE html>
|
||||
<html lang="fi">
|
||||
<head>
|
||||
<meta charset="UTF-8">
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
||||
<title>Kipina Model Benchmark</title>
|
||||
<style>
|
||||
:root { --bg: #0d1117; --card: #161b22; --border: #30363d; --text: #e6edf3; --dim: #8b949e; --green: #3fb950; --yellow: #d29922; --red: #f85149; --blue: #58a6ff; }
|
||||
* { box-sizing: border-box; margin: 0; padding: 0; }
|
||||
body { font-family: -apple-system, 'Segoe UI', Helvetica, Arial, sans-serif; background: var(--bg); color: var(--text); padding: 2rem; max-width: 1400px; margin: 0 auto; }
|
||||
h1 { font-size: 1.5rem; margin-bottom: 0.5rem; }
|
||||
.meta { color: var(--dim); font-size: 0.85rem; margin-bottom: 2rem; }
|
||||
.cards { display: grid; grid-template-columns: repeat(auto-fit, minmax(180px, 1fr)); gap: 1rem; margin-bottom: 2rem; }
|
||||
.card { background: var(--card); border: 1px solid var(--border); border-radius: 8px; padding: 1rem; }
|
||||
.card .label { color: var(--dim); font-size: 0.75rem; text-transform: uppercase; letter-spacing: 0.05em; }
|
||||
.card .value { font-size: 1.8rem; font-weight: 600; margin-top: 0.25rem; }
|
||||
.card .sub { color: var(--dim); font-size: 0.8rem; margin-top: 0.25rem; }
|
||||
table { width: 100%; border-collapse: collapse; background: var(--card); border: 1px solid var(--border); border-radius: 8px; overflow: hidden; margin-bottom: 2rem; }
|
||||
th { background: #1c2128; text-align: left; padding: 0.6rem 0.8rem; font-size: 0.75rem; text-transform: uppercase; letter-spacing: 0.05em; color: var(--dim); cursor: pointer; user-select: none; white-space: nowrap; }
|
||||
th:hover { color: var(--text); }
|
||||
th.sorted-asc::after { content: ' ▲'; }
|
||||
th.sorted-desc::after { content: ' ▼'; }
|
||||
td { padding: 0.5rem 0.8rem; border-top: 1px solid var(--border); font-size: 0.85rem; white-space: nowrap; }
|
||||
tr:hover td { background: #1c2128; }
|
||||
.pass { color: var(--green); }
|
||||
.partial { color: var(--yellow); }
|
||||
.fail { color: var(--red); }
|
||||
.stars { letter-spacing: 1px; }
|
||||
.bar { display: inline-block; height: 8px; border-radius: 4px; vertical-align: middle; }
|
||||
.bar-bg { background: var(--border); }
|
||||
.bar-fill { background: var(--green); }
|
||||
.bar-partial { background: var(--yellow); }
|
||||
.model-name { font-weight: 600; }
|
||||
h2 { font-size: 1.1rem; margin-bottom: 1rem; color: var(--dim); }
|
||||
.summary-table th:first-child, .summary-table td:first-child { min-width: 200px; }
|
||||
</style>
|
||||
</head>
|
||||
<body>
|
||||
|
||||
<h1>Kipina Model Benchmark</h1>
|
||||
<div class="meta" id="meta"></div>
|
||||
|
||||
<div class="cards" id="cards"></div>
|
||||
|
||||
<h2>Mallikohtainen yhteenveto</h2>
|
||||
<table class="summary-table" id="summary-table"><thead></thead><tbody></tbody></table>
|
||||
|
||||
<h2>Kaikki tulokset</h2>
|
||||
<table id="results-table"><thead></thead><tbody></tbody></table>
|
||||
|
||||
<script>
|
||||
const RAW = [{"model":"qwen3:14b","scenario":"todo","reqOk":true,"specOk":true,"specEntities":1,"validationIssues":0,"fixRounds":0,"testsTotal":1,"testsPassed":0,"testsFailed":1,"totalDurationMs":66903,"totalTokens":5454,"avgTokPerSec":86.45918994499432,"promptChars":9985,"promptTokensEst":2496,"score":40,"stars":"★★☆☆☆","error":null},{"model":"qwen3:14b","scenario":"users","reqOk":true,"specOk":true,"specEntities":1,"validationIssues":0,"fixRounds":0,"testsTotal":1,"testsPassed":0,"testsFailed":1,"totalDurationMs":87618,"totalTokens":7150,"avgTokPerSec":87.21782190501095,"promptChars":9922,"promptTokensEst":2481,"score":40,"stars":"★★☆☆☆","error":null},{"model":"qwen3:14b","scenario":"blog","reqOk":true,"specOk":true,"specEntities":2,"validationIssues":0,"fixRounds":0,"testsTotal":9,"testsPassed":5,"testsFailed":4,"totalDurationMs":78398,"totalTokens":6427,"avgTokPerSec":85.52353711143463,"promptChars":10737,"promptTokensEst":2684,"score":73,"stars":"★★★★☆","error":null},{"model":"qwen3:8b","scenario":"todo","reqOk":true,"specOk":true,"specEntities":1,"validationIssues":0,"fixRounds":0,"testsTotal":8,"testsPassed":7,"testsFailed":1,"totalDurationMs":82750,"totalTokens":10054,"avgTokPerSec":139.90690936146032,"promptChars":9360,"promptTokensEst":2340,"score":93,"stars":"★★★★★","error":null},{"model":"qwen3:8b","scenario":"users","reqOk":true,"specOk":true,"specEntities":1,"validationIssues":0,"fixRounds":0,"testsTotal":6,"testsPassed":6,"testsFailed":0,"totalDurationMs":32233,"totalTokens":4404,"avgTokPerSec":143.4997404058814,"promptChars":9310,"promptTokensEst":2328,"score":100,"stars":"★★★★★","error":null},{"model":"qwen3:8b","scenario":"blog","reqOk":true,"specOk":true,"specEntities":2,"validationIssues":0,"fixRounds":0,"testsTotal":1,"testsPassed":0,"testsFailed":1,"totalDurationMs":88563,"totalTokens":11575,"avgTokPerSec":141.54675017528362,"promptChars":10567,"promptTokensEst":2642,"score":40,"stars":"★★☆☆☆","error":null}];
|
||||
|
||||
const starsFor = s => s >= 90 ? '★★★★★' : s >= 70 ? '★★★★☆' : s >= 50 ? '★★★☆☆' : s >= 25 ? '★★☆☆☆' : s > 0 ? '★☆☆☆☆' : '☆☆☆☆☆';
|
||||
function calcScore(r) {
|
||||
if (r.error && r.testsTotal === 0) return 0;
|
||||
let s = 0;
|
||||
if (r.specOk) s += 10;
|
||||
if (!r.error || r.testsTotal > 0) s += 10;
|
||||
if (r.testsTotal > 0) s += Math.round((r.testsPassed / r.testsTotal) * 60);
|
||||
s += Math.max(0, 20 - (r.fixRounds || 0) * 10);
|
||||
return Math.min(100, s);
|
||||
}
|
||||
// Laske pisteet jos puuttuvat
|
||||
const DATA = RAW.map(r => {
|
||||
if (r.score == null) r.score = calcScore(r);
|
||||
if (!r.stars) r.stars = starsFor(r.score);
|
||||
if (!r.promptTokensEst) r.promptTokensEst = r.promptChars ? Math.round(r.promptChars / 4) : 0;
|
||||
return r;
|
||||
});
|
||||
const cls = r => (!r.error && r.testsPassed === r.testsTotal && r.testsTotal > 0) ? 'pass' : (r.testsTotal > 0 && r.testsPassed > 0) ? 'partial' : 'fail';
|
||||
const pctBar = (passed, total, w=80) => {
|
||||
if (total === 0) return '-';
|
||||
const pct = passed/total*100;
|
||||
const c = pct === 100 ? 'bar-fill' : 'bar-partial';
|
||||
return `<span class="bar bar-bg" style="width:${w}px"><span class="bar ${c}" style="width:${Math.round(pct/100*w)}px"></span></span> ${passed}/${total}`;
|
||||
};
|
||||
|
||||
// Meta
|
||||
const totalTime = DATA.reduce((s,r) => s + r.totalDurationMs, 0);
|
||||
document.getElementById('meta').textContent = `${new Date().toLocaleDateString('fi-FI')} — ${DATA.length} ajoa — ${(totalTime/1000/60).toFixed(1)} min`;
|
||||
|
||||
// Cards
|
||||
const models = [...new Set(DATA.map(r => r.model))];
|
||||
const scenarios = [...new Set(DATA.map(r => r.scenario))];
|
||||
const avgScore = DATA.length ? Math.round(DATA.reduce((s,r) => s + r.score, 0) / DATA.length) : 0;
|
||||
const totalPassed = DATA.reduce((s,r) => s + r.testsPassed, 0);
|
||||
const totalTests = DATA.reduce((s,r) => s + r.testsTotal, 0);
|
||||
const passRate = totalTests ? Math.round(totalPassed/totalTests*100) : 0;
|
||||
const bestModel = models.map(m => {
|
||||
const mrs = DATA.filter(r => r.model === m);
|
||||
return { model: m, avg: Math.round(mrs.reduce((s,r) => s + r.score, 0) / mrs.length) };
|
||||
}).sort((a,b) => b.avg - a.avg)[0];
|
||||
const fastestModel = models.map(m => {
|
||||
const mrs = DATA.filter(r => r.model === m);
|
||||
return { model: m, speed: Math.round(mrs.reduce((s,r) => s + r.avgTokPerSec, 0) / mrs.length) };
|
||||
}).sort((a,b) => b.speed - a.speed)[0];
|
||||
|
||||
document.getElementById('cards').innerHTML = `
|
||||
<div class="card"><div class="label">Keskiarvo</div><div class="value">${starsFor(avgScore)}</div><div class="sub">${avgScore} pistetta</div></div>
|
||||
<div class="card"><div class="label">Testien läpäisy</div><div class="value">${passRate}%</div><div class="sub">${totalPassed}/${totalTests} testiä</div></div>
|
||||
<div class="card"><div class="label">Paras malli</div><div class="value" style="font-size:1.2rem">${bestModel?.model || '-'}</div><div class="sub">${bestModel?.avg || 0}p</div></div>
|
||||
<div class="card"><div class="label">Nopein</div><div class="value" style="font-size:1.2rem">${fastestModel?.model || '-'}</div><div class="sub">${fastestModel?.speed || 0} tok/s</div></div>
|
||||
<div class="card"><div class="label">Malleja</div><div class="value">${models.length}</div><div class="sub">${scenarios.length} skenaariota</div></div>
|
||||
<div class="card"><div class="label">Kokonaisaika</div><div class="value">${(totalTime/1000/60).toFixed(1)}</div><div class="sub">minuuttia</div></div>
|
||||
`;
|
||||
|
||||
// Summary table
|
||||
const sumHead = document.querySelector('#summary-table thead');
|
||||
const sumBody = document.querySelector('#summary-table tbody');
|
||||
sumHead.innerHTML = '<tr><th>Malli</th>' + scenarios.map(s => `<th>${s}</th>`).join('') + '<th>Yht.</th><th>Out tok</th><th>Aika</th><th>tok/s</th><th>Pisteet</th></tr>';
|
||||
|
||||
const modelRows = models.map(m => {
|
||||
const mrs = DATA.filter(r => r.model === m);
|
||||
const tp = mrs.reduce((s,r) => s + r.testsPassed, 0);
|
||||
const tt = mrs.reduce((s,r) => s + r.testsTotal, 0);
|
||||
const tok = mrs.reduce((s,r) => s + r.totalTokens, 0);
|
||||
const time = mrs.reduce((s,r) => s + r.totalDurationMs, 0);
|
||||
const speed = Math.round(mrs.reduce((s,r) => s + r.avgTokPerSec, 0) / mrs.length);
|
||||
const avg = Math.round(mrs.reduce((s,r) => s + r.score, 0) / mrs.length);
|
||||
const scenCols = scenarios.map(s => {
|
||||
const r = mrs.find(r => r.scenario === s);
|
||||
if (!r) return '<td>-</td>';
|
||||
return `<td class="${cls(r)}">${pctBar(r.testsPassed, r.testsTotal, 60)} <span style="color:var(--dim)">${(r.totalDurationMs/1000).toFixed(0)}s</span></td>`;
|
||||
}).join('');
|
||||
return { avg, html: `<tr><td class="model-name">${m}</td>${scenCols}<td>${pctBar(tp, tt)}</td><td>${(tok/1000).toFixed(1)}K</td><td>${(time/1000).toFixed(0)}s</td><td>${speed}</td><td><span class="stars">${starsFor(avg)}</span> ${avg}p</td></tr>` };
|
||||
}).sort((a,b) => b.avg - a.avg);
|
||||
sumBody.innerHTML = modelRows.map(r => r.html).join('');
|
||||
|
||||
// Results table
|
||||
const resHead = document.querySelector('#results-table thead');
|
||||
const resBody = document.querySelector('#results-table tbody');
|
||||
const resCols = ['Malli','Skenaario','Speksi','Testit','Korjaus','Ctx','Out tok','Aika','tok/s','Pisteet'];
|
||||
resHead.innerHTML = '<tr>' + resCols.map((c,i) => `<th data-col="${i}">${c}</th>`).join('') + '</tr>';
|
||||
|
||||
let sortCol = 9, sortAsc = false;
|
||||
function renderResults() {
|
||||
const sorted = [...DATA].sort((a,b) => {
|
||||
const vals = [
|
||||
[a.model, b.model],
|
||||
[a.scenario, b.scenario],
|
||||
[a.specEntities, b.specEntities],
|
||||
[a.testsPassed/Math.max(a.testsTotal,1), b.testsPassed/Math.max(b.testsTotal,1)],
|
||||
[a.fixRounds, b.fixRounds],
|
||||
[a.promptTokensEst, b.promptTokensEst],
|
||||
[a.totalTokens, b.totalTokens],
|
||||
[a.totalDurationMs, b.totalDurationMs],
|
||||
[a.avgTokPerSec, b.avgTokPerSec],
|
||||
[a.score, b.score],
|
||||
][sortCol];
|
||||
const cmp = typeof vals[0] === 'string' ? vals[0].localeCompare(vals[1]) : vals[0] - vals[1];
|
||||
return sortAsc ? cmp : -cmp;
|
||||
});
|
||||
resBody.innerHTML = sorted.map(r => {
|
||||
const c = cls(r);
|
||||
return `<tr>
|
||||
<td class="model-name">${r.model}</td>
|
||||
<td>${r.scenario}</td>
|
||||
<td>${r.specOk ? `✓ ${r.specEntities}e` : '<span class="fail">✗</span>'}</td>
|
||||
<td class="${c}">${pctBar(r.testsPassed, r.testsTotal)}</td>
|
||||
<td>${r.fixRounds > 0 ? r.fixRounds + '×' : '-'}</td>
|
||||
<td>${r.promptTokensEst > 0 ? '~'+(r.promptTokensEst/1000).toFixed(1)+'K' : '-'}</td>
|
||||
<td>${r.totalTokens > 0 ? (r.totalTokens/1000).toFixed(1)+'K' : '-'}</td>
|
||||
<td>${(r.totalDurationMs/1000).toFixed(0)}s</td>
|
||||
<td>${r.avgTokPerSec.toFixed(0)}</td>
|
||||
<td><span class="stars">${r.stars}</span> ${r.score}p</td>
|
||||
</tr>`;
|
||||
}).join('');
|
||||
document.querySelectorAll('#results-table th').forEach((th,i) => {
|
||||
th.className = i === sortCol ? (sortAsc ? 'sorted-asc' : 'sorted-desc') : '';
|
||||
});
|
||||
}
|
||||
document.querySelector('#results-table thead').addEventListener('click', e => {
|
||||
const col = parseInt(e.target.dataset.col);
|
||||
if (isNaN(col)) return;
|
||||
if (sortCol === col) sortAsc = !sortAsc;
|
||||
else { sortCol = col; sortAsc = false; }
|
||||
renderResults();
|
||||
});
|
||||
renderResults();
|
||||
</script>
|
||||
</body>
|
||||
</html>
|
||||
122
kipina-codebench/results/2026-04-14T07-18.json
Normal file
122
kipina-codebench/results/2026-04-14T07-18.json
Normal file
@@ -0,0 +1,122 @@
|
||||
[
|
||||
{
|
||||
"model": "qwen3:14b",
|
||||
"scenario": "todo",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 1,
|
||||
"testsPassed": 0,
|
||||
"testsFailed": 1,
|
||||
"totalDurationMs": 66903,
|
||||
"totalTokens": 5454,
|
||||
"avgTokPerSec": 86.45918994499432,
|
||||
"promptChars": 9985,
|
||||
"promptTokensEst": 2496,
|
||||
"score": 40,
|
||||
"stars": "★★☆☆☆",
|
||||
"error": null
|
||||
},
|
||||
{
|
||||
"model": "qwen3:14b",
|
||||
"scenario": "users",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 1,
|
||||
"testsPassed": 0,
|
||||
"testsFailed": 1,
|
||||
"totalDurationMs": 87618,
|
||||
"totalTokens": 7150,
|
||||
"avgTokPerSec": 87.21782190501095,
|
||||
"promptChars": 9922,
|
||||
"promptTokensEst": 2481,
|
||||
"score": 40,
|
||||
"stars": "★★☆☆☆",
|
||||
"error": null
|
||||
},
|
||||
{
|
||||
"model": "qwen3:14b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 2,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 9,
|
||||
"testsPassed": 5,
|
||||
"testsFailed": 4,
|
||||
"totalDurationMs": 78398,
|
||||
"totalTokens": 6427,
|
||||
"avgTokPerSec": 85.52353711143463,
|
||||
"promptChars": 10737,
|
||||
"promptTokensEst": 2684,
|
||||
"score": 73,
|
||||
"stars": "★★★★☆",
|
||||
"error": null
|
||||
},
|
||||
{
|
||||
"model": "qwen3:8b",
|
||||
"scenario": "todo",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 8,
|
||||
"testsPassed": 7,
|
||||
"testsFailed": 1,
|
||||
"totalDurationMs": 82750,
|
||||
"totalTokens": 10054,
|
||||
"avgTokPerSec": 139.90690936146032,
|
||||
"promptChars": 9360,
|
||||
"promptTokensEst": 2340,
|
||||
"score": 93,
|
||||
"stars": "★★★★★",
|
||||
"error": null
|
||||
},
|
||||
{
|
||||
"model": "qwen3:8b",
|
||||
"scenario": "users",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 6,
|
||||
"testsPassed": 6,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 32233,
|
||||
"totalTokens": 4404,
|
||||
"avgTokPerSec": 143.4997404058814,
|
||||
"promptChars": 9310,
|
||||
"promptTokensEst": 2328,
|
||||
"score": 100,
|
||||
"stars": "★★★★★",
|
||||
"error": null
|
||||
},
|
||||
{
|
||||
"model": "qwen3:8b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 2,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 1,
|
||||
"testsPassed": 0,
|
||||
"testsFailed": 1,
|
||||
"totalDurationMs": 88563,
|
||||
"totalTokens": 11575,
|
||||
"avgTokPerSec": 141.54675017528362,
|
||||
"promptChars": 10567,
|
||||
"promptTokensEst": 2642,
|
||||
"score": 40,
|
||||
"stars": "★★☆☆☆",
|
||||
"error": null
|
||||
}
|
||||
]
|
||||
183
kipina-codebench/results/2026-04-14T07-55.html
Normal file
183
kipina-codebench/results/2026-04-14T07-55.html
Normal file
@@ -0,0 +1,183 @@
|
||||
<!DOCTYPE html>
|
||||
<html lang="fi">
|
||||
<head>
|
||||
<meta charset="UTF-8">
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
||||
<title>Kipina Model Benchmark</title>
|
||||
<style>
|
||||
:root { --bg: #0d1117; --card: #161b22; --border: #30363d; --text: #e6edf3; --dim: #8b949e; --green: #3fb950; --yellow: #d29922; --red: #f85149; --blue: #58a6ff; }
|
||||
* { box-sizing: border-box; margin: 0; padding: 0; }
|
||||
body { font-family: -apple-system, 'Segoe UI', Helvetica, Arial, sans-serif; background: var(--bg); color: var(--text); padding: 2rem; max-width: 1400px; margin: 0 auto; }
|
||||
h1 { font-size: 1.5rem; margin-bottom: 0.5rem; }
|
||||
.meta { color: var(--dim); font-size: 0.85rem; margin-bottom: 2rem; }
|
||||
.cards { display: grid; grid-template-columns: repeat(auto-fit, minmax(180px, 1fr)); gap: 1rem; margin-bottom: 2rem; }
|
||||
.card { background: var(--card); border: 1px solid var(--border); border-radius: 8px; padding: 1rem; }
|
||||
.card .label { color: var(--dim); font-size: 0.75rem; text-transform: uppercase; letter-spacing: 0.05em; }
|
||||
.card .value { font-size: 1.8rem; font-weight: 600; margin-top: 0.25rem; }
|
||||
.card .sub { color: var(--dim); font-size: 0.8rem; margin-top: 0.25rem; }
|
||||
table { width: 100%; border-collapse: collapse; background: var(--card); border: 1px solid var(--border); border-radius: 8px; overflow: hidden; margin-bottom: 2rem; }
|
||||
th { background: #1c2128; text-align: left; padding: 0.6rem 0.8rem; font-size: 0.75rem; text-transform: uppercase; letter-spacing: 0.05em; color: var(--dim); cursor: pointer; user-select: none; white-space: nowrap; }
|
||||
th:hover { color: var(--text); }
|
||||
th.sorted-asc::after { content: ' ▲'; }
|
||||
th.sorted-desc::after { content: ' ▼'; }
|
||||
td { padding: 0.5rem 0.8rem; border-top: 1px solid var(--border); font-size: 0.85rem; white-space: nowrap; }
|
||||
tr:hover td { background: #1c2128; }
|
||||
.pass { color: var(--green); }
|
||||
.partial { color: var(--yellow); }
|
||||
.fail { color: var(--red); }
|
||||
.stars { letter-spacing: 1px; }
|
||||
.bar { display: inline-block; height: 8px; border-radius: 4px; vertical-align: middle; }
|
||||
.bar-bg { background: var(--border); }
|
||||
.bar-fill { background: var(--green); }
|
||||
.bar-partial { background: var(--yellow); }
|
||||
.model-name { font-weight: 600; }
|
||||
h2 { font-size: 1.1rem; margin-bottom: 1rem; color: var(--dim); }
|
||||
.summary-table th:first-child, .summary-table td:first-child { min-width: 200px; }
|
||||
</style>
|
||||
</head>
|
||||
<body>
|
||||
|
||||
<h1>Kipina Model Benchmark</h1>
|
||||
<div class="meta" id="meta"></div>
|
||||
|
||||
<div class="cards" id="cards"></div>
|
||||
|
||||
<h2>Mallikohtainen yhteenveto</h2>
|
||||
<table class="summary-table" id="summary-table"><thead></thead><tbody></tbody></table>
|
||||
|
||||
<h2>Kaikki tulokset</h2>
|
||||
<table id="results-table"><thead></thead><tbody></tbody></table>
|
||||
|
||||
<script>
|
||||
const RAW = [{"model":"qwen3:14b","scenario":"todo","reqOk":true,"specOk":true,"specEntities":1,"validationIssues":0,"fixRounds":0,"testsTotal":9,"testsPassed":6,"testsFailed":3,"totalDurationMs":50350,"totalTokens":2797,"avgTokPerSec":60.919860198859574,"promptChars":9858,"promptTokensEst":2465,"score":80,"stars":"★★★★☆","error":null},{"model":"qwen3:14b","scenario":"users","reqOk":true,"specOk":true,"specEntities":1,"validationIssues":0,"fixRounds":0,"testsTotal":8,"testsPassed":6,"testsFailed":2,"totalDurationMs":46557,"totalTokens":2584,"avgTokPerSec":60.88834523948,"promptChars":9544,"promptTokensEst":2386,"score":85,"stars":"★★★★☆","error":null},{"model":"qwen3:14b","scenario":"blog","reqOk":true,"specOk":true,"specEntities":2,"validationIssues":0,"fixRounds":0,"testsTotal":15,"testsPassed":2,"testsFailed":13,"totalDurationMs":90761,"totalTokens":4979,"avgTokPerSec":60.19247492391319,"promptChars":10521,"promptTokensEst":2630,"score":48,"stars":"★★☆☆☆","error":null},{"model":"qwen3:8b","scenario":"todo","reqOk":true,"specOk":true,"specEntities":1,"validationIssues":0,"fixRounds":0,"testsTotal":0,"testsPassed":0,"testsFailed":0,"totalDurationMs":27360,"totalTokens":2466,"avgTokPerSec":100.9922018173994,"promptChars":9767,"promptTokensEst":2442,"score":0,"stars":"☆☆☆☆☆","error":"Testit kaatuivat"},{"model":"qwen3:8b","scenario":"users","reqOk":true,"specOk":true,"specEntities":1,"validationIssues":0,"fixRounds":0,"testsTotal":7,"testsPassed":7,"testsFailed":0,"totalDurationMs":20920,"totalTokens":1876,"avgTokPerSec":101.60760023892685,"promptChars":8782,"promptTokensEst":2196,"score":100,"stars":"★★★★★","error":null},{"model":"qwen3:8b","scenario":"blog","reqOk":true,"specOk":true,"specEntities":2,"validationIssues":0,"fixRounds":0,"testsTotal":10,"testsPassed":9,"testsFailed":1,"totalDurationMs":35766,"totalTokens":3217,"avgTokPerSec":100.40066102398943,"promptChars":10334,"promptTokensEst":2584,"score":94,"stars":"★★★★★","error":null}];
|
||||
|
||||
const starsFor = s => s >= 90 ? '★★★★★' : s >= 70 ? '★★★★☆' : s >= 50 ? '★★★☆☆' : s >= 25 ? '★★☆☆☆' : s > 0 ? '★☆☆☆☆' : '☆☆☆☆☆';
|
||||
function calcScore(r) {
|
||||
if (r.error && r.testsTotal === 0) return 0;
|
||||
let s = 0;
|
||||
if (r.specOk) s += 10;
|
||||
if (!r.error || r.testsTotal > 0) s += 10;
|
||||
if (r.testsTotal > 0) s += Math.round((r.testsPassed / r.testsTotal) * 60);
|
||||
s += Math.max(0, 20 - (r.fixRounds || 0) * 10);
|
||||
return Math.min(100, s);
|
||||
}
|
||||
// Laske pisteet jos puuttuvat
|
||||
const DATA = RAW.map(r => {
|
||||
if (r.score == null) r.score = calcScore(r);
|
||||
if (!r.stars) r.stars = starsFor(r.score);
|
||||
if (!r.promptTokensEst) r.promptTokensEst = r.promptChars ? Math.round(r.promptChars / 4) : 0;
|
||||
return r;
|
||||
});
|
||||
const cls = r => (!r.error && r.testsPassed === r.testsTotal && r.testsTotal > 0) ? 'pass' : (r.testsTotal > 0 && r.testsPassed > 0) ? 'partial' : 'fail';
|
||||
const pctBar = (passed, total, w=80) => {
|
||||
if (total === 0) return '-';
|
||||
const pct = passed/total*100;
|
||||
const c = pct === 100 ? 'bar-fill' : 'bar-partial';
|
||||
return `<span class="bar bar-bg" style="width:${w}px"><span class="bar ${c}" style="width:${Math.round(pct/100*w)}px"></span></span> ${passed}/${total}`;
|
||||
};
|
||||
|
||||
// Meta
|
||||
const totalTime = DATA.reduce((s,r) => s + r.totalDurationMs, 0);
|
||||
document.getElementById('meta').textContent = `${new Date().toLocaleDateString('fi-FI')} — ${DATA.length} ajoa — ${(totalTime/1000/60).toFixed(1)} min`;
|
||||
|
||||
// Cards
|
||||
const models = [...new Set(DATA.map(r => r.model))];
|
||||
const scenarios = [...new Set(DATA.map(r => r.scenario))];
|
||||
const avgScore = DATA.length ? Math.round(DATA.reduce((s,r) => s + r.score, 0) / DATA.length) : 0;
|
||||
const totalPassed = DATA.reduce((s,r) => s + r.testsPassed, 0);
|
||||
const totalTests = DATA.reduce((s,r) => s + r.testsTotal, 0);
|
||||
const passRate = totalTests ? Math.round(totalPassed/totalTests*100) : 0;
|
||||
const bestModel = models.map(m => {
|
||||
const mrs = DATA.filter(r => r.model === m);
|
||||
return { model: m, avg: Math.round(mrs.reduce((s,r) => s + r.score, 0) / mrs.length) };
|
||||
}).sort((a,b) => b.avg - a.avg)[0];
|
||||
const fastestModel = models.map(m => {
|
||||
const mrs = DATA.filter(r => r.model === m);
|
||||
return { model: m, speed: Math.round(mrs.reduce((s,r) => s + r.avgTokPerSec, 0) / mrs.length) };
|
||||
}).sort((a,b) => b.speed - a.speed)[0];
|
||||
|
||||
document.getElementById('cards').innerHTML = `
|
||||
<div class="card"><div class="label">Keskiarvo</div><div class="value">${starsFor(avgScore)}</div><div class="sub">${avgScore} pistetta</div></div>
|
||||
<div class="card"><div class="label">Testien läpäisy</div><div class="value">${passRate}%</div><div class="sub">${totalPassed}/${totalTests} testiä</div></div>
|
||||
<div class="card"><div class="label">Paras malli</div><div class="value" style="font-size:1.2rem">${bestModel?.model || '-'}</div><div class="sub">${bestModel?.avg || 0}p</div></div>
|
||||
<div class="card"><div class="label">Nopein</div><div class="value" style="font-size:1.2rem">${fastestModel?.model || '-'}</div><div class="sub">${fastestModel?.speed || 0} tok/s</div></div>
|
||||
<div class="card"><div class="label">Malleja</div><div class="value">${models.length}</div><div class="sub">${scenarios.length} skenaariota</div></div>
|
||||
<div class="card"><div class="label">Kokonaisaika</div><div class="value">${(totalTime/1000/60).toFixed(1)}</div><div class="sub">minuuttia</div></div>
|
||||
`;
|
||||
|
||||
// Summary table
|
||||
const sumHead = document.querySelector('#summary-table thead');
|
||||
const sumBody = document.querySelector('#summary-table tbody');
|
||||
sumHead.innerHTML = '<tr><th>Malli</th>' + scenarios.map(s => `<th>${s}</th>`).join('') + '<th>Yht.</th><th>Out tok</th><th>Aika</th><th>tok/s</th><th>Pisteet</th></tr>';
|
||||
|
||||
const modelRows = models.map(m => {
|
||||
const mrs = DATA.filter(r => r.model === m);
|
||||
const tp = mrs.reduce((s,r) => s + r.testsPassed, 0);
|
||||
const tt = mrs.reduce((s,r) => s + r.testsTotal, 0);
|
||||
const tok = mrs.reduce((s,r) => s + r.totalTokens, 0);
|
||||
const time = mrs.reduce((s,r) => s + r.totalDurationMs, 0);
|
||||
const speed = Math.round(mrs.reduce((s,r) => s + r.avgTokPerSec, 0) / mrs.length);
|
||||
const avg = Math.round(mrs.reduce((s,r) => s + r.score, 0) / mrs.length);
|
||||
const scenCols = scenarios.map(s => {
|
||||
const r = mrs.find(r => r.scenario === s);
|
||||
if (!r) return '<td>-</td>';
|
||||
return `<td class="${cls(r)}">${pctBar(r.testsPassed, r.testsTotal, 60)} <span style="color:var(--dim)">${(r.totalDurationMs/1000).toFixed(0)}s</span></td>`;
|
||||
}).join('');
|
||||
return { avg, html: `<tr><td class="model-name">${m}</td>${scenCols}<td>${pctBar(tp, tt)}</td><td>${(tok/1000).toFixed(1)}K</td><td>${(time/1000).toFixed(0)}s</td><td>${speed}</td><td><span class="stars">${starsFor(avg)}</span> ${avg}p</td></tr>` };
|
||||
}).sort((a,b) => b.avg - a.avg);
|
||||
sumBody.innerHTML = modelRows.map(r => r.html).join('');
|
||||
|
||||
// Results table
|
||||
const resHead = document.querySelector('#results-table thead');
|
||||
const resBody = document.querySelector('#results-table tbody');
|
||||
const resCols = ['Malli','Skenaario','Speksi','Testit','Korjaus','Ctx','Out tok','Aika','tok/s','Pisteet'];
|
||||
resHead.innerHTML = '<tr>' + resCols.map((c,i) => `<th data-col="${i}">${c}</th>`).join('') + '</tr>';
|
||||
|
||||
let sortCol = 9, sortAsc = false;
|
||||
function renderResults() {
|
||||
const sorted = [...DATA].sort((a,b) => {
|
||||
const vals = [
|
||||
[a.model, b.model],
|
||||
[a.scenario, b.scenario],
|
||||
[a.specEntities, b.specEntities],
|
||||
[a.testsPassed/Math.max(a.testsTotal,1), b.testsPassed/Math.max(b.testsTotal,1)],
|
||||
[a.fixRounds, b.fixRounds],
|
||||
[a.promptTokensEst, b.promptTokensEst],
|
||||
[a.totalTokens, b.totalTokens],
|
||||
[a.totalDurationMs, b.totalDurationMs],
|
||||
[a.avgTokPerSec, b.avgTokPerSec],
|
||||
[a.score, b.score],
|
||||
][sortCol];
|
||||
const cmp = typeof vals[0] === 'string' ? vals[0].localeCompare(vals[1]) : vals[0] - vals[1];
|
||||
return sortAsc ? cmp : -cmp;
|
||||
});
|
||||
resBody.innerHTML = sorted.map(r => {
|
||||
const c = cls(r);
|
||||
return `<tr>
|
||||
<td class="model-name">${r.model}</td>
|
||||
<td>${r.scenario}</td>
|
||||
<td>${r.specOk ? `✓ ${r.specEntities}e` : '<span class="fail">✗</span>'}</td>
|
||||
<td class="${c}">${pctBar(r.testsPassed, r.testsTotal)}</td>
|
||||
<td>${r.fixRounds > 0 ? r.fixRounds + '×' : '-'}</td>
|
||||
<td>${r.promptTokensEst > 0 ? '~'+(r.promptTokensEst/1000).toFixed(1)+'K' : '-'}</td>
|
||||
<td>${r.totalTokens > 0 ? (r.totalTokens/1000).toFixed(1)+'K' : '-'}</td>
|
||||
<td>${(r.totalDurationMs/1000).toFixed(0)}s</td>
|
||||
<td>${r.avgTokPerSec.toFixed(0)}</td>
|
||||
<td><span class="stars">${r.stars}</span> ${r.score}p</td>
|
||||
</tr>`;
|
||||
}).join('');
|
||||
document.querySelectorAll('#results-table th').forEach((th,i) => {
|
||||
th.className = i === sortCol ? (sortAsc ? 'sorted-asc' : 'sorted-desc') : '';
|
||||
});
|
||||
}
|
||||
document.querySelector('#results-table thead').addEventListener('click', e => {
|
||||
const col = parseInt(e.target.dataset.col);
|
||||
if (isNaN(col)) return;
|
||||
if (sortCol === col) sortAsc = !sortAsc;
|
||||
else { sortCol = col; sortAsc = false; }
|
||||
renderResults();
|
||||
});
|
||||
renderResults();
|
||||
</script>
|
||||
</body>
|
||||
</html>
|
||||
122
kipina-codebench/results/2026-04-14T07-55.json
Normal file
122
kipina-codebench/results/2026-04-14T07-55.json
Normal file
@@ -0,0 +1,122 @@
|
||||
[
|
||||
{
|
||||
"model": "qwen3:14b",
|
||||
"scenario": "todo",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 9,
|
||||
"testsPassed": 6,
|
||||
"testsFailed": 3,
|
||||
"totalDurationMs": 50350,
|
||||
"totalTokens": 2797,
|
||||
"avgTokPerSec": 60.919860198859574,
|
||||
"promptChars": 9858,
|
||||
"promptTokensEst": 2465,
|
||||
"score": 80,
|
||||
"stars": "★★★★☆",
|
||||
"error": null
|
||||
},
|
||||
{
|
||||
"model": "qwen3:14b",
|
||||
"scenario": "users",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 8,
|
||||
"testsPassed": 6,
|
||||
"testsFailed": 2,
|
||||
"totalDurationMs": 46557,
|
||||
"totalTokens": 2584,
|
||||
"avgTokPerSec": 60.88834523948,
|
||||
"promptChars": 9544,
|
||||
"promptTokensEst": 2386,
|
||||
"score": 85,
|
||||
"stars": "★★★★☆",
|
||||
"error": null
|
||||
},
|
||||
{
|
||||
"model": "qwen3:14b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 2,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 15,
|
||||
"testsPassed": 2,
|
||||
"testsFailed": 13,
|
||||
"totalDurationMs": 90761,
|
||||
"totalTokens": 4979,
|
||||
"avgTokPerSec": 60.19247492391319,
|
||||
"promptChars": 10521,
|
||||
"promptTokensEst": 2630,
|
||||
"score": 48,
|
||||
"stars": "★★☆☆☆",
|
||||
"error": null
|
||||
},
|
||||
{
|
||||
"model": "qwen3:8b",
|
||||
"scenario": "todo",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 0,
|
||||
"testsPassed": 0,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 27360,
|
||||
"totalTokens": 2466,
|
||||
"avgTokPerSec": 100.9922018173994,
|
||||
"promptChars": 9767,
|
||||
"promptTokensEst": 2442,
|
||||
"score": 0,
|
||||
"stars": "☆☆☆☆☆",
|
||||
"error": "Testit kaatuivat"
|
||||
},
|
||||
{
|
||||
"model": "qwen3:8b",
|
||||
"scenario": "users",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 7,
|
||||
"testsPassed": 7,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 20920,
|
||||
"totalTokens": 1876,
|
||||
"avgTokPerSec": 101.60760023892685,
|
||||
"promptChars": 8782,
|
||||
"promptTokensEst": 2196,
|
||||
"score": 100,
|
||||
"stars": "★★★★★",
|
||||
"error": null
|
||||
},
|
||||
{
|
||||
"model": "qwen3:8b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 2,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 10,
|
||||
"testsPassed": 9,
|
||||
"testsFailed": 1,
|
||||
"totalDurationMs": 35766,
|
||||
"totalTokens": 3217,
|
||||
"avgTokPerSec": 100.40066102398943,
|
||||
"promptChars": 10334,
|
||||
"promptTokensEst": 2584,
|
||||
"score": 94,
|
||||
"stars": "★★★★★",
|
||||
"error": null
|
||||
}
|
||||
]
|
||||
183
kipina-codebench/results/2026-04-14T08-05.html
Normal file
183
kipina-codebench/results/2026-04-14T08-05.html
Normal file
File diff suppressed because one or more lines are too long
947
kipina-codebench/results/2026-04-14T08-05.json
Normal file
947
kipina-codebench/results/2026-04-14T08-05.json
Normal file
@@ -0,0 +1,947 @@
|
||||
[
|
||||
{
|
||||
"model": "qwen3-coder:30b",
|
||||
"scenario": "todo",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 6,
|
||||
"testsPassed": 1,
|
||||
"testsFailed": 5,
|
||||
"totalDurationMs": 30801,
|
||||
"totalTokens": 2333,
|
||||
"avgTokPerSec": 122.77922150989748,
|
||||
"promptChars": 10015,
|
||||
"promptTokensEst": 2504,
|
||||
"score": 50,
|
||||
"stars": "★★★☆☆",
|
||||
"error": null,
|
||||
"round": 1
|
||||
},
|
||||
{
|
||||
"model": "qwen3-coder:30b",
|
||||
"scenario": "users",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 7,
|
||||
"testsPassed": 6,
|
||||
"testsFailed": 1,
|
||||
"totalDurationMs": 25495,
|
||||
"totalTokens": 2714,
|
||||
"avgTokPerSec": 122.70970007652487,
|
||||
"promptChars": 9891,
|
||||
"promptTokensEst": 2473,
|
||||
"score": 91,
|
||||
"stars": "★★★★★",
|
||||
"error": null,
|
||||
"round": 1
|
||||
},
|
||||
{
|
||||
"model": "qwen3-coder:30b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 2,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 11,
|
||||
"testsPassed": 10,
|
||||
"testsFailed": 1,
|
||||
"totalDurationMs": 37153,
|
||||
"totalTokens": 3979,
|
||||
"avgTokPerSec": 121.9183958236036,
|
||||
"promptChars": 11158,
|
||||
"promptTokensEst": 2790,
|
||||
"score": 95,
|
||||
"stars": "★★★★★",
|
||||
"error": null,
|
||||
"round": 1
|
||||
},
|
||||
{
|
||||
"model": "qwen3:14b",
|
||||
"scenario": "todo",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 7,
|
||||
"testsPassed": 6,
|
||||
"testsFailed": 1,
|
||||
"totalDurationMs": 43456,
|
||||
"totalTokens": 2411,
|
||||
"avgTokPerSec": 60.89226084568145,
|
||||
"promptChars": 9831,
|
||||
"promptTokensEst": 2458,
|
||||
"score": 91,
|
||||
"stars": "★★★★★",
|
||||
"error": null,
|
||||
"round": 1
|
||||
},
|
||||
{
|
||||
"model": "qwen3:14b",
|
||||
"scenario": "users",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 8,
|
||||
"testsPassed": 8,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 40376,
|
||||
"totalTokens": 2237,
|
||||
"avgTokPerSec": 61.028627032662456,
|
||||
"promptChars": 9343,
|
||||
"promptTokensEst": 2336,
|
||||
"score": 100,
|
||||
"stars": "★★★★★",
|
||||
"error": null,
|
||||
"round": 1
|
||||
},
|
||||
{
|
||||
"model": "qwen3:14b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 2,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 12,
|
||||
"testsPassed": 2,
|
||||
"testsFailed": 10,
|
||||
"totalDurationMs": 68620,
|
||||
"totalTokens": 3796,
|
||||
"avgTokPerSec": 60.47793268944476,
|
||||
"promptChars": 10497,
|
||||
"promptTokensEst": 2624,
|
||||
"score": 50,
|
||||
"stars": "★★★☆☆",
|
||||
"error": null,
|
||||
"round": 1
|
||||
},
|
||||
{
|
||||
"model": "qwen3:8b",
|
||||
"scenario": "todo",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 7,
|
||||
"testsPassed": 7,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 25235,
|
||||
"totalTokens": 2269,
|
||||
"avgTokPerSec": 101.24212769079884,
|
||||
"promptChars": 9294,
|
||||
"promptTokensEst": 2324,
|
||||
"score": 100,
|
||||
"stars": "★★★★★",
|
||||
"error": null,
|
||||
"round": 1
|
||||
},
|
||||
{
|
||||
"model": "qwen3:8b",
|
||||
"scenario": "users",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 8,
|
||||
"testsPassed": 6,
|
||||
"testsFailed": 2,
|
||||
"totalDurationMs": 21720,
|
||||
"totalTokens": 1942,
|
||||
"avgTokPerSec": 101.65074583709965,
|
||||
"promptChars": 9020,
|
||||
"promptTokensEst": 2255,
|
||||
"score": 85,
|
||||
"stars": "★★★★☆",
|
||||
"error": null,
|
||||
"round": 1
|
||||
},
|
||||
{
|
||||
"model": "qwen3:8b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 2,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 11,
|
||||
"testsPassed": 10,
|
||||
"testsFailed": 1,
|
||||
"totalDurationMs": 39006,
|
||||
"totalTokens": 3509,
|
||||
"avgTokPerSec": 100.43593706181406,
|
||||
"promptChars": 10372,
|
||||
"promptTokensEst": 2593,
|
||||
"score": 95,
|
||||
"stars": "★★★★★",
|
||||
"error": null,
|
||||
"round": 1
|
||||
},
|
||||
{
|
||||
"model": "qwen3-coder:30b",
|
||||
"scenario": "todo",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 6,
|
||||
"testsPassed": 6,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 21989,
|
||||
"totalTokens": 2339,
|
||||
"avgTokPerSec": 122.8454095677367,
|
||||
"promptChars": 10052,
|
||||
"promptTokensEst": 2513,
|
||||
"score": 100,
|
||||
"stars": "★★★★★",
|
||||
"error": null,
|
||||
"round": 2
|
||||
},
|
||||
{
|
||||
"model": "qwen3-coder:30b",
|
||||
"scenario": "users",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 6,
|
||||
"testsPassed": 6,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 23997,
|
||||
"totalTokens": 2551,
|
||||
"avgTokPerSec": 122.23722733560855,
|
||||
"promptChars": 9973,
|
||||
"promptTokensEst": 2493,
|
||||
"score": 100,
|
||||
"stars": "★★★★★",
|
||||
"error": null,
|
||||
"round": 2
|
||||
},
|
||||
{
|
||||
"model": "qwen3-coder:30b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 2,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 8,
|
||||
"testsPassed": 8,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 30169,
|
||||
"totalTokens": 3249,
|
||||
"avgTokPerSec": 123.04696524796096,
|
||||
"promptChars": 11097,
|
||||
"promptTokensEst": 2774,
|
||||
"score": 100,
|
||||
"stars": "★★★★★",
|
||||
"error": null,
|
||||
"round": 2
|
||||
},
|
||||
{
|
||||
"model": "qwen3:14b",
|
||||
"scenario": "todo",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 9,
|
||||
"testsPassed": 6,
|
||||
"testsFailed": 3,
|
||||
"totalDurationMs": 47091,
|
||||
"totalTokens": 2602,
|
||||
"avgTokPerSec": 60.962687726457375,
|
||||
"promptChars": 9633,
|
||||
"promptTokensEst": 2408,
|
||||
"score": 80,
|
||||
"stars": "★★★★☆",
|
||||
"error": null,
|
||||
"round": 2
|
||||
},
|
||||
{
|
||||
"model": "qwen3:14b",
|
||||
"scenario": "users",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 7,
|
||||
"testsPassed": 7,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 41747,
|
||||
"totalTokens": 2313,
|
||||
"avgTokPerSec": 60.949025583617605,
|
||||
"promptChars": 9373,
|
||||
"promptTokensEst": 2343,
|
||||
"score": 100,
|
||||
"stars": "★★★★★",
|
||||
"error": null,
|
||||
"round": 2
|
||||
},
|
||||
{
|
||||
"model": "qwen3:14b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 2,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 12,
|
||||
"testsPassed": 2,
|
||||
"testsFailed": 10,
|
||||
"totalDurationMs": 66888,
|
||||
"totalTokens": 3699,
|
||||
"avgTokPerSec": 60.49540514685331,
|
||||
"promptChars": 10323,
|
||||
"promptTokensEst": 2581,
|
||||
"score": 50,
|
||||
"stars": "★★★☆☆",
|
||||
"error": null,
|
||||
"round": 2
|
||||
},
|
||||
{
|
||||
"model": "qwen3:8b",
|
||||
"scenario": "todo",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 8,
|
||||
"testsPassed": 7,
|
||||
"testsFailed": 1,
|
||||
"totalDurationMs": 27036,
|
||||
"totalTokens": 2434,
|
||||
"avgTokPerSec": 101.01399069228444,
|
||||
"promptChars": 9513,
|
||||
"promptTokensEst": 2378,
|
||||
"score": 93,
|
||||
"stars": "★★★★★",
|
||||
"error": null,
|
||||
"round": 2
|
||||
},
|
||||
{
|
||||
"model": "qwen3:8b",
|
||||
"scenario": "users",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 7,
|
||||
"testsPassed": 6,
|
||||
"testsFailed": 1,
|
||||
"totalDurationMs": 20927,
|
||||
"totalTokens": 1872,
|
||||
"avgTokPerSec": 101.45096098956486,
|
||||
"promptChars": 8881,
|
||||
"promptTokensEst": 2220,
|
||||
"score": 91,
|
||||
"stars": "★★★★★",
|
||||
"error": null,
|
||||
"round": 2
|
||||
},
|
||||
{
|
||||
"model": "qwen3:8b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": false,
|
||||
"specEntities": 0,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 0,
|
||||
"testsPassed": 0,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 0,
|
||||
"totalTokens": 0,
|
||||
"avgTokPerSec": 0,
|
||||
"promptChars": 0,
|
||||
"promptTokensEst": 0,
|
||||
"score": 0,
|
||||
"stars": "",
|
||||
"error": "JSON-speksi epäonnistui",
|
||||
"round": 2
|
||||
},
|
||||
{
|
||||
"model": "qwen3-coder:30b",
|
||||
"scenario": "todo",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 8,
|
||||
"testsPassed": 6,
|
||||
"testsFailed": 2,
|
||||
"totalDurationMs": 26919,
|
||||
"totalTokens": 2889,
|
||||
"avgTokPerSec": 123.63666629145064,
|
||||
"promptChars": 10162,
|
||||
"promptTokensEst": 2541,
|
||||
"score": 85,
|
||||
"stars": "★★★★☆",
|
||||
"error": null,
|
||||
"round": 3
|
||||
},
|
||||
{
|
||||
"model": "qwen3-coder:30b",
|
||||
"scenario": "users",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 8,
|
||||
"testsPassed": 8,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 27592,
|
||||
"totalTokens": 2946,
|
||||
"avgTokPerSec": 122.33273400152825,
|
||||
"promptChars": 9469,
|
||||
"promptTokensEst": 2367,
|
||||
"score": 100,
|
||||
"stars": "★★★★★",
|
||||
"error": null,
|
||||
"round": 3
|
||||
},
|
||||
{
|
||||
"model": "qwen3-coder:30b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 2,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 11,
|
||||
"testsPassed": 11,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 35734,
|
||||
"totalTokens": 3827,
|
||||
"avgTokPerSec": 122.65156559717951,
|
||||
"promptChars": 11086,
|
||||
"promptTokensEst": 2772,
|
||||
"score": 100,
|
||||
"stars": "★★★★★",
|
||||
"error": null,
|
||||
"round": 3
|
||||
},
|
||||
{
|
||||
"model": "qwen3:14b",
|
||||
"scenario": "todo",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 9,
|
||||
"testsPassed": 6,
|
||||
"testsFailed": 3,
|
||||
"totalDurationMs": 50372,
|
||||
"totalTokens": 2795,
|
||||
"avgTokPerSec": 60.91611850918806,
|
||||
"promptChars": 9758,
|
||||
"promptTokensEst": 2440,
|
||||
"score": 80,
|
||||
"stars": "★★★★☆",
|
||||
"error": null,
|
||||
"round": 3
|
||||
},
|
||||
{
|
||||
"model": "qwen3:14b",
|
||||
"scenario": "users",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 6,
|
||||
"testsPassed": 1,
|
||||
"testsFailed": 5,
|
||||
"totalDurationMs": 38716,
|
||||
"totalTokens": 2144,
|
||||
"avgTokPerSec": 61.0412890406478,
|
||||
"promptChars": 9415,
|
||||
"promptTokensEst": 2354,
|
||||
"score": 50,
|
||||
"stars": "★★★☆☆",
|
||||
"error": null,
|
||||
"round": 3
|
||||
},
|
||||
{
|
||||
"model": "qwen3:14b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 2,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 14,
|
||||
"testsPassed": 7,
|
||||
"testsFailed": 7,
|
||||
"totalDurationMs": 74882,
|
||||
"totalTokens": 4130,
|
||||
"avgTokPerSec": 60.32640855026445,
|
||||
"promptChars": 10506,
|
||||
"promptTokensEst": 2627,
|
||||
"score": 70,
|
||||
"stars": "★★★★☆",
|
||||
"error": null,
|
||||
"round": 3
|
||||
},
|
||||
{
|
||||
"model": "qwen3:8b",
|
||||
"scenario": "todo",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 3,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 0,
|
||||
"testsPassed": 0,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 35913,
|
||||
"totalTokens": 3218,
|
||||
"avgTokPerSec": 100.38516205100154,
|
||||
"promptChars": 11338,
|
||||
"promptTokensEst": 2835,
|
||||
"score": 0,
|
||||
"stars": "☆☆☆☆☆",
|
||||
"error": "Testit kaatuivat",
|
||||
"round": 3
|
||||
},
|
||||
{
|
||||
"model": "qwen3:8b",
|
||||
"scenario": "users",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 7,
|
||||
"testsPassed": 7,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 20974,
|
||||
"totalTokens": 1880,
|
||||
"avgTokPerSec": 101.52450928280543,
|
||||
"promptChars": 8803,
|
||||
"promptTokensEst": 2201,
|
||||
"score": 100,
|
||||
"stars": "★★★★★",
|
||||
"error": null,
|
||||
"round": 3
|
||||
},
|
||||
{
|
||||
"model": "qwen3:8b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 2,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 11,
|
||||
"testsPassed": 9,
|
||||
"testsFailed": 2,
|
||||
"totalDurationMs": 36005,
|
||||
"totalTokens": 3243,
|
||||
"avgTokPerSec": 100.44301406462307,
|
||||
"promptChars": 10414,
|
||||
"promptTokensEst": 2604,
|
||||
"score": 89,
|
||||
"stars": "★★★★☆",
|
||||
"error": null,
|
||||
"round": 3
|
||||
},
|
||||
{
|
||||
"model": "qwen3-coder:30b",
|
||||
"scenario": "todo",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 7,
|
||||
"testsPassed": 1,
|
||||
"testsFailed": 6,
|
||||
"totalDurationMs": 23071,
|
||||
"totalTokens": 2469,
|
||||
"avgTokPerSec": 124.09643322620661,
|
||||
"promptChars": 9960,
|
||||
"promptTokensEst": 2490,
|
||||
"score": 49,
|
||||
"stars": "★★☆☆☆",
|
||||
"error": null,
|
||||
"round": 4
|
||||
},
|
||||
{
|
||||
"model": "qwen3-coder:30b",
|
||||
"scenario": "users",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 8,
|
||||
"testsPassed": 2,
|
||||
"testsFailed": 6,
|
||||
"totalDurationMs": 27062,
|
||||
"totalTokens": 2907,
|
||||
"avgTokPerSec": 123.35530975346687,
|
||||
"promptChars": 9558,
|
||||
"promptTokensEst": 2390,
|
||||
"score": 55,
|
||||
"stars": "★★★☆☆",
|
||||
"error": null,
|
||||
"round": 4
|
||||
},
|
||||
{
|
||||
"model": "qwen3-coder:30b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 2,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 9,
|
||||
"testsPassed": 9,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 29395,
|
||||
"totalTokens": 3156,
|
||||
"avgTokPerSec": 123.22575073561812,
|
||||
"promptChars": 10574,
|
||||
"promptTokensEst": 2644,
|
||||
"score": 100,
|
||||
"stars": "★★★★★",
|
||||
"error": null,
|
||||
"round": 4
|
||||
},
|
||||
{
|
||||
"model": "qwen3:14b",
|
||||
"scenario": "todo",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 6,
|
||||
"testsPassed": 6,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 39590,
|
||||
"totalTokens": 2198,
|
||||
"avgTokPerSec": 61.051945510465806,
|
||||
"promptChars": 9664,
|
||||
"promptTokensEst": 2416,
|
||||
"score": 100,
|
||||
"stars": "★★★★★",
|
||||
"error": null,
|
||||
"round": 4
|
||||
},
|
||||
{
|
||||
"model": "qwen3:14b",
|
||||
"scenario": "users",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 6,
|
||||
"testsPassed": 1,
|
||||
"testsFailed": 5,
|
||||
"totalDurationMs": 36950,
|
||||
"totalTokens": 2042,
|
||||
"avgTokPerSec": 61.01436784429489,
|
||||
"promptChars": 9225,
|
||||
"promptTokensEst": 2306,
|
||||
"score": 50,
|
||||
"stars": "★★★☆☆",
|
||||
"error": null,
|
||||
"round": 4
|
||||
},
|
||||
{
|
||||
"model": "qwen3:14b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 2,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 14,
|
||||
"testsPassed": 2,
|
||||
"testsFailed": 12,
|
||||
"totalDurationMs": 80600,
|
||||
"totalTokens": 4437,
|
||||
"avgTokPerSec": 60.29371170543078,
|
||||
"promptChars": 10688,
|
||||
"promptTokensEst": 2672,
|
||||
"score": 49,
|
||||
"stars": "★★☆☆☆",
|
||||
"error": null,
|
||||
"round": 4
|
||||
},
|
||||
{
|
||||
"model": "qwen3:8b",
|
||||
"scenario": "todo",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 0,
|
||||
"testsPassed": 0,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 29125,
|
||||
"totalTokens": 2619,
|
||||
"avgTokPerSec": 100.90587777586212,
|
||||
"promptChars": 9899,
|
||||
"promptTokensEst": 2475,
|
||||
"score": 0,
|
||||
"stars": "☆☆☆☆☆",
|
||||
"error": "Testit kaatuivat",
|
||||
"round": 4
|
||||
},
|
||||
{
|
||||
"model": "qwen3:8b",
|
||||
"scenario": "users",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 8,
|
||||
"testsPassed": 8,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 21847,
|
||||
"totalTokens": 1957,
|
||||
"avgTokPerSec": 101.44111070734304,
|
||||
"promptChars": 8946,
|
||||
"promptTokensEst": 2237,
|
||||
"score": 100,
|
||||
"stars": "★★★★★",
|
||||
"error": null,
|
||||
"round": 4
|
||||
},
|
||||
{
|
||||
"model": "qwen3:8b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": false,
|
||||
"specEntities": 0,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 0,
|
||||
"testsPassed": 0,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 0,
|
||||
"totalTokens": 0,
|
||||
"avgTokPerSec": 0,
|
||||
"promptChars": 0,
|
||||
"promptTokensEst": 0,
|
||||
"score": 0,
|
||||
"stars": "",
|
||||
"error": "JSON-speksi epäonnistui",
|
||||
"round": 4
|
||||
},
|
||||
{
|
||||
"model": "qwen3-coder:30b",
|
||||
"scenario": "todo",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 6,
|
||||
"testsPassed": 1,
|
||||
"testsFailed": 5,
|
||||
"totalDurationMs": 21127,
|
||||
"totalTokens": 2245,
|
||||
"avgTokPerSec": 124.22714049663371,
|
||||
"promptChars": 9972,
|
||||
"promptTokensEst": 2493,
|
||||
"score": 50,
|
||||
"stars": "★★★☆☆",
|
||||
"error": null,
|
||||
"round": 5
|
||||
},
|
||||
{
|
||||
"model": "qwen3-coder:30b",
|
||||
"scenario": "users",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 9,
|
||||
"testsPassed": 7,
|
||||
"testsFailed": 2,
|
||||
"totalDurationMs": 30281,
|
||||
"totalTokens": 3079,
|
||||
"avgTokPerSec": 123.00254714651271,
|
||||
"promptChars": 9562,
|
||||
"promptTokensEst": 2391,
|
||||
"score": 87,
|
||||
"stars": "★★★★☆",
|
||||
"error": null,
|
||||
"round": 5
|
||||
},
|
||||
{
|
||||
"model": "qwen3-coder:30b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 2,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 12,
|
||||
"testsPassed": 12,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 39630,
|
||||
"totalTokens": 4274,
|
||||
"avgTokPerSec": 123.08303937451802,
|
||||
"promptChars": 11119,
|
||||
"promptTokensEst": 2780,
|
||||
"score": 100,
|
||||
"stars": "★★★★★",
|
||||
"error": null,
|
||||
"round": 5
|
||||
},
|
||||
{
|
||||
"model": "qwen3:14b",
|
||||
"scenario": "todo",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 6,
|
||||
"testsPassed": 6,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 38032,
|
||||
"totalTokens": 2104,
|
||||
"avgTokPerSec": 61.05445464163662,
|
||||
"promptChars": 9455,
|
||||
"promptTokensEst": 2364,
|
||||
"score": 100,
|
||||
"stars": "★★★★★",
|
||||
"error": null,
|
||||
"round": 5
|
||||
},
|
||||
{
|
||||
"model": "qwen3:14b",
|
||||
"scenario": "users",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 0,
|
||||
"testsPassed": 0,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 39620,
|
||||
"totalTokens": 2193,
|
||||
"avgTokPerSec": 61.04565233675101,
|
||||
"promptChars": 9481,
|
||||
"promptTokensEst": 2370,
|
||||
"score": 0,
|
||||
"stars": "☆☆☆☆☆",
|
||||
"error": "Testit kaatuivat",
|
||||
"round": 5
|
||||
},
|
||||
{
|
||||
"model": "qwen3:14b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 2,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 9,
|
||||
"testsPassed": 7,
|
||||
"testsFailed": 2,
|
||||
"totalDurationMs": 63579,
|
||||
"totalTokens": 3520,
|
||||
"avgTokPerSec": 60.51513453009977,
|
||||
"promptChars": 10493,
|
||||
"promptTokensEst": 2623,
|
||||
"score": 87,
|
||||
"stars": "★★★★☆",
|
||||
"error": null,
|
||||
"round": 5
|
||||
},
|
||||
{
|
||||
"model": "qwen3:8b",
|
||||
"scenario": "todo",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 9,
|
||||
"testsPassed": 6,
|
||||
"testsFailed": 3,
|
||||
"totalDurationMs": 30845,
|
||||
"totalTokens": 2777,
|
||||
"avgTokPerSec": 100.79046137130972,
|
||||
"promptChars": 9507,
|
||||
"promptTokensEst": 2377,
|
||||
"score": 80,
|
||||
"stars": "★★★★☆",
|
||||
"error": null,
|
||||
"round": 5
|
||||
},
|
||||
{
|
||||
"model": "qwen3:8b",
|
||||
"scenario": "users",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 8,
|
||||
"testsPassed": 6,
|
||||
"testsFailed": 2,
|
||||
"totalDurationMs": 21413,
|
||||
"totalTokens": 1914,
|
||||
"avgTokPerSec": 101.25525436264132,
|
||||
"promptChars": 8804,
|
||||
"promptTokensEst": 2201,
|
||||
"score": 85,
|
||||
"stars": "★★★★☆",
|
||||
"error": null,
|
||||
"round": 5
|
||||
},
|
||||
{
|
||||
"model": "qwen3:8b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": false,
|
||||
"specEntities": 0,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 0,
|
||||
"testsPassed": 0,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 0,
|
||||
"totalTokens": 0,
|
||||
"avgTokPerSec": 0,
|
||||
"promptChars": 0,
|
||||
"promptTokensEst": 0,
|
||||
"score": 0,
|
||||
"stars": "",
|
||||
"error": "JSON-speksi epäonnistui",
|
||||
"round": 5
|
||||
}
|
||||
]
|
||||
183
kipina-codebench/results/2026-04-14T08-18.html
Normal file
183
kipina-codebench/results/2026-04-14T08-18.html
Normal file
File diff suppressed because one or more lines are too long
947
kipina-codebench/results/2026-04-14T08-18.json
Normal file
947
kipina-codebench/results/2026-04-14T08-18.json
Normal file
@@ -0,0 +1,947 @@
|
||||
[
|
||||
{
|
||||
"model": "qwen3:14b",
|
||||
"scenario": "todo",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 9,
|
||||
"testsPassed": 6,
|
||||
"testsFailed": 3,
|
||||
"totalDurationMs": 33892,
|
||||
"totalTokens": 2675,
|
||||
"avgTokPerSec": 88.07409036121237,
|
||||
"promptChars": 9688,
|
||||
"promptTokensEst": 2422,
|
||||
"score": 80,
|
||||
"stars": "★★★★☆",
|
||||
"error": null,
|
||||
"round": 1
|
||||
},
|
||||
{
|
||||
"model": "qwen3:14b",
|
||||
"scenario": "users",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 8,
|
||||
"testsPassed": 6,
|
||||
"testsFailed": 2,
|
||||
"totalDurationMs": 30647,
|
||||
"totalTokens": 2549,
|
||||
"avgTokPerSec": 88.4488185974085,
|
||||
"promptChars": 9594,
|
||||
"promptTokensEst": 2399,
|
||||
"score": 85,
|
||||
"stars": "★★★★☆",
|
||||
"error": null,
|
||||
"round": 1
|
||||
},
|
||||
{
|
||||
"model": "qwen3:14b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 2,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 13,
|
||||
"testsPassed": 6,
|
||||
"testsFailed": 7,
|
||||
"totalDurationMs": 44371,
|
||||
"totalTokens": 3678,
|
||||
"avgTokPerSec": 88.172616246191,
|
||||
"promptChars": 10432,
|
||||
"promptTokensEst": 2608,
|
||||
"score": 68,
|
||||
"stars": "★★★☆☆",
|
||||
"error": null,
|
||||
"round": 1
|
||||
},
|
||||
{
|
||||
"model": "qwen3:8b",
|
||||
"scenario": "todo",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 7,
|
||||
"testsPassed": 6,
|
||||
"testsFailed": 1,
|
||||
"totalDurationMs": 18385,
|
||||
"totalTokens": 2375,
|
||||
"avgTokPerSec": 147.62230806597154,
|
||||
"promptChars": 9478,
|
||||
"promptTokensEst": 2370,
|
||||
"score": 91,
|
||||
"stars": "★★★★★",
|
||||
"error": null,
|
||||
"round": 1
|
||||
},
|
||||
{
|
||||
"model": "qwen3:8b",
|
||||
"scenario": "users",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 7,
|
||||
"testsPassed": 7,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 13968,
|
||||
"totalTokens": 1904,
|
||||
"avgTokPerSec": 148.3084817167518,
|
||||
"promptChars": 8837,
|
||||
"promptTokensEst": 2209,
|
||||
"score": 100,
|
||||
"stars": "★★★★★",
|
||||
"error": null,
|
||||
"round": 1
|
||||
},
|
||||
{
|
||||
"model": "qwen3:8b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 2,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 0,
|
||||
"testsPassed": 0,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 25642,
|
||||
"totalTokens": 3476,
|
||||
"avgTokPerSec": 146.49556892944076,
|
||||
"promptChars": 10734,
|
||||
"promptTokensEst": 2684,
|
||||
"score": 0,
|
||||
"stars": "☆☆☆☆☆",
|
||||
"error": "Testit kaatuivat",
|
||||
"round": 1
|
||||
},
|
||||
{
|
||||
"model": "qwen3-coder:30b",
|
||||
"scenario": "todo",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 8,
|
||||
"testsPassed": 6,
|
||||
"testsFailed": 2,
|
||||
"totalDurationMs": 19982,
|
||||
"totalTokens": 2937,
|
||||
"avgTokPerSec": 191.2786317674431,
|
||||
"promptChars": 10281,
|
||||
"promptTokensEst": 2570,
|
||||
"score": 85,
|
||||
"stars": "★★★★☆",
|
||||
"error": null,
|
||||
"round": 1
|
||||
},
|
||||
{
|
||||
"model": "qwen3-coder:30b",
|
||||
"scenario": "users",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 7,
|
||||
"testsPassed": 7,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 17114,
|
||||
"totalTokens": 2903,
|
||||
"avgTokPerSec": 190.51221206765385,
|
||||
"promptChars": 9654,
|
||||
"promptTokensEst": 2414,
|
||||
"score": 100,
|
||||
"stars": "★★★★★",
|
||||
"error": null,
|
||||
"round": 1
|
||||
},
|
||||
{
|
||||
"model": "qwen3-coder:30b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 2,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 11,
|
||||
"testsPassed": 11,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 22352,
|
||||
"totalTokens": 3776,
|
||||
"avgTokPerSec": 190.56628728306987,
|
||||
"promptChars": 11134,
|
||||
"promptTokensEst": 2784,
|
||||
"score": 100,
|
||||
"stars": "★★★★★",
|
||||
"error": null,
|
||||
"round": 1
|
||||
},
|
||||
{
|
||||
"model": "qwen3:14b",
|
||||
"scenario": "todo",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 8,
|
||||
"testsPassed": 6,
|
||||
"testsFailed": 2,
|
||||
"totalDurationMs": 31217,
|
||||
"totalTokens": 2463,
|
||||
"avgTokPerSec": 88.6684646675098,
|
||||
"promptChars": 9598,
|
||||
"promptTokensEst": 2400,
|
||||
"score": 85,
|
||||
"stars": "★★★★☆",
|
||||
"error": null,
|
||||
"round": 2
|
||||
},
|
||||
{
|
||||
"model": "qwen3:14b",
|
||||
"scenario": "users",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 7,
|
||||
"testsPassed": 7,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 27520,
|
||||
"totalTokens": 2288,
|
||||
"avgTokPerSec": 88.64765360012593,
|
||||
"promptChars": 9612,
|
||||
"promptTokensEst": 2403,
|
||||
"score": 100,
|
||||
"stars": "★★★★★",
|
||||
"error": null,
|
||||
"round": 2
|
||||
},
|
||||
{
|
||||
"model": "qwen3:14b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 2,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 12,
|
||||
"testsPassed": 3,
|
||||
"testsFailed": 9,
|
||||
"totalDurationMs": 41874,
|
||||
"totalTokens": 3474,
|
||||
"avgTokPerSec": 88.22266853318554,
|
||||
"promptChars": 10408,
|
||||
"promptTokensEst": 2602,
|
||||
"score": 55,
|
||||
"stars": "★★★☆☆",
|
||||
"error": null,
|
||||
"round": 2
|
||||
},
|
||||
{
|
||||
"model": "qwen3:8b",
|
||||
"scenario": "todo",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 2,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 11,
|
||||
"testsPassed": 11,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 24781,
|
||||
"totalTokens": 3240,
|
||||
"avgTokPerSec": 146.89167309934365,
|
||||
"promptChars": 10179,
|
||||
"promptTokensEst": 2545,
|
||||
"score": 100,
|
||||
"stars": "★★★★★",
|
||||
"error": null,
|
||||
"round": 2
|
||||
},
|
||||
{
|
||||
"model": "qwen3:8b",
|
||||
"scenario": "users",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 9,
|
||||
"testsPassed": 6,
|
||||
"testsFailed": 3,
|
||||
"totalDurationMs": 19148,
|
||||
"totalTokens": 2605,
|
||||
"avgTokPerSec": 147.55250620481297,
|
||||
"promptChars": 9634,
|
||||
"promptTokensEst": 2409,
|
||||
"score": 80,
|
||||
"stars": "★★★★☆",
|
||||
"error": null,
|
||||
"round": 2
|
||||
},
|
||||
{
|
||||
"model": "qwen3:8b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 11,
|
||||
"testsPassed": 11,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 23816,
|
||||
"totalTokens": 3232,
|
||||
"avgTokPerSec": 147.25857324533817,
|
||||
"promptChars": 9226,
|
||||
"promptTokensEst": 2307,
|
||||
"score": 100,
|
||||
"stars": "★★★★★",
|
||||
"error": null,
|
||||
"round": 2
|
||||
},
|
||||
{
|
||||
"model": "qwen3-coder:30b",
|
||||
"scenario": "todo",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 6,
|
||||
"testsPassed": 6,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 16639,
|
||||
"totalTokens": 2369,
|
||||
"avgTokPerSec": 191.61273045157245,
|
||||
"promptChars": 10048,
|
||||
"promptTokensEst": 2512,
|
||||
"score": 100,
|
||||
"stars": "★★★★★",
|
||||
"error": null,
|
||||
"round": 2
|
||||
},
|
||||
{
|
||||
"model": "qwen3-coder:30b",
|
||||
"scenario": "users",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 9,
|
||||
"testsPassed": 8,
|
||||
"testsFailed": 1,
|
||||
"totalDurationMs": 18588,
|
||||
"totalTokens": 3163,
|
||||
"avgTokPerSec": 190.86975006725547,
|
||||
"promptChars": 10048,
|
||||
"promptTokensEst": 2512,
|
||||
"score": 93,
|
||||
"stars": "★★★★★",
|
||||
"error": null,
|
||||
"round": 2
|
||||
},
|
||||
{
|
||||
"model": "qwen3-coder:30b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 2,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 10,
|
||||
"testsPassed": 10,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 22677,
|
||||
"totalTokens": 3828,
|
||||
"avgTokPerSec": 190.15611016906482,
|
||||
"promptChars": 11090,
|
||||
"promptTokensEst": 2773,
|
||||
"score": 100,
|
||||
"stars": "★★★★★",
|
||||
"error": null,
|
||||
"round": 2
|
||||
},
|
||||
{
|
||||
"model": "qwen3:14b",
|
||||
"scenario": "todo",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 6,
|
||||
"testsPassed": 6,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 26449,
|
||||
"totalTokens": 2063,
|
||||
"avgTokPerSec": 88.77498453063184,
|
||||
"promptChars": 9608,
|
||||
"promptTokensEst": 2402,
|
||||
"score": 100,
|
||||
"stars": "★★★★★",
|
||||
"error": null,
|
||||
"round": 3
|
||||
},
|
||||
{
|
||||
"model": "qwen3:14b",
|
||||
"scenario": "users",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 7,
|
||||
"testsPassed": 7,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 27510,
|
||||
"totalTokens": 2289,
|
||||
"avgTokPerSec": 88.74699253414485,
|
||||
"promptChars": 9418,
|
||||
"promptTokensEst": 2355,
|
||||
"score": 100,
|
||||
"stars": "★★★★★",
|
||||
"error": null,
|
||||
"round": 3
|
||||
},
|
||||
{
|
||||
"model": "qwen3:14b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 2,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 12,
|
||||
"testsPassed": 3,
|
||||
"testsFailed": 9,
|
||||
"totalDurationMs": 45105,
|
||||
"totalTokens": 3738,
|
||||
"avgTokPerSec": 88.04788102995212,
|
||||
"promptChars": 10564,
|
||||
"promptTokensEst": 2641,
|
||||
"score": 55,
|
||||
"stars": "★★★☆☆",
|
||||
"error": null,
|
||||
"round": 3
|
||||
},
|
||||
{
|
||||
"model": "qwen3:8b",
|
||||
"scenario": "todo",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 8,
|
||||
"testsPassed": 7,
|
||||
"testsFailed": 1,
|
||||
"totalDurationMs": 19204,
|
||||
"totalTokens": 2480,
|
||||
"avgTokPerSec": 147.91758782382294,
|
||||
"promptChars": 9391,
|
||||
"promptTokensEst": 2348,
|
||||
"score": 93,
|
||||
"stars": "★★★★★",
|
||||
"error": null,
|
||||
"round": 3
|
||||
},
|
||||
{
|
||||
"model": "qwen3:8b",
|
||||
"scenario": "users",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 6,
|
||||
"testsPassed": 6,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 12990,
|
||||
"totalTokens": 1769,
|
||||
"avgTokPerSec": 148.2616673700717,
|
||||
"promptChars": 8898,
|
||||
"promptTokensEst": 2225,
|
||||
"score": 100,
|
||||
"stars": "★★★★★",
|
||||
"error": null,
|
||||
"round": 3
|
||||
},
|
||||
{
|
||||
"model": "qwen3:8b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 12,
|
||||
"testsPassed": 10,
|
||||
"testsFailed": 2,
|
||||
"totalDurationMs": 25831,
|
||||
"totalTokens": 3500,
|
||||
"avgTokPerSec": 146.86924785880186,
|
||||
"promptChars": 9465,
|
||||
"promptTokensEst": 2366,
|
||||
"score": 90,
|
||||
"stars": "★★★★★",
|
||||
"error": null,
|
||||
"round": 3
|
||||
},
|
||||
{
|
||||
"model": "qwen3-coder:30b",
|
||||
"scenario": "todo",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 6,
|
||||
"testsPassed": 6,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 19453,
|
||||
"totalTokens": 2845,
|
||||
"avgTokPerSec": 191.37382231956113,
|
||||
"promptChars": 10157,
|
||||
"promptTokensEst": 2539,
|
||||
"score": 100,
|
||||
"stars": "★★★★★",
|
||||
"error": null,
|
||||
"round": 3
|
||||
},
|
||||
{
|
||||
"model": "qwen3-coder:30b",
|
||||
"scenario": "users",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 9,
|
||||
"testsPassed": 9,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 21570,
|
||||
"totalTokens": 3529,
|
||||
"avgTokPerSec": 190.65454623497536,
|
||||
"promptChars": 9732,
|
||||
"promptTokensEst": 2433,
|
||||
"score": 100,
|
||||
"stars": "★★★★★",
|
||||
"error": null,
|
||||
"round": 3
|
||||
},
|
||||
{
|
||||
"model": "qwen3-coder:30b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 2,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 11,
|
||||
"testsPassed": 11,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 25537,
|
||||
"totalTokens": 4300,
|
||||
"avgTokPerSec": 189.94521619124598,
|
||||
"promptChars": 11127,
|
||||
"promptTokensEst": 2782,
|
||||
"score": 100,
|
||||
"stars": "★★★★★",
|
||||
"error": null,
|
||||
"round": 3
|
||||
},
|
||||
{
|
||||
"model": "qwen3:14b",
|
||||
"scenario": "todo",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 9,
|
||||
"testsPassed": 7,
|
||||
"testsFailed": 2,
|
||||
"totalDurationMs": 31923,
|
||||
"totalTokens": 2522,
|
||||
"avgTokPerSec": 88.62182881661799,
|
||||
"promptChars": 9700,
|
||||
"promptTokensEst": 2425,
|
||||
"score": 87,
|
||||
"stars": "★★★★☆",
|
||||
"error": null,
|
||||
"round": 4
|
||||
},
|
||||
{
|
||||
"model": "qwen3:14b",
|
||||
"scenario": "users",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 7,
|
||||
"testsPassed": 7,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 26000,
|
||||
"totalTokens": 2163,
|
||||
"avgTokPerSec": 88.86878707672254,
|
||||
"promptChars": 9288,
|
||||
"promptTokensEst": 2322,
|
||||
"score": 100,
|
||||
"stars": "★★★★★",
|
||||
"error": null,
|
||||
"round": 4
|
||||
},
|
||||
{
|
||||
"model": "qwen3:14b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 2,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 10,
|
||||
"testsPassed": 10,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 43275,
|
||||
"totalTokens": 3588,
|
||||
"avgTokPerSec": 88.24995936347965,
|
||||
"promptChars": 10173,
|
||||
"promptTokensEst": 2543,
|
||||
"score": 100,
|
||||
"stars": "★★★★★",
|
||||
"error": null,
|
||||
"round": 4
|
||||
},
|
||||
{
|
||||
"model": "qwen3:8b",
|
||||
"scenario": "todo",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 2,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 14,
|
||||
"testsPassed": 0,
|
||||
"testsFailed": 14,
|
||||
"totalDurationMs": 30045,
|
||||
"totalTokens": 3913,
|
||||
"avgTokPerSec": 146.51683619371713,
|
||||
"promptChars": 10334,
|
||||
"promptTokensEst": 2584,
|
||||
"score": 40,
|
||||
"stars": "★★☆☆☆",
|
||||
"error": null,
|
||||
"round": 4
|
||||
},
|
||||
{
|
||||
"model": "qwen3:8b",
|
||||
"scenario": "users",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 9,
|
||||
"testsPassed": 5,
|
||||
"testsFailed": 4,
|
||||
"totalDurationMs": 17076,
|
||||
"totalTokens": 2321,
|
||||
"avgTokPerSec": 147.99547121069506,
|
||||
"promptChars": 9451,
|
||||
"promptTokensEst": 2363,
|
||||
"score": 73,
|
||||
"stars": "★★★★☆",
|
||||
"error": null,
|
||||
"round": 4
|
||||
},
|
||||
{
|
||||
"model": "qwen3:8b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 11,
|
||||
"testsPassed": 11,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 23890,
|
||||
"totalTokens": 3243,
|
||||
"avgTokPerSec": 147.20125507974117,
|
||||
"promptChars": 9217,
|
||||
"promptTokensEst": 2304,
|
||||
"score": 100,
|
||||
"stars": "★★★★★",
|
||||
"error": null,
|
||||
"round": 4
|
||||
},
|
||||
{
|
||||
"model": "qwen3-coder:30b",
|
||||
"scenario": "todo",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 8,
|
||||
"testsPassed": 6,
|
||||
"testsFailed": 2,
|
||||
"totalDurationMs": 21812,
|
||||
"totalTokens": 3246,
|
||||
"avgTokPerSec": 191.07801335688654,
|
||||
"promptChars": 10249,
|
||||
"promptTokensEst": 2562,
|
||||
"score": 85,
|
||||
"stars": "★★★★☆",
|
||||
"error": null,
|
||||
"round": 4
|
||||
},
|
||||
{
|
||||
"model": "qwen3-coder:30b",
|
||||
"scenario": "users",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 9,
|
||||
"testsPassed": 8,
|
||||
"testsFailed": 1,
|
||||
"totalDurationMs": 20325,
|
||||
"totalTokens": 3441,
|
||||
"avgTokPerSec": 190.10241840094508,
|
||||
"promptChars": 9930,
|
||||
"promptTokensEst": 2483,
|
||||
"score": 93,
|
||||
"stars": "★★★★★",
|
||||
"error": null,
|
||||
"round": 4
|
||||
},
|
||||
{
|
||||
"model": "qwen3-coder:30b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 2,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 12,
|
||||
"testsPassed": 12,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 26087,
|
||||
"totalTokens": 4387,
|
||||
"avgTokPerSec": 189.8005689388054,
|
||||
"promptChars": 11109,
|
||||
"promptTokensEst": 2777,
|
||||
"score": 100,
|
||||
"stars": "★★★★★",
|
||||
"error": null,
|
||||
"round": 4
|
||||
},
|
||||
{
|
||||
"model": "qwen3:14b",
|
||||
"scenario": "todo",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 7,
|
||||
"testsPassed": 7,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 30287,
|
||||
"totalTokens": 2388,
|
||||
"avgTokPerSec": 88.72243320918638,
|
||||
"promptChars": 9695,
|
||||
"promptTokensEst": 2424,
|
||||
"score": 100,
|
||||
"stars": "★★★★★",
|
||||
"error": null,
|
||||
"round": 5
|
||||
},
|
||||
{
|
||||
"model": "qwen3:14b",
|
||||
"scenario": "users",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 9,
|
||||
"testsPassed": 6,
|
||||
"testsFailed": 3,
|
||||
"totalDurationMs": 31212,
|
||||
"totalTokens": 2601,
|
||||
"avgTokPerSec": 88.71289036919063,
|
||||
"promptChars": 9619,
|
||||
"promptTokensEst": 2405,
|
||||
"score": 80,
|
||||
"stars": "★★★★☆",
|
||||
"error": null,
|
||||
"round": 5
|
||||
},
|
||||
{
|
||||
"model": "qwen3:14b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 2,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 15,
|
||||
"testsPassed": 3,
|
||||
"testsFailed": 12,
|
||||
"totalDurationMs": 50939,
|
||||
"totalTokens": 4217,
|
||||
"avgTokPerSec": 88.06125722020734,
|
||||
"promptChars": 10743,
|
||||
"promptTokensEst": 2686,
|
||||
"score": 52,
|
||||
"stars": "★★★☆☆",
|
||||
"error": null,
|
||||
"round": 5
|
||||
},
|
||||
{
|
||||
"model": "qwen3:8b",
|
||||
"scenario": "todo",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 7,
|
||||
"testsPassed": 6,
|
||||
"testsFailed": 1,
|
||||
"totalDurationMs": 17913,
|
||||
"totalTokens": 2310,
|
||||
"avgTokPerSec": 148.0291268001691,
|
||||
"promptChars": 9357,
|
||||
"promptTokensEst": 2339,
|
||||
"score": 91,
|
||||
"stars": "★★★★★",
|
||||
"error": null,
|
||||
"round": 5
|
||||
},
|
||||
{
|
||||
"model": "qwen3:8b",
|
||||
"scenario": "users",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 8,
|
||||
"testsPassed": 8,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 13948,
|
||||
"totalTokens": 1898,
|
||||
"avgTokPerSec": 148.37907379944423,
|
||||
"promptChars": 8725,
|
||||
"promptTokensEst": 2181,
|
||||
"score": 100,
|
||||
"stars": "★★★★★",
|
||||
"error": null,
|
||||
"round": 5
|
||||
},
|
||||
{
|
||||
"model": "qwen3:8b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": false,
|
||||
"specEntities": 0,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 0,
|
||||
"testsPassed": 0,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 0,
|
||||
"totalTokens": 0,
|
||||
"avgTokPerSec": 0,
|
||||
"promptChars": 0,
|
||||
"promptTokensEst": 0,
|
||||
"score": 0,
|
||||
"stars": "",
|
||||
"error": "JSON-speksi epäonnistui",
|
||||
"round": 5
|
||||
},
|
||||
{
|
||||
"model": "qwen3-coder:30b",
|
||||
"scenario": "todo",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 6,
|
||||
"testsPassed": 1,
|
||||
"testsFailed": 5,
|
||||
"totalDurationMs": 15229,
|
||||
"totalTokens": 2119,
|
||||
"avgTokPerSec": 192.33007410215646,
|
||||
"promptChars": 9827,
|
||||
"promptTokensEst": 2457,
|
||||
"score": 50,
|
||||
"stars": "★★★☆☆",
|
||||
"error": null,
|
||||
"round": 5
|
||||
},
|
||||
{
|
||||
"model": "qwen3-coder:30b",
|
||||
"scenario": "users",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 6,
|
||||
"testsPassed": 6,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 18223,
|
||||
"totalTokens": 3093,
|
||||
"avgTokPerSec": 190.71372054282037,
|
||||
"promptChars": 9641,
|
||||
"promptTokensEst": 2410,
|
||||
"score": 100,
|
||||
"stars": "★★★★★",
|
||||
"error": null,
|
||||
"round": 5
|
||||
},
|
||||
{
|
||||
"model": "qwen3-coder:30b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 2,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 10,
|
||||
"testsPassed": 1,
|
||||
"testsFailed": 9,
|
||||
"totalDurationMs": 21215,
|
||||
"totalTokens": 3589,
|
||||
"avgTokPerSec": 190.49493540345176,
|
||||
"promptChars": 11180,
|
||||
"promptTokensEst": 2795,
|
||||
"score": 46,
|
||||
"stars": "★★☆☆☆",
|
||||
"error": null,
|
||||
"round": 5
|
||||
}
|
||||
]
|
||||
183
kipina-codebench/results/2026-04-14T09-43.html
Normal file
183
kipina-codebench/results/2026-04-14T09-43.html
Normal file
@@ -0,0 +1,183 @@
|
||||
<!DOCTYPE html>
|
||||
<html lang="fi">
|
||||
<head>
|
||||
<meta charset="UTF-8">
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
||||
<title>Kipina Model Benchmark</title>
|
||||
<style>
|
||||
:root { --bg: #0d1117; --card: #161b22; --border: #30363d; --text: #e6edf3; --dim: #8b949e; --green: #3fb950; --yellow: #d29922; --red: #f85149; --blue: #58a6ff; }
|
||||
* { box-sizing: border-box; margin: 0; padding: 0; }
|
||||
body { font-family: -apple-system, 'Segoe UI', Helvetica, Arial, sans-serif; background: var(--bg); color: var(--text); padding: 2rem; max-width: 1400px; margin: 0 auto; }
|
||||
h1 { font-size: 1.5rem; margin-bottom: 0.5rem; }
|
||||
.meta { color: var(--dim); font-size: 0.85rem; margin-bottom: 2rem; }
|
||||
.cards { display: grid; grid-template-columns: repeat(auto-fit, minmax(180px, 1fr)); gap: 1rem; margin-bottom: 2rem; }
|
||||
.card { background: var(--card); border: 1px solid var(--border); border-radius: 8px; padding: 1rem; }
|
||||
.card .label { color: var(--dim); font-size: 0.75rem; text-transform: uppercase; letter-spacing: 0.05em; }
|
||||
.card .value { font-size: 1.8rem; font-weight: 600; margin-top: 0.25rem; }
|
||||
.card .sub { color: var(--dim); font-size: 0.8rem; margin-top: 0.25rem; }
|
||||
table { width: 100%; border-collapse: collapse; background: var(--card); border: 1px solid var(--border); border-radius: 8px; overflow: hidden; margin-bottom: 2rem; }
|
||||
th { background: #1c2128; text-align: left; padding: 0.6rem 0.8rem; font-size: 0.75rem; text-transform: uppercase; letter-spacing: 0.05em; color: var(--dim); cursor: pointer; user-select: none; white-space: nowrap; }
|
||||
th:hover { color: var(--text); }
|
||||
th.sorted-asc::after { content: ' ▲'; }
|
||||
th.sorted-desc::after { content: ' ▼'; }
|
||||
td { padding: 0.5rem 0.8rem; border-top: 1px solid var(--border); font-size: 0.85rem; white-space: nowrap; }
|
||||
tr:hover td { background: #1c2128; }
|
||||
.pass { color: var(--green); }
|
||||
.partial { color: var(--yellow); }
|
||||
.fail { color: var(--red); }
|
||||
.stars { letter-spacing: 1px; }
|
||||
.bar { display: inline-block; height: 8px; border-radius: 4px; vertical-align: middle; }
|
||||
.bar-bg { background: var(--border); }
|
||||
.bar-fill { background: var(--green); }
|
||||
.bar-partial { background: var(--yellow); }
|
||||
.model-name { font-weight: 600; }
|
||||
h2 { font-size: 1.1rem; margin-bottom: 1rem; color: var(--dim); }
|
||||
.summary-table th:first-child, .summary-table td:first-child { min-width: 200px; }
|
||||
</style>
|
||||
</head>
|
||||
<body>
|
||||
|
||||
<h1>Kipina Model Benchmark</h1>
|
||||
<div class="meta" id="meta"></div>
|
||||
|
||||
<div class="cards" id="cards"></div>
|
||||
|
||||
<h2>Mallikohtainen yhteenveto</h2>
|
||||
<table class="summary-table" id="summary-table"><thead></thead><tbody></tbody></table>
|
||||
|
||||
<h2>Kaikki tulokset</h2>
|
||||
<table id="results-table"><thead></thead><tbody></tbody></table>
|
||||
|
||||
<script>
|
||||
const RAW = [{"model":"qwen3-coder:30b","scenario":"todo","reqOk":true,"specOk":true,"specEntities":1,"validationIssues":0,"fixRounds":0,"testsTotal":6,"testsPassed":6,"testsFailed":0,"totalDurationMs":21688,"totalTokens":2243,"avgTokPerSec":121.7719614197307,"promptChars":11588,"promptTokensEst":2897,"score":100,"stars":"★★★★★","error":null}];
|
||||
|
||||
const starsFor = s => s >= 90 ? '★★★★★' : s >= 70 ? '★★★★☆' : s >= 50 ? '★★★☆☆' : s >= 25 ? '★★☆☆☆' : s > 0 ? '★☆☆☆☆' : '☆☆☆☆☆';
|
||||
function calcScore(r) {
|
||||
if (r.error && r.testsTotal === 0) return 0;
|
||||
let s = 0;
|
||||
if (r.specOk) s += 10;
|
||||
if (!r.error || r.testsTotal > 0) s += 10;
|
||||
if (r.testsTotal > 0) s += Math.round((r.testsPassed / r.testsTotal) * 60);
|
||||
s += Math.max(0, 20 - (r.fixRounds || 0) * 10);
|
||||
return Math.min(100, s);
|
||||
}
|
||||
// Laske pisteet jos puuttuvat
|
||||
const DATA = RAW.map(r => {
|
||||
if (r.score == null) r.score = calcScore(r);
|
||||
if (!r.stars) r.stars = starsFor(r.score);
|
||||
if (!r.promptTokensEst) r.promptTokensEst = r.promptChars ? Math.round(r.promptChars / 4) : 0;
|
||||
return r;
|
||||
});
|
||||
const cls = r => (!r.error && r.testsPassed === r.testsTotal && r.testsTotal > 0) ? 'pass' : (r.testsTotal > 0 && r.testsPassed > 0) ? 'partial' : 'fail';
|
||||
const pctBar = (passed, total, w=80) => {
|
||||
if (total === 0) return '-';
|
||||
const pct = passed/total*100;
|
||||
const c = pct === 100 ? 'bar-fill' : 'bar-partial';
|
||||
return `<span class="bar bar-bg" style="width:${w}px"><span class="bar ${c}" style="width:${Math.round(pct/100*w)}px"></span></span> ${passed}/${total}`;
|
||||
};
|
||||
|
||||
// Meta
|
||||
const totalTime = DATA.reduce((s,r) => s + r.totalDurationMs, 0);
|
||||
document.getElementById('meta').textContent = `${new Date().toLocaleDateString('fi-FI')} — ${DATA.length} ajoa — ${(totalTime/1000/60).toFixed(1)} min`;
|
||||
|
||||
// Cards
|
||||
const models = [...new Set(DATA.map(r => r.model))];
|
||||
const scenarios = [...new Set(DATA.map(r => r.scenario))];
|
||||
const avgScore = DATA.length ? Math.round(DATA.reduce((s,r) => s + r.score, 0) / DATA.length) : 0;
|
||||
const totalPassed = DATA.reduce((s,r) => s + r.testsPassed, 0);
|
||||
const totalTests = DATA.reduce((s,r) => s + r.testsTotal, 0);
|
||||
const passRate = totalTests ? Math.round(totalPassed/totalTests*100) : 0;
|
||||
const bestModel = models.map(m => {
|
||||
const mrs = DATA.filter(r => r.model === m);
|
||||
return { model: m, avg: Math.round(mrs.reduce((s,r) => s + r.score, 0) / mrs.length) };
|
||||
}).sort((a,b) => b.avg - a.avg)[0];
|
||||
const fastestModel = models.map(m => {
|
||||
const mrs = DATA.filter(r => r.model === m);
|
||||
return { model: m, speed: Math.round(mrs.reduce((s,r) => s + r.avgTokPerSec, 0) / mrs.length) };
|
||||
}).sort((a,b) => b.speed - a.speed)[0];
|
||||
|
||||
document.getElementById('cards').innerHTML = `
|
||||
<div class="card"><div class="label">Keskiarvo</div><div class="value">${starsFor(avgScore)}</div><div class="sub">${avgScore} pistetta</div></div>
|
||||
<div class="card"><div class="label">Testien läpäisy</div><div class="value">${passRate}%</div><div class="sub">${totalPassed}/${totalTests} testiä</div></div>
|
||||
<div class="card"><div class="label">Paras malli</div><div class="value" style="font-size:1.2rem">${bestModel?.model || '-'}</div><div class="sub">${bestModel?.avg || 0}p</div></div>
|
||||
<div class="card"><div class="label">Nopein</div><div class="value" style="font-size:1.2rem">${fastestModel?.model || '-'}</div><div class="sub">${fastestModel?.speed || 0} tok/s</div></div>
|
||||
<div class="card"><div class="label">Malleja</div><div class="value">${models.length}</div><div class="sub">${scenarios.length} skenaariota</div></div>
|
||||
<div class="card"><div class="label">Kokonaisaika</div><div class="value">${(totalTime/1000/60).toFixed(1)}</div><div class="sub">minuuttia</div></div>
|
||||
`;
|
||||
|
||||
// Summary table
|
||||
const sumHead = document.querySelector('#summary-table thead');
|
||||
const sumBody = document.querySelector('#summary-table tbody');
|
||||
sumHead.innerHTML = '<tr><th>Malli</th>' + scenarios.map(s => `<th>${s}</th>`).join('') + '<th>Yht.</th><th>Out tok</th><th>Aika</th><th>tok/s</th><th>Pisteet</th></tr>';
|
||||
|
||||
const modelRows = models.map(m => {
|
||||
const mrs = DATA.filter(r => r.model === m);
|
||||
const tp = mrs.reduce((s,r) => s + r.testsPassed, 0);
|
||||
const tt = mrs.reduce((s,r) => s + r.testsTotal, 0);
|
||||
const tok = mrs.reduce((s,r) => s + r.totalTokens, 0);
|
||||
const time = mrs.reduce((s,r) => s + r.totalDurationMs, 0);
|
||||
const speed = Math.round(mrs.reduce((s,r) => s + r.avgTokPerSec, 0) / mrs.length);
|
||||
const avg = Math.round(mrs.reduce((s,r) => s + r.score, 0) / mrs.length);
|
||||
const scenCols = scenarios.map(s => {
|
||||
const r = mrs.find(r => r.scenario === s);
|
||||
if (!r) return '<td>-</td>';
|
||||
return `<td class="${cls(r)}">${pctBar(r.testsPassed, r.testsTotal, 60)} <span style="color:var(--dim)">${(r.totalDurationMs/1000).toFixed(0)}s</span></td>`;
|
||||
}).join('');
|
||||
return { avg, html: `<tr><td class="model-name">${m}</td>${scenCols}<td>${pctBar(tp, tt)}</td><td>${(tok/1000).toFixed(1)}K</td><td>${(time/1000).toFixed(0)}s</td><td>${speed}</td><td><span class="stars">${starsFor(avg)}</span> ${avg}p</td></tr>` };
|
||||
}).sort((a,b) => b.avg - a.avg);
|
||||
sumBody.innerHTML = modelRows.map(r => r.html).join('');
|
||||
|
||||
// Results table
|
||||
const resHead = document.querySelector('#results-table thead');
|
||||
const resBody = document.querySelector('#results-table tbody');
|
||||
const resCols = ['Malli','Skenaario','Speksi','Testit','Korjaus','Ctx','Out tok','Aika','tok/s','Pisteet'];
|
||||
resHead.innerHTML = '<tr>' + resCols.map((c,i) => `<th data-col="${i}">${c}</th>`).join('') + '</tr>';
|
||||
|
||||
let sortCol = 9, sortAsc = false;
|
||||
function renderResults() {
|
||||
const sorted = [...DATA].sort((a,b) => {
|
||||
const vals = [
|
||||
[a.model, b.model],
|
||||
[a.scenario, b.scenario],
|
||||
[a.specEntities, b.specEntities],
|
||||
[a.testsPassed/Math.max(a.testsTotal,1), b.testsPassed/Math.max(b.testsTotal,1)],
|
||||
[a.fixRounds, b.fixRounds],
|
||||
[a.promptTokensEst, b.promptTokensEst],
|
||||
[a.totalTokens, b.totalTokens],
|
||||
[a.totalDurationMs, b.totalDurationMs],
|
||||
[a.avgTokPerSec, b.avgTokPerSec],
|
||||
[a.score, b.score],
|
||||
][sortCol];
|
||||
const cmp = typeof vals[0] === 'string' ? vals[0].localeCompare(vals[1]) : vals[0] - vals[1];
|
||||
return sortAsc ? cmp : -cmp;
|
||||
});
|
||||
resBody.innerHTML = sorted.map(r => {
|
||||
const c = cls(r);
|
||||
return `<tr>
|
||||
<td class="model-name">${r.model}</td>
|
||||
<td>${r.scenario}</td>
|
||||
<td>${r.specOk ? `✓ ${r.specEntities}e` : '<span class="fail">✗</span>'}</td>
|
||||
<td class="${c}">${pctBar(r.testsPassed, r.testsTotal)}</td>
|
||||
<td>${r.fixRounds > 0 ? r.fixRounds + '×' : '-'}</td>
|
||||
<td>${r.promptTokensEst > 0 ? '~'+(r.promptTokensEst/1000).toFixed(1)+'K' : '-'}</td>
|
||||
<td>${r.totalTokens > 0 ? (r.totalTokens/1000).toFixed(1)+'K' : '-'}</td>
|
||||
<td>${(r.totalDurationMs/1000).toFixed(0)}s</td>
|
||||
<td>${r.avgTokPerSec.toFixed(0)}</td>
|
||||
<td><span class="stars">${r.stars}</span> ${r.score}p</td>
|
||||
</tr>`;
|
||||
}).join('');
|
||||
document.querySelectorAll('#results-table th').forEach((th,i) => {
|
||||
th.className = i === sortCol ? (sortAsc ? 'sorted-asc' : 'sorted-desc') : '';
|
||||
});
|
||||
}
|
||||
document.querySelector('#results-table thead').addEventListener('click', e => {
|
||||
const col = parseInt(e.target.dataset.col);
|
||||
if (isNaN(col)) return;
|
||||
if (sortCol === col) sortAsc = !sortAsc;
|
||||
else { sortCol = col; sortAsc = false; }
|
||||
renderResults();
|
||||
});
|
||||
renderResults();
|
||||
</script>
|
||||
</body>
|
||||
</html>
|
||||
22
kipina-codebench/results/2026-04-14T09-43.json
Normal file
22
kipina-codebench/results/2026-04-14T09-43.json
Normal file
@@ -0,0 +1,22 @@
|
||||
[
|
||||
{
|
||||
"model": "qwen3-coder:30b",
|
||||
"scenario": "todo",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 6,
|
||||
"testsPassed": 6,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 21688,
|
||||
"totalTokens": 2243,
|
||||
"avgTokPerSec": 121.7719614197307,
|
||||
"promptChars": 11588,
|
||||
"promptTokensEst": 2897,
|
||||
"score": 100,
|
||||
"stars": "★★★★★",
|
||||
"error": null
|
||||
}
|
||||
]
|
||||
183
kipina-codebench/results/2026-04-14T09-44.html
Normal file
183
kipina-codebench/results/2026-04-14T09-44.html
Normal file
@@ -0,0 +1,183 @@
|
||||
<!DOCTYPE html>
|
||||
<html lang="fi">
|
||||
<head>
|
||||
<meta charset="UTF-8">
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
||||
<title>Kipina Model Benchmark</title>
|
||||
<style>
|
||||
:root { --bg: #0d1117; --card: #161b22; --border: #30363d; --text: #e6edf3; --dim: #8b949e; --green: #3fb950; --yellow: #d29922; --red: #f85149; --blue: #58a6ff; }
|
||||
* { box-sizing: border-box; margin: 0; padding: 0; }
|
||||
body { font-family: -apple-system, 'Segoe UI', Helvetica, Arial, sans-serif; background: var(--bg); color: var(--text); padding: 2rem; max-width: 1400px; margin: 0 auto; }
|
||||
h1 { font-size: 1.5rem; margin-bottom: 0.5rem; }
|
||||
.meta { color: var(--dim); font-size: 0.85rem; margin-bottom: 2rem; }
|
||||
.cards { display: grid; grid-template-columns: repeat(auto-fit, minmax(180px, 1fr)); gap: 1rem; margin-bottom: 2rem; }
|
||||
.card { background: var(--card); border: 1px solid var(--border); border-radius: 8px; padding: 1rem; }
|
||||
.card .label { color: var(--dim); font-size: 0.75rem; text-transform: uppercase; letter-spacing: 0.05em; }
|
||||
.card .value { font-size: 1.8rem; font-weight: 600; margin-top: 0.25rem; }
|
||||
.card .sub { color: var(--dim); font-size: 0.8rem; margin-top: 0.25rem; }
|
||||
table { width: 100%; border-collapse: collapse; background: var(--card); border: 1px solid var(--border); border-radius: 8px; overflow: hidden; margin-bottom: 2rem; }
|
||||
th { background: #1c2128; text-align: left; padding: 0.6rem 0.8rem; font-size: 0.75rem; text-transform: uppercase; letter-spacing: 0.05em; color: var(--dim); cursor: pointer; user-select: none; white-space: nowrap; }
|
||||
th:hover { color: var(--text); }
|
||||
th.sorted-asc::after { content: ' ▲'; }
|
||||
th.sorted-desc::after { content: ' ▼'; }
|
||||
td { padding: 0.5rem 0.8rem; border-top: 1px solid var(--border); font-size: 0.85rem; white-space: nowrap; }
|
||||
tr:hover td { background: #1c2128; }
|
||||
.pass { color: var(--green); }
|
||||
.partial { color: var(--yellow); }
|
||||
.fail { color: var(--red); }
|
||||
.stars { letter-spacing: 1px; }
|
||||
.bar { display: inline-block; height: 8px; border-radius: 4px; vertical-align: middle; }
|
||||
.bar-bg { background: var(--border); }
|
||||
.bar-fill { background: var(--green); }
|
||||
.bar-partial { background: var(--yellow); }
|
||||
.model-name { font-weight: 600; }
|
||||
h2 { font-size: 1.1rem; margin-bottom: 1rem; color: var(--dim); }
|
||||
.summary-table th:first-child, .summary-table td:first-child { min-width: 200px; }
|
||||
</style>
|
||||
</head>
|
||||
<body>
|
||||
|
||||
<h1>Kipina Model Benchmark</h1>
|
||||
<div class="meta" id="meta"></div>
|
||||
|
||||
<div class="cards" id="cards"></div>
|
||||
|
||||
<h2>Mallikohtainen yhteenveto</h2>
|
||||
<table class="summary-table" id="summary-table"><thead></thead><tbody></tbody></table>
|
||||
|
||||
<h2>Kaikki tulokset</h2>
|
||||
<table id="results-table"><thead></thead><tbody></tbody></table>
|
||||
|
||||
<script>
|
||||
const RAW = [{"model":"qwen3:8b","scenario":"todo","reqOk":true,"specOk":true,"specEntities":1,"validationIssues":0,"fixRounds":0,"testsTotal":6,"testsPassed":6,"testsFailed":0,"totalDurationMs":23521,"totalTokens":2090,"avgTokPerSec":100.94324085271073,"promptChars":10962,"promptTokensEst":2741,"score":100,"stars":"★★★★★","error":null},{"model":"qwen3:8b","scenario":"users","reqOk":true,"specOk":true,"specEntities":1,"validationIssues":0,"fixRounds":1,"testsTotal":6,"testsPassed":6,"testsFailed":0,"totalDurationMs":33680,"totalTokens":3003,"avgTokPerSec":100.52754588753601,"promptChars":10171,"promptTokensEst":2543,"score":90,"stars":"★★★★★","error":null},{"model":"qwen3:8b","scenario":"blog","reqOk":true,"specOk":false,"specEntities":0,"validationIssues":0,"fixRounds":0,"testsTotal":0,"testsPassed":0,"testsFailed":0,"totalDurationMs":0,"totalTokens":0,"avgTokPerSec":0,"promptChars":0,"promptTokensEst":0,"score":0,"stars":"","error":"JSON-speksi epäonnistui"}];
|
||||
|
||||
const starsFor = s => s >= 90 ? '★★★★★' : s >= 70 ? '★★★★☆' : s >= 50 ? '★★★☆☆' : s >= 25 ? '★★☆☆☆' : s > 0 ? '★☆☆☆☆' : '☆☆☆☆☆';
|
||||
function calcScore(r) {
|
||||
if (r.error && r.testsTotal === 0) return 0;
|
||||
let s = 0;
|
||||
if (r.specOk) s += 10;
|
||||
if (!r.error || r.testsTotal > 0) s += 10;
|
||||
if (r.testsTotal > 0) s += Math.round((r.testsPassed / r.testsTotal) * 60);
|
||||
s += Math.max(0, 20 - (r.fixRounds || 0) * 10);
|
||||
return Math.min(100, s);
|
||||
}
|
||||
// Laske pisteet jos puuttuvat
|
||||
const DATA = RAW.map(r => {
|
||||
if (r.score == null) r.score = calcScore(r);
|
||||
if (!r.stars) r.stars = starsFor(r.score);
|
||||
if (!r.promptTokensEst) r.promptTokensEst = r.promptChars ? Math.round(r.promptChars / 4) : 0;
|
||||
return r;
|
||||
});
|
||||
const cls = r => (!r.error && r.testsPassed === r.testsTotal && r.testsTotal > 0) ? 'pass' : (r.testsTotal > 0 && r.testsPassed > 0) ? 'partial' : 'fail';
|
||||
const pctBar = (passed, total, w=80) => {
|
||||
if (total === 0) return '-';
|
||||
const pct = passed/total*100;
|
||||
const c = pct === 100 ? 'bar-fill' : 'bar-partial';
|
||||
return `<span class="bar bar-bg" style="width:${w}px"><span class="bar ${c}" style="width:${Math.round(pct/100*w)}px"></span></span> ${passed}/${total}`;
|
||||
};
|
||||
|
||||
// Meta
|
||||
const totalTime = DATA.reduce((s,r) => s + r.totalDurationMs, 0);
|
||||
document.getElementById('meta').textContent = `${new Date().toLocaleDateString('fi-FI')} — ${DATA.length} ajoa — ${(totalTime/1000/60).toFixed(1)} min`;
|
||||
|
||||
// Cards
|
||||
const models = [...new Set(DATA.map(r => r.model))];
|
||||
const scenarios = [...new Set(DATA.map(r => r.scenario))];
|
||||
const avgScore = DATA.length ? Math.round(DATA.reduce((s,r) => s + r.score, 0) / DATA.length) : 0;
|
||||
const totalPassed = DATA.reduce((s,r) => s + r.testsPassed, 0);
|
||||
const totalTests = DATA.reduce((s,r) => s + r.testsTotal, 0);
|
||||
const passRate = totalTests ? Math.round(totalPassed/totalTests*100) : 0;
|
||||
const bestModel = models.map(m => {
|
||||
const mrs = DATA.filter(r => r.model === m);
|
||||
return { model: m, avg: Math.round(mrs.reduce((s,r) => s + r.score, 0) / mrs.length) };
|
||||
}).sort((a,b) => b.avg - a.avg)[0];
|
||||
const fastestModel = models.map(m => {
|
||||
const mrs = DATA.filter(r => r.model === m);
|
||||
return { model: m, speed: Math.round(mrs.reduce((s,r) => s + r.avgTokPerSec, 0) / mrs.length) };
|
||||
}).sort((a,b) => b.speed - a.speed)[0];
|
||||
|
||||
document.getElementById('cards').innerHTML = `
|
||||
<div class="card"><div class="label">Keskiarvo</div><div class="value">${starsFor(avgScore)}</div><div class="sub">${avgScore} pistetta</div></div>
|
||||
<div class="card"><div class="label">Testien läpäisy</div><div class="value">${passRate}%</div><div class="sub">${totalPassed}/${totalTests} testiä</div></div>
|
||||
<div class="card"><div class="label">Paras malli</div><div class="value" style="font-size:1.2rem">${bestModel?.model || '-'}</div><div class="sub">${bestModel?.avg || 0}p</div></div>
|
||||
<div class="card"><div class="label">Nopein</div><div class="value" style="font-size:1.2rem">${fastestModel?.model || '-'}</div><div class="sub">${fastestModel?.speed || 0} tok/s</div></div>
|
||||
<div class="card"><div class="label">Malleja</div><div class="value">${models.length}</div><div class="sub">${scenarios.length} skenaariota</div></div>
|
||||
<div class="card"><div class="label">Kokonaisaika</div><div class="value">${(totalTime/1000/60).toFixed(1)}</div><div class="sub">minuuttia</div></div>
|
||||
`;
|
||||
|
||||
// Summary table
|
||||
const sumHead = document.querySelector('#summary-table thead');
|
||||
const sumBody = document.querySelector('#summary-table tbody');
|
||||
sumHead.innerHTML = '<tr><th>Malli</th>' + scenarios.map(s => `<th>${s}</th>`).join('') + '<th>Yht.</th><th>Out tok</th><th>Aika</th><th>tok/s</th><th>Pisteet</th></tr>';
|
||||
|
||||
const modelRows = models.map(m => {
|
||||
const mrs = DATA.filter(r => r.model === m);
|
||||
const tp = mrs.reduce((s,r) => s + r.testsPassed, 0);
|
||||
const tt = mrs.reduce((s,r) => s + r.testsTotal, 0);
|
||||
const tok = mrs.reduce((s,r) => s + r.totalTokens, 0);
|
||||
const time = mrs.reduce((s,r) => s + r.totalDurationMs, 0);
|
||||
const speed = Math.round(mrs.reduce((s,r) => s + r.avgTokPerSec, 0) / mrs.length);
|
||||
const avg = Math.round(mrs.reduce((s,r) => s + r.score, 0) / mrs.length);
|
||||
const scenCols = scenarios.map(s => {
|
||||
const r = mrs.find(r => r.scenario === s);
|
||||
if (!r) return '<td>-</td>';
|
||||
return `<td class="${cls(r)}">${pctBar(r.testsPassed, r.testsTotal, 60)} <span style="color:var(--dim)">${(r.totalDurationMs/1000).toFixed(0)}s</span></td>`;
|
||||
}).join('');
|
||||
return { avg, html: `<tr><td class="model-name">${m}</td>${scenCols}<td>${pctBar(tp, tt)}</td><td>${(tok/1000).toFixed(1)}K</td><td>${(time/1000).toFixed(0)}s</td><td>${speed}</td><td><span class="stars">${starsFor(avg)}</span> ${avg}p</td></tr>` };
|
||||
}).sort((a,b) => b.avg - a.avg);
|
||||
sumBody.innerHTML = modelRows.map(r => r.html).join('');
|
||||
|
||||
// Results table
|
||||
const resHead = document.querySelector('#results-table thead');
|
||||
const resBody = document.querySelector('#results-table tbody');
|
||||
const resCols = ['Malli','Skenaario','Speksi','Testit','Korjaus','Ctx','Out tok','Aika','tok/s','Pisteet'];
|
||||
resHead.innerHTML = '<tr>' + resCols.map((c,i) => `<th data-col="${i}">${c}</th>`).join('') + '</tr>';
|
||||
|
||||
let sortCol = 9, sortAsc = false;
|
||||
function renderResults() {
|
||||
const sorted = [...DATA].sort((a,b) => {
|
||||
const vals = [
|
||||
[a.model, b.model],
|
||||
[a.scenario, b.scenario],
|
||||
[a.specEntities, b.specEntities],
|
||||
[a.testsPassed/Math.max(a.testsTotal,1), b.testsPassed/Math.max(b.testsTotal,1)],
|
||||
[a.fixRounds, b.fixRounds],
|
||||
[a.promptTokensEst, b.promptTokensEst],
|
||||
[a.totalTokens, b.totalTokens],
|
||||
[a.totalDurationMs, b.totalDurationMs],
|
||||
[a.avgTokPerSec, b.avgTokPerSec],
|
||||
[a.score, b.score],
|
||||
][sortCol];
|
||||
const cmp = typeof vals[0] === 'string' ? vals[0].localeCompare(vals[1]) : vals[0] - vals[1];
|
||||
return sortAsc ? cmp : -cmp;
|
||||
});
|
||||
resBody.innerHTML = sorted.map(r => {
|
||||
const c = cls(r);
|
||||
return `<tr>
|
||||
<td class="model-name">${r.model}</td>
|
||||
<td>${r.scenario}</td>
|
||||
<td>${r.specOk ? `✓ ${r.specEntities}e` : '<span class="fail">✗</span>'}</td>
|
||||
<td class="${c}">${pctBar(r.testsPassed, r.testsTotal)}</td>
|
||||
<td>${r.fixRounds > 0 ? r.fixRounds + '×' : '-'}</td>
|
||||
<td>${r.promptTokensEst > 0 ? '~'+(r.promptTokensEst/1000).toFixed(1)+'K' : '-'}</td>
|
||||
<td>${r.totalTokens > 0 ? (r.totalTokens/1000).toFixed(1)+'K' : '-'}</td>
|
||||
<td>${(r.totalDurationMs/1000).toFixed(0)}s</td>
|
||||
<td>${r.avgTokPerSec.toFixed(0)}</td>
|
||||
<td><span class="stars">${r.stars}</span> ${r.score}p</td>
|
||||
</tr>`;
|
||||
}).join('');
|
||||
document.querySelectorAll('#results-table th').forEach((th,i) => {
|
||||
th.className = i === sortCol ? (sortAsc ? 'sorted-asc' : 'sorted-desc') : '';
|
||||
});
|
||||
}
|
||||
document.querySelector('#results-table thead').addEventListener('click', e => {
|
||||
const col = parseInt(e.target.dataset.col);
|
||||
if (isNaN(col)) return;
|
||||
if (sortCol === col) sortAsc = !sortAsc;
|
||||
else { sortCol = col; sortAsc = false; }
|
||||
renderResults();
|
||||
});
|
||||
renderResults();
|
||||
</script>
|
||||
</body>
|
||||
</html>
|
||||
62
kipina-codebench/results/2026-04-14T09-44.json
Normal file
62
kipina-codebench/results/2026-04-14T09-44.json
Normal file
@@ -0,0 +1,62 @@
|
||||
[
|
||||
{
|
||||
"model": "qwen3:8b",
|
||||
"scenario": "todo",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 6,
|
||||
"testsPassed": 6,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 23521,
|
||||
"totalTokens": 2090,
|
||||
"avgTokPerSec": 100.94324085271073,
|
||||
"promptChars": 10962,
|
||||
"promptTokensEst": 2741,
|
||||
"score": 100,
|
||||
"stars": "★★★★★",
|
||||
"error": null
|
||||
},
|
||||
{
|
||||
"model": "qwen3:8b",
|
||||
"scenario": "users",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 1,
|
||||
"testsTotal": 6,
|
||||
"testsPassed": 6,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 33680,
|
||||
"totalTokens": 3003,
|
||||
"avgTokPerSec": 100.52754588753601,
|
||||
"promptChars": 10171,
|
||||
"promptTokensEst": 2543,
|
||||
"score": 90,
|
||||
"stars": "★★★★★",
|
||||
"error": null
|
||||
},
|
||||
{
|
||||
"model": "qwen3:8b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": false,
|
||||
"specEntities": 0,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 0,
|
||||
"testsPassed": 0,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 0,
|
||||
"totalTokens": 0,
|
||||
"avgTokPerSec": 0,
|
||||
"promptChars": 0,
|
||||
"promptTokensEst": 0,
|
||||
"score": 0,
|
||||
"stars": "",
|
||||
"error": "JSON-speksi epäonnistui"
|
||||
}
|
||||
]
|
||||
183
kipina-codebench/results/2026-04-14T09-47.html
Normal file
183
kipina-codebench/results/2026-04-14T09-47.html
Normal file
@@ -0,0 +1,183 @@
|
||||
<!DOCTYPE html>
|
||||
<html lang="fi">
|
||||
<head>
|
||||
<meta charset="UTF-8">
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
||||
<title>Kipina Model Benchmark</title>
|
||||
<style>
|
||||
:root { --bg: #0d1117; --card: #161b22; --border: #30363d; --text: #e6edf3; --dim: #8b949e; --green: #3fb950; --yellow: #d29922; --red: #f85149; --blue: #58a6ff; }
|
||||
* { box-sizing: border-box; margin: 0; padding: 0; }
|
||||
body { font-family: -apple-system, 'Segoe UI', Helvetica, Arial, sans-serif; background: var(--bg); color: var(--text); padding: 2rem; max-width: 1400px; margin: 0 auto; }
|
||||
h1 { font-size: 1.5rem; margin-bottom: 0.5rem; }
|
||||
.meta { color: var(--dim); font-size: 0.85rem; margin-bottom: 2rem; }
|
||||
.cards { display: grid; grid-template-columns: repeat(auto-fit, minmax(180px, 1fr)); gap: 1rem; margin-bottom: 2rem; }
|
||||
.card { background: var(--card); border: 1px solid var(--border); border-radius: 8px; padding: 1rem; }
|
||||
.card .label { color: var(--dim); font-size: 0.75rem; text-transform: uppercase; letter-spacing: 0.05em; }
|
||||
.card .value { font-size: 1.8rem; font-weight: 600; margin-top: 0.25rem; }
|
||||
.card .sub { color: var(--dim); font-size: 0.8rem; margin-top: 0.25rem; }
|
||||
table { width: 100%; border-collapse: collapse; background: var(--card); border: 1px solid var(--border); border-radius: 8px; overflow: hidden; margin-bottom: 2rem; }
|
||||
th { background: #1c2128; text-align: left; padding: 0.6rem 0.8rem; font-size: 0.75rem; text-transform: uppercase; letter-spacing: 0.05em; color: var(--dim); cursor: pointer; user-select: none; white-space: nowrap; }
|
||||
th:hover { color: var(--text); }
|
||||
th.sorted-asc::after { content: ' ▲'; }
|
||||
th.sorted-desc::after { content: ' ▼'; }
|
||||
td { padding: 0.5rem 0.8rem; border-top: 1px solid var(--border); font-size: 0.85rem; white-space: nowrap; }
|
||||
tr:hover td { background: #1c2128; }
|
||||
.pass { color: var(--green); }
|
||||
.partial { color: var(--yellow); }
|
||||
.fail { color: var(--red); }
|
||||
.stars { letter-spacing: 1px; }
|
||||
.bar { display: inline-block; height: 8px; border-radius: 4px; vertical-align: middle; }
|
||||
.bar-bg { background: var(--border); }
|
||||
.bar-fill { background: var(--green); }
|
||||
.bar-partial { background: var(--yellow); }
|
||||
.model-name { font-weight: 600; }
|
||||
h2 { font-size: 1.1rem; margin-bottom: 1rem; color: var(--dim); }
|
||||
.summary-table th:first-child, .summary-table td:first-child { min-width: 200px; }
|
||||
</style>
|
||||
</head>
|
||||
<body>
|
||||
|
||||
<h1>Kipina Model Benchmark</h1>
|
||||
<div class="meta" id="meta"></div>
|
||||
|
||||
<div class="cards" id="cards"></div>
|
||||
|
||||
<h2>Mallikohtainen yhteenveto</h2>
|
||||
<table class="summary-table" id="summary-table"><thead></thead><tbody></tbody></table>
|
||||
|
||||
<h2>Kaikki tulokset</h2>
|
||||
<table id="results-table"><thead></thead><tbody></tbody></table>
|
||||
|
||||
<script>
|
||||
const RAW = [{"model":"qwen3:8b","scenario":"todo","reqOk":true,"specOk":true,"specEntities":1,"validationIssues":0,"fixRounds":3,"testsTotal":8,"testsPassed":6,"testsFailed":2,"totalDurationMs":97470,"totalTokens":8786,"avgTokPerSec":97.96636139685832,"promptChars":11290,"promptTokensEst":2823,"score":65,"stars":"★★★☆☆","error":null},{"model":"qwen3:8b","scenario":"users","reqOk":true,"specOk":true,"specEntities":1,"validationIssues":0,"fixRounds":0,"testsTotal":6,"testsPassed":6,"testsFailed":0,"totalDurationMs":18951,"totalTokens":1666,"avgTokPerSec":101.807593927545,"promptChars":10293,"promptTokensEst":2573,"score":100,"stars":"★★★★★","error":null},{"model":"qwen3:8b","scenario":"blog","reqOk":true,"specOk":true,"specEntities":2,"validationIssues":0,"fixRounds":3,"testsTotal":1,"testsPassed":0,"testsFailed":1,"totalDurationMs":126005,"totalTokens":11056,"avgTokPerSec":96.6373549161171,"promptChars":11878,"promptTokensEst":2970,"score":20,"stars":"★☆☆☆☆","error":"Syntaksivirhe"}];
|
||||
|
||||
const starsFor = s => s >= 90 ? '★★★★★' : s >= 70 ? '★★★★☆' : s >= 50 ? '★★★☆☆' : s >= 25 ? '★★☆☆☆' : s > 0 ? '★☆☆☆☆' : '☆☆☆☆☆';
|
||||
function calcScore(r) {
|
||||
if (r.error && r.testsTotal === 0) return 0;
|
||||
let s = 0;
|
||||
if (r.specOk) s += 10;
|
||||
if (!r.error || r.testsTotal > 0) s += 10;
|
||||
if (r.testsTotal > 0) s += Math.round((r.testsPassed / r.testsTotal) * 60);
|
||||
s += Math.max(0, 20 - (r.fixRounds || 0) * 10);
|
||||
return Math.min(100, s);
|
||||
}
|
||||
// Laske pisteet jos puuttuvat
|
||||
const DATA = RAW.map(r => {
|
||||
if (r.score == null) r.score = calcScore(r);
|
||||
if (!r.stars) r.stars = starsFor(r.score);
|
||||
if (!r.promptTokensEst) r.promptTokensEst = r.promptChars ? Math.round(r.promptChars / 4) : 0;
|
||||
return r;
|
||||
});
|
||||
const cls = r => (!r.error && r.testsPassed === r.testsTotal && r.testsTotal > 0) ? 'pass' : (r.testsTotal > 0 && r.testsPassed > 0) ? 'partial' : 'fail';
|
||||
const pctBar = (passed, total, w=80) => {
|
||||
if (total === 0) return '-';
|
||||
const pct = passed/total*100;
|
||||
const c = pct === 100 ? 'bar-fill' : 'bar-partial';
|
||||
return `<span class="bar bar-bg" style="width:${w}px"><span class="bar ${c}" style="width:${Math.round(pct/100*w)}px"></span></span> ${passed}/${total}`;
|
||||
};
|
||||
|
||||
// Meta
|
||||
const totalTime = DATA.reduce((s,r) => s + r.totalDurationMs, 0);
|
||||
document.getElementById('meta').textContent = `${new Date().toLocaleDateString('fi-FI')} — ${DATA.length} ajoa — ${(totalTime/1000/60).toFixed(1)} min`;
|
||||
|
||||
// Cards
|
||||
const models = [...new Set(DATA.map(r => r.model))];
|
||||
const scenarios = [...new Set(DATA.map(r => r.scenario))];
|
||||
const avgScore = DATA.length ? Math.round(DATA.reduce((s,r) => s + r.score, 0) / DATA.length) : 0;
|
||||
const totalPassed = DATA.reduce((s,r) => s + r.testsPassed, 0);
|
||||
const totalTests = DATA.reduce((s,r) => s + r.testsTotal, 0);
|
||||
const passRate = totalTests ? Math.round(totalPassed/totalTests*100) : 0;
|
||||
const bestModel = models.map(m => {
|
||||
const mrs = DATA.filter(r => r.model === m);
|
||||
return { model: m, avg: Math.round(mrs.reduce((s,r) => s + r.score, 0) / mrs.length) };
|
||||
}).sort((a,b) => b.avg - a.avg)[0];
|
||||
const fastestModel = models.map(m => {
|
||||
const mrs = DATA.filter(r => r.model === m);
|
||||
return { model: m, speed: Math.round(mrs.reduce((s,r) => s + r.avgTokPerSec, 0) / mrs.length) };
|
||||
}).sort((a,b) => b.speed - a.speed)[0];
|
||||
|
||||
document.getElementById('cards').innerHTML = `
|
||||
<div class="card"><div class="label">Keskiarvo</div><div class="value">${starsFor(avgScore)}</div><div class="sub">${avgScore} pistetta</div></div>
|
||||
<div class="card"><div class="label">Testien läpäisy</div><div class="value">${passRate}%</div><div class="sub">${totalPassed}/${totalTests} testiä</div></div>
|
||||
<div class="card"><div class="label">Paras malli</div><div class="value" style="font-size:1.2rem">${bestModel?.model || '-'}</div><div class="sub">${bestModel?.avg || 0}p</div></div>
|
||||
<div class="card"><div class="label">Nopein</div><div class="value" style="font-size:1.2rem">${fastestModel?.model || '-'}</div><div class="sub">${fastestModel?.speed || 0} tok/s</div></div>
|
||||
<div class="card"><div class="label">Malleja</div><div class="value">${models.length}</div><div class="sub">${scenarios.length} skenaariota</div></div>
|
||||
<div class="card"><div class="label">Kokonaisaika</div><div class="value">${(totalTime/1000/60).toFixed(1)}</div><div class="sub">minuuttia</div></div>
|
||||
`;
|
||||
|
||||
// Summary table
|
||||
const sumHead = document.querySelector('#summary-table thead');
|
||||
const sumBody = document.querySelector('#summary-table tbody');
|
||||
sumHead.innerHTML = '<tr><th>Malli</th>' + scenarios.map(s => `<th>${s}</th>`).join('') + '<th>Yht.</th><th>Out tok</th><th>Aika</th><th>tok/s</th><th>Pisteet</th></tr>';
|
||||
|
||||
const modelRows = models.map(m => {
|
||||
const mrs = DATA.filter(r => r.model === m);
|
||||
const tp = mrs.reduce((s,r) => s + r.testsPassed, 0);
|
||||
const tt = mrs.reduce((s,r) => s + r.testsTotal, 0);
|
||||
const tok = mrs.reduce((s,r) => s + r.totalTokens, 0);
|
||||
const time = mrs.reduce((s,r) => s + r.totalDurationMs, 0);
|
||||
const speed = Math.round(mrs.reduce((s,r) => s + r.avgTokPerSec, 0) / mrs.length);
|
||||
const avg = Math.round(mrs.reduce((s,r) => s + r.score, 0) / mrs.length);
|
||||
const scenCols = scenarios.map(s => {
|
||||
const r = mrs.find(r => r.scenario === s);
|
||||
if (!r) return '<td>-</td>';
|
||||
return `<td class="${cls(r)}">${pctBar(r.testsPassed, r.testsTotal, 60)} <span style="color:var(--dim)">${(r.totalDurationMs/1000).toFixed(0)}s</span></td>`;
|
||||
}).join('');
|
||||
return { avg, html: `<tr><td class="model-name">${m}</td>${scenCols}<td>${pctBar(tp, tt)}</td><td>${(tok/1000).toFixed(1)}K</td><td>${(time/1000).toFixed(0)}s</td><td>${speed}</td><td><span class="stars">${starsFor(avg)}</span> ${avg}p</td></tr>` };
|
||||
}).sort((a,b) => b.avg - a.avg);
|
||||
sumBody.innerHTML = modelRows.map(r => r.html).join('');
|
||||
|
||||
// Results table
|
||||
const resHead = document.querySelector('#results-table thead');
|
||||
const resBody = document.querySelector('#results-table tbody');
|
||||
const resCols = ['Malli','Skenaario','Speksi','Testit','Korjaus','Ctx','Out tok','Aika','tok/s','Pisteet'];
|
||||
resHead.innerHTML = '<tr>' + resCols.map((c,i) => `<th data-col="${i}">${c}</th>`).join('') + '</tr>';
|
||||
|
||||
let sortCol = 9, sortAsc = false;
|
||||
function renderResults() {
|
||||
const sorted = [...DATA].sort((a,b) => {
|
||||
const vals = [
|
||||
[a.model, b.model],
|
||||
[a.scenario, b.scenario],
|
||||
[a.specEntities, b.specEntities],
|
||||
[a.testsPassed/Math.max(a.testsTotal,1), b.testsPassed/Math.max(b.testsTotal,1)],
|
||||
[a.fixRounds, b.fixRounds],
|
||||
[a.promptTokensEst, b.promptTokensEst],
|
||||
[a.totalTokens, b.totalTokens],
|
||||
[a.totalDurationMs, b.totalDurationMs],
|
||||
[a.avgTokPerSec, b.avgTokPerSec],
|
||||
[a.score, b.score],
|
||||
][sortCol];
|
||||
const cmp = typeof vals[0] === 'string' ? vals[0].localeCompare(vals[1]) : vals[0] - vals[1];
|
||||
return sortAsc ? cmp : -cmp;
|
||||
});
|
||||
resBody.innerHTML = sorted.map(r => {
|
||||
const c = cls(r);
|
||||
return `<tr>
|
||||
<td class="model-name">${r.model}</td>
|
||||
<td>${r.scenario}</td>
|
||||
<td>${r.specOk ? `✓ ${r.specEntities}e` : '<span class="fail">✗</span>'}</td>
|
||||
<td class="${c}">${pctBar(r.testsPassed, r.testsTotal)}</td>
|
||||
<td>${r.fixRounds > 0 ? r.fixRounds + '×' : '-'}</td>
|
||||
<td>${r.promptTokensEst > 0 ? '~'+(r.promptTokensEst/1000).toFixed(1)+'K' : '-'}</td>
|
||||
<td>${r.totalTokens > 0 ? (r.totalTokens/1000).toFixed(1)+'K' : '-'}</td>
|
||||
<td>${(r.totalDurationMs/1000).toFixed(0)}s</td>
|
||||
<td>${r.avgTokPerSec.toFixed(0)}</td>
|
||||
<td><span class="stars">${r.stars}</span> ${r.score}p</td>
|
||||
</tr>`;
|
||||
}).join('');
|
||||
document.querySelectorAll('#results-table th').forEach((th,i) => {
|
||||
th.className = i === sortCol ? (sortAsc ? 'sorted-asc' : 'sorted-desc') : '';
|
||||
});
|
||||
}
|
||||
document.querySelector('#results-table thead').addEventListener('click', e => {
|
||||
const col = parseInt(e.target.dataset.col);
|
||||
if (isNaN(col)) return;
|
||||
if (sortCol === col) sortAsc = !sortAsc;
|
||||
else { sortCol = col; sortAsc = false; }
|
||||
renderResults();
|
||||
});
|
||||
renderResults();
|
||||
</script>
|
||||
</body>
|
||||
</html>
|
||||
62
kipina-codebench/results/2026-04-14T09-47.json
Normal file
62
kipina-codebench/results/2026-04-14T09-47.json
Normal file
@@ -0,0 +1,62 @@
|
||||
[
|
||||
{
|
||||
"model": "qwen3:8b",
|
||||
"scenario": "todo",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 3,
|
||||
"testsTotal": 8,
|
||||
"testsPassed": 6,
|
||||
"testsFailed": 2,
|
||||
"totalDurationMs": 97470,
|
||||
"totalTokens": 8786,
|
||||
"avgTokPerSec": 97.96636139685832,
|
||||
"promptChars": 11290,
|
||||
"promptTokensEst": 2823,
|
||||
"score": 65,
|
||||
"stars": "★★★☆☆",
|
||||
"error": null
|
||||
},
|
||||
{
|
||||
"model": "qwen3:8b",
|
||||
"scenario": "users",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 6,
|
||||
"testsPassed": 6,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 18951,
|
||||
"totalTokens": 1666,
|
||||
"avgTokPerSec": 101.807593927545,
|
||||
"promptChars": 10293,
|
||||
"promptTokensEst": 2573,
|
||||
"score": 100,
|
||||
"stars": "★★★★★",
|
||||
"error": null
|
||||
},
|
||||
{
|
||||
"model": "qwen3:8b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 2,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 3,
|
||||
"testsTotal": 1,
|
||||
"testsPassed": 0,
|
||||
"testsFailed": 1,
|
||||
"totalDurationMs": 126005,
|
||||
"totalTokens": 11056,
|
||||
"avgTokPerSec": 96.6373549161171,
|
||||
"promptChars": 11878,
|
||||
"promptTokensEst": 2970,
|
||||
"score": 20,
|
||||
"stars": "★☆☆☆☆",
|
||||
"error": "Syntaksivirhe"
|
||||
}
|
||||
]
|
||||
183
kipina-codebench/results/2026-04-14T09-52.html
Normal file
183
kipina-codebench/results/2026-04-14T09-52.html
Normal file
File diff suppressed because one or more lines are too long
947
kipina-codebench/results/2026-04-14T09-52.json
Normal file
947
kipina-codebench/results/2026-04-14T09-52.json
Normal file
@@ -0,0 +1,947 @@
|
||||
[
|
||||
{
|
||||
"model": "qwen3-coder:30b",
|
||||
"scenario": "todo",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 6,
|
||||
"testsPassed": 6,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 25444,
|
||||
"totalTokens": 2661,
|
||||
"avgTokPerSec": 122.06801173056196,
|
||||
"promptChars": 11849,
|
||||
"promptTokensEst": 2962,
|
||||
"score": 100,
|
||||
"stars": "★★★★★",
|
||||
"error": null,
|
||||
"round": 1
|
||||
},
|
||||
{
|
||||
"model": "qwen3-coder:30b",
|
||||
"scenario": "users",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 6,
|
||||
"testsPassed": 6,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 24447,
|
||||
"totalTokens": 2537,
|
||||
"avgTokPerSec": 121.11837170891442,
|
||||
"promptChars": 11045,
|
||||
"promptTokensEst": 2761,
|
||||
"score": 100,
|
||||
"stars": "★★★★★",
|
||||
"error": null,
|
||||
"round": 1
|
||||
},
|
||||
{
|
||||
"model": "qwen3-coder:30b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 2,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 11,
|
||||
"testsPassed": 11,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 38071,
|
||||
"totalTokens": 3965,
|
||||
"avgTokPerSec": 120.37309655579647,
|
||||
"promptChars": 12702,
|
||||
"promptTokensEst": 3176,
|
||||
"score": 100,
|
||||
"stars": "★★★★★",
|
||||
"error": null,
|
||||
"round": 1
|
||||
},
|
||||
{
|
||||
"model": "qwen3:14b",
|
||||
"scenario": "todo",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 6,
|
||||
"testsPassed": 6,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 38459,
|
||||
"totalTokens": 2106,
|
||||
"avgTokPerSec": 60.889088461567745,
|
||||
"promptChars": 10951,
|
||||
"promptTokensEst": 2738,
|
||||
"score": 100,
|
||||
"stars": "★★★★★",
|
||||
"error": null,
|
||||
"round": 1
|
||||
},
|
||||
{
|
||||
"model": "qwen3:14b",
|
||||
"scenario": "users",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 6,
|
||||
"testsPassed": 6,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 35959,
|
||||
"totalTokens": 1966,
|
||||
"avgTokPerSec": 60.9684885562545,
|
||||
"promptChars": 10698,
|
||||
"promptTokensEst": 2675,
|
||||
"score": 100,
|
||||
"stars": "★★★★★",
|
||||
"error": null,
|
||||
"round": 1
|
||||
},
|
||||
{
|
||||
"model": "qwen3:14b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 2,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 3,
|
||||
"testsTotal": 13,
|
||||
"testsPassed": 2,
|
||||
"testsFailed": 11,
|
||||
"totalDurationMs": 269370,
|
||||
"totalTokens": 14361,
|
||||
"avgTokPerSec": 57.79069860126629,
|
||||
"promptChars": 11838,
|
||||
"promptTokensEst": 2960,
|
||||
"score": 29,
|
||||
"stars": "★★☆☆☆",
|
||||
"error": null,
|
||||
"round": 1
|
||||
},
|
||||
{
|
||||
"model": "qwen3:8b",
|
||||
"scenario": "todo",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 6,
|
||||
"testsPassed": 6,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 23199,
|
||||
"totalTokens": 2054,
|
||||
"avgTokPerSec": 101.09280595816365,
|
||||
"promptChars": 10854,
|
||||
"promptTokensEst": 2714,
|
||||
"score": 100,
|
||||
"stars": "★★★★★",
|
||||
"error": null,
|
||||
"round": 1
|
||||
},
|
||||
{
|
||||
"model": "qwen3:8b",
|
||||
"scenario": "users",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 3,
|
||||
"testsTotal": 1,
|
||||
"testsPassed": 0,
|
||||
"testsFailed": 1,
|
||||
"totalDurationMs": 72665,
|
||||
"totalTokens": 6586,
|
||||
"avgTokPerSec": 99.40636298490288,
|
||||
"promptChars": 10157,
|
||||
"promptTokensEst": 2539,
|
||||
"score": 20,
|
||||
"stars": "★☆☆☆☆",
|
||||
"error": "Syntaksivirhe",
|
||||
"round": 1
|
||||
},
|
||||
{
|
||||
"model": "qwen3:8b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 3,
|
||||
"testsTotal": 0,
|
||||
"testsPassed": 0,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 136309,
|
||||
"totalTokens": 12036,
|
||||
"avgTokPerSec": 97.02525169408467,
|
||||
"promptChars": 10823,
|
||||
"promptTokensEst": 2706,
|
||||
"score": 0,
|
||||
"stars": "☆☆☆☆☆",
|
||||
"error": "Testit kaatuivat",
|
||||
"round": 1
|
||||
},
|
||||
{
|
||||
"model": "qwen3-coder:30b",
|
||||
"scenario": "todo",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 6,
|
||||
"testsPassed": 6,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 28177,
|
||||
"totalTokens": 2946,
|
||||
"avgTokPerSec": 121.23541038097,
|
||||
"promptChars": 11836,
|
||||
"promptTokensEst": 2959,
|
||||
"score": 100,
|
||||
"stars": "★★★★★",
|
||||
"error": null,
|
||||
"round": 2
|
||||
},
|
||||
{
|
||||
"model": "qwen3-coder:30b",
|
||||
"scenario": "users",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 8,
|
||||
"testsPassed": 8,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 22631,
|
||||
"totalTokens": 2352,
|
||||
"avgTokPerSec": 121.93930190168658,
|
||||
"promptChars": 10440,
|
||||
"promptTokensEst": 2610,
|
||||
"score": 100,
|
||||
"stars": "★★★★★",
|
||||
"error": null,
|
||||
"round": 2
|
||||
},
|
||||
{
|
||||
"model": "qwen3-coder:30b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 2,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 12,
|
||||
"testsPassed": 12,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 40394,
|
||||
"totalTokens": 4225,
|
||||
"avgTokPerSec": 120.84107397324551,
|
||||
"promptChars": 12362,
|
||||
"promptTokensEst": 3091,
|
||||
"score": 100,
|
||||
"stars": "★★★★★",
|
||||
"error": null,
|
||||
"round": 2
|
||||
},
|
||||
{
|
||||
"model": "qwen3:14b",
|
||||
"scenario": "todo",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 7,
|
||||
"testsPassed": 7,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 46081,
|
||||
"totalTokens": 2542,
|
||||
"avgTokPerSec": 60.93046828700026,
|
||||
"promptChars": 11412,
|
||||
"promptTokensEst": 2853,
|
||||
"score": 100,
|
||||
"stars": "★★★★★",
|
||||
"error": null,
|
||||
"round": 2
|
||||
},
|
||||
{
|
||||
"model": "qwen3:14b",
|
||||
"scenario": "users",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 7,
|
||||
"testsPassed": 7,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 41323,
|
||||
"totalTokens": 2272,
|
||||
"avgTokPerSec": 60.99406174164295,
|
||||
"promptChars": 10884,
|
||||
"promptTokensEst": 2721,
|
||||
"score": 100,
|
||||
"stars": "★★★★★",
|
||||
"error": null,
|
||||
"round": 2
|
||||
},
|
||||
{
|
||||
"model": "qwen3:14b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 2,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 3,
|
||||
"testsTotal": 14,
|
||||
"testsPassed": 2,
|
||||
"testsFailed": 12,
|
||||
"totalDurationMs": 262591,
|
||||
"totalTokens": 14129,
|
||||
"avgTokPerSec": 57.91340837830759,
|
||||
"promptChars": 12143,
|
||||
"promptTokensEst": 3036,
|
||||
"score": 29,
|
||||
"stars": "★★☆☆☆",
|
||||
"error": null,
|
||||
"round": 2
|
||||
},
|
||||
{
|
||||
"model": "qwen3:8b",
|
||||
"scenario": "todo",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 6,
|
||||
"testsPassed": 6,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 24007,
|
||||
"totalTokens": 2137,
|
||||
"avgTokPerSec": 101.05982103292858,
|
||||
"promptChars": 10756,
|
||||
"promptTokensEst": 2689,
|
||||
"score": 100,
|
||||
"stars": "★★★★★",
|
||||
"error": null,
|
||||
"round": 2
|
||||
},
|
||||
{
|
||||
"model": "qwen3:8b",
|
||||
"scenario": "users",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 3,
|
||||
"testsTotal": 7,
|
||||
"testsPassed": 6,
|
||||
"testsFailed": 1,
|
||||
"totalDurationMs": 68739,
|
||||
"totalTokens": 6199,
|
||||
"avgTokPerSec": 98.9825675198183,
|
||||
"promptChars": 10313,
|
||||
"promptTokensEst": 2578,
|
||||
"score": 71,
|
||||
"stars": "★★★★☆",
|
||||
"error": null,
|
||||
"round": 2
|
||||
},
|
||||
{
|
||||
"model": "qwen3:8b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": false,
|
||||
"specEntities": 0,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 0,
|
||||
"testsPassed": 0,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 0,
|
||||
"totalTokens": 0,
|
||||
"avgTokPerSec": 0,
|
||||
"promptChars": 0,
|
||||
"promptTokensEst": 0,
|
||||
"score": 0,
|
||||
"stars": "",
|
||||
"error": "JSON-speksi epäonnistui",
|
||||
"round": 2
|
||||
},
|
||||
{
|
||||
"model": "qwen3-coder:30b",
|
||||
"scenario": "todo",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 6,
|
||||
"testsPassed": 6,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 23472,
|
||||
"totalTokens": 2427,
|
||||
"avgTokPerSec": 120.85293828875076,
|
||||
"promptChars": 11663,
|
||||
"promptTokensEst": 2916,
|
||||
"score": 100,
|
||||
"stars": "★★★★★",
|
||||
"error": null,
|
||||
"round": 3
|
||||
},
|
||||
{
|
||||
"model": "qwen3-coder:30b",
|
||||
"scenario": "users",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 8,
|
||||
"testsPassed": 8,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 25864,
|
||||
"totalTokens": 2671,
|
||||
"avgTokPerSec": 120.6883137195962,
|
||||
"promptChars": 11148,
|
||||
"promptTokensEst": 2787,
|
||||
"score": 100,
|
||||
"stars": "★★★★★",
|
||||
"error": null,
|
||||
"round": 3
|
||||
},
|
||||
{
|
||||
"model": "qwen3-coder:30b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 2,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 12,
|
||||
"testsPassed": 12,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 41074,
|
||||
"totalTokens": 4275,
|
||||
"avgTokPerSec": 120.33351485161673,
|
||||
"promptChars": 12664,
|
||||
"promptTokensEst": 3166,
|
||||
"score": 100,
|
||||
"stars": "★★★★★",
|
||||
"error": null,
|
||||
"round": 3
|
||||
},
|
||||
{
|
||||
"model": "qwen3:14b",
|
||||
"scenario": "todo",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 7,
|
||||
"testsPassed": 7,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 40457,
|
||||
"totalTokens": 2229,
|
||||
"avgTokPerSec": 61.093615619948345,
|
||||
"promptChars": 10905,
|
||||
"promptTokensEst": 2726,
|
||||
"score": 100,
|
||||
"stars": "★★★★★",
|
||||
"error": null,
|
||||
"round": 3
|
||||
},
|
||||
{
|
||||
"model": "qwen3:14b",
|
||||
"scenario": "users",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 1,
|
||||
"testsTotal": 7,
|
||||
"testsPassed": 7,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 77506,
|
||||
"totalTokens": 4268,
|
||||
"avgTokPerSec": 60.19655522627278,
|
||||
"promptChars": 11135,
|
||||
"promptTokensEst": 2784,
|
||||
"score": 90,
|
||||
"stars": "★★★★★",
|
||||
"error": null,
|
||||
"round": 3
|
||||
},
|
||||
{
|
||||
"model": "qwen3:14b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 2,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 12,
|
||||
"testsPassed": 12,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 74791,
|
||||
"totalTokens": 3590,
|
||||
"avgTokPerSec": 60.549298891176214,
|
||||
"promptChars": 11653,
|
||||
"promptTokensEst": 2913,
|
||||
"score": 100,
|
||||
"stars": "★★★★★",
|
||||
"error": null,
|
||||
"round": 3
|
||||
},
|
||||
{
|
||||
"model": "qwen3:8b",
|
||||
"scenario": "todo",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 6,
|
||||
"testsPassed": 6,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 26402,
|
||||
"totalTokens": 2358,
|
||||
"avgTokPerSec": 100.76936895480246,
|
||||
"promptChars": 11243,
|
||||
"promptTokensEst": 2811,
|
||||
"score": 100,
|
||||
"stars": "★★★★★",
|
||||
"error": null,
|
||||
"round": 3
|
||||
},
|
||||
{
|
||||
"model": "qwen3:8b",
|
||||
"scenario": "users",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 6,
|
||||
"testsPassed": 6,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 20751,
|
||||
"totalTokens": 1837,
|
||||
"avgTokPerSec": 101.05480893032836,
|
||||
"promptChars": 10553,
|
||||
"promptTokensEst": 2638,
|
||||
"score": 100,
|
||||
"stars": "★★★★★",
|
||||
"error": null,
|
||||
"round": 3
|
||||
},
|
||||
{
|
||||
"model": "qwen3:8b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": false,
|
||||
"specEntities": 0,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 0,
|
||||
"testsPassed": 0,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 0,
|
||||
"totalTokens": 0,
|
||||
"avgTokPerSec": 0,
|
||||
"promptChars": 0,
|
||||
"promptTokensEst": 0,
|
||||
"score": 0,
|
||||
"stars": "",
|
||||
"error": "JSON-speksi epäonnistui",
|
||||
"round": 3
|
||||
},
|
||||
{
|
||||
"model": "qwen3-coder:30b",
|
||||
"scenario": "todo",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 6,
|
||||
"testsPassed": 6,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 22098,
|
||||
"totalTokens": 2283,
|
||||
"avgTokPerSec": 121.81254413612446,
|
||||
"promptChars": 11503,
|
||||
"promptTokensEst": 2876,
|
||||
"score": 100,
|
||||
"stars": "★★★★★",
|
||||
"error": null,
|
||||
"round": 4
|
||||
},
|
||||
{
|
||||
"model": "qwen3-coder:30b",
|
||||
"scenario": "users",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 2,
|
||||
"testsTotal": 8,
|
||||
"testsPassed": 8,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 65403,
|
||||
"totalTokens": 6779,
|
||||
"avgTokPerSec": 118.13288294758586,
|
||||
"promptChars": 10939,
|
||||
"promptTokensEst": 2735,
|
||||
"score": 80,
|
||||
"stars": "★★★★☆",
|
||||
"error": null,
|
||||
"round": 4
|
||||
},
|
||||
{
|
||||
"model": "qwen3-coder:30b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 2,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 10,
|
||||
"testsPassed": 10,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 36044,
|
||||
"totalTokens": 3748,
|
||||
"avgTokPerSec": 120.14822967005487,
|
||||
"promptChars": 12639,
|
||||
"promptTokensEst": 3160,
|
||||
"score": 100,
|
||||
"stars": "★★★★★",
|
||||
"error": null,
|
||||
"round": 4
|
||||
},
|
||||
{
|
||||
"model": "qwen3:14b",
|
||||
"scenario": "todo",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 6,
|
||||
"testsPassed": 6,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 38501,
|
||||
"totalTokens": 2113,
|
||||
"avgTokPerSec": 61.01814139430428,
|
||||
"promptChars": 10929,
|
||||
"promptTokensEst": 2732,
|
||||
"score": 100,
|
||||
"stars": "★★★★★",
|
||||
"error": null,
|
||||
"round": 4
|
||||
},
|
||||
{
|
||||
"model": "qwen3:14b",
|
||||
"scenario": "users",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 3,
|
||||
"testsTotal": 8,
|
||||
"testsPassed": 1,
|
||||
"testsFailed": 7,
|
||||
"totalDurationMs": 147057,
|
||||
"totalTokens": 7799,
|
||||
"avgTokPerSec": 56.209406465865904,
|
||||
"promptChars": 11207,
|
||||
"promptTokensEst": 2802,
|
||||
"score": 28,
|
||||
"stars": "★★☆☆☆",
|
||||
"error": null,
|
||||
"round": 4
|
||||
},
|
||||
{
|
||||
"model": "qwen3:14b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 2,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 3,
|
||||
"testsTotal": 12,
|
||||
"testsPassed": 12,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 227508,
|
||||
"totalTokens": 12026,
|
||||
"avgTokPerSec": 58.52888492610325,
|
||||
"promptChars": 11809,
|
||||
"promptTokensEst": 2952,
|
||||
"score": 80,
|
||||
"stars": "★★★★☆",
|
||||
"error": null,
|
||||
"round": 4
|
||||
},
|
||||
{
|
||||
"model": "qwen3:8b",
|
||||
"scenario": "todo",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 2,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 3,
|
||||
"testsTotal": 11,
|
||||
"testsPassed": 11,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 131964,
|
||||
"totalTokens": 11403,
|
||||
"avgTokPerSec": 97.10963264920952,
|
||||
"promptChars": 11786,
|
||||
"promptTokensEst": 2947,
|
||||
"score": 80,
|
||||
"stars": "★★★★☆",
|
||||
"error": null,
|
||||
"round": 4
|
||||
},
|
||||
{
|
||||
"model": "qwen3:8b",
|
||||
"scenario": "users",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 6,
|
||||
"testsPassed": 6,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 38820,
|
||||
"totalTokens": 1826,
|
||||
"avgTokPerSec": 101.07773707712924,
|
||||
"promptChars": 10568,
|
||||
"promptTokensEst": 2642,
|
||||
"score": 100,
|
||||
"stars": "★★★★★",
|
||||
"error": null,
|
||||
"round": 4
|
||||
},
|
||||
{
|
||||
"model": "qwen3:8b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": false,
|
||||
"specEntities": 0,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 0,
|
||||
"testsPassed": 0,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 0,
|
||||
"totalTokens": 0,
|
||||
"avgTokPerSec": 0,
|
||||
"promptChars": 0,
|
||||
"promptTokensEst": 0,
|
||||
"score": 0,
|
||||
"stars": "",
|
||||
"error": "JSON-speksi epäonnistui",
|
||||
"round": 4
|
||||
},
|
||||
{
|
||||
"model": "qwen3-coder:30b",
|
||||
"scenario": "todo",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 1,
|
||||
"testsTotal": 6,
|
||||
"testsPassed": 6,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 39797,
|
||||
"totalTokens": 3776,
|
||||
"avgTokPerSec": 120.91801837211113,
|
||||
"promptChars": 11435,
|
||||
"promptTokensEst": 2859,
|
||||
"score": 90,
|
||||
"stars": "★★★★★",
|
||||
"error": null,
|
||||
"round": 5
|
||||
},
|
||||
{
|
||||
"model": "qwen3-coder:30b",
|
||||
"scenario": "users",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 3,
|
||||
"testsTotal": 9,
|
||||
"testsPassed": 8,
|
||||
"testsFailed": 1,
|
||||
"totalDurationMs": 87836,
|
||||
"totalTokens": 9343,
|
||||
"avgTokPerSec": 119.28783662683314,
|
||||
"promptChars": 10718,
|
||||
"promptTokensEst": 2680,
|
||||
"score": 73,
|
||||
"stars": "★★★★☆",
|
||||
"error": null,
|
||||
"round": 5
|
||||
},
|
||||
{
|
||||
"model": "qwen3-coder:30b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 2,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 10,
|
||||
"testsPassed": 10,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 36644,
|
||||
"totalTokens": 3897,
|
||||
"avgTokPerSec": 122.28607796191666,
|
||||
"promptChars": 12598,
|
||||
"promptTokensEst": 3150,
|
||||
"score": 100,
|
||||
"stars": "★★★★★",
|
||||
"error": null,
|
||||
"round": 5
|
||||
},
|
||||
{
|
||||
"model": "qwen3:14b",
|
||||
"scenario": "todo",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 1,
|
||||
"testsTotal": 6,
|
||||
"testsPassed": 6,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 127532,
|
||||
"totalTokens": 3919,
|
||||
"avgTokPerSec": 34.13133325491828,
|
||||
"promptChars": 11352,
|
||||
"promptTokensEst": 2838,
|
||||
"score": 90,
|
||||
"stars": "★★★★★",
|
||||
"error": null,
|
||||
"round": 5
|
||||
},
|
||||
{
|
||||
"model": "qwen3:14b",
|
||||
"scenario": "users",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 3,
|
||||
"testsTotal": 8,
|
||||
"testsPassed": 6,
|
||||
"testsFailed": 2,
|
||||
"totalDurationMs": 217365,
|
||||
"totalTokens": 7764,
|
||||
"avgTokPerSec": 38.67613170588518,
|
||||
"promptChars": 10834,
|
||||
"promptTokensEst": 2709,
|
||||
"score": 65,
|
||||
"stars": "★★★☆☆",
|
||||
"error": null,
|
||||
"round": 5
|
||||
},
|
||||
{
|
||||
"model": "qwen3:14b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 2,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 3,
|
||||
"testsTotal": 14,
|
||||
"testsPassed": 7,
|
||||
"testsFailed": 7,
|
||||
"totalDurationMs": 248311,
|
||||
"totalTokens": 13443,
|
||||
"avgTokPerSec": 58.05680015263308,
|
||||
"promptChars": 12219,
|
||||
"promptTokensEst": 3055,
|
||||
"score": 50,
|
||||
"stars": "★★★☆☆",
|
||||
"error": null,
|
||||
"round": 5
|
||||
},
|
||||
{
|
||||
"model": "qwen3:8b",
|
||||
"scenario": "todo",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 6,
|
||||
"testsPassed": 6,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 38326,
|
||||
"totalTokens": 2079,
|
||||
"avgTokPerSec": 100.89778087504016,
|
||||
"promptChars": 10908,
|
||||
"promptTokensEst": 2727,
|
||||
"score": 100,
|
||||
"stars": "★★★★★",
|
||||
"error": null,
|
||||
"round": 5
|
||||
},
|
||||
{
|
||||
"model": "qwen3:8b",
|
||||
"scenario": "users",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 6,
|
||||
"testsPassed": 6,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 60823,
|
||||
"totalTokens": 1772,
|
||||
"avgTokPerSec": 96.76383996716295,
|
||||
"promptChars": 10378,
|
||||
"promptTokensEst": 2595,
|
||||
"score": 100,
|
||||
"stars": "★★★★★",
|
||||
"error": null,
|
||||
"round": 5
|
||||
},
|
||||
{
|
||||
"model": "qwen3:8b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 2,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 11,
|
||||
"testsPassed": 11,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 81654,
|
||||
"totalTokens": 3458,
|
||||
"avgTokPerSec": 95.65675360193613,
|
||||
"promptChars": 11914,
|
||||
"promptTokensEst": 2979,
|
||||
"score": 100,
|
||||
"stars": "★★★★★",
|
||||
"error": null,
|
||||
"round": 5
|
||||
}
|
||||
]
|
||||
183
kipina-codebench/results/2026-04-14T10-03.html
Normal file
183
kipina-codebench/results/2026-04-14T10-03.html
Normal file
@@ -0,0 +1,183 @@
|
||||
<!DOCTYPE html>
|
||||
<html lang="fi">
|
||||
<head>
|
||||
<meta charset="UTF-8">
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
||||
<title>Kipina Model Benchmark</title>
|
||||
<style>
|
||||
:root { --bg: #0d1117; --card: #161b22; --border: #30363d; --text: #e6edf3; --dim: #8b949e; --green: #3fb950; --yellow: #d29922; --red: #f85149; --blue: #58a6ff; }
|
||||
* { box-sizing: border-box; margin: 0; padding: 0; }
|
||||
body { font-family: -apple-system, 'Segoe UI', Helvetica, Arial, sans-serif; background: var(--bg); color: var(--text); padding: 2rem; max-width: 1400px; margin: 0 auto; }
|
||||
h1 { font-size: 1.5rem; margin-bottom: 0.5rem; }
|
||||
.meta { color: var(--dim); font-size: 0.85rem; margin-bottom: 2rem; }
|
||||
.cards { display: grid; grid-template-columns: repeat(auto-fit, minmax(180px, 1fr)); gap: 1rem; margin-bottom: 2rem; }
|
||||
.card { background: var(--card); border: 1px solid var(--border); border-radius: 8px; padding: 1rem; }
|
||||
.card .label { color: var(--dim); font-size: 0.75rem; text-transform: uppercase; letter-spacing: 0.05em; }
|
||||
.card .value { font-size: 1.8rem; font-weight: 600; margin-top: 0.25rem; }
|
||||
.card .sub { color: var(--dim); font-size: 0.8rem; margin-top: 0.25rem; }
|
||||
table { width: 100%; border-collapse: collapse; background: var(--card); border: 1px solid var(--border); border-radius: 8px; overflow: hidden; margin-bottom: 2rem; }
|
||||
th { background: #1c2128; text-align: left; padding: 0.6rem 0.8rem; font-size: 0.75rem; text-transform: uppercase; letter-spacing: 0.05em; color: var(--dim); cursor: pointer; user-select: none; white-space: nowrap; }
|
||||
th:hover { color: var(--text); }
|
||||
th.sorted-asc::after { content: ' ▲'; }
|
||||
th.sorted-desc::after { content: ' ▼'; }
|
||||
td { padding: 0.5rem 0.8rem; border-top: 1px solid var(--border); font-size: 0.85rem; white-space: nowrap; }
|
||||
tr:hover td { background: #1c2128; }
|
||||
.pass { color: var(--green); }
|
||||
.partial { color: var(--yellow); }
|
||||
.fail { color: var(--red); }
|
||||
.stars { letter-spacing: 1px; }
|
||||
.bar { display: inline-block; height: 8px; border-radius: 4px; vertical-align: middle; }
|
||||
.bar-bg { background: var(--border); }
|
||||
.bar-fill { background: var(--green); }
|
||||
.bar-partial { background: var(--yellow); }
|
||||
.model-name { font-weight: 600; }
|
||||
h2 { font-size: 1.1rem; margin-bottom: 1rem; color: var(--dim); }
|
||||
.summary-table th:first-child, .summary-table td:first-child { min-width: 200px; }
|
||||
</style>
|
||||
</head>
|
||||
<body>
|
||||
|
||||
<h1>Kipina Model Benchmark</h1>
|
||||
<div class="meta" id="meta"></div>
|
||||
|
||||
<div class="cards" id="cards"></div>
|
||||
|
||||
<h2>Mallikohtainen yhteenveto</h2>
|
||||
<table class="summary-table" id="summary-table"><thead></thead><tbody></tbody></table>
|
||||
|
||||
<h2>Kaikki tulokset</h2>
|
||||
<table id="results-table"><thead></thead><tbody></tbody></table>
|
||||
|
||||
<script>
|
||||
const RAW = [];
|
||||
|
||||
const starsFor = s => s >= 90 ? '★★★★★' : s >= 70 ? '★★★★☆' : s >= 50 ? '★★★☆☆' : s >= 25 ? '★★☆☆☆' : s > 0 ? '★☆☆☆☆' : '☆☆☆☆☆';
|
||||
function calcScore(r) {
|
||||
if (r.error && r.testsTotal === 0) return 0;
|
||||
let s = 0;
|
||||
if (r.specOk) s += 10;
|
||||
if (!r.error || r.testsTotal > 0) s += 10;
|
||||
if (r.testsTotal > 0) s += Math.round((r.testsPassed / r.testsTotal) * 60);
|
||||
s += Math.max(0, 20 - (r.fixRounds || 0) * 10);
|
||||
return Math.min(100, s);
|
||||
}
|
||||
// Laske pisteet jos puuttuvat
|
||||
const DATA = RAW.map(r => {
|
||||
if (r.score == null) r.score = calcScore(r);
|
||||
if (!r.stars) r.stars = starsFor(r.score);
|
||||
if (!r.promptTokensEst) r.promptTokensEst = r.promptChars ? Math.round(r.promptChars / 4) : 0;
|
||||
return r;
|
||||
});
|
||||
const cls = r => (!r.error && r.testsPassed === r.testsTotal && r.testsTotal > 0) ? 'pass' : (r.testsTotal > 0 && r.testsPassed > 0) ? 'partial' : 'fail';
|
||||
const pctBar = (passed, total, w=80) => {
|
||||
if (total === 0) return '-';
|
||||
const pct = passed/total*100;
|
||||
const c = pct === 100 ? 'bar-fill' : 'bar-partial';
|
||||
return `<span class="bar bar-bg" style="width:${w}px"><span class="bar ${c}" style="width:${Math.round(pct/100*w)}px"></span></span> ${passed}/${total}`;
|
||||
};
|
||||
|
||||
// Meta
|
||||
const totalTime = DATA.reduce((s,r) => s + r.totalDurationMs, 0);
|
||||
document.getElementById('meta').textContent = `${new Date().toLocaleDateString('fi-FI')} — ${DATA.length} ajoa — ${(totalTime/1000/60).toFixed(1)} min`;
|
||||
|
||||
// Cards
|
||||
const models = [...new Set(DATA.map(r => r.model))];
|
||||
const scenarios = [...new Set(DATA.map(r => r.scenario))];
|
||||
const avgScore = DATA.length ? Math.round(DATA.reduce((s,r) => s + r.score, 0) / DATA.length) : 0;
|
||||
const totalPassed = DATA.reduce((s,r) => s + r.testsPassed, 0);
|
||||
const totalTests = DATA.reduce((s,r) => s + r.testsTotal, 0);
|
||||
const passRate = totalTests ? Math.round(totalPassed/totalTests*100) : 0;
|
||||
const bestModel = models.map(m => {
|
||||
const mrs = DATA.filter(r => r.model === m);
|
||||
return { model: m, avg: Math.round(mrs.reduce((s,r) => s + r.score, 0) / mrs.length) };
|
||||
}).sort((a,b) => b.avg - a.avg)[0];
|
||||
const fastestModel = models.map(m => {
|
||||
const mrs = DATA.filter(r => r.model === m);
|
||||
return { model: m, speed: Math.round(mrs.reduce((s,r) => s + r.avgTokPerSec, 0) / mrs.length) };
|
||||
}).sort((a,b) => b.speed - a.speed)[0];
|
||||
|
||||
document.getElementById('cards').innerHTML = `
|
||||
<div class="card"><div class="label">Keskiarvo</div><div class="value">${starsFor(avgScore)}</div><div class="sub">${avgScore} pistetta</div></div>
|
||||
<div class="card"><div class="label">Testien läpäisy</div><div class="value">${passRate}%</div><div class="sub">${totalPassed}/${totalTests} testiä</div></div>
|
||||
<div class="card"><div class="label">Paras malli</div><div class="value" style="font-size:1.2rem">${bestModel?.model || '-'}</div><div class="sub">${bestModel?.avg || 0}p</div></div>
|
||||
<div class="card"><div class="label">Nopein</div><div class="value" style="font-size:1.2rem">${fastestModel?.model || '-'}</div><div class="sub">${fastestModel?.speed || 0} tok/s</div></div>
|
||||
<div class="card"><div class="label">Malleja</div><div class="value">${models.length}</div><div class="sub">${scenarios.length} skenaariota</div></div>
|
||||
<div class="card"><div class="label">Kokonaisaika</div><div class="value">${(totalTime/1000/60).toFixed(1)}</div><div class="sub">minuuttia</div></div>
|
||||
`;
|
||||
|
||||
// Summary table
|
||||
const sumHead = document.querySelector('#summary-table thead');
|
||||
const sumBody = document.querySelector('#summary-table tbody');
|
||||
sumHead.innerHTML = '<tr><th>Malli</th>' + scenarios.map(s => `<th>${s}</th>`).join('') + '<th>Yht.</th><th>Out tok</th><th>Aika</th><th>tok/s</th><th>Pisteet</th></tr>';
|
||||
|
||||
const modelRows = models.map(m => {
|
||||
const mrs = DATA.filter(r => r.model === m);
|
||||
const tp = mrs.reduce((s,r) => s + r.testsPassed, 0);
|
||||
const tt = mrs.reduce((s,r) => s + r.testsTotal, 0);
|
||||
const tok = mrs.reduce((s,r) => s + r.totalTokens, 0);
|
||||
const time = mrs.reduce((s,r) => s + r.totalDurationMs, 0);
|
||||
const speed = Math.round(mrs.reduce((s,r) => s + r.avgTokPerSec, 0) / mrs.length);
|
||||
const avg = Math.round(mrs.reduce((s,r) => s + r.score, 0) / mrs.length);
|
||||
const scenCols = scenarios.map(s => {
|
||||
const r = mrs.find(r => r.scenario === s);
|
||||
if (!r) return '<td>-</td>';
|
||||
return `<td class="${cls(r)}">${pctBar(r.testsPassed, r.testsTotal, 60)} <span style="color:var(--dim)">${(r.totalDurationMs/1000).toFixed(0)}s</span></td>`;
|
||||
}).join('');
|
||||
return { avg, html: `<tr><td class="model-name">${m}</td>${scenCols}<td>${pctBar(tp, tt)}</td><td>${(tok/1000).toFixed(1)}K</td><td>${(time/1000).toFixed(0)}s</td><td>${speed}</td><td><span class="stars">${starsFor(avg)}</span> ${avg}p</td></tr>` };
|
||||
}).sort((a,b) => b.avg - a.avg);
|
||||
sumBody.innerHTML = modelRows.map(r => r.html).join('');
|
||||
|
||||
// Results table
|
||||
const resHead = document.querySelector('#results-table thead');
|
||||
const resBody = document.querySelector('#results-table tbody');
|
||||
const resCols = ['Malli','Skenaario','Speksi','Testit','Korjaus','Ctx','Out tok','Aika','tok/s','Pisteet'];
|
||||
resHead.innerHTML = '<tr>' + resCols.map((c,i) => `<th data-col="${i}">${c}</th>`).join('') + '</tr>';
|
||||
|
||||
let sortCol = 9, sortAsc = false;
|
||||
function renderResults() {
|
||||
const sorted = [...DATA].sort((a,b) => {
|
||||
const vals = [
|
||||
[a.model, b.model],
|
||||
[a.scenario, b.scenario],
|
||||
[a.specEntities, b.specEntities],
|
||||
[a.testsPassed/Math.max(a.testsTotal,1), b.testsPassed/Math.max(b.testsTotal,1)],
|
||||
[a.fixRounds, b.fixRounds],
|
||||
[a.promptTokensEst, b.promptTokensEst],
|
||||
[a.totalTokens, b.totalTokens],
|
||||
[a.totalDurationMs, b.totalDurationMs],
|
||||
[a.avgTokPerSec, b.avgTokPerSec],
|
||||
[a.score, b.score],
|
||||
][sortCol];
|
||||
const cmp = typeof vals[0] === 'string' ? vals[0].localeCompare(vals[1]) : vals[0] - vals[1];
|
||||
return sortAsc ? cmp : -cmp;
|
||||
});
|
||||
resBody.innerHTML = sorted.map(r => {
|
||||
const c = cls(r);
|
||||
return `<tr>
|
||||
<td class="model-name">${r.model}</td>
|
||||
<td>${r.scenario}</td>
|
||||
<td>${r.specOk ? `✓ ${r.specEntities}e` : '<span class="fail">✗</span>'}</td>
|
||||
<td class="${c}">${pctBar(r.testsPassed, r.testsTotal)}</td>
|
||||
<td>${r.fixRounds > 0 ? r.fixRounds + '×' : '-'}</td>
|
||||
<td>${r.promptTokensEst > 0 ? '~'+(r.promptTokensEst/1000).toFixed(1)+'K' : '-'}</td>
|
||||
<td>${r.totalTokens > 0 ? (r.totalTokens/1000).toFixed(1)+'K' : '-'}</td>
|
||||
<td>${(r.totalDurationMs/1000).toFixed(0)}s</td>
|
||||
<td>${r.avgTokPerSec.toFixed(0)}</td>
|
||||
<td><span class="stars">${r.stars}</span> ${r.score}p</td>
|
||||
</tr>`;
|
||||
}).join('');
|
||||
document.querySelectorAll('#results-table th').forEach((th,i) => {
|
||||
th.className = i === sortCol ? (sortAsc ? 'sorted-asc' : 'sorted-desc') : '';
|
||||
});
|
||||
}
|
||||
document.querySelector('#results-table thead').addEventListener('click', e => {
|
||||
const col = parseInt(e.target.dataset.col);
|
||||
if (isNaN(col)) return;
|
||||
if (sortCol === col) sortAsc = !sortAsc;
|
||||
else { sortCol = col; sortAsc = false; }
|
||||
renderResults();
|
||||
});
|
||||
renderResults();
|
||||
</script>
|
||||
</body>
|
||||
</html>
|
||||
1
kipina-codebench/results/2026-04-14T10-03.json
Normal file
1
kipina-codebench/results/2026-04-14T10-03.json
Normal file
@@ -0,0 +1 @@
|
||||
[]
|
||||
183
kipina-codebench/results/2026-04-14T10-31.html
Normal file
183
kipina-codebench/results/2026-04-14T10-31.html
Normal file
File diff suppressed because one or more lines are too long
317
kipina-codebench/results/2026-04-14T10-31.json
Normal file
317
kipina-codebench/results/2026-04-14T10-31.json
Normal file
@@ -0,0 +1,317 @@
|
||||
[
|
||||
{
|
||||
"model": "qwen3:8b",
|
||||
"scenario": "todo",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 6,
|
||||
"testsPassed": 6,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 97527,
|
||||
"totalTokens": 2228,
|
||||
"avgTokPerSec": 100.69171830800946,
|
||||
"promptChars": 11566,
|
||||
"promptTokensEst": 2892,
|
||||
"score": 100,
|
||||
"stars": "★★★★★",
|
||||
"error": null,
|
||||
"round": 1
|
||||
},
|
||||
{
|
||||
"model": "qwen3:8b",
|
||||
"scenario": "users",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 7,
|
||||
"testsPassed": 7,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 39549,
|
||||
"totalTokens": 1960,
|
||||
"avgTokPerSec": 100.98265593129491,
|
||||
"promptChars": 11073,
|
||||
"promptTokensEst": 2768,
|
||||
"score": 100,
|
||||
"stars": "★★★★★",
|
||||
"error": null,
|
||||
"round": 1
|
||||
},
|
||||
{
|
||||
"model": "qwen3:8b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": false,
|
||||
"specEntities": 0,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 0,
|
||||
"testsPassed": 0,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 0,
|
||||
"totalTokens": 0,
|
||||
"avgTokPerSec": 0,
|
||||
"promptChars": 0,
|
||||
"promptTokensEst": 0,
|
||||
"score": 0,
|
||||
"stars": "",
|
||||
"error": "JSON-speksi epäonnistui",
|
||||
"round": 1
|
||||
},
|
||||
{
|
||||
"model": "qwen3:8b",
|
||||
"scenario": "todo",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 2,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 3,
|
||||
"testsTotal": 0,
|
||||
"testsPassed": 0,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 131339,
|
||||
"totalTokens": 11518,
|
||||
"avgTokPerSec": 96.52358107464266,
|
||||
"promptChars": 12388,
|
||||
"promptTokensEst": 3097,
|
||||
"score": 0,
|
||||
"stars": "☆☆☆☆☆",
|
||||
"error": "Testit kaatuivat",
|
||||
"round": 2
|
||||
},
|
||||
{
|
||||
"model": "qwen3:8b",
|
||||
"scenario": "users",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 6,
|
||||
"testsPassed": 6,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 20658,
|
||||
"totalTokens": 1808,
|
||||
"avgTokPerSec": 101.0081173861862,
|
||||
"promptChars": 11057,
|
||||
"promptTokensEst": 2764,
|
||||
"score": 100,
|
||||
"stars": "★★★★★",
|
||||
"error": null,
|
||||
"round": 2
|
||||
},
|
||||
{
|
||||
"model": "qwen3:8b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": false,
|
||||
"specEntities": 0,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 0,
|
||||
"testsPassed": 0,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 0,
|
||||
"totalTokens": 0,
|
||||
"avgTokPerSec": 0,
|
||||
"promptChars": 0,
|
||||
"promptTokensEst": 0,
|
||||
"score": 0,
|
||||
"stars": "",
|
||||
"error": "JSON-speksi epäonnistui",
|
||||
"round": 2
|
||||
},
|
||||
{
|
||||
"model": "qwen3:8b",
|
||||
"scenario": "todo",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 2,
|
||||
"validationIssues": 1,
|
||||
"fixRounds": 5,
|
||||
"testsTotal": 0,
|
||||
"testsPassed": 0,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 320031,
|
||||
"totalTokens": 11985,
|
||||
"avgTokPerSec": 54.915025374575386,
|
||||
"promptChars": 12517,
|
||||
"promptTokensEst": 3129,
|
||||
"score": 0,
|
||||
"stars": "☆☆☆☆☆",
|
||||
"error": "Testit kaatuivat",
|
||||
"round": 3
|
||||
},
|
||||
{
|
||||
"model": "qwen3:8b",
|
||||
"scenario": "users",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 7,
|
||||
"testsPassed": 7,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 28654,
|
||||
"totalTokens": 1877,
|
||||
"avgTokPerSec": 100.70920643946336,
|
||||
"promptChars": 10747,
|
||||
"promptTokensEst": 2687,
|
||||
"score": 100,
|
||||
"stars": "★★★★★",
|
||||
"error": null,
|
||||
"round": 3
|
||||
},
|
||||
{
|
||||
"model": "qwen3:8b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": false,
|
||||
"specEntities": 0,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 0,
|
||||
"testsPassed": 0,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 0,
|
||||
"totalTokens": 0,
|
||||
"avgTokPerSec": 0,
|
||||
"promptChars": 0,
|
||||
"promptTokensEst": 0,
|
||||
"score": 0,
|
||||
"stars": "",
|
||||
"error": "JSON-speksi epäonnistui",
|
||||
"round": 3
|
||||
},
|
||||
{
|
||||
"model": "qwen3:8b",
|
||||
"scenario": "todo",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 2,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 1,
|
||||
"testsTotal": 12,
|
||||
"testsPassed": 12,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 67943,
|
||||
"totalTokens": 6002,
|
||||
"avgTokPerSec": 98.29436788902672,
|
||||
"promptChars": 12389,
|
||||
"promptTokensEst": 3097,
|
||||
"score": 90,
|
||||
"stars": "★★★★★",
|
||||
"error": null,
|
||||
"round": 4
|
||||
},
|
||||
{
|
||||
"model": "qwen3:8b",
|
||||
"scenario": "users",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 6,
|
||||
"testsPassed": 6,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 20203,
|
||||
"totalTokens": 1774,
|
||||
"avgTokPerSec": 100.9066297884274,
|
||||
"promptChars": 10905,
|
||||
"promptTokensEst": 2726,
|
||||
"score": 100,
|
||||
"stars": "★★★★★",
|
||||
"error": null,
|
||||
"round": 4
|
||||
},
|
||||
{
|
||||
"model": "qwen3:8b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 2,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 3,
|
||||
"testsTotal": 13,
|
||||
"testsPassed": 12,
|
||||
"testsFailed": 1,
|
||||
"totalDurationMs": 148491,
|
||||
"totalTokens": 12747,
|
||||
"avgTokPerSec": 95.18237885727869,
|
||||
"promptChars": 12476,
|
||||
"promptTokensEst": 3119,
|
||||
"score": 75,
|
||||
"stars": "★★★★☆",
|
||||
"error": null,
|
||||
"round": 4
|
||||
},
|
||||
{
|
||||
"model": "qwen3:8b",
|
||||
"scenario": "todo",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 6,
|
||||
"testsPassed": 6,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 23830,
|
||||
"totalTokens": 2102,
|
||||
"avgTokPerSec": 100.641489789061,
|
||||
"promptChars": 11404,
|
||||
"promptTokensEst": 2851,
|
||||
"score": 100,
|
||||
"stars": "★★★★★",
|
||||
"error": null,
|
||||
"round": 5
|
||||
},
|
||||
{
|
||||
"model": "qwen3:8b",
|
||||
"scenario": "users",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 3,
|
||||
"testsTotal": 8,
|
||||
"testsPassed": 6,
|
||||
"testsFailed": 2,
|
||||
"totalDurationMs": 122453,
|
||||
"totalTokens": 7285,
|
||||
"avgTokPerSec": 94.12482830400619,
|
||||
"promptChars": 11400,
|
||||
"promptTokensEst": 2850,
|
||||
"score": 65,
|
||||
"stars": "★★★☆☆",
|
||||
"error": null,
|
||||
"round": 5
|
||||
},
|
||||
{
|
||||
"model": "qwen3:8b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 2,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 3,
|
||||
"testsTotal": 11,
|
||||
"testsPassed": 10,
|
||||
"testsFailed": 1,
|
||||
"totalDurationMs": 147125,
|
||||
"totalTokens": 9893,
|
||||
"avgTokPerSec": 97.37021605085566,
|
||||
"promptChars": 12455,
|
||||
"promptTokensEst": 3114,
|
||||
"score": 75,
|
||||
"stars": "★★★★☆",
|
||||
"error": null,
|
||||
"round": 5
|
||||
}
|
||||
]
|
||||
183
kipina-codebench/results/2026-04-14T10-59.html
Normal file
183
kipina-codebench/results/2026-04-14T10-59.html
Normal file
@@ -0,0 +1,183 @@
|
||||
<!DOCTYPE html>
|
||||
<html lang="fi">
|
||||
<head>
|
||||
<meta charset="UTF-8">
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
||||
<title>Kipina Model Benchmark</title>
|
||||
<style>
|
||||
:root { --bg: #0d1117; --card: #161b22; --border: #30363d; --text: #e6edf3; --dim: #8b949e; --green: #3fb950; --yellow: #d29922; --red: #f85149; --blue: #58a6ff; }
|
||||
* { box-sizing: border-box; margin: 0; padding: 0; }
|
||||
body { font-family: -apple-system, 'Segoe UI', Helvetica, Arial, sans-serif; background: var(--bg); color: var(--text); padding: 2rem; max-width: 1400px; margin: 0 auto; }
|
||||
h1 { font-size: 1.5rem; margin-bottom: 0.5rem; }
|
||||
.meta { color: var(--dim); font-size: 0.85rem; margin-bottom: 2rem; }
|
||||
.cards { display: grid; grid-template-columns: repeat(auto-fit, minmax(180px, 1fr)); gap: 1rem; margin-bottom: 2rem; }
|
||||
.card { background: var(--card); border: 1px solid var(--border); border-radius: 8px; padding: 1rem; }
|
||||
.card .label { color: var(--dim); font-size: 0.75rem; text-transform: uppercase; letter-spacing: 0.05em; }
|
||||
.card .value { font-size: 1.8rem; font-weight: 600; margin-top: 0.25rem; }
|
||||
.card .sub { color: var(--dim); font-size: 0.8rem; margin-top: 0.25rem; }
|
||||
table { width: 100%; border-collapse: collapse; background: var(--card); border: 1px solid var(--border); border-radius: 8px; overflow: hidden; margin-bottom: 2rem; }
|
||||
th { background: #1c2128; text-align: left; padding: 0.6rem 0.8rem; font-size: 0.75rem; text-transform: uppercase; letter-spacing: 0.05em; color: var(--dim); cursor: pointer; user-select: none; white-space: nowrap; }
|
||||
th:hover { color: var(--text); }
|
||||
th.sorted-asc::after { content: ' ▲'; }
|
||||
th.sorted-desc::after { content: ' ▼'; }
|
||||
td { padding: 0.5rem 0.8rem; border-top: 1px solid var(--border); font-size: 0.85rem; white-space: nowrap; }
|
||||
tr:hover td { background: #1c2128; }
|
||||
.pass { color: var(--green); }
|
||||
.partial { color: var(--yellow); }
|
||||
.fail { color: var(--red); }
|
||||
.stars { letter-spacing: 1px; }
|
||||
.bar { display: inline-block; height: 8px; border-radius: 4px; vertical-align: middle; }
|
||||
.bar-bg { background: var(--border); }
|
||||
.bar-fill { background: var(--green); }
|
||||
.bar-partial { background: var(--yellow); }
|
||||
.model-name { font-weight: 600; }
|
||||
h2 { font-size: 1.1rem; margin-bottom: 1rem; color: var(--dim); }
|
||||
.summary-table th:first-child, .summary-table td:first-child { min-width: 200px; }
|
||||
</style>
|
||||
</head>
|
||||
<body>
|
||||
|
||||
<h1>Kipina Model Benchmark</h1>
|
||||
<div class="meta" id="meta"></div>
|
||||
|
||||
<div class="cards" id="cards"></div>
|
||||
|
||||
<h2>Mallikohtainen yhteenveto</h2>
|
||||
<table class="summary-table" id="summary-table"><thead></thead><tbody></tbody></table>
|
||||
|
||||
<h2>Kaikki tulokset</h2>
|
||||
<table id="results-table"><thead></thead><tbody></tbody></table>
|
||||
|
||||
<script>
|
||||
const RAW = [{"model":"qwen3:8b","scenario":"blog","reqOk":true,"specOk":true,"specEntities":2,"validationIssues":0,"fixRounds":1,"testsTotal":11,"testsPassed":11,"testsFailed":0,"totalDurationMs":64124,"totalTokens":5689,"avgTokPerSec":98.61378134916481,"promptChars":12098,"promptTokensEst":3025,"score":90,"stars":"★★★★★","error":null,"profile":"small","promptName":"code-small","round":1},{"model":"qwen3:8b","scenario":"blog","reqOk":true,"specOk":true,"specEntities":2,"validationIssues":0,"fixRounds":3,"testsTotal":0,"testsPassed":0,"testsFailed":0,"totalDurationMs":126014,"totalTokens":11162,"avgTokPerSec":97.09858655726343,"promptChars":12101,"promptTokensEst":3025,"score":0,"stars":"☆☆☆☆☆","error":"Testit kaatuivat","profile":"small","promptName":"code-small","round":2},{"model":"qwen3:8b","scenario":"blog","reqOk":true,"specOk":false,"specEntities":0,"validationIssues":0,"fixRounds":0,"testsTotal":0,"testsPassed":0,"testsFailed":0,"totalDurationMs":0,"totalTokens":0,"avgTokPerSec":0,"promptChars":0,"promptTokensEst":0,"score":0,"stars":"","error":"JSON-speksi epäonnistui","round":3}];
|
||||
|
||||
const starsFor = s => s >= 90 ? '★★★★★' : s >= 70 ? '★★★★☆' : s >= 50 ? '★★★☆☆' : s >= 25 ? '★★☆☆☆' : s > 0 ? '★☆☆☆☆' : '☆☆☆☆☆';
|
||||
function calcScore(r) {
|
||||
if (r.error && r.testsTotal === 0) return 0;
|
||||
let s = 0;
|
||||
if (r.specOk) s += 10;
|
||||
if (!r.error || r.testsTotal > 0) s += 10;
|
||||
if (r.testsTotal > 0) s += Math.round((r.testsPassed / r.testsTotal) * 60);
|
||||
s += Math.max(0, 20 - (r.fixRounds || 0) * 10);
|
||||
return Math.min(100, s);
|
||||
}
|
||||
// Laske pisteet jos puuttuvat
|
||||
const DATA = RAW.map(r => {
|
||||
if (r.score == null) r.score = calcScore(r);
|
||||
if (!r.stars) r.stars = starsFor(r.score);
|
||||
if (!r.promptTokensEst) r.promptTokensEst = r.promptChars ? Math.round(r.promptChars / 4) : 0;
|
||||
return r;
|
||||
});
|
||||
const cls = r => (!r.error && r.testsPassed === r.testsTotal && r.testsTotal > 0) ? 'pass' : (r.testsTotal > 0 && r.testsPassed > 0) ? 'partial' : 'fail';
|
||||
const pctBar = (passed, total, w=80) => {
|
||||
if (total === 0) return '-';
|
||||
const pct = passed/total*100;
|
||||
const c = pct === 100 ? 'bar-fill' : 'bar-partial';
|
||||
return `<span class="bar bar-bg" style="width:${w}px"><span class="bar ${c}" style="width:${Math.round(pct/100*w)}px"></span></span> ${passed}/${total}`;
|
||||
};
|
||||
|
||||
// Meta
|
||||
const totalTime = DATA.reduce((s,r) => s + r.totalDurationMs, 0);
|
||||
document.getElementById('meta').textContent = `${new Date().toLocaleDateString('fi-FI')} — ${DATA.length} ajoa — ${(totalTime/1000/60).toFixed(1)} min`;
|
||||
|
||||
// Cards
|
||||
const models = [...new Set(DATA.map(r => r.model))];
|
||||
const scenarios = [...new Set(DATA.map(r => r.scenario))];
|
||||
const avgScore = DATA.length ? Math.round(DATA.reduce((s,r) => s + r.score, 0) / DATA.length) : 0;
|
||||
const totalPassed = DATA.reduce((s,r) => s + r.testsPassed, 0);
|
||||
const totalTests = DATA.reduce((s,r) => s + r.testsTotal, 0);
|
||||
const passRate = totalTests ? Math.round(totalPassed/totalTests*100) : 0;
|
||||
const bestModel = models.map(m => {
|
||||
const mrs = DATA.filter(r => r.model === m);
|
||||
return { model: m, avg: Math.round(mrs.reduce((s,r) => s + r.score, 0) / mrs.length) };
|
||||
}).sort((a,b) => b.avg - a.avg)[0];
|
||||
const fastestModel = models.map(m => {
|
||||
const mrs = DATA.filter(r => r.model === m);
|
||||
return { model: m, speed: Math.round(mrs.reduce((s,r) => s + r.avgTokPerSec, 0) / mrs.length) };
|
||||
}).sort((a,b) => b.speed - a.speed)[0];
|
||||
|
||||
document.getElementById('cards').innerHTML = `
|
||||
<div class="card"><div class="label">Keskiarvo</div><div class="value">${starsFor(avgScore)}</div><div class="sub">${avgScore} pistetta</div></div>
|
||||
<div class="card"><div class="label">Testien läpäisy</div><div class="value">${passRate}%</div><div class="sub">${totalPassed}/${totalTests} testiä</div></div>
|
||||
<div class="card"><div class="label">Paras malli</div><div class="value" style="font-size:1.2rem">${bestModel?.model || '-'}</div><div class="sub">${bestModel?.avg || 0}p</div></div>
|
||||
<div class="card"><div class="label">Nopein</div><div class="value" style="font-size:1.2rem">${fastestModel?.model || '-'}</div><div class="sub">${fastestModel?.speed || 0} tok/s</div></div>
|
||||
<div class="card"><div class="label">Malleja</div><div class="value">${models.length}</div><div class="sub">${scenarios.length} skenaariota</div></div>
|
||||
<div class="card"><div class="label">Kokonaisaika</div><div class="value">${(totalTime/1000/60).toFixed(1)}</div><div class="sub">minuuttia</div></div>
|
||||
`;
|
||||
|
||||
// Summary table
|
||||
const sumHead = document.querySelector('#summary-table thead');
|
||||
const sumBody = document.querySelector('#summary-table tbody');
|
||||
sumHead.innerHTML = '<tr><th>Malli</th>' + scenarios.map(s => `<th>${s}</th>`).join('') + '<th>Yht.</th><th>Out tok</th><th>Aika</th><th>tok/s</th><th>Pisteet</th></tr>';
|
||||
|
||||
const modelRows = models.map(m => {
|
||||
const mrs = DATA.filter(r => r.model === m);
|
||||
const tp = mrs.reduce((s,r) => s + r.testsPassed, 0);
|
||||
const tt = mrs.reduce((s,r) => s + r.testsTotal, 0);
|
||||
const tok = mrs.reduce((s,r) => s + r.totalTokens, 0);
|
||||
const time = mrs.reduce((s,r) => s + r.totalDurationMs, 0);
|
||||
const speed = Math.round(mrs.reduce((s,r) => s + r.avgTokPerSec, 0) / mrs.length);
|
||||
const avg = Math.round(mrs.reduce((s,r) => s + r.score, 0) / mrs.length);
|
||||
const scenCols = scenarios.map(s => {
|
||||
const r = mrs.find(r => r.scenario === s);
|
||||
if (!r) return '<td>-</td>';
|
||||
return `<td class="${cls(r)}">${pctBar(r.testsPassed, r.testsTotal, 60)} <span style="color:var(--dim)">${(r.totalDurationMs/1000).toFixed(0)}s</span></td>`;
|
||||
}).join('');
|
||||
return { avg, html: `<tr><td class="model-name">${m}</td>${scenCols}<td>${pctBar(tp, tt)}</td><td>${(tok/1000).toFixed(1)}K</td><td>${(time/1000).toFixed(0)}s</td><td>${speed}</td><td><span class="stars">${starsFor(avg)}</span> ${avg}p</td></tr>` };
|
||||
}).sort((a,b) => b.avg - a.avg);
|
||||
sumBody.innerHTML = modelRows.map(r => r.html).join('');
|
||||
|
||||
// Results table
|
||||
const resHead = document.querySelector('#results-table thead');
|
||||
const resBody = document.querySelector('#results-table tbody');
|
||||
const resCols = ['Malli','Skenaario','Speksi','Testit','Korjaus','Ctx','Out tok','Aika','tok/s','Pisteet'];
|
||||
resHead.innerHTML = '<tr>' + resCols.map((c,i) => `<th data-col="${i}">${c}</th>`).join('') + '</tr>';
|
||||
|
||||
let sortCol = 9, sortAsc = false;
|
||||
function renderResults() {
|
||||
const sorted = [...DATA].sort((a,b) => {
|
||||
const vals = [
|
||||
[a.model, b.model],
|
||||
[a.scenario, b.scenario],
|
||||
[a.specEntities, b.specEntities],
|
||||
[a.testsPassed/Math.max(a.testsTotal,1), b.testsPassed/Math.max(b.testsTotal,1)],
|
||||
[a.fixRounds, b.fixRounds],
|
||||
[a.promptTokensEst, b.promptTokensEst],
|
||||
[a.totalTokens, b.totalTokens],
|
||||
[a.totalDurationMs, b.totalDurationMs],
|
||||
[a.avgTokPerSec, b.avgTokPerSec],
|
||||
[a.score, b.score],
|
||||
][sortCol];
|
||||
const cmp = typeof vals[0] === 'string' ? vals[0].localeCompare(vals[1]) : vals[0] - vals[1];
|
||||
return sortAsc ? cmp : -cmp;
|
||||
});
|
||||
resBody.innerHTML = sorted.map(r => {
|
||||
const c = cls(r);
|
||||
return `<tr>
|
||||
<td class="model-name">${r.model}</td>
|
||||
<td>${r.scenario}</td>
|
||||
<td>${r.specOk ? `✓ ${r.specEntities}e` : '<span class="fail">✗</span>'}</td>
|
||||
<td class="${c}">${pctBar(r.testsPassed, r.testsTotal)}</td>
|
||||
<td>${r.fixRounds > 0 ? r.fixRounds + '×' : '-'}</td>
|
||||
<td>${r.promptTokensEst > 0 ? '~'+(r.promptTokensEst/1000).toFixed(1)+'K' : '-'}</td>
|
||||
<td>${r.totalTokens > 0 ? (r.totalTokens/1000).toFixed(1)+'K' : '-'}</td>
|
||||
<td>${(r.totalDurationMs/1000).toFixed(0)}s</td>
|
||||
<td>${r.avgTokPerSec.toFixed(0)}</td>
|
||||
<td><span class="stars">${r.stars}</span> ${r.score}p</td>
|
||||
</tr>`;
|
||||
}).join('');
|
||||
document.querySelectorAll('#results-table th').forEach((th,i) => {
|
||||
th.className = i === sortCol ? (sortAsc ? 'sorted-asc' : 'sorted-desc') : '';
|
||||
});
|
||||
}
|
||||
document.querySelector('#results-table thead').addEventListener('click', e => {
|
||||
const col = parseInt(e.target.dataset.col);
|
||||
if (isNaN(col)) return;
|
||||
if (sortCol === col) sortAsc = !sortAsc;
|
||||
else { sortCol = col; sortAsc = false; }
|
||||
renderResults();
|
||||
});
|
||||
renderResults();
|
||||
</script>
|
||||
</body>
|
||||
</html>
|
||||
69
kipina-codebench/results/2026-04-14T10-59.json
Normal file
69
kipina-codebench/results/2026-04-14T10-59.json
Normal file
@@ -0,0 +1,69 @@
|
||||
[
|
||||
{
|
||||
"model": "qwen3:8b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 2,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 1,
|
||||
"testsTotal": 11,
|
||||
"testsPassed": 11,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 64124,
|
||||
"totalTokens": 5689,
|
||||
"avgTokPerSec": 98.61378134916481,
|
||||
"promptChars": 12098,
|
||||
"promptTokensEst": 3025,
|
||||
"score": 90,
|
||||
"stars": "★★★★★",
|
||||
"error": null,
|
||||
"profile": "small",
|
||||
"promptName": "code-small",
|
||||
"round": 1
|
||||
},
|
||||
{
|
||||
"model": "qwen3:8b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 2,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 3,
|
||||
"testsTotal": 0,
|
||||
"testsPassed": 0,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 126014,
|
||||
"totalTokens": 11162,
|
||||
"avgTokPerSec": 97.09858655726343,
|
||||
"promptChars": 12101,
|
||||
"promptTokensEst": 3025,
|
||||
"score": 0,
|
||||
"stars": "☆☆☆☆☆",
|
||||
"error": "Testit kaatuivat",
|
||||
"profile": "small",
|
||||
"promptName": "code-small",
|
||||
"round": 2
|
||||
},
|
||||
{
|
||||
"model": "qwen3:8b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": false,
|
||||
"specEntities": 0,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 0,
|
||||
"testsPassed": 0,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 0,
|
||||
"totalTokens": 0,
|
||||
"avgTokPerSec": 0,
|
||||
"promptChars": 0,
|
||||
"promptTokensEst": 0,
|
||||
"score": 0,
|
||||
"stars": "",
|
||||
"error": "JSON-speksi epäonnistui",
|
||||
"round": 3
|
||||
}
|
||||
]
|
||||
183
kipina-codebench/results/2026-04-14T11-06.html
Normal file
183
kipina-codebench/results/2026-04-14T11-06.html
Normal file
@@ -0,0 +1,183 @@
|
||||
<!DOCTYPE html>
|
||||
<html lang="fi">
|
||||
<head>
|
||||
<meta charset="UTF-8">
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
||||
<title>Kipina Model Benchmark</title>
|
||||
<style>
|
||||
:root { --bg: #0d1117; --card: #161b22; --border: #30363d; --text: #e6edf3; --dim: #8b949e; --green: #3fb950; --yellow: #d29922; --red: #f85149; --blue: #58a6ff; }
|
||||
* { box-sizing: border-box; margin: 0; padding: 0; }
|
||||
body { font-family: -apple-system, 'Segoe UI', Helvetica, Arial, sans-serif; background: var(--bg); color: var(--text); padding: 2rem; max-width: 1400px; margin: 0 auto; }
|
||||
h1 { font-size: 1.5rem; margin-bottom: 0.5rem; }
|
||||
.meta { color: var(--dim); font-size: 0.85rem; margin-bottom: 2rem; }
|
||||
.cards { display: grid; grid-template-columns: repeat(auto-fit, minmax(180px, 1fr)); gap: 1rem; margin-bottom: 2rem; }
|
||||
.card { background: var(--card); border: 1px solid var(--border); border-radius: 8px; padding: 1rem; }
|
||||
.card .label { color: var(--dim); font-size: 0.75rem; text-transform: uppercase; letter-spacing: 0.05em; }
|
||||
.card .value { font-size: 1.8rem; font-weight: 600; margin-top: 0.25rem; }
|
||||
.card .sub { color: var(--dim); font-size: 0.8rem; margin-top: 0.25rem; }
|
||||
table { width: 100%; border-collapse: collapse; background: var(--card); border: 1px solid var(--border); border-radius: 8px; overflow: hidden; margin-bottom: 2rem; }
|
||||
th { background: #1c2128; text-align: left; padding: 0.6rem 0.8rem; font-size: 0.75rem; text-transform: uppercase; letter-spacing: 0.05em; color: var(--dim); cursor: pointer; user-select: none; white-space: nowrap; }
|
||||
th:hover { color: var(--text); }
|
||||
th.sorted-asc::after { content: ' ▲'; }
|
||||
th.sorted-desc::after { content: ' ▼'; }
|
||||
td { padding: 0.5rem 0.8rem; border-top: 1px solid var(--border); font-size: 0.85rem; white-space: nowrap; }
|
||||
tr:hover td { background: #1c2128; }
|
||||
.pass { color: var(--green); }
|
||||
.partial { color: var(--yellow); }
|
||||
.fail { color: var(--red); }
|
||||
.stars { letter-spacing: 1px; }
|
||||
.bar { display: inline-block; height: 8px; border-radius: 4px; vertical-align: middle; }
|
||||
.bar-bg { background: var(--border); }
|
||||
.bar-fill { background: var(--green); }
|
||||
.bar-partial { background: var(--yellow); }
|
||||
.model-name { font-weight: 600; }
|
||||
h2 { font-size: 1.1rem; margin-bottom: 1rem; color: var(--dim); }
|
||||
.summary-table th:first-child, .summary-table td:first-child { min-width: 200px; }
|
||||
</style>
|
||||
</head>
|
||||
<body>
|
||||
|
||||
<h1>Kipina Model Benchmark</h1>
|
||||
<div class="meta" id="meta"></div>
|
||||
|
||||
<div class="cards" id="cards"></div>
|
||||
|
||||
<h2>Mallikohtainen yhteenveto</h2>
|
||||
<table class="summary-table" id="summary-table"><thead></thead><tbody></tbody></table>
|
||||
|
||||
<h2>Kaikki tulokset</h2>
|
||||
<table id="results-table"><thead></thead><tbody></tbody></table>
|
||||
|
||||
<script>
|
||||
const RAW = [{"model":"qwen3:8b","scenario":"blog","reqOk":true,"specOk":true,"specEntities":2,"validationIssues":0,"fixRounds":3,"testsTotal":12,"testsPassed":10,"testsFailed":2,"totalDurationMs":139308,"totalTokens":11782,"avgTokPerSec":96.85039238572556,"promptChars":11148,"promptTokensEst":2787,"score":70,"stars":"★★★★☆","error":null,"profile":"small","promptName":"code-small","round":1},{"model":"qwen3:8b","scenario":"blog","reqOk":true,"specOk":true,"specEntities":2,"validationIssues":0,"fixRounds":3,"testsTotal":1,"testsPassed":0,"testsFailed":1,"totalDurationMs":132306,"totalTokens":11671,"avgTokPerSec":96.88921767777383,"promptChars":11267,"promptTokensEst":2817,"score":20,"stars":"★☆☆☆☆","error":"Syntaksivirhe","profile":"small","promptName":"code-small","round":2},{"model":"qwen3:8b","scenario":"blog","reqOk":true,"specOk":true,"specEntities":2,"validationIssues":0,"fixRounds":3,"testsTotal":12,"testsPassed":11,"testsFailed":1,"totalDurationMs":126092,"totalTokens":11132,"avgTokPerSec":96.98598556369416,"promptChars":11292,"promptTokensEst":2823,"score":75,"stars":"★★★★☆","error":null,"profile":"small","promptName":"code-small","round":3}];
|
||||
|
||||
const starsFor = s => s >= 90 ? '★★★★★' : s >= 70 ? '★★★★☆' : s >= 50 ? '★★★☆☆' : s >= 25 ? '★★☆☆☆' : s > 0 ? '★☆☆☆☆' : '☆☆☆☆☆';
|
||||
function calcScore(r) {
|
||||
if (r.error && r.testsTotal === 0) return 0;
|
||||
let s = 0;
|
||||
if (r.specOk) s += 10;
|
||||
if (!r.error || r.testsTotal > 0) s += 10;
|
||||
if (r.testsTotal > 0) s += Math.round((r.testsPassed / r.testsTotal) * 60);
|
||||
s += Math.max(0, 20 - (r.fixRounds || 0) * 10);
|
||||
return Math.min(100, s);
|
||||
}
|
||||
// Laske pisteet jos puuttuvat
|
||||
const DATA = RAW.map(r => {
|
||||
if (r.score == null) r.score = calcScore(r);
|
||||
if (!r.stars) r.stars = starsFor(r.score);
|
||||
if (!r.promptTokensEst) r.promptTokensEst = r.promptChars ? Math.round(r.promptChars / 4) : 0;
|
||||
return r;
|
||||
});
|
||||
const cls = r => (!r.error && r.testsPassed === r.testsTotal && r.testsTotal > 0) ? 'pass' : (r.testsTotal > 0 && r.testsPassed > 0) ? 'partial' : 'fail';
|
||||
const pctBar = (passed, total, w=80) => {
|
||||
if (total === 0) return '-';
|
||||
const pct = passed/total*100;
|
||||
const c = pct === 100 ? 'bar-fill' : 'bar-partial';
|
||||
return `<span class="bar bar-bg" style="width:${w}px"><span class="bar ${c}" style="width:${Math.round(pct/100*w)}px"></span></span> ${passed}/${total}`;
|
||||
};
|
||||
|
||||
// Meta
|
||||
const totalTime = DATA.reduce((s,r) => s + r.totalDurationMs, 0);
|
||||
document.getElementById('meta').textContent = `${new Date().toLocaleDateString('fi-FI')} — ${DATA.length} ajoa — ${(totalTime/1000/60).toFixed(1)} min`;
|
||||
|
||||
// Cards
|
||||
const models = [...new Set(DATA.map(r => r.model))];
|
||||
const scenarios = [...new Set(DATA.map(r => r.scenario))];
|
||||
const avgScore = DATA.length ? Math.round(DATA.reduce((s,r) => s + r.score, 0) / DATA.length) : 0;
|
||||
const totalPassed = DATA.reduce((s,r) => s + r.testsPassed, 0);
|
||||
const totalTests = DATA.reduce((s,r) => s + r.testsTotal, 0);
|
||||
const passRate = totalTests ? Math.round(totalPassed/totalTests*100) : 0;
|
||||
const bestModel = models.map(m => {
|
||||
const mrs = DATA.filter(r => r.model === m);
|
||||
return { model: m, avg: Math.round(mrs.reduce((s,r) => s + r.score, 0) / mrs.length) };
|
||||
}).sort((a,b) => b.avg - a.avg)[0];
|
||||
const fastestModel = models.map(m => {
|
||||
const mrs = DATA.filter(r => r.model === m);
|
||||
return { model: m, speed: Math.round(mrs.reduce((s,r) => s + r.avgTokPerSec, 0) / mrs.length) };
|
||||
}).sort((a,b) => b.speed - a.speed)[0];
|
||||
|
||||
document.getElementById('cards').innerHTML = `
|
||||
<div class="card"><div class="label">Keskiarvo</div><div class="value">${starsFor(avgScore)}</div><div class="sub">${avgScore} pistetta</div></div>
|
||||
<div class="card"><div class="label">Testien läpäisy</div><div class="value">${passRate}%</div><div class="sub">${totalPassed}/${totalTests} testiä</div></div>
|
||||
<div class="card"><div class="label">Paras malli</div><div class="value" style="font-size:1.2rem">${bestModel?.model || '-'}</div><div class="sub">${bestModel?.avg || 0}p</div></div>
|
||||
<div class="card"><div class="label">Nopein</div><div class="value" style="font-size:1.2rem">${fastestModel?.model || '-'}</div><div class="sub">${fastestModel?.speed || 0} tok/s</div></div>
|
||||
<div class="card"><div class="label">Malleja</div><div class="value">${models.length}</div><div class="sub">${scenarios.length} skenaariota</div></div>
|
||||
<div class="card"><div class="label">Kokonaisaika</div><div class="value">${(totalTime/1000/60).toFixed(1)}</div><div class="sub">minuuttia</div></div>
|
||||
`;
|
||||
|
||||
// Summary table
|
||||
const sumHead = document.querySelector('#summary-table thead');
|
||||
const sumBody = document.querySelector('#summary-table tbody');
|
||||
sumHead.innerHTML = '<tr><th>Malli</th>' + scenarios.map(s => `<th>${s}</th>`).join('') + '<th>Yht.</th><th>Out tok</th><th>Aika</th><th>tok/s</th><th>Pisteet</th></tr>';
|
||||
|
||||
const modelRows = models.map(m => {
|
||||
const mrs = DATA.filter(r => r.model === m);
|
||||
const tp = mrs.reduce((s,r) => s + r.testsPassed, 0);
|
||||
const tt = mrs.reduce((s,r) => s + r.testsTotal, 0);
|
||||
const tok = mrs.reduce((s,r) => s + r.totalTokens, 0);
|
||||
const time = mrs.reduce((s,r) => s + r.totalDurationMs, 0);
|
||||
const speed = Math.round(mrs.reduce((s,r) => s + r.avgTokPerSec, 0) / mrs.length);
|
||||
const avg = Math.round(mrs.reduce((s,r) => s + r.score, 0) / mrs.length);
|
||||
const scenCols = scenarios.map(s => {
|
||||
const r = mrs.find(r => r.scenario === s);
|
||||
if (!r) return '<td>-</td>';
|
||||
return `<td class="${cls(r)}">${pctBar(r.testsPassed, r.testsTotal, 60)} <span style="color:var(--dim)">${(r.totalDurationMs/1000).toFixed(0)}s</span></td>`;
|
||||
}).join('');
|
||||
return { avg, html: `<tr><td class="model-name">${m}</td>${scenCols}<td>${pctBar(tp, tt)}</td><td>${(tok/1000).toFixed(1)}K</td><td>${(time/1000).toFixed(0)}s</td><td>${speed}</td><td><span class="stars">${starsFor(avg)}</span> ${avg}p</td></tr>` };
|
||||
}).sort((a,b) => b.avg - a.avg);
|
||||
sumBody.innerHTML = modelRows.map(r => r.html).join('');
|
||||
|
||||
// Results table
|
||||
const resHead = document.querySelector('#results-table thead');
|
||||
const resBody = document.querySelector('#results-table tbody');
|
||||
const resCols = ['Malli','Skenaario','Speksi','Testit','Korjaus','Ctx','Out tok','Aika','tok/s','Pisteet'];
|
||||
resHead.innerHTML = '<tr>' + resCols.map((c,i) => `<th data-col="${i}">${c}</th>`).join('') + '</tr>';
|
||||
|
||||
let sortCol = 9, sortAsc = false;
|
||||
function renderResults() {
|
||||
const sorted = [...DATA].sort((a,b) => {
|
||||
const vals = [
|
||||
[a.model, b.model],
|
||||
[a.scenario, b.scenario],
|
||||
[a.specEntities, b.specEntities],
|
||||
[a.testsPassed/Math.max(a.testsTotal,1), b.testsPassed/Math.max(b.testsTotal,1)],
|
||||
[a.fixRounds, b.fixRounds],
|
||||
[a.promptTokensEst, b.promptTokensEst],
|
||||
[a.totalTokens, b.totalTokens],
|
||||
[a.totalDurationMs, b.totalDurationMs],
|
||||
[a.avgTokPerSec, b.avgTokPerSec],
|
||||
[a.score, b.score],
|
||||
][sortCol];
|
||||
const cmp = typeof vals[0] === 'string' ? vals[0].localeCompare(vals[1]) : vals[0] - vals[1];
|
||||
return sortAsc ? cmp : -cmp;
|
||||
});
|
||||
resBody.innerHTML = sorted.map(r => {
|
||||
const c = cls(r);
|
||||
return `<tr>
|
||||
<td class="model-name">${r.model}</td>
|
||||
<td>${r.scenario}</td>
|
||||
<td>${r.specOk ? `✓ ${r.specEntities}e` : '<span class="fail">✗</span>'}</td>
|
||||
<td class="${c}">${pctBar(r.testsPassed, r.testsTotal)}</td>
|
||||
<td>${r.fixRounds > 0 ? r.fixRounds + '×' : '-'}</td>
|
||||
<td>${r.promptTokensEst > 0 ? '~'+(r.promptTokensEst/1000).toFixed(1)+'K' : '-'}</td>
|
||||
<td>${r.totalTokens > 0 ? (r.totalTokens/1000).toFixed(1)+'K' : '-'}</td>
|
||||
<td>${(r.totalDurationMs/1000).toFixed(0)}s</td>
|
||||
<td>${r.avgTokPerSec.toFixed(0)}</td>
|
||||
<td><span class="stars">${r.stars}</span> ${r.score}p</td>
|
||||
</tr>`;
|
||||
}).join('');
|
||||
document.querySelectorAll('#results-table th').forEach((th,i) => {
|
||||
th.className = i === sortCol ? (sortAsc ? 'sorted-asc' : 'sorted-desc') : '';
|
||||
});
|
||||
}
|
||||
document.querySelector('#results-table thead').addEventListener('click', e => {
|
||||
const col = parseInt(e.target.dataset.col);
|
||||
if (isNaN(col)) return;
|
||||
if (sortCol === col) sortAsc = !sortAsc;
|
||||
else { sortCol = col; sortAsc = false; }
|
||||
renderResults();
|
||||
});
|
||||
renderResults();
|
||||
</script>
|
||||
</body>
|
||||
</html>
|
||||
71
kipina-codebench/results/2026-04-14T11-06.json
Normal file
71
kipina-codebench/results/2026-04-14T11-06.json
Normal file
@@ -0,0 +1,71 @@
|
||||
[
|
||||
{
|
||||
"model": "qwen3:8b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 2,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 3,
|
||||
"testsTotal": 12,
|
||||
"testsPassed": 10,
|
||||
"testsFailed": 2,
|
||||
"totalDurationMs": 139308,
|
||||
"totalTokens": 11782,
|
||||
"avgTokPerSec": 96.85039238572556,
|
||||
"promptChars": 11148,
|
||||
"promptTokensEst": 2787,
|
||||
"score": 70,
|
||||
"stars": "★★★★☆",
|
||||
"error": null,
|
||||
"profile": "small",
|
||||
"promptName": "code-small",
|
||||
"round": 1
|
||||
},
|
||||
{
|
||||
"model": "qwen3:8b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 2,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 3,
|
||||
"testsTotal": 1,
|
||||
"testsPassed": 0,
|
||||
"testsFailed": 1,
|
||||
"totalDurationMs": 132306,
|
||||
"totalTokens": 11671,
|
||||
"avgTokPerSec": 96.88921767777383,
|
||||
"promptChars": 11267,
|
||||
"promptTokensEst": 2817,
|
||||
"score": 20,
|
||||
"stars": "★☆☆☆☆",
|
||||
"error": "Syntaksivirhe",
|
||||
"profile": "small",
|
||||
"promptName": "code-small",
|
||||
"round": 2
|
||||
},
|
||||
{
|
||||
"model": "qwen3:8b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 2,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 3,
|
||||
"testsTotal": 12,
|
||||
"testsPassed": 11,
|
||||
"testsFailed": 1,
|
||||
"totalDurationMs": 126092,
|
||||
"totalTokens": 11132,
|
||||
"avgTokPerSec": 96.98598556369416,
|
||||
"promptChars": 11292,
|
||||
"promptTokensEst": 2823,
|
||||
"score": 75,
|
||||
"stars": "★★★★☆",
|
||||
"error": null,
|
||||
"profile": "small",
|
||||
"promptName": "code-small",
|
||||
"round": 3
|
||||
}
|
||||
]
|
||||
183
kipina-codebench/results/2026-04-14T11-15.html
Normal file
183
kipina-codebench/results/2026-04-14T11-15.html
Normal file
@@ -0,0 +1,183 @@
|
||||
<!DOCTYPE html>
|
||||
<html lang="fi">
|
||||
<head>
|
||||
<meta charset="UTF-8">
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
||||
<title>Kipina Model Benchmark</title>
|
||||
<style>
|
||||
:root { --bg: #0d1117; --card: #161b22; --border: #30363d; --text: #e6edf3; --dim: #8b949e; --green: #3fb950; --yellow: #d29922; --red: #f85149; --blue: #58a6ff; }
|
||||
* { box-sizing: border-box; margin: 0; padding: 0; }
|
||||
body { font-family: -apple-system, 'Segoe UI', Helvetica, Arial, sans-serif; background: var(--bg); color: var(--text); padding: 2rem; max-width: 1400px; margin: 0 auto; }
|
||||
h1 { font-size: 1.5rem; margin-bottom: 0.5rem; }
|
||||
.meta { color: var(--dim); font-size: 0.85rem; margin-bottom: 2rem; }
|
||||
.cards { display: grid; grid-template-columns: repeat(auto-fit, minmax(180px, 1fr)); gap: 1rem; margin-bottom: 2rem; }
|
||||
.card { background: var(--card); border: 1px solid var(--border); border-radius: 8px; padding: 1rem; }
|
||||
.card .label { color: var(--dim); font-size: 0.75rem; text-transform: uppercase; letter-spacing: 0.05em; }
|
||||
.card .value { font-size: 1.8rem; font-weight: 600; margin-top: 0.25rem; }
|
||||
.card .sub { color: var(--dim); font-size: 0.8rem; margin-top: 0.25rem; }
|
||||
table { width: 100%; border-collapse: collapse; background: var(--card); border: 1px solid var(--border); border-radius: 8px; overflow: hidden; margin-bottom: 2rem; }
|
||||
th { background: #1c2128; text-align: left; padding: 0.6rem 0.8rem; font-size: 0.75rem; text-transform: uppercase; letter-spacing: 0.05em; color: var(--dim); cursor: pointer; user-select: none; white-space: nowrap; }
|
||||
th:hover { color: var(--text); }
|
||||
th.sorted-asc::after { content: ' ▲'; }
|
||||
th.sorted-desc::after { content: ' ▼'; }
|
||||
td { padding: 0.5rem 0.8rem; border-top: 1px solid var(--border); font-size: 0.85rem; white-space: nowrap; }
|
||||
tr:hover td { background: #1c2128; }
|
||||
.pass { color: var(--green); }
|
||||
.partial { color: var(--yellow); }
|
||||
.fail { color: var(--red); }
|
||||
.stars { letter-spacing: 1px; }
|
||||
.bar { display: inline-block; height: 8px; border-radius: 4px; vertical-align: middle; }
|
||||
.bar-bg { background: var(--border); }
|
||||
.bar-fill { background: var(--green); }
|
||||
.bar-partial { background: var(--yellow); }
|
||||
.model-name { font-weight: 600; }
|
||||
h2 { font-size: 1.1rem; margin-bottom: 1rem; color: var(--dim); }
|
||||
.summary-table th:first-child, .summary-table td:first-child { min-width: 200px; }
|
||||
</style>
|
||||
</head>
|
||||
<body>
|
||||
|
||||
<h1>Kipina Model Benchmark</h1>
|
||||
<div class="meta" id="meta"></div>
|
||||
|
||||
<div class="cards" id="cards"></div>
|
||||
|
||||
<h2>Mallikohtainen yhteenveto</h2>
|
||||
<table class="summary-table" id="summary-table"><thead></thead><tbody></tbody></table>
|
||||
|
||||
<h2>Kaikki tulokset</h2>
|
||||
<table id="results-table"><thead></thead><tbody></tbody></table>
|
||||
|
||||
<script>
|
||||
const RAW = [{"model":"qwen3:8b","scenario":"blog","reqOk":true,"specOk":true,"specEntities":1,"validationIssues":0,"fixRounds":3,"testsTotal":11,"testsPassed":9,"testsFailed":2,"totalDurationMs":75178,"totalTokens":9916,"avgTokPerSec":142.94675043471062,"promptChars":10516,"promptTokensEst":2629,"score":69,"stars":"★★★☆☆","error":null,"profile":"small","promptName":"code-small","round":1},{"model":"qwen3:8b","scenario":"blog","reqOk":true,"specOk":true,"specEntities":2,"validationIssues":1,"fixRounds":5,"testsTotal":0,"testsPassed":0,"testsFailed":0,"totalDurationMs":98787,"totalTokens":12904,"avgTokPerSec":141.16873850064812,"promptChars":11810,"promptTokensEst":2953,"score":0,"stars":"☆☆☆☆☆","error":"Testit kaatuivat","profile":"small","promptName":"code-small","round":2},{"model":"qwen3:8b","scenario":"blog","reqOk":true,"specOk":true,"specEntities":2,"validationIssues":0,"fixRounds":3,"testsTotal":1,"testsPassed":0,"testsFailed":1,"totalDurationMs":81763,"totalTokens":10277,"avgTokPerSec":134.82946940948588,"promptChars":11534,"promptTokensEst":2884,"score":20,"stars":"★☆☆☆☆","error":"Syntaksivirhe","profile":"small","promptName":"code-small","round":3},{"model":"qwen3:8b","scenario":"blog","reqOk":true,"specOk":true,"specEntities":1,"validationIssues":0,"fixRounds":3,"testsTotal":1,"testsPassed":0,"testsFailed":1,"totalDurationMs":88517,"totalTokens":11280,"avgTokPerSec":136.63597159351744,"promptChars":10568,"promptTokensEst":2642,"score":20,"stars":"★☆☆☆☆","error":"Syntaksivirhe","profile":"small","promptName":"code-small","round":4},{"model":"qwen3:8b","scenario":"blog","reqOk":true,"specOk":true,"specEntities":2,"validationIssues":0,"fixRounds":3,"testsTotal":12,"testsPassed":9,"testsFailed":3,"totalDurationMs":87817,"totalTokens":11171,"avgTokPerSec":136.1538785139482,"promptChars":11627,"promptTokensEst":2907,"score":65,"stars":"★★★☆☆","error":null,"profile":"small","promptName":"code-small","round":5}];
|
||||
|
||||
const starsFor = s => s >= 90 ? '★★★★★' : s >= 70 ? '★★★★☆' : s >= 50 ? '★★★☆☆' : s >= 25 ? '★★☆☆☆' : s > 0 ? '★☆☆☆☆' : '☆☆☆☆☆';
|
||||
function calcScore(r) {
|
||||
if (r.error && r.testsTotal === 0) return 0;
|
||||
let s = 0;
|
||||
if (r.specOk) s += 10;
|
||||
if (!r.error || r.testsTotal > 0) s += 10;
|
||||
if (r.testsTotal > 0) s += Math.round((r.testsPassed / r.testsTotal) * 60);
|
||||
s += Math.max(0, 20 - (r.fixRounds || 0) * 10);
|
||||
return Math.min(100, s);
|
||||
}
|
||||
// Laske pisteet jos puuttuvat
|
||||
const DATA = RAW.map(r => {
|
||||
if (r.score == null) r.score = calcScore(r);
|
||||
if (!r.stars) r.stars = starsFor(r.score);
|
||||
if (!r.promptTokensEst) r.promptTokensEst = r.promptChars ? Math.round(r.promptChars / 4) : 0;
|
||||
return r;
|
||||
});
|
||||
const cls = r => (!r.error && r.testsPassed === r.testsTotal && r.testsTotal > 0) ? 'pass' : (r.testsTotal > 0 && r.testsPassed > 0) ? 'partial' : 'fail';
|
||||
const pctBar = (passed, total, w=80) => {
|
||||
if (total === 0) return '-';
|
||||
const pct = passed/total*100;
|
||||
const c = pct === 100 ? 'bar-fill' : 'bar-partial';
|
||||
return `<span class="bar bar-bg" style="width:${w}px"><span class="bar ${c}" style="width:${Math.round(pct/100*w)}px"></span></span> ${passed}/${total}`;
|
||||
};
|
||||
|
||||
// Meta
|
||||
const totalTime = DATA.reduce((s,r) => s + r.totalDurationMs, 0);
|
||||
document.getElementById('meta').textContent = `${new Date().toLocaleDateString('fi-FI')} — ${DATA.length} ajoa — ${(totalTime/1000/60).toFixed(1)} min`;
|
||||
|
||||
// Cards
|
||||
const models = [...new Set(DATA.map(r => r.model))];
|
||||
const scenarios = [...new Set(DATA.map(r => r.scenario))];
|
||||
const avgScore = DATA.length ? Math.round(DATA.reduce((s,r) => s + r.score, 0) / DATA.length) : 0;
|
||||
const totalPassed = DATA.reduce((s,r) => s + r.testsPassed, 0);
|
||||
const totalTests = DATA.reduce((s,r) => s + r.testsTotal, 0);
|
||||
const passRate = totalTests ? Math.round(totalPassed/totalTests*100) : 0;
|
||||
const bestModel = models.map(m => {
|
||||
const mrs = DATA.filter(r => r.model === m);
|
||||
return { model: m, avg: Math.round(mrs.reduce((s,r) => s + r.score, 0) / mrs.length) };
|
||||
}).sort((a,b) => b.avg - a.avg)[0];
|
||||
const fastestModel = models.map(m => {
|
||||
const mrs = DATA.filter(r => r.model === m);
|
||||
return { model: m, speed: Math.round(mrs.reduce((s,r) => s + r.avgTokPerSec, 0) / mrs.length) };
|
||||
}).sort((a,b) => b.speed - a.speed)[0];
|
||||
|
||||
document.getElementById('cards').innerHTML = `
|
||||
<div class="card"><div class="label">Keskiarvo</div><div class="value">${starsFor(avgScore)}</div><div class="sub">${avgScore} pistetta</div></div>
|
||||
<div class="card"><div class="label">Testien läpäisy</div><div class="value">${passRate}%</div><div class="sub">${totalPassed}/${totalTests} testiä</div></div>
|
||||
<div class="card"><div class="label">Paras malli</div><div class="value" style="font-size:1.2rem">${bestModel?.model || '-'}</div><div class="sub">${bestModel?.avg || 0}p</div></div>
|
||||
<div class="card"><div class="label">Nopein</div><div class="value" style="font-size:1.2rem">${fastestModel?.model || '-'}</div><div class="sub">${fastestModel?.speed || 0} tok/s</div></div>
|
||||
<div class="card"><div class="label">Malleja</div><div class="value">${models.length}</div><div class="sub">${scenarios.length} skenaariota</div></div>
|
||||
<div class="card"><div class="label">Kokonaisaika</div><div class="value">${(totalTime/1000/60).toFixed(1)}</div><div class="sub">minuuttia</div></div>
|
||||
`;
|
||||
|
||||
// Summary table
|
||||
const sumHead = document.querySelector('#summary-table thead');
|
||||
const sumBody = document.querySelector('#summary-table tbody');
|
||||
sumHead.innerHTML = '<tr><th>Malli</th>' + scenarios.map(s => `<th>${s}</th>`).join('') + '<th>Yht.</th><th>Out tok</th><th>Aika</th><th>tok/s</th><th>Pisteet</th></tr>';
|
||||
|
||||
const modelRows = models.map(m => {
|
||||
const mrs = DATA.filter(r => r.model === m);
|
||||
const tp = mrs.reduce((s,r) => s + r.testsPassed, 0);
|
||||
const tt = mrs.reduce((s,r) => s + r.testsTotal, 0);
|
||||
const tok = mrs.reduce((s,r) => s + r.totalTokens, 0);
|
||||
const time = mrs.reduce((s,r) => s + r.totalDurationMs, 0);
|
||||
const speed = Math.round(mrs.reduce((s,r) => s + r.avgTokPerSec, 0) / mrs.length);
|
||||
const avg = Math.round(mrs.reduce((s,r) => s + r.score, 0) / mrs.length);
|
||||
const scenCols = scenarios.map(s => {
|
||||
const r = mrs.find(r => r.scenario === s);
|
||||
if (!r) return '<td>-</td>';
|
||||
return `<td class="${cls(r)}">${pctBar(r.testsPassed, r.testsTotal, 60)} <span style="color:var(--dim)">${(r.totalDurationMs/1000).toFixed(0)}s</span></td>`;
|
||||
}).join('');
|
||||
return { avg, html: `<tr><td class="model-name">${m}</td>${scenCols}<td>${pctBar(tp, tt)}</td><td>${(tok/1000).toFixed(1)}K</td><td>${(time/1000).toFixed(0)}s</td><td>${speed}</td><td><span class="stars">${starsFor(avg)}</span> ${avg}p</td></tr>` };
|
||||
}).sort((a,b) => b.avg - a.avg);
|
||||
sumBody.innerHTML = modelRows.map(r => r.html).join('');
|
||||
|
||||
// Results table
|
||||
const resHead = document.querySelector('#results-table thead');
|
||||
const resBody = document.querySelector('#results-table tbody');
|
||||
const resCols = ['Malli','Skenaario','Speksi','Testit','Korjaus','Ctx','Out tok','Aika','tok/s','Pisteet'];
|
||||
resHead.innerHTML = '<tr>' + resCols.map((c,i) => `<th data-col="${i}">${c}</th>`).join('') + '</tr>';
|
||||
|
||||
let sortCol = 9, sortAsc = false;
|
||||
function renderResults() {
|
||||
const sorted = [...DATA].sort((a,b) => {
|
||||
const vals = [
|
||||
[a.model, b.model],
|
||||
[a.scenario, b.scenario],
|
||||
[a.specEntities, b.specEntities],
|
||||
[a.testsPassed/Math.max(a.testsTotal,1), b.testsPassed/Math.max(b.testsTotal,1)],
|
||||
[a.fixRounds, b.fixRounds],
|
||||
[a.promptTokensEst, b.promptTokensEst],
|
||||
[a.totalTokens, b.totalTokens],
|
||||
[a.totalDurationMs, b.totalDurationMs],
|
||||
[a.avgTokPerSec, b.avgTokPerSec],
|
||||
[a.score, b.score],
|
||||
][sortCol];
|
||||
const cmp = typeof vals[0] === 'string' ? vals[0].localeCompare(vals[1]) : vals[0] - vals[1];
|
||||
return sortAsc ? cmp : -cmp;
|
||||
});
|
||||
resBody.innerHTML = sorted.map(r => {
|
||||
const c = cls(r);
|
||||
return `<tr>
|
||||
<td class="model-name">${r.model}</td>
|
||||
<td>${r.scenario}</td>
|
||||
<td>${r.specOk ? `✓ ${r.specEntities}e` : '<span class="fail">✗</span>'}</td>
|
||||
<td class="${c}">${pctBar(r.testsPassed, r.testsTotal)}</td>
|
||||
<td>${r.fixRounds > 0 ? r.fixRounds + '×' : '-'}</td>
|
||||
<td>${r.promptTokensEst > 0 ? '~'+(r.promptTokensEst/1000).toFixed(1)+'K' : '-'}</td>
|
||||
<td>${r.totalTokens > 0 ? (r.totalTokens/1000).toFixed(1)+'K' : '-'}</td>
|
||||
<td>${(r.totalDurationMs/1000).toFixed(0)}s</td>
|
||||
<td>${r.avgTokPerSec.toFixed(0)}</td>
|
||||
<td><span class="stars">${r.stars}</span> ${r.score}p</td>
|
||||
</tr>`;
|
||||
}).join('');
|
||||
document.querySelectorAll('#results-table th').forEach((th,i) => {
|
||||
th.className = i === sortCol ? (sortAsc ? 'sorted-asc' : 'sorted-desc') : '';
|
||||
});
|
||||
}
|
||||
document.querySelector('#results-table thead').addEventListener('click', e => {
|
||||
const col = parseInt(e.target.dataset.col);
|
||||
if (isNaN(col)) return;
|
||||
if (sortCol === col) sortAsc = !sortAsc;
|
||||
else { sortCol = col; sortAsc = false; }
|
||||
renderResults();
|
||||
});
|
||||
renderResults();
|
||||
</script>
|
||||
</body>
|
||||
</html>
|
||||
117
kipina-codebench/results/2026-04-14T11-15.json
Normal file
117
kipina-codebench/results/2026-04-14T11-15.json
Normal file
@@ -0,0 +1,117 @@
|
||||
[
|
||||
{
|
||||
"model": "qwen3:8b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 3,
|
||||
"testsTotal": 11,
|
||||
"testsPassed": 9,
|
||||
"testsFailed": 2,
|
||||
"totalDurationMs": 75178,
|
||||
"totalTokens": 9916,
|
||||
"avgTokPerSec": 142.94675043471062,
|
||||
"promptChars": 10516,
|
||||
"promptTokensEst": 2629,
|
||||
"score": 69,
|
||||
"stars": "★★★☆☆",
|
||||
"error": null,
|
||||
"profile": "small",
|
||||
"promptName": "code-small",
|
||||
"round": 1
|
||||
},
|
||||
{
|
||||
"model": "qwen3:8b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 2,
|
||||
"validationIssues": 1,
|
||||
"fixRounds": 5,
|
||||
"testsTotal": 0,
|
||||
"testsPassed": 0,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 98787,
|
||||
"totalTokens": 12904,
|
||||
"avgTokPerSec": 141.16873850064812,
|
||||
"promptChars": 11810,
|
||||
"promptTokensEst": 2953,
|
||||
"score": 0,
|
||||
"stars": "☆☆☆☆☆",
|
||||
"error": "Testit kaatuivat",
|
||||
"profile": "small",
|
||||
"promptName": "code-small",
|
||||
"round": 2
|
||||
},
|
||||
{
|
||||
"model": "qwen3:8b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 2,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 3,
|
||||
"testsTotal": 1,
|
||||
"testsPassed": 0,
|
||||
"testsFailed": 1,
|
||||
"totalDurationMs": 81763,
|
||||
"totalTokens": 10277,
|
||||
"avgTokPerSec": 134.82946940948588,
|
||||
"promptChars": 11534,
|
||||
"promptTokensEst": 2884,
|
||||
"score": 20,
|
||||
"stars": "★☆☆☆☆",
|
||||
"error": "Syntaksivirhe",
|
||||
"profile": "small",
|
||||
"promptName": "code-small",
|
||||
"round": 3
|
||||
},
|
||||
{
|
||||
"model": "qwen3:8b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 3,
|
||||
"testsTotal": 1,
|
||||
"testsPassed": 0,
|
||||
"testsFailed": 1,
|
||||
"totalDurationMs": 88517,
|
||||
"totalTokens": 11280,
|
||||
"avgTokPerSec": 136.63597159351744,
|
||||
"promptChars": 10568,
|
||||
"promptTokensEst": 2642,
|
||||
"score": 20,
|
||||
"stars": "★☆☆☆☆",
|
||||
"error": "Syntaksivirhe",
|
||||
"profile": "small",
|
||||
"promptName": "code-small",
|
||||
"round": 4
|
||||
},
|
||||
{
|
||||
"model": "qwen3:8b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 2,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 3,
|
||||
"testsTotal": 12,
|
||||
"testsPassed": 9,
|
||||
"testsFailed": 3,
|
||||
"totalDurationMs": 87817,
|
||||
"totalTokens": 11171,
|
||||
"avgTokPerSec": 136.1538785139482,
|
||||
"promptChars": 11627,
|
||||
"promptTokensEst": 2907,
|
||||
"score": 65,
|
||||
"stars": "★★★☆☆",
|
||||
"error": null,
|
||||
"profile": "small",
|
||||
"promptName": "code-small",
|
||||
"round": 5
|
||||
}
|
||||
]
|
||||
183
kipina-codebench/results/2026-04-14T11-54.html
Normal file
183
kipina-codebench/results/2026-04-14T11-54.html
Normal file
@@ -0,0 +1,183 @@
|
||||
<!DOCTYPE html>
|
||||
<html lang="fi">
|
||||
<head>
|
||||
<meta charset="UTF-8">
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
||||
<title>Kipina Model Benchmark</title>
|
||||
<style>
|
||||
:root { --bg: #0d1117; --card: #161b22; --border: #30363d; --text: #e6edf3; --dim: #8b949e; --green: #3fb950; --yellow: #d29922; --red: #f85149; --blue: #58a6ff; }
|
||||
* { box-sizing: border-box; margin: 0; padding: 0; }
|
||||
body { font-family: -apple-system, 'Segoe UI', Helvetica, Arial, sans-serif; background: var(--bg); color: var(--text); padding: 2rem; max-width: 1400px; margin: 0 auto; }
|
||||
h1 { font-size: 1.5rem; margin-bottom: 0.5rem; }
|
||||
.meta { color: var(--dim); font-size: 0.85rem; margin-bottom: 2rem; }
|
||||
.cards { display: grid; grid-template-columns: repeat(auto-fit, minmax(180px, 1fr)); gap: 1rem; margin-bottom: 2rem; }
|
||||
.card { background: var(--card); border: 1px solid var(--border); border-radius: 8px; padding: 1rem; }
|
||||
.card .label { color: var(--dim); font-size: 0.75rem; text-transform: uppercase; letter-spacing: 0.05em; }
|
||||
.card .value { font-size: 1.8rem; font-weight: 600; margin-top: 0.25rem; }
|
||||
.card .sub { color: var(--dim); font-size: 0.8rem; margin-top: 0.25rem; }
|
||||
table { width: 100%; border-collapse: collapse; background: var(--card); border: 1px solid var(--border); border-radius: 8px; overflow: hidden; margin-bottom: 2rem; }
|
||||
th { background: #1c2128; text-align: left; padding: 0.6rem 0.8rem; font-size: 0.75rem; text-transform: uppercase; letter-spacing: 0.05em; color: var(--dim); cursor: pointer; user-select: none; white-space: nowrap; }
|
||||
th:hover { color: var(--text); }
|
||||
th.sorted-asc::after { content: ' ▲'; }
|
||||
th.sorted-desc::after { content: ' ▼'; }
|
||||
td { padding: 0.5rem 0.8rem; border-top: 1px solid var(--border); font-size: 0.85rem; white-space: nowrap; }
|
||||
tr:hover td { background: #1c2128; }
|
||||
.pass { color: var(--green); }
|
||||
.partial { color: var(--yellow); }
|
||||
.fail { color: var(--red); }
|
||||
.stars { letter-spacing: 1px; }
|
||||
.bar { display: inline-block; height: 8px; border-radius: 4px; vertical-align: middle; }
|
||||
.bar-bg { background: var(--border); }
|
||||
.bar-fill { background: var(--green); }
|
||||
.bar-partial { background: var(--yellow); }
|
||||
.model-name { font-weight: 600; }
|
||||
h2 { font-size: 1.1rem; margin-bottom: 1rem; color: var(--dim); }
|
||||
.summary-table th:first-child, .summary-table td:first-child { min-width: 200px; }
|
||||
</style>
|
||||
</head>
|
||||
<body>
|
||||
|
||||
<h1>Kipina Model Benchmark</h1>
|
||||
<div class="meta" id="meta"></div>
|
||||
|
||||
<div class="cards" id="cards"></div>
|
||||
|
||||
<h2>Mallikohtainen yhteenveto</h2>
|
||||
<table class="summary-table" id="summary-table"><thead></thead><tbody></tbody></table>
|
||||
|
||||
<h2>Kaikki tulokset</h2>
|
||||
<table id="results-table"><thead></thead><tbody></tbody></table>
|
||||
|
||||
<script>
|
||||
const RAW = [{"model":"qwen3:8b","scenario":"blog","reqOk":true,"specOk":true,"specEntities":2,"validationIssues":0,"fixRounds":3,"testsTotal":0,"testsPassed":0,"testsFailed":0,"totalDurationMs":79193,"totalTokens":10304,"avgTokPerSec":141.2083113764173,"promptChars":12199,"promptTokensEst":3050,"score":0,"stars":"☆☆☆☆☆","error":"Testit kaatuivat","profile":"small","promptName":"code-small","round":1},{"model":"qwen3:8b","scenario":"blog","reqOk":true,"specOk":true,"specEntities":2,"validationIssues":0,"fixRounds":3,"testsTotal":10,"testsPassed":6,"testsFailed":4,"totalDurationMs":66764,"totalTokens":8896,"avgTokPerSec":142.57944640796882,"promptChars":12391,"promptTokensEst":3098,"score":56,"stars":"★★★☆☆","error":null,"profile":"small","promptName":"code-small","round":2},{"model":"qwen3:8b","scenario":"blog","reqOk":true,"specOk":true,"specEntities":2,"validationIssues":0,"fixRounds":3,"testsTotal":1,"testsPassed":0,"testsFailed":1,"totalDurationMs":76403,"totalTokens":9962,"avgTokPerSec":137.0023398819064,"promptChars":12432,"promptTokensEst":3108,"score":20,"stars":"★☆☆☆☆","error":"Syntaksivirhe","profile":"small","promptName":"code-small","round":3},{"model":"qwen3:8b","scenario":"blog","reqOk":true,"specOk":true,"specEntities":1,"validationIssues":0,"fixRounds":3,"testsTotal":13,"testsPassed":7,"testsFailed":6,"totalDurationMs":81345,"totalTokens":10535,"avgTokPerSec":139.42076339875726,"promptChars":11419,"promptTokensEst":2855,"score":52,"stars":"★★★☆☆","error":null,"profile":"small","promptName":"code-small","round":4},{"model":"qwen3:8b","scenario":"blog","reqOk":true,"specOk":true,"specEntities":1,"validationIssues":0,"fixRounds":3,"testsTotal":12,"testsPassed":11,"testsFailed":1,"totalDurationMs":72723,"totalTokens":9567,"avgTokPerSec":141.2709378394512,"promptChars":11416,"promptTokensEst":2854,"score":75,"stars":"★★★★☆","error":null,"profile":"small","promptName":"code-small","round":5}];
|
||||
|
||||
const starsFor = s => s >= 90 ? '★★★★★' : s >= 70 ? '★★★★☆' : s >= 50 ? '★★★☆☆' : s >= 25 ? '★★☆☆☆' : s > 0 ? '★☆☆☆☆' : '☆☆☆☆☆';
|
||||
function calcScore(r) {
|
||||
if (r.error && r.testsTotal === 0) return 0;
|
||||
let s = 0;
|
||||
if (r.specOk) s += 10;
|
||||
if (!r.error || r.testsTotal > 0) s += 10;
|
||||
if (r.testsTotal > 0) s += Math.round((r.testsPassed / r.testsTotal) * 60);
|
||||
s += Math.max(0, 20 - (r.fixRounds || 0) * 10);
|
||||
return Math.min(100, s);
|
||||
}
|
||||
// Laske pisteet jos puuttuvat
|
||||
const DATA = RAW.map(r => {
|
||||
if (r.score == null) r.score = calcScore(r);
|
||||
if (!r.stars) r.stars = starsFor(r.score);
|
||||
if (!r.promptTokensEst) r.promptTokensEst = r.promptChars ? Math.round(r.promptChars / 4) : 0;
|
||||
return r;
|
||||
});
|
||||
const cls = r => (!r.error && r.testsPassed === r.testsTotal && r.testsTotal > 0) ? 'pass' : (r.testsTotal > 0 && r.testsPassed > 0) ? 'partial' : 'fail';
|
||||
const pctBar = (passed, total, w=80) => {
|
||||
if (total === 0) return '-';
|
||||
const pct = passed/total*100;
|
||||
const c = pct === 100 ? 'bar-fill' : 'bar-partial';
|
||||
return `<span class="bar bar-bg" style="width:${w}px"><span class="bar ${c}" style="width:${Math.round(pct/100*w)}px"></span></span> ${passed}/${total}`;
|
||||
};
|
||||
|
||||
// Meta
|
||||
const totalTime = DATA.reduce((s,r) => s + r.totalDurationMs, 0);
|
||||
document.getElementById('meta').textContent = `${new Date().toLocaleDateString('fi-FI')} — ${DATA.length} ajoa — ${(totalTime/1000/60).toFixed(1)} min`;
|
||||
|
||||
// Cards
|
||||
const models = [...new Set(DATA.map(r => r.model))];
|
||||
const scenarios = [...new Set(DATA.map(r => r.scenario))];
|
||||
const avgScore = DATA.length ? Math.round(DATA.reduce((s,r) => s + r.score, 0) / DATA.length) : 0;
|
||||
const totalPassed = DATA.reduce((s,r) => s + r.testsPassed, 0);
|
||||
const totalTests = DATA.reduce((s,r) => s + r.testsTotal, 0);
|
||||
const passRate = totalTests ? Math.round(totalPassed/totalTests*100) : 0;
|
||||
const bestModel = models.map(m => {
|
||||
const mrs = DATA.filter(r => r.model === m);
|
||||
return { model: m, avg: Math.round(mrs.reduce((s,r) => s + r.score, 0) / mrs.length) };
|
||||
}).sort((a,b) => b.avg - a.avg)[0];
|
||||
const fastestModel = models.map(m => {
|
||||
const mrs = DATA.filter(r => r.model === m);
|
||||
return { model: m, speed: Math.round(mrs.reduce((s,r) => s + r.avgTokPerSec, 0) / mrs.length) };
|
||||
}).sort((a,b) => b.speed - a.speed)[0];
|
||||
|
||||
document.getElementById('cards').innerHTML = `
|
||||
<div class="card"><div class="label">Keskiarvo</div><div class="value">${starsFor(avgScore)}</div><div class="sub">${avgScore} pistetta</div></div>
|
||||
<div class="card"><div class="label">Testien läpäisy</div><div class="value">${passRate}%</div><div class="sub">${totalPassed}/${totalTests} testiä</div></div>
|
||||
<div class="card"><div class="label">Paras malli</div><div class="value" style="font-size:1.2rem">${bestModel?.model || '-'}</div><div class="sub">${bestModel?.avg || 0}p</div></div>
|
||||
<div class="card"><div class="label">Nopein</div><div class="value" style="font-size:1.2rem">${fastestModel?.model || '-'}</div><div class="sub">${fastestModel?.speed || 0} tok/s</div></div>
|
||||
<div class="card"><div class="label">Malleja</div><div class="value">${models.length}</div><div class="sub">${scenarios.length} skenaariota</div></div>
|
||||
<div class="card"><div class="label">Kokonaisaika</div><div class="value">${(totalTime/1000/60).toFixed(1)}</div><div class="sub">minuuttia</div></div>
|
||||
`;
|
||||
|
||||
// Summary table
|
||||
const sumHead = document.querySelector('#summary-table thead');
|
||||
const sumBody = document.querySelector('#summary-table tbody');
|
||||
sumHead.innerHTML = '<tr><th>Malli</th>' + scenarios.map(s => `<th>${s}</th>`).join('') + '<th>Yht.</th><th>Out tok</th><th>Aika</th><th>tok/s</th><th>Pisteet</th></tr>';
|
||||
|
||||
const modelRows = models.map(m => {
|
||||
const mrs = DATA.filter(r => r.model === m);
|
||||
const tp = mrs.reduce((s,r) => s + r.testsPassed, 0);
|
||||
const tt = mrs.reduce((s,r) => s + r.testsTotal, 0);
|
||||
const tok = mrs.reduce((s,r) => s + r.totalTokens, 0);
|
||||
const time = mrs.reduce((s,r) => s + r.totalDurationMs, 0);
|
||||
const speed = Math.round(mrs.reduce((s,r) => s + r.avgTokPerSec, 0) / mrs.length);
|
||||
const avg = Math.round(mrs.reduce((s,r) => s + r.score, 0) / mrs.length);
|
||||
const scenCols = scenarios.map(s => {
|
||||
const r = mrs.find(r => r.scenario === s);
|
||||
if (!r) return '<td>-</td>';
|
||||
return `<td class="${cls(r)}">${pctBar(r.testsPassed, r.testsTotal, 60)} <span style="color:var(--dim)">${(r.totalDurationMs/1000).toFixed(0)}s</span></td>`;
|
||||
}).join('');
|
||||
return { avg, html: `<tr><td class="model-name">${m}</td>${scenCols}<td>${pctBar(tp, tt)}</td><td>${(tok/1000).toFixed(1)}K</td><td>${(time/1000).toFixed(0)}s</td><td>${speed}</td><td><span class="stars">${starsFor(avg)}</span> ${avg}p</td></tr>` };
|
||||
}).sort((a,b) => b.avg - a.avg);
|
||||
sumBody.innerHTML = modelRows.map(r => r.html).join('');
|
||||
|
||||
// Results table
|
||||
const resHead = document.querySelector('#results-table thead');
|
||||
const resBody = document.querySelector('#results-table tbody');
|
||||
const resCols = ['Malli','Skenaario','Speksi','Testit','Korjaus','Ctx','Out tok','Aika','tok/s','Pisteet'];
|
||||
resHead.innerHTML = '<tr>' + resCols.map((c,i) => `<th data-col="${i}">${c}</th>`).join('') + '</tr>';
|
||||
|
||||
let sortCol = 9, sortAsc = false;
|
||||
function renderResults() {
|
||||
const sorted = [...DATA].sort((a,b) => {
|
||||
const vals = [
|
||||
[a.model, b.model],
|
||||
[a.scenario, b.scenario],
|
||||
[a.specEntities, b.specEntities],
|
||||
[a.testsPassed/Math.max(a.testsTotal,1), b.testsPassed/Math.max(b.testsTotal,1)],
|
||||
[a.fixRounds, b.fixRounds],
|
||||
[a.promptTokensEst, b.promptTokensEst],
|
||||
[a.totalTokens, b.totalTokens],
|
||||
[a.totalDurationMs, b.totalDurationMs],
|
||||
[a.avgTokPerSec, b.avgTokPerSec],
|
||||
[a.score, b.score],
|
||||
][sortCol];
|
||||
const cmp = typeof vals[0] === 'string' ? vals[0].localeCompare(vals[1]) : vals[0] - vals[1];
|
||||
return sortAsc ? cmp : -cmp;
|
||||
});
|
||||
resBody.innerHTML = sorted.map(r => {
|
||||
const c = cls(r);
|
||||
return `<tr>
|
||||
<td class="model-name">${r.model}</td>
|
||||
<td>${r.scenario}</td>
|
||||
<td>${r.specOk ? `✓ ${r.specEntities}e` : '<span class="fail">✗</span>'}</td>
|
||||
<td class="${c}">${pctBar(r.testsPassed, r.testsTotal)}</td>
|
||||
<td>${r.fixRounds > 0 ? r.fixRounds + '×' : '-'}</td>
|
||||
<td>${r.promptTokensEst > 0 ? '~'+(r.promptTokensEst/1000).toFixed(1)+'K' : '-'}</td>
|
||||
<td>${r.totalTokens > 0 ? (r.totalTokens/1000).toFixed(1)+'K' : '-'}</td>
|
||||
<td>${(r.totalDurationMs/1000).toFixed(0)}s</td>
|
||||
<td>${r.avgTokPerSec.toFixed(0)}</td>
|
||||
<td><span class="stars">${r.stars}</span> ${r.score}p</td>
|
||||
</tr>`;
|
||||
}).join('');
|
||||
document.querySelectorAll('#results-table th').forEach((th,i) => {
|
||||
th.className = i === sortCol ? (sortAsc ? 'sorted-asc' : 'sorted-desc') : '';
|
||||
});
|
||||
}
|
||||
document.querySelector('#results-table thead').addEventListener('click', e => {
|
||||
const col = parseInt(e.target.dataset.col);
|
||||
if (isNaN(col)) return;
|
||||
if (sortCol === col) sortAsc = !sortAsc;
|
||||
else { sortCol = col; sortAsc = false; }
|
||||
renderResults();
|
||||
});
|
||||
renderResults();
|
||||
</script>
|
||||
</body>
|
||||
</html>
|
||||
117
kipina-codebench/results/2026-04-14T11-54.json
Normal file
117
kipina-codebench/results/2026-04-14T11-54.json
Normal file
@@ -0,0 +1,117 @@
|
||||
[
|
||||
{
|
||||
"model": "qwen3:8b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 2,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 3,
|
||||
"testsTotal": 0,
|
||||
"testsPassed": 0,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 79193,
|
||||
"totalTokens": 10304,
|
||||
"avgTokPerSec": 141.2083113764173,
|
||||
"promptChars": 12199,
|
||||
"promptTokensEst": 3050,
|
||||
"score": 0,
|
||||
"stars": "☆☆☆☆☆",
|
||||
"error": "Testit kaatuivat",
|
||||
"profile": "small",
|
||||
"promptName": "code-small",
|
||||
"round": 1
|
||||
},
|
||||
{
|
||||
"model": "qwen3:8b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 2,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 3,
|
||||
"testsTotal": 10,
|
||||
"testsPassed": 6,
|
||||
"testsFailed": 4,
|
||||
"totalDurationMs": 66764,
|
||||
"totalTokens": 8896,
|
||||
"avgTokPerSec": 142.57944640796882,
|
||||
"promptChars": 12391,
|
||||
"promptTokensEst": 3098,
|
||||
"score": 56,
|
||||
"stars": "★★★☆☆",
|
||||
"error": null,
|
||||
"profile": "small",
|
||||
"promptName": "code-small",
|
||||
"round": 2
|
||||
},
|
||||
{
|
||||
"model": "qwen3:8b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 2,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 3,
|
||||
"testsTotal": 1,
|
||||
"testsPassed": 0,
|
||||
"testsFailed": 1,
|
||||
"totalDurationMs": 76403,
|
||||
"totalTokens": 9962,
|
||||
"avgTokPerSec": 137.0023398819064,
|
||||
"promptChars": 12432,
|
||||
"promptTokensEst": 3108,
|
||||
"score": 20,
|
||||
"stars": "★☆☆☆☆",
|
||||
"error": "Syntaksivirhe",
|
||||
"profile": "small",
|
||||
"promptName": "code-small",
|
||||
"round": 3
|
||||
},
|
||||
{
|
||||
"model": "qwen3:8b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 3,
|
||||
"testsTotal": 13,
|
||||
"testsPassed": 7,
|
||||
"testsFailed": 6,
|
||||
"totalDurationMs": 81345,
|
||||
"totalTokens": 10535,
|
||||
"avgTokPerSec": 139.42076339875726,
|
||||
"promptChars": 11419,
|
||||
"promptTokensEst": 2855,
|
||||
"score": 52,
|
||||
"stars": "★★★☆☆",
|
||||
"error": null,
|
||||
"profile": "small",
|
||||
"promptName": "code-small",
|
||||
"round": 4
|
||||
},
|
||||
{
|
||||
"model": "qwen3:8b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 3,
|
||||
"testsTotal": 12,
|
||||
"testsPassed": 11,
|
||||
"testsFailed": 1,
|
||||
"totalDurationMs": 72723,
|
||||
"totalTokens": 9567,
|
||||
"avgTokPerSec": 141.2709378394512,
|
||||
"promptChars": 11416,
|
||||
"promptTokensEst": 2854,
|
||||
"score": 75,
|
||||
"stars": "★★★★☆",
|
||||
"error": null,
|
||||
"profile": "small",
|
||||
"promptName": "code-small",
|
||||
"round": 5
|
||||
}
|
||||
]
|
||||
183
kipina-codebench/results/2026-04-14T11-55.html
Normal file
183
kipina-codebench/results/2026-04-14T11-55.html
Normal file
@@ -0,0 +1,183 @@
|
||||
<!DOCTYPE html>
|
||||
<html lang="fi">
|
||||
<head>
|
||||
<meta charset="UTF-8">
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
||||
<title>Kipina Model Benchmark</title>
|
||||
<style>
|
||||
:root { --bg: #0d1117; --card: #161b22; --border: #30363d; --text: #e6edf3; --dim: #8b949e; --green: #3fb950; --yellow: #d29922; --red: #f85149; --blue: #58a6ff; }
|
||||
* { box-sizing: border-box; margin: 0; padding: 0; }
|
||||
body { font-family: -apple-system, 'Segoe UI', Helvetica, Arial, sans-serif; background: var(--bg); color: var(--text); padding: 2rem; max-width: 1400px; margin: 0 auto; }
|
||||
h1 { font-size: 1.5rem; margin-bottom: 0.5rem; }
|
||||
.meta { color: var(--dim); font-size: 0.85rem; margin-bottom: 2rem; }
|
||||
.cards { display: grid; grid-template-columns: repeat(auto-fit, minmax(180px, 1fr)); gap: 1rem; margin-bottom: 2rem; }
|
||||
.card { background: var(--card); border: 1px solid var(--border); border-radius: 8px; padding: 1rem; }
|
||||
.card .label { color: var(--dim); font-size: 0.75rem; text-transform: uppercase; letter-spacing: 0.05em; }
|
||||
.card .value { font-size: 1.8rem; font-weight: 600; margin-top: 0.25rem; }
|
||||
.card .sub { color: var(--dim); font-size: 0.8rem; margin-top: 0.25rem; }
|
||||
table { width: 100%; border-collapse: collapse; background: var(--card); border: 1px solid var(--border); border-radius: 8px; overflow: hidden; margin-bottom: 2rem; }
|
||||
th { background: #1c2128; text-align: left; padding: 0.6rem 0.8rem; font-size: 0.75rem; text-transform: uppercase; letter-spacing: 0.05em; color: var(--dim); cursor: pointer; user-select: none; white-space: nowrap; }
|
||||
th:hover { color: var(--text); }
|
||||
th.sorted-asc::after { content: ' ▲'; }
|
||||
th.sorted-desc::after { content: ' ▼'; }
|
||||
td { padding: 0.5rem 0.8rem; border-top: 1px solid var(--border); font-size: 0.85rem; white-space: nowrap; }
|
||||
tr:hover td { background: #1c2128; }
|
||||
.pass { color: var(--green); }
|
||||
.partial { color: var(--yellow); }
|
||||
.fail { color: var(--red); }
|
||||
.stars { letter-spacing: 1px; }
|
||||
.bar { display: inline-block; height: 8px; border-radius: 4px; vertical-align: middle; }
|
||||
.bar-bg { background: var(--border); }
|
||||
.bar-fill { background: var(--green); }
|
||||
.bar-partial { background: var(--yellow); }
|
||||
.model-name { font-weight: 600; }
|
||||
h2 { font-size: 1.1rem; margin-bottom: 1rem; color: var(--dim); }
|
||||
.summary-table th:first-child, .summary-table td:first-child { min-width: 200px; }
|
||||
</style>
|
||||
</head>
|
||||
<body>
|
||||
|
||||
<h1>Kipina Model Benchmark</h1>
|
||||
<div class="meta" id="meta"></div>
|
||||
|
||||
<div class="cards" id="cards"></div>
|
||||
|
||||
<h2>Mallikohtainen yhteenveto</h2>
|
||||
<table class="summary-table" id="summary-table"><thead></thead><tbody></tbody></table>
|
||||
|
||||
<h2>Kaikki tulokset</h2>
|
||||
<table id="results-table"><thead></thead><tbody></tbody></table>
|
||||
|
||||
<script>
|
||||
const RAW = [{"model":"qwen3:8b","scenario":"blog","reqOk":true,"specOk":false,"specEntities":0,"validationIssues":0,"fixRounds":0,"testsTotal":0,"testsPassed":0,"testsFailed":0,"totalDurationMs":0,"totalTokens":0,"avgTokPerSec":0,"promptChars":0,"promptTokensEst":0,"score":0,"stars":"","error":"JSON-speksi epäonnistui","round":1},{"model":"qwen3:8b","scenario":"blog","reqOk":true,"specOk":true,"specEntities":1,"validationIssues":0,"fixRounds":3,"testsTotal":0,"testsPassed":0,"testsFailed":0,"totalDurationMs":56798,"totalTokens":5105,"avgTokPerSec":99.4097006568848,"promptChars":11326,"promptTokensEst":2832,"score":0,"stars":"☆☆☆☆☆","error":"Testit kaatuivat","profile":"small","promptName":"code-small","round":2},{"model":"qwen3:8b","scenario":"blog","reqOk":true,"specOk":true,"specEntities":2,"validationIssues":0,"fixRounds":3,"testsTotal":0,"testsPassed":0,"testsFailed":0,"totalDurationMs":114297,"totalTokens":10163,"avgTokPerSec":97.19131591932717,"promptChars":12182,"promptTokensEst":3046,"score":0,"stars":"☆☆☆☆☆","error":"Testit kaatuivat","profile":"small","promptName":"code-small","round":3},{"model":"qwen3:8b","scenario":"blog","reqOk":true,"specOk":false,"specEntities":0,"validationIssues":0,"fixRounds":0,"testsTotal":0,"testsPassed":0,"testsFailed":0,"totalDurationMs":0,"totalTokens":0,"avgTokPerSec":0,"promptChars":0,"promptTokensEst":0,"score":0,"stars":"","error":"JSON-speksi epäonnistui","round":4},{"model":"qwen3:8b","scenario":"blog","reqOk":true,"specOk":true,"specEntities":2,"validationIssues":0,"fixRounds":3,"testsTotal":12,"testsPassed":11,"testsFailed":1,"totalDurationMs":112008,"totalTokens":9892,"avgTokPerSec":97.0586619009377,"promptChars":12406,"promptTokensEst":3102,"score":75,"stars":"★★★★☆","error":null,"profile":"small","promptName":"code-small","round":5}];
|
||||
|
||||
const starsFor = s => s >= 90 ? '★★★★★' : s >= 70 ? '★★★★☆' : s >= 50 ? '★★★☆☆' : s >= 25 ? '★★☆☆☆' : s > 0 ? '★☆☆☆☆' : '☆☆☆☆☆';
|
||||
function calcScore(r) {
|
||||
if (r.error && r.testsTotal === 0) return 0;
|
||||
let s = 0;
|
||||
if (r.specOk) s += 10;
|
||||
if (!r.error || r.testsTotal > 0) s += 10;
|
||||
if (r.testsTotal > 0) s += Math.round((r.testsPassed / r.testsTotal) * 60);
|
||||
s += Math.max(0, 20 - (r.fixRounds || 0) * 10);
|
||||
return Math.min(100, s);
|
||||
}
|
||||
// Laske pisteet jos puuttuvat
|
||||
const DATA = RAW.map(r => {
|
||||
if (r.score == null) r.score = calcScore(r);
|
||||
if (!r.stars) r.stars = starsFor(r.score);
|
||||
if (!r.promptTokensEst) r.promptTokensEst = r.promptChars ? Math.round(r.promptChars / 4) : 0;
|
||||
return r;
|
||||
});
|
||||
const cls = r => (!r.error && r.testsPassed === r.testsTotal && r.testsTotal > 0) ? 'pass' : (r.testsTotal > 0 && r.testsPassed > 0) ? 'partial' : 'fail';
|
||||
const pctBar = (passed, total, w=80) => {
|
||||
if (total === 0) return '-';
|
||||
const pct = passed/total*100;
|
||||
const c = pct === 100 ? 'bar-fill' : 'bar-partial';
|
||||
return `<span class="bar bar-bg" style="width:${w}px"><span class="bar ${c}" style="width:${Math.round(pct/100*w)}px"></span></span> ${passed}/${total}`;
|
||||
};
|
||||
|
||||
// Meta
|
||||
const totalTime = DATA.reduce((s,r) => s + r.totalDurationMs, 0);
|
||||
document.getElementById('meta').textContent = `${new Date().toLocaleDateString('fi-FI')} — ${DATA.length} ajoa — ${(totalTime/1000/60).toFixed(1)} min`;
|
||||
|
||||
// Cards
|
||||
const models = [...new Set(DATA.map(r => r.model))];
|
||||
const scenarios = [...new Set(DATA.map(r => r.scenario))];
|
||||
const avgScore = DATA.length ? Math.round(DATA.reduce((s,r) => s + r.score, 0) / DATA.length) : 0;
|
||||
const totalPassed = DATA.reduce((s,r) => s + r.testsPassed, 0);
|
||||
const totalTests = DATA.reduce((s,r) => s + r.testsTotal, 0);
|
||||
const passRate = totalTests ? Math.round(totalPassed/totalTests*100) : 0;
|
||||
const bestModel = models.map(m => {
|
||||
const mrs = DATA.filter(r => r.model === m);
|
||||
return { model: m, avg: Math.round(mrs.reduce((s,r) => s + r.score, 0) / mrs.length) };
|
||||
}).sort((a,b) => b.avg - a.avg)[0];
|
||||
const fastestModel = models.map(m => {
|
||||
const mrs = DATA.filter(r => r.model === m);
|
||||
return { model: m, speed: Math.round(mrs.reduce((s,r) => s + r.avgTokPerSec, 0) / mrs.length) };
|
||||
}).sort((a,b) => b.speed - a.speed)[0];
|
||||
|
||||
document.getElementById('cards').innerHTML = `
|
||||
<div class="card"><div class="label">Keskiarvo</div><div class="value">${starsFor(avgScore)}</div><div class="sub">${avgScore} pistetta</div></div>
|
||||
<div class="card"><div class="label">Testien läpäisy</div><div class="value">${passRate}%</div><div class="sub">${totalPassed}/${totalTests} testiä</div></div>
|
||||
<div class="card"><div class="label">Paras malli</div><div class="value" style="font-size:1.2rem">${bestModel?.model || '-'}</div><div class="sub">${bestModel?.avg || 0}p</div></div>
|
||||
<div class="card"><div class="label">Nopein</div><div class="value" style="font-size:1.2rem">${fastestModel?.model || '-'}</div><div class="sub">${fastestModel?.speed || 0} tok/s</div></div>
|
||||
<div class="card"><div class="label">Malleja</div><div class="value">${models.length}</div><div class="sub">${scenarios.length} skenaariota</div></div>
|
||||
<div class="card"><div class="label">Kokonaisaika</div><div class="value">${(totalTime/1000/60).toFixed(1)}</div><div class="sub">minuuttia</div></div>
|
||||
`;
|
||||
|
||||
// Summary table
|
||||
const sumHead = document.querySelector('#summary-table thead');
|
||||
const sumBody = document.querySelector('#summary-table tbody');
|
||||
sumHead.innerHTML = '<tr><th>Malli</th>' + scenarios.map(s => `<th>${s}</th>`).join('') + '<th>Yht.</th><th>Out tok</th><th>Aika</th><th>tok/s</th><th>Pisteet</th></tr>';
|
||||
|
||||
const modelRows = models.map(m => {
|
||||
const mrs = DATA.filter(r => r.model === m);
|
||||
const tp = mrs.reduce((s,r) => s + r.testsPassed, 0);
|
||||
const tt = mrs.reduce((s,r) => s + r.testsTotal, 0);
|
||||
const tok = mrs.reduce((s,r) => s + r.totalTokens, 0);
|
||||
const time = mrs.reduce((s,r) => s + r.totalDurationMs, 0);
|
||||
const speed = Math.round(mrs.reduce((s,r) => s + r.avgTokPerSec, 0) / mrs.length);
|
||||
const avg = Math.round(mrs.reduce((s,r) => s + r.score, 0) / mrs.length);
|
||||
const scenCols = scenarios.map(s => {
|
||||
const r = mrs.find(r => r.scenario === s);
|
||||
if (!r) return '<td>-</td>';
|
||||
return `<td class="${cls(r)}">${pctBar(r.testsPassed, r.testsTotal, 60)} <span style="color:var(--dim)">${(r.totalDurationMs/1000).toFixed(0)}s</span></td>`;
|
||||
}).join('');
|
||||
return { avg, html: `<tr><td class="model-name">${m}</td>${scenCols}<td>${pctBar(tp, tt)}</td><td>${(tok/1000).toFixed(1)}K</td><td>${(time/1000).toFixed(0)}s</td><td>${speed}</td><td><span class="stars">${starsFor(avg)}</span> ${avg}p</td></tr>` };
|
||||
}).sort((a,b) => b.avg - a.avg);
|
||||
sumBody.innerHTML = modelRows.map(r => r.html).join('');
|
||||
|
||||
// Results table
|
||||
const resHead = document.querySelector('#results-table thead');
|
||||
const resBody = document.querySelector('#results-table tbody');
|
||||
const resCols = ['Malli','Skenaario','Speksi','Testit','Korjaus','Ctx','Out tok','Aika','tok/s','Pisteet'];
|
||||
resHead.innerHTML = '<tr>' + resCols.map((c,i) => `<th data-col="${i}">${c}</th>`).join('') + '</tr>';
|
||||
|
||||
let sortCol = 9, sortAsc = false;
|
||||
function renderResults() {
|
||||
const sorted = [...DATA].sort((a,b) => {
|
||||
const vals = [
|
||||
[a.model, b.model],
|
||||
[a.scenario, b.scenario],
|
||||
[a.specEntities, b.specEntities],
|
||||
[a.testsPassed/Math.max(a.testsTotal,1), b.testsPassed/Math.max(b.testsTotal,1)],
|
||||
[a.fixRounds, b.fixRounds],
|
||||
[a.promptTokensEst, b.promptTokensEst],
|
||||
[a.totalTokens, b.totalTokens],
|
||||
[a.totalDurationMs, b.totalDurationMs],
|
||||
[a.avgTokPerSec, b.avgTokPerSec],
|
||||
[a.score, b.score],
|
||||
][sortCol];
|
||||
const cmp = typeof vals[0] === 'string' ? vals[0].localeCompare(vals[1]) : vals[0] - vals[1];
|
||||
return sortAsc ? cmp : -cmp;
|
||||
});
|
||||
resBody.innerHTML = sorted.map(r => {
|
||||
const c = cls(r);
|
||||
return `<tr>
|
||||
<td class="model-name">${r.model}</td>
|
||||
<td>${r.scenario}</td>
|
||||
<td>${r.specOk ? `✓ ${r.specEntities}e` : '<span class="fail">✗</span>'}</td>
|
||||
<td class="${c}">${pctBar(r.testsPassed, r.testsTotal)}</td>
|
||||
<td>${r.fixRounds > 0 ? r.fixRounds + '×' : '-'}</td>
|
||||
<td>${r.promptTokensEst > 0 ? '~'+(r.promptTokensEst/1000).toFixed(1)+'K' : '-'}</td>
|
||||
<td>${r.totalTokens > 0 ? (r.totalTokens/1000).toFixed(1)+'K' : '-'}</td>
|
||||
<td>${(r.totalDurationMs/1000).toFixed(0)}s</td>
|
||||
<td>${r.avgTokPerSec.toFixed(0)}</td>
|
||||
<td><span class="stars">${r.stars}</span> ${r.score}p</td>
|
||||
</tr>`;
|
||||
}).join('');
|
||||
document.querySelectorAll('#results-table th').forEach((th,i) => {
|
||||
th.className = i === sortCol ? (sortAsc ? 'sorted-asc' : 'sorted-desc') : '';
|
||||
});
|
||||
}
|
||||
document.querySelector('#results-table thead').addEventListener('click', e => {
|
||||
const col = parseInt(e.target.dataset.col);
|
||||
if (isNaN(col)) return;
|
||||
if (sortCol === col) sortAsc = !sortAsc;
|
||||
else { sortCol = col; sortAsc = false; }
|
||||
renderResults();
|
||||
});
|
||||
renderResults();
|
||||
</script>
|
||||
</body>
|
||||
</html>
|
||||
113
kipina-codebench/results/2026-04-14T11-55.json
Normal file
113
kipina-codebench/results/2026-04-14T11-55.json
Normal file
@@ -0,0 +1,113 @@
|
||||
[
|
||||
{
|
||||
"model": "qwen3:8b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": false,
|
||||
"specEntities": 0,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 0,
|
||||
"testsPassed": 0,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 0,
|
||||
"totalTokens": 0,
|
||||
"avgTokPerSec": 0,
|
||||
"promptChars": 0,
|
||||
"promptTokensEst": 0,
|
||||
"score": 0,
|
||||
"stars": "",
|
||||
"error": "JSON-speksi epäonnistui",
|
||||
"round": 1
|
||||
},
|
||||
{
|
||||
"model": "qwen3:8b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 3,
|
||||
"testsTotal": 0,
|
||||
"testsPassed": 0,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 56798,
|
||||
"totalTokens": 5105,
|
||||
"avgTokPerSec": 99.4097006568848,
|
||||
"promptChars": 11326,
|
||||
"promptTokensEst": 2832,
|
||||
"score": 0,
|
||||
"stars": "☆☆☆☆☆",
|
||||
"error": "Testit kaatuivat",
|
||||
"profile": "small",
|
||||
"promptName": "code-small",
|
||||
"round": 2
|
||||
},
|
||||
{
|
||||
"model": "qwen3:8b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 2,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 3,
|
||||
"testsTotal": 0,
|
||||
"testsPassed": 0,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 114297,
|
||||
"totalTokens": 10163,
|
||||
"avgTokPerSec": 97.19131591932717,
|
||||
"promptChars": 12182,
|
||||
"promptTokensEst": 3046,
|
||||
"score": 0,
|
||||
"stars": "☆☆☆☆☆",
|
||||
"error": "Testit kaatuivat",
|
||||
"profile": "small",
|
||||
"promptName": "code-small",
|
||||
"round": 3
|
||||
},
|
||||
{
|
||||
"model": "qwen3:8b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": false,
|
||||
"specEntities": 0,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 0,
|
||||
"testsPassed": 0,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 0,
|
||||
"totalTokens": 0,
|
||||
"avgTokPerSec": 0,
|
||||
"promptChars": 0,
|
||||
"promptTokensEst": 0,
|
||||
"score": 0,
|
||||
"stars": "",
|
||||
"error": "JSON-speksi epäonnistui",
|
||||
"round": 4
|
||||
},
|
||||
{
|
||||
"model": "qwen3:8b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 2,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 3,
|
||||
"testsTotal": 12,
|
||||
"testsPassed": 11,
|
||||
"testsFailed": 1,
|
||||
"totalDurationMs": 112008,
|
||||
"totalTokens": 9892,
|
||||
"avgTokPerSec": 97.0586619009377,
|
||||
"promptChars": 12406,
|
||||
"promptTokensEst": 3102,
|
||||
"score": 75,
|
||||
"stars": "★★★★☆",
|
||||
"error": null,
|
||||
"profile": "small",
|
||||
"promptName": "code-small",
|
||||
"round": 5
|
||||
}
|
||||
]
|
||||
183
kipina-codebench/results/2026-04-14T12-01.html
Normal file
183
kipina-codebench/results/2026-04-14T12-01.html
Normal file
@@ -0,0 +1,183 @@
|
||||
<!DOCTYPE html>
|
||||
<html lang="fi">
|
||||
<head>
|
||||
<meta charset="UTF-8">
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
||||
<title>Kipina Model Benchmark</title>
|
||||
<style>
|
||||
:root { --bg: #0d1117; --card: #161b22; --border: #30363d; --text: #e6edf3; --dim: #8b949e; --green: #3fb950; --yellow: #d29922; --red: #f85149; --blue: #58a6ff; }
|
||||
* { box-sizing: border-box; margin: 0; padding: 0; }
|
||||
body { font-family: -apple-system, 'Segoe UI', Helvetica, Arial, sans-serif; background: var(--bg); color: var(--text); padding: 2rem; max-width: 1400px; margin: 0 auto; }
|
||||
h1 { font-size: 1.5rem; margin-bottom: 0.5rem; }
|
||||
.meta { color: var(--dim); font-size: 0.85rem; margin-bottom: 2rem; }
|
||||
.cards { display: grid; grid-template-columns: repeat(auto-fit, minmax(180px, 1fr)); gap: 1rem; margin-bottom: 2rem; }
|
||||
.card { background: var(--card); border: 1px solid var(--border); border-radius: 8px; padding: 1rem; }
|
||||
.card .label { color: var(--dim); font-size: 0.75rem; text-transform: uppercase; letter-spacing: 0.05em; }
|
||||
.card .value { font-size: 1.8rem; font-weight: 600; margin-top: 0.25rem; }
|
||||
.card .sub { color: var(--dim); font-size: 0.8rem; margin-top: 0.25rem; }
|
||||
table { width: 100%; border-collapse: collapse; background: var(--card); border: 1px solid var(--border); border-radius: 8px; overflow: hidden; margin-bottom: 2rem; }
|
||||
th { background: #1c2128; text-align: left; padding: 0.6rem 0.8rem; font-size: 0.75rem; text-transform: uppercase; letter-spacing: 0.05em; color: var(--dim); cursor: pointer; user-select: none; white-space: nowrap; }
|
||||
th:hover { color: var(--text); }
|
||||
th.sorted-asc::after { content: ' ▲'; }
|
||||
th.sorted-desc::after { content: ' ▼'; }
|
||||
td { padding: 0.5rem 0.8rem; border-top: 1px solid var(--border); font-size: 0.85rem; white-space: nowrap; }
|
||||
tr:hover td { background: #1c2128; }
|
||||
.pass { color: var(--green); }
|
||||
.partial { color: var(--yellow); }
|
||||
.fail { color: var(--red); }
|
||||
.stars { letter-spacing: 1px; }
|
||||
.bar { display: inline-block; height: 8px; border-radius: 4px; vertical-align: middle; }
|
||||
.bar-bg { background: var(--border); }
|
||||
.bar-fill { background: var(--green); }
|
||||
.bar-partial { background: var(--yellow); }
|
||||
.model-name { font-weight: 600; }
|
||||
h2 { font-size: 1.1rem; margin-bottom: 1rem; color: var(--dim); }
|
||||
.summary-table th:first-child, .summary-table td:first-child { min-width: 200px; }
|
||||
</style>
|
||||
</head>
|
||||
<body>
|
||||
|
||||
<h1>Kipina Model Benchmark</h1>
|
||||
<div class="meta" id="meta"></div>
|
||||
|
||||
<div class="cards" id="cards"></div>
|
||||
|
||||
<h2>Mallikohtainen yhteenveto</h2>
|
||||
<table class="summary-table" id="summary-table"><thead></thead><tbody></tbody></table>
|
||||
|
||||
<h2>Kaikki tulokset</h2>
|
||||
<table id="results-table"><thead></thead><tbody></tbody></table>
|
||||
|
||||
<script>
|
||||
const RAW = [{"model":"qwen3:8b","scenario":"blog","reqOk":true,"specOk":false,"specEntities":0,"validationIssues":0,"fixRounds":0,"testsTotal":0,"testsPassed":0,"testsFailed":0,"totalDurationMs":0,"totalTokens":0,"avgTokPerSec":0,"promptChars":0,"promptTokensEst":0,"score":0,"stars":"","error":"JSON-speksi epäonnistui","round":1},{"model":"qwen3:8b","scenario":"blog","reqOk":true,"specOk":true,"specEntities":2,"validationIssues":0,"fixRounds":3,"testsTotal":11,"testsPassed":11,"testsFailed":0,"totalDurationMs":143640,"totalTokens":12611,"avgTokPerSec":96.28061629672216,"promptChars":12125,"promptTokensEst":3031,"score":80,"stars":"★★★★☆","error":null,"profile":"small","promptName":"code-small","round":2},{"model":"qwen3:8b","scenario":"blog","reqOk":true,"specOk":true,"specEntities":2,"validationIssues":0,"fixRounds":2,"testsTotal":12,"testsPassed":12,"testsFailed":0,"totalDurationMs":116061,"totalTokens":10181,"avgTokPerSec":96.63321228455318,"promptChars":12435,"promptTokensEst":3109,"score":80,"stars":"★★★★☆","error":null,"profile":"small","promptName":"code-small","round":3},{"model":"qwen3:8b","scenario":"blog","reqOk":true,"specOk":false,"specEntities":0,"validationIssues":0,"fixRounds":0,"testsTotal":0,"testsPassed":0,"testsFailed":0,"totalDurationMs":0,"totalTokens":0,"avgTokPerSec":0,"promptChars":0,"promptTokensEst":0,"score":0,"stars":"","error":"JSON-speksi epäonnistui","round":4},{"model":"qwen3:8b","scenario":"blog","reqOk":true,"specOk":true,"specEntities":2,"validationIssues":0,"fixRounds":2,"testsTotal":11,"testsPassed":11,"testsFailed":0,"totalDurationMs":113792,"totalTokens":10022,"avgTokPerSec":96.96815077469971,"promptChars":12260,"promptTokensEst":3065,"score":80,"stars":"★★★★☆","error":null,"profile":"small","promptName":"code-small","round":5}];
|
||||
|
||||
const starsFor = s => s >= 90 ? '★★★★★' : s >= 70 ? '★★★★☆' : s >= 50 ? '★★★☆☆' : s >= 25 ? '★★☆☆☆' : s > 0 ? '★☆☆☆☆' : '☆☆☆☆☆';
|
||||
function calcScore(r) {
|
||||
if (r.error && r.testsTotal === 0) return 0;
|
||||
let s = 0;
|
||||
if (r.specOk) s += 10;
|
||||
if (!r.error || r.testsTotal > 0) s += 10;
|
||||
if (r.testsTotal > 0) s += Math.round((r.testsPassed / r.testsTotal) * 60);
|
||||
s += Math.max(0, 20 - (r.fixRounds || 0) * 10);
|
||||
return Math.min(100, s);
|
||||
}
|
||||
// Laske pisteet jos puuttuvat
|
||||
const DATA = RAW.map(r => {
|
||||
if (r.score == null) r.score = calcScore(r);
|
||||
if (!r.stars) r.stars = starsFor(r.score);
|
||||
if (!r.promptTokensEst) r.promptTokensEst = r.promptChars ? Math.round(r.promptChars / 4) : 0;
|
||||
return r;
|
||||
});
|
||||
const cls = r => (!r.error && r.testsPassed === r.testsTotal && r.testsTotal > 0) ? 'pass' : (r.testsTotal > 0 && r.testsPassed > 0) ? 'partial' : 'fail';
|
||||
const pctBar = (passed, total, w=80) => {
|
||||
if (total === 0) return '-';
|
||||
const pct = passed/total*100;
|
||||
const c = pct === 100 ? 'bar-fill' : 'bar-partial';
|
||||
return `<span class="bar bar-bg" style="width:${w}px"><span class="bar ${c}" style="width:${Math.round(pct/100*w)}px"></span></span> ${passed}/${total}`;
|
||||
};
|
||||
|
||||
// Meta
|
||||
const totalTime = DATA.reduce((s,r) => s + r.totalDurationMs, 0);
|
||||
document.getElementById('meta').textContent = `${new Date().toLocaleDateString('fi-FI')} — ${DATA.length} ajoa — ${(totalTime/1000/60).toFixed(1)} min`;
|
||||
|
||||
// Cards
|
||||
const models = [...new Set(DATA.map(r => r.model))];
|
||||
const scenarios = [...new Set(DATA.map(r => r.scenario))];
|
||||
const avgScore = DATA.length ? Math.round(DATA.reduce((s,r) => s + r.score, 0) / DATA.length) : 0;
|
||||
const totalPassed = DATA.reduce((s,r) => s + r.testsPassed, 0);
|
||||
const totalTests = DATA.reduce((s,r) => s + r.testsTotal, 0);
|
||||
const passRate = totalTests ? Math.round(totalPassed/totalTests*100) : 0;
|
||||
const bestModel = models.map(m => {
|
||||
const mrs = DATA.filter(r => r.model === m);
|
||||
return { model: m, avg: Math.round(mrs.reduce((s,r) => s + r.score, 0) / mrs.length) };
|
||||
}).sort((a,b) => b.avg - a.avg)[0];
|
||||
const fastestModel = models.map(m => {
|
||||
const mrs = DATA.filter(r => r.model === m);
|
||||
return { model: m, speed: Math.round(mrs.reduce((s,r) => s + r.avgTokPerSec, 0) / mrs.length) };
|
||||
}).sort((a,b) => b.speed - a.speed)[0];
|
||||
|
||||
document.getElementById('cards').innerHTML = `
|
||||
<div class="card"><div class="label">Keskiarvo</div><div class="value">${starsFor(avgScore)}</div><div class="sub">${avgScore} pistetta</div></div>
|
||||
<div class="card"><div class="label">Testien läpäisy</div><div class="value">${passRate}%</div><div class="sub">${totalPassed}/${totalTests} testiä</div></div>
|
||||
<div class="card"><div class="label">Paras malli</div><div class="value" style="font-size:1.2rem">${bestModel?.model || '-'}</div><div class="sub">${bestModel?.avg || 0}p</div></div>
|
||||
<div class="card"><div class="label">Nopein</div><div class="value" style="font-size:1.2rem">${fastestModel?.model || '-'}</div><div class="sub">${fastestModel?.speed || 0} tok/s</div></div>
|
||||
<div class="card"><div class="label">Malleja</div><div class="value">${models.length}</div><div class="sub">${scenarios.length} skenaariota</div></div>
|
||||
<div class="card"><div class="label">Kokonaisaika</div><div class="value">${(totalTime/1000/60).toFixed(1)}</div><div class="sub">minuuttia</div></div>
|
||||
`;
|
||||
|
||||
// Summary table
|
||||
const sumHead = document.querySelector('#summary-table thead');
|
||||
const sumBody = document.querySelector('#summary-table tbody');
|
||||
sumHead.innerHTML = '<tr><th>Malli</th>' + scenarios.map(s => `<th>${s}</th>`).join('') + '<th>Yht.</th><th>Out tok</th><th>Aika</th><th>tok/s</th><th>Pisteet</th></tr>';
|
||||
|
||||
const modelRows = models.map(m => {
|
||||
const mrs = DATA.filter(r => r.model === m);
|
||||
const tp = mrs.reduce((s,r) => s + r.testsPassed, 0);
|
||||
const tt = mrs.reduce((s,r) => s + r.testsTotal, 0);
|
||||
const tok = mrs.reduce((s,r) => s + r.totalTokens, 0);
|
||||
const time = mrs.reduce((s,r) => s + r.totalDurationMs, 0);
|
||||
const speed = Math.round(mrs.reduce((s,r) => s + r.avgTokPerSec, 0) / mrs.length);
|
||||
const avg = Math.round(mrs.reduce((s,r) => s + r.score, 0) / mrs.length);
|
||||
const scenCols = scenarios.map(s => {
|
||||
const r = mrs.find(r => r.scenario === s);
|
||||
if (!r) return '<td>-</td>';
|
||||
return `<td class="${cls(r)}">${pctBar(r.testsPassed, r.testsTotal, 60)} <span style="color:var(--dim)">${(r.totalDurationMs/1000).toFixed(0)}s</span></td>`;
|
||||
}).join('');
|
||||
return { avg, html: `<tr><td class="model-name">${m}</td>${scenCols}<td>${pctBar(tp, tt)}</td><td>${(tok/1000).toFixed(1)}K</td><td>${(time/1000).toFixed(0)}s</td><td>${speed}</td><td><span class="stars">${starsFor(avg)}</span> ${avg}p</td></tr>` };
|
||||
}).sort((a,b) => b.avg - a.avg);
|
||||
sumBody.innerHTML = modelRows.map(r => r.html).join('');
|
||||
|
||||
// Results table
|
||||
const resHead = document.querySelector('#results-table thead');
|
||||
const resBody = document.querySelector('#results-table tbody');
|
||||
const resCols = ['Malli','Skenaario','Speksi','Testit','Korjaus','Ctx','Out tok','Aika','tok/s','Pisteet'];
|
||||
resHead.innerHTML = '<tr>' + resCols.map((c,i) => `<th data-col="${i}">${c}</th>`).join('') + '</tr>';
|
||||
|
||||
let sortCol = 9, sortAsc = false;
|
||||
function renderResults() {
|
||||
const sorted = [...DATA].sort((a,b) => {
|
||||
const vals = [
|
||||
[a.model, b.model],
|
||||
[a.scenario, b.scenario],
|
||||
[a.specEntities, b.specEntities],
|
||||
[a.testsPassed/Math.max(a.testsTotal,1), b.testsPassed/Math.max(b.testsTotal,1)],
|
||||
[a.fixRounds, b.fixRounds],
|
||||
[a.promptTokensEst, b.promptTokensEst],
|
||||
[a.totalTokens, b.totalTokens],
|
||||
[a.totalDurationMs, b.totalDurationMs],
|
||||
[a.avgTokPerSec, b.avgTokPerSec],
|
||||
[a.score, b.score],
|
||||
][sortCol];
|
||||
const cmp = typeof vals[0] === 'string' ? vals[0].localeCompare(vals[1]) : vals[0] - vals[1];
|
||||
return sortAsc ? cmp : -cmp;
|
||||
});
|
||||
resBody.innerHTML = sorted.map(r => {
|
||||
const c = cls(r);
|
||||
return `<tr>
|
||||
<td class="model-name">${r.model}</td>
|
||||
<td>${r.scenario}</td>
|
||||
<td>${r.specOk ? `✓ ${r.specEntities}e` : '<span class="fail">✗</span>'}</td>
|
||||
<td class="${c}">${pctBar(r.testsPassed, r.testsTotal)}</td>
|
||||
<td>${r.fixRounds > 0 ? r.fixRounds + '×' : '-'}</td>
|
||||
<td>${r.promptTokensEst > 0 ? '~'+(r.promptTokensEst/1000).toFixed(1)+'K' : '-'}</td>
|
||||
<td>${r.totalTokens > 0 ? (r.totalTokens/1000).toFixed(1)+'K' : '-'}</td>
|
||||
<td>${(r.totalDurationMs/1000).toFixed(0)}s</td>
|
||||
<td>${r.avgTokPerSec.toFixed(0)}</td>
|
||||
<td><span class="stars">${r.stars}</span> ${r.score}p</td>
|
||||
</tr>`;
|
||||
}).join('');
|
||||
document.querySelectorAll('#results-table th').forEach((th,i) => {
|
||||
th.className = i === sortCol ? (sortAsc ? 'sorted-asc' : 'sorted-desc') : '';
|
||||
});
|
||||
}
|
||||
document.querySelector('#results-table thead').addEventListener('click', e => {
|
||||
const col = parseInt(e.target.dataset.col);
|
||||
if (isNaN(col)) return;
|
||||
if (sortCol === col) sortAsc = !sortAsc;
|
||||
else { sortCol = col; sortAsc = false; }
|
||||
renderResults();
|
||||
});
|
||||
renderResults();
|
||||
</script>
|
||||
</body>
|
||||
</html>
|
||||
113
kipina-codebench/results/2026-04-14T12-01.json
Normal file
113
kipina-codebench/results/2026-04-14T12-01.json
Normal file
@@ -0,0 +1,113 @@
|
||||
[
|
||||
{
|
||||
"model": "qwen3:8b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": false,
|
||||
"specEntities": 0,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 0,
|
||||
"testsPassed": 0,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 0,
|
||||
"totalTokens": 0,
|
||||
"avgTokPerSec": 0,
|
||||
"promptChars": 0,
|
||||
"promptTokensEst": 0,
|
||||
"score": 0,
|
||||
"stars": "",
|
||||
"error": "JSON-speksi epäonnistui",
|
||||
"round": 1
|
||||
},
|
||||
{
|
||||
"model": "qwen3:8b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 2,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 3,
|
||||
"testsTotal": 11,
|
||||
"testsPassed": 11,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 143640,
|
||||
"totalTokens": 12611,
|
||||
"avgTokPerSec": 96.28061629672216,
|
||||
"promptChars": 12125,
|
||||
"promptTokensEst": 3031,
|
||||
"score": 80,
|
||||
"stars": "★★★★☆",
|
||||
"error": null,
|
||||
"profile": "small",
|
||||
"promptName": "code-small",
|
||||
"round": 2
|
||||
},
|
||||
{
|
||||
"model": "qwen3:8b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 2,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 2,
|
||||
"testsTotal": 12,
|
||||
"testsPassed": 12,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 116061,
|
||||
"totalTokens": 10181,
|
||||
"avgTokPerSec": 96.63321228455318,
|
||||
"promptChars": 12435,
|
||||
"promptTokensEst": 3109,
|
||||
"score": 80,
|
||||
"stars": "★★★★☆",
|
||||
"error": null,
|
||||
"profile": "small",
|
||||
"promptName": "code-small",
|
||||
"round": 3
|
||||
},
|
||||
{
|
||||
"model": "qwen3:8b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": false,
|
||||
"specEntities": 0,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 0,
|
||||
"testsPassed": 0,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 0,
|
||||
"totalTokens": 0,
|
||||
"avgTokPerSec": 0,
|
||||
"promptChars": 0,
|
||||
"promptTokensEst": 0,
|
||||
"score": 0,
|
||||
"stars": "",
|
||||
"error": "JSON-speksi epäonnistui",
|
||||
"round": 4
|
||||
},
|
||||
{
|
||||
"model": "qwen3:8b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 2,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 2,
|
||||
"testsTotal": 11,
|
||||
"testsPassed": 11,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 113792,
|
||||
"totalTokens": 10022,
|
||||
"avgTokPerSec": 96.96815077469971,
|
||||
"promptChars": 12260,
|
||||
"promptTokensEst": 3065,
|
||||
"score": 80,
|
||||
"stars": "★★★★☆",
|
||||
"error": null,
|
||||
"profile": "small",
|
||||
"promptName": "code-small",
|
||||
"round": 5
|
||||
}
|
||||
]
|
||||
183
kipina-codebench/results/2026-04-14T13-11.html
Normal file
183
kipina-codebench/results/2026-04-14T13-11.html
Normal file
@@ -0,0 +1,183 @@
|
||||
<!DOCTYPE html>
|
||||
<html lang="fi">
|
||||
<head>
|
||||
<meta charset="UTF-8">
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
||||
<title>Kipina Model Benchmark</title>
|
||||
<style>
|
||||
:root { --bg: #0d1117; --card: #161b22; --border: #30363d; --text: #e6edf3; --dim: #8b949e; --green: #3fb950; --yellow: #d29922; --red: #f85149; --blue: #58a6ff; }
|
||||
* { box-sizing: border-box; margin: 0; padding: 0; }
|
||||
body { font-family: -apple-system, 'Segoe UI', Helvetica, Arial, sans-serif; background: var(--bg); color: var(--text); padding: 2rem; max-width: 1400px; margin: 0 auto; }
|
||||
h1 { font-size: 1.5rem; margin-bottom: 0.5rem; }
|
||||
.meta { color: var(--dim); font-size: 0.85rem; margin-bottom: 2rem; }
|
||||
.cards { display: grid; grid-template-columns: repeat(auto-fit, minmax(180px, 1fr)); gap: 1rem; margin-bottom: 2rem; }
|
||||
.card { background: var(--card); border: 1px solid var(--border); border-radius: 8px; padding: 1rem; }
|
||||
.card .label { color: var(--dim); font-size: 0.75rem; text-transform: uppercase; letter-spacing: 0.05em; }
|
||||
.card .value { font-size: 1.8rem; font-weight: 600; margin-top: 0.25rem; }
|
||||
.card .sub { color: var(--dim); font-size: 0.8rem; margin-top: 0.25rem; }
|
||||
table { width: 100%; border-collapse: collapse; background: var(--card); border: 1px solid var(--border); border-radius: 8px; overflow: hidden; margin-bottom: 2rem; }
|
||||
th { background: #1c2128; text-align: left; padding: 0.6rem 0.8rem; font-size: 0.75rem; text-transform: uppercase; letter-spacing: 0.05em; color: var(--dim); cursor: pointer; user-select: none; white-space: nowrap; }
|
||||
th:hover { color: var(--text); }
|
||||
th.sorted-asc::after { content: ' ▲'; }
|
||||
th.sorted-desc::after { content: ' ▼'; }
|
||||
td { padding: 0.5rem 0.8rem; border-top: 1px solid var(--border); font-size: 0.85rem; white-space: nowrap; }
|
||||
tr:hover td { background: #1c2128; }
|
||||
.pass { color: var(--green); }
|
||||
.partial { color: var(--yellow); }
|
||||
.fail { color: var(--red); }
|
||||
.stars { letter-spacing: 1px; }
|
||||
.bar { display: inline-block; height: 8px; border-radius: 4px; vertical-align: middle; }
|
||||
.bar-bg { background: var(--border); }
|
||||
.bar-fill { background: var(--green); }
|
||||
.bar-partial { background: var(--yellow); }
|
||||
.model-name { font-weight: 600; }
|
||||
h2 { font-size: 1.1rem; margin-bottom: 1rem; color: var(--dim); }
|
||||
.summary-table th:first-child, .summary-table td:first-child { min-width: 200px; }
|
||||
</style>
|
||||
</head>
|
||||
<body>
|
||||
|
||||
<h1>Kipina Model Benchmark</h1>
|
||||
<div class="meta" id="meta"></div>
|
||||
|
||||
<div class="cards" id="cards"></div>
|
||||
|
||||
<h2>Mallikohtainen yhteenveto</h2>
|
||||
<table class="summary-table" id="summary-table"><thead></thead><tbody></tbody></table>
|
||||
|
||||
<h2>Kaikki tulokset</h2>
|
||||
<table id="results-table"><thead></thead><tbody></tbody></table>
|
||||
|
||||
<script>
|
||||
const RAW = [{"model":"qwen3:8b","scenario":"blog","reqOk":true,"specOk":true,"specEntities":1,"validationIssues":0,"fixRounds":0,"testsTotal":0,"testsPassed":0,"testsFailed":0,"totalDurationMs":0,"totalTokens":0,"avgTokPerSec":0,"promptChars":10508,"promptTokensEst":2627,"score":0,"stars":"","error":"Puuttuvat: Cargo.toml, src/models.rs, src/handlers.rs, src/lib.rs, src/main.rs, tests/api_test.rs","round":1},{"model":"qwen3:8b","scenario":"blog","reqOk":true,"specOk":false,"specEntities":0,"validationIssues":0,"fixRounds":0,"testsTotal":0,"testsPassed":0,"testsFailed":0,"totalDurationMs":0,"totalTokens":0,"avgTokPerSec":0,"promptChars":0,"promptTokensEst":0,"score":0,"stars":"","error":"JSON-speksi epäonnistui","round":2},{"model":"qwen3:8b","scenario":"blog","reqOk":true,"specOk":false,"specEntities":0,"validationIssues":0,"fixRounds":0,"testsTotal":0,"testsPassed":0,"testsFailed":0,"totalDurationMs":0,"totalTokens":0,"avgTokPerSec":0,"promptChars":0,"promptTokensEst":0,"score":0,"stars":"","error":"JSON-speksi epäonnistui","round":3},{"model":"qwen3:8b","scenario":"blog","reqOk":true,"specOk":false,"specEntities":0,"validationIssues":0,"fixRounds":0,"testsTotal":0,"testsPassed":0,"testsFailed":0,"totalDurationMs":0,"totalTokens":0,"avgTokPerSec":0,"promptChars":0,"promptTokensEst":0,"score":0,"stars":"","error":"JSON-speksi epäonnistui","round":4},{"model":"qwen3:8b","scenario":"blog","reqOk":true,"specOk":false,"specEntities":0,"validationIssues":0,"fixRounds":0,"testsTotal":0,"testsPassed":0,"testsFailed":0,"totalDurationMs":0,"totalTokens":0,"avgTokPerSec":0,"promptChars":0,"promptTokensEst":0,"score":0,"stars":"","error":"JSON-speksi epäonnistui","round":5}];
|
||||
|
||||
const starsFor = s => s >= 90 ? '★★★★★' : s >= 70 ? '★★★★☆' : s >= 50 ? '★★★☆☆' : s >= 25 ? '★★☆☆☆' : s > 0 ? '★☆☆☆☆' : '☆☆☆☆☆';
|
||||
function calcScore(r) {
|
||||
if (r.error && r.testsTotal === 0) return 0;
|
||||
let s = 0;
|
||||
if (r.specOk) s += 10;
|
||||
if (!r.error || r.testsTotal > 0) s += 10;
|
||||
if (r.testsTotal > 0) s += Math.round((r.testsPassed / r.testsTotal) * 60);
|
||||
s += Math.max(0, 20 - (r.fixRounds || 0) * 10);
|
||||
return Math.min(100, s);
|
||||
}
|
||||
// Laske pisteet jos puuttuvat
|
||||
const DATA = RAW.map(r => {
|
||||
if (r.score == null) r.score = calcScore(r);
|
||||
if (!r.stars) r.stars = starsFor(r.score);
|
||||
if (!r.promptTokensEst) r.promptTokensEst = r.promptChars ? Math.round(r.promptChars / 4) : 0;
|
||||
return r;
|
||||
});
|
||||
const cls = r => (!r.error && r.testsPassed === r.testsTotal && r.testsTotal > 0) ? 'pass' : (r.testsTotal > 0 && r.testsPassed > 0) ? 'partial' : 'fail';
|
||||
const pctBar = (passed, total, w=80) => {
|
||||
if (total === 0) return '-';
|
||||
const pct = passed/total*100;
|
||||
const c = pct === 100 ? 'bar-fill' : 'bar-partial';
|
||||
return `<span class="bar bar-bg" style="width:${w}px"><span class="bar ${c}" style="width:${Math.round(pct/100*w)}px"></span></span> ${passed}/${total}`;
|
||||
};
|
||||
|
||||
// Meta
|
||||
const totalTime = DATA.reduce((s,r) => s + r.totalDurationMs, 0);
|
||||
document.getElementById('meta').textContent = `${new Date().toLocaleDateString('fi-FI')} — ${DATA.length} ajoa — ${(totalTime/1000/60).toFixed(1)} min`;
|
||||
|
||||
// Cards
|
||||
const models = [...new Set(DATA.map(r => r.model))];
|
||||
const scenarios = [...new Set(DATA.map(r => r.scenario))];
|
||||
const avgScore = DATA.length ? Math.round(DATA.reduce((s,r) => s + r.score, 0) / DATA.length) : 0;
|
||||
const totalPassed = DATA.reduce((s,r) => s + r.testsPassed, 0);
|
||||
const totalTests = DATA.reduce((s,r) => s + r.testsTotal, 0);
|
||||
const passRate = totalTests ? Math.round(totalPassed/totalTests*100) : 0;
|
||||
const bestModel = models.map(m => {
|
||||
const mrs = DATA.filter(r => r.model === m);
|
||||
return { model: m, avg: Math.round(mrs.reduce((s,r) => s + r.score, 0) / mrs.length) };
|
||||
}).sort((a,b) => b.avg - a.avg)[0];
|
||||
const fastestModel = models.map(m => {
|
||||
const mrs = DATA.filter(r => r.model === m);
|
||||
return { model: m, speed: Math.round(mrs.reduce((s,r) => s + r.avgTokPerSec, 0) / mrs.length) };
|
||||
}).sort((a,b) => b.speed - a.speed)[0];
|
||||
|
||||
document.getElementById('cards').innerHTML = `
|
||||
<div class="card"><div class="label">Keskiarvo</div><div class="value">${starsFor(avgScore)}</div><div class="sub">${avgScore} pistetta</div></div>
|
||||
<div class="card"><div class="label">Testien läpäisy</div><div class="value">${passRate}%</div><div class="sub">${totalPassed}/${totalTests} testiä</div></div>
|
||||
<div class="card"><div class="label">Paras malli</div><div class="value" style="font-size:1.2rem">${bestModel?.model || '-'}</div><div class="sub">${bestModel?.avg || 0}p</div></div>
|
||||
<div class="card"><div class="label">Nopein</div><div class="value" style="font-size:1.2rem">${fastestModel?.model || '-'}</div><div class="sub">${fastestModel?.speed || 0} tok/s</div></div>
|
||||
<div class="card"><div class="label">Malleja</div><div class="value">${models.length}</div><div class="sub">${scenarios.length} skenaariota</div></div>
|
||||
<div class="card"><div class="label">Kokonaisaika</div><div class="value">${(totalTime/1000/60).toFixed(1)}</div><div class="sub">minuuttia</div></div>
|
||||
`;
|
||||
|
||||
// Summary table
|
||||
const sumHead = document.querySelector('#summary-table thead');
|
||||
const sumBody = document.querySelector('#summary-table tbody');
|
||||
sumHead.innerHTML = '<tr><th>Malli</th>' + scenarios.map(s => `<th>${s}</th>`).join('') + '<th>Yht.</th><th>Out tok</th><th>Aika</th><th>tok/s</th><th>Pisteet</th></tr>';
|
||||
|
||||
const modelRows = models.map(m => {
|
||||
const mrs = DATA.filter(r => r.model === m);
|
||||
const tp = mrs.reduce((s,r) => s + r.testsPassed, 0);
|
||||
const tt = mrs.reduce((s,r) => s + r.testsTotal, 0);
|
||||
const tok = mrs.reduce((s,r) => s + r.totalTokens, 0);
|
||||
const time = mrs.reduce((s,r) => s + r.totalDurationMs, 0);
|
||||
const speed = Math.round(mrs.reduce((s,r) => s + r.avgTokPerSec, 0) / mrs.length);
|
||||
const avg = Math.round(mrs.reduce((s,r) => s + r.score, 0) / mrs.length);
|
||||
const scenCols = scenarios.map(s => {
|
||||
const r = mrs.find(r => r.scenario === s);
|
||||
if (!r) return '<td>-</td>';
|
||||
return `<td class="${cls(r)}">${pctBar(r.testsPassed, r.testsTotal, 60)} <span style="color:var(--dim)">${(r.totalDurationMs/1000).toFixed(0)}s</span></td>`;
|
||||
}).join('');
|
||||
return { avg, html: `<tr><td class="model-name">${m}</td>${scenCols}<td>${pctBar(tp, tt)}</td><td>${(tok/1000).toFixed(1)}K</td><td>${(time/1000).toFixed(0)}s</td><td>${speed}</td><td><span class="stars">${starsFor(avg)}</span> ${avg}p</td></tr>` };
|
||||
}).sort((a,b) => b.avg - a.avg);
|
||||
sumBody.innerHTML = modelRows.map(r => r.html).join('');
|
||||
|
||||
// Results table
|
||||
const resHead = document.querySelector('#results-table thead');
|
||||
const resBody = document.querySelector('#results-table tbody');
|
||||
const resCols = ['Malli','Skenaario','Speksi','Testit','Korjaus','Ctx','Out tok','Aika','tok/s','Pisteet'];
|
||||
resHead.innerHTML = '<tr>' + resCols.map((c,i) => `<th data-col="${i}">${c}</th>`).join('') + '</tr>';
|
||||
|
||||
let sortCol = 9, sortAsc = false;
|
||||
function renderResults() {
|
||||
const sorted = [...DATA].sort((a,b) => {
|
||||
const vals = [
|
||||
[a.model, b.model],
|
||||
[a.scenario, b.scenario],
|
||||
[a.specEntities, b.specEntities],
|
||||
[a.testsPassed/Math.max(a.testsTotal,1), b.testsPassed/Math.max(b.testsTotal,1)],
|
||||
[a.fixRounds, b.fixRounds],
|
||||
[a.promptTokensEst, b.promptTokensEst],
|
||||
[a.totalTokens, b.totalTokens],
|
||||
[a.totalDurationMs, b.totalDurationMs],
|
||||
[a.avgTokPerSec, b.avgTokPerSec],
|
||||
[a.score, b.score],
|
||||
][sortCol];
|
||||
const cmp = typeof vals[0] === 'string' ? vals[0].localeCompare(vals[1]) : vals[0] - vals[1];
|
||||
return sortAsc ? cmp : -cmp;
|
||||
});
|
||||
resBody.innerHTML = sorted.map(r => {
|
||||
const c = cls(r);
|
||||
return `<tr>
|
||||
<td class="model-name">${r.model}</td>
|
||||
<td>${r.scenario}</td>
|
||||
<td>${r.specOk ? `✓ ${r.specEntities}e` : '<span class="fail">✗</span>'}</td>
|
||||
<td class="${c}">${pctBar(r.testsPassed, r.testsTotal)}</td>
|
||||
<td>${r.fixRounds > 0 ? r.fixRounds + '×' : '-'}</td>
|
||||
<td>${r.promptTokensEst > 0 ? '~'+(r.promptTokensEst/1000).toFixed(1)+'K' : '-'}</td>
|
||||
<td>${r.totalTokens > 0 ? (r.totalTokens/1000).toFixed(1)+'K' : '-'}</td>
|
||||
<td>${(r.totalDurationMs/1000).toFixed(0)}s</td>
|
||||
<td>${r.avgTokPerSec.toFixed(0)}</td>
|
||||
<td><span class="stars">${r.stars}</span> ${r.score}p</td>
|
||||
</tr>`;
|
||||
}).join('');
|
||||
document.querySelectorAll('#results-table th').forEach((th,i) => {
|
||||
th.className = i === sortCol ? (sortAsc ? 'sorted-asc' : 'sorted-desc') : '';
|
||||
});
|
||||
}
|
||||
document.querySelector('#results-table thead').addEventListener('click', e => {
|
||||
const col = parseInt(e.target.dataset.col);
|
||||
if (isNaN(col)) return;
|
||||
if (sortCol === col) sortAsc = !sortAsc;
|
||||
else { sortCol = col; sortAsc = false; }
|
||||
renderResults();
|
||||
});
|
||||
renderResults();
|
||||
</script>
|
||||
</body>
|
||||
</html>
|
||||
107
kipina-codebench/results/2026-04-14T13-11.json
Normal file
107
kipina-codebench/results/2026-04-14T13-11.json
Normal file
@@ -0,0 +1,107 @@
|
||||
[
|
||||
{
|
||||
"model": "qwen3:8b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 1,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 0,
|
||||
"testsPassed": 0,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 0,
|
||||
"totalTokens": 0,
|
||||
"avgTokPerSec": 0,
|
||||
"promptChars": 10508,
|
||||
"promptTokensEst": 2627,
|
||||
"score": 0,
|
||||
"stars": "",
|
||||
"error": "Puuttuvat: Cargo.toml, src/models.rs, src/handlers.rs, src/lib.rs, src/main.rs, tests/api_test.rs",
|
||||
"round": 1
|
||||
},
|
||||
{
|
||||
"model": "qwen3:8b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": false,
|
||||
"specEntities": 0,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 0,
|
||||
"testsPassed": 0,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 0,
|
||||
"totalTokens": 0,
|
||||
"avgTokPerSec": 0,
|
||||
"promptChars": 0,
|
||||
"promptTokensEst": 0,
|
||||
"score": 0,
|
||||
"stars": "",
|
||||
"error": "JSON-speksi epäonnistui",
|
||||
"round": 2
|
||||
},
|
||||
{
|
||||
"model": "qwen3:8b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": false,
|
||||
"specEntities": 0,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 0,
|
||||
"testsPassed": 0,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 0,
|
||||
"totalTokens": 0,
|
||||
"avgTokPerSec": 0,
|
||||
"promptChars": 0,
|
||||
"promptTokensEst": 0,
|
||||
"score": 0,
|
||||
"stars": "",
|
||||
"error": "JSON-speksi epäonnistui",
|
||||
"round": 3
|
||||
},
|
||||
{
|
||||
"model": "qwen3:8b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": false,
|
||||
"specEntities": 0,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 0,
|
||||
"testsPassed": 0,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 0,
|
||||
"totalTokens": 0,
|
||||
"avgTokPerSec": 0,
|
||||
"promptChars": 0,
|
||||
"promptTokensEst": 0,
|
||||
"score": 0,
|
||||
"stars": "",
|
||||
"error": "JSON-speksi epäonnistui",
|
||||
"round": 4
|
||||
},
|
||||
{
|
||||
"model": "qwen3:8b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": false,
|
||||
"specEntities": 0,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 0,
|
||||
"testsPassed": 0,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 0,
|
||||
"totalTokens": 0,
|
||||
"avgTokPerSec": 0,
|
||||
"promptChars": 0,
|
||||
"promptTokensEst": 0,
|
||||
"score": 0,
|
||||
"stars": "",
|
||||
"error": "JSON-speksi epäonnistui",
|
||||
"round": 5
|
||||
}
|
||||
]
|
||||
183
kipina-codebench/results/2026-04-14T13-12.html
Normal file
183
kipina-codebench/results/2026-04-14T13-12.html
Normal file
@@ -0,0 +1,183 @@
|
||||
<!DOCTYPE html>
|
||||
<html lang="fi">
|
||||
<head>
|
||||
<meta charset="UTF-8">
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
||||
<title>Kipina Model Benchmark</title>
|
||||
<style>
|
||||
:root { --bg: #0d1117; --card: #161b22; --border: #30363d; --text: #e6edf3; --dim: #8b949e; --green: #3fb950; --yellow: #d29922; --red: #f85149; --blue: #58a6ff; }
|
||||
* { box-sizing: border-box; margin: 0; padding: 0; }
|
||||
body { font-family: -apple-system, 'Segoe UI', Helvetica, Arial, sans-serif; background: var(--bg); color: var(--text); padding: 2rem; max-width: 1400px; margin: 0 auto; }
|
||||
h1 { font-size: 1.5rem; margin-bottom: 0.5rem; }
|
||||
.meta { color: var(--dim); font-size: 0.85rem; margin-bottom: 2rem; }
|
||||
.cards { display: grid; grid-template-columns: repeat(auto-fit, minmax(180px, 1fr)); gap: 1rem; margin-bottom: 2rem; }
|
||||
.card { background: var(--card); border: 1px solid var(--border); border-radius: 8px; padding: 1rem; }
|
||||
.card .label { color: var(--dim); font-size: 0.75rem; text-transform: uppercase; letter-spacing: 0.05em; }
|
||||
.card .value { font-size: 1.8rem; font-weight: 600; margin-top: 0.25rem; }
|
||||
.card .sub { color: var(--dim); font-size: 0.8rem; margin-top: 0.25rem; }
|
||||
table { width: 100%; border-collapse: collapse; background: var(--card); border: 1px solid var(--border); border-radius: 8px; overflow: hidden; margin-bottom: 2rem; }
|
||||
th { background: #1c2128; text-align: left; padding: 0.6rem 0.8rem; font-size: 0.75rem; text-transform: uppercase; letter-spacing: 0.05em; color: var(--dim); cursor: pointer; user-select: none; white-space: nowrap; }
|
||||
th:hover { color: var(--text); }
|
||||
th.sorted-asc::after { content: ' ▲'; }
|
||||
th.sorted-desc::after { content: ' ▼'; }
|
||||
td { padding: 0.5rem 0.8rem; border-top: 1px solid var(--border); font-size: 0.85rem; white-space: nowrap; }
|
||||
tr:hover td { background: #1c2128; }
|
||||
.pass { color: var(--green); }
|
||||
.partial { color: var(--yellow); }
|
||||
.fail { color: var(--red); }
|
||||
.stars { letter-spacing: 1px; }
|
||||
.bar { display: inline-block; height: 8px; border-radius: 4px; vertical-align: middle; }
|
||||
.bar-bg { background: var(--border); }
|
||||
.bar-fill { background: var(--green); }
|
||||
.bar-partial { background: var(--yellow); }
|
||||
.model-name { font-weight: 600; }
|
||||
h2 { font-size: 1.1rem; margin-bottom: 1rem; color: var(--dim); }
|
||||
.summary-table th:first-child, .summary-table td:first-child { min-width: 200px; }
|
||||
</style>
|
||||
</head>
|
||||
<body>
|
||||
|
||||
<h1>Kipina Model Benchmark</h1>
|
||||
<div class="meta" id="meta"></div>
|
||||
|
||||
<div class="cards" id="cards"></div>
|
||||
|
||||
<h2>Mallikohtainen yhteenveto</h2>
|
||||
<table class="summary-table" id="summary-table"><thead></thead><tbody></tbody></table>
|
||||
|
||||
<h2>Kaikki tulokset</h2>
|
||||
<table id="results-table"><thead></thead><tbody></tbody></table>
|
||||
|
||||
<script>
|
||||
const RAW = [{"model":"qwen3-coder:30b","scenario":"blog","reqOk":true,"specOk":true,"specEntities":2,"validationIssues":0,"fixRounds":3,"testsTotal":3,"testsPassed":0,"testsFailed":3,"totalDurationMs":217110,"totalTokens":21602,"avgTokPerSec":114.70956637458333,"promptChars":12612,"promptTokensEst":3153,"score":20,"stars":"★☆☆☆☆","error":null,"profile":"large","promptName":"code-rs","round":1},{"model":"qwen3-coder:30b","scenario":"blog","reqOk":true,"specOk":true,"specEntities":2,"validationIssues":0,"fixRounds":3,"testsTotal":3,"testsPassed":0,"testsFailed":3,"totalDurationMs":204772,"totalTokens":20717,"avgTokPerSec":114.45999021594592,"promptChars":12743,"promptTokensEst":3186,"score":20,"stars":"★☆☆☆☆","error":null,"profile":"large","promptName":"code-rs","round":2},{"model":"qwen3-coder:30b","scenario":"blog","reqOk":true,"specOk":true,"specEntities":2,"validationIssues":0,"fixRounds":3,"testsTotal":4,"testsPassed":0,"testsFailed":4,"totalDurationMs":180501,"totalTokens":18467,"avgTokPerSec":115.23583963958032,"promptChars":12392,"promptTokensEst":3098,"score":20,"stars":"★☆☆☆☆","error":null,"profile":"large","promptName":"code-rs","round":3},{"model":"qwen3-coder:30b","scenario":"blog","reqOk":true,"specOk":true,"specEntities":2,"validationIssues":0,"fixRounds":3,"testsTotal":25,"testsPassed":0,"testsFailed":25,"totalDurationMs":282681,"totalTokens":27665,"avgTokPerSec":111.29688837623901,"promptChars":12675,"promptTokensEst":3169,"score":20,"stars":"★☆☆☆☆","error":null,"profile":"large","promptName":"code-rs","round":4},{"model":"qwen3-coder:30b","scenario":"blog","reqOk":true,"specOk":true,"specEntities":2,"validationIssues":0,"fixRounds":3,"testsTotal":5,"testsPassed":0,"testsFailed":5,"totalDurationMs":171686,"totalTokens":17525,"avgTokPerSec":114.88288274375243,"promptChars":12618,"promptTokensEst":3155,"score":20,"stars":"★☆☆☆☆","error":null,"profile":"large","promptName":"code-rs","round":5}];
|
||||
|
||||
const starsFor = s => s >= 90 ? '★★★★★' : s >= 70 ? '★★★★☆' : s >= 50 ? '★★★☆☆' : s >= 25 ? '★★☆☆☆' : s > 0 ? '★☆☆☆☆' : '☆☆☆☆☆';
|
||||
function calcScore(r) {
|
||||
if (r.error && r.testsTotal === 0) return 0;
|
||||
let s = 0;
|
||||
if (r.specOk) s += 10;
|
||||
if (!r.error || r.testsTotal > 0) s += 10;
|
||||
if (r.testsTotal > 0) s += Math.round((r.testsPassed / r.testsTotal) * 60);
|
||||
s += Math.max(0, 20 - (r.fixRounds || 0) * 10);
|
||||
return Math.min(100, s);
|
||||
}
|
||||
// Laske pisteet jos puuttuvat
|
||||
const DATA = RAW.map(r => {
|
||||
if (r.score == null) r.score = calcScore(r);
|
||||
if (!r.stars) r.stars = starsFor(r.score);
|
||||
if (!r.promptTokensEst) r.promptTokensEst = r.promptChars ? Math.round(r.promptChars / 4) : 0;
|
||||
return r;
|
||||
});
|
||||
const cls = r => (!r.error && r.testsPassed === r.testsTotal && r.testsTotal > 0) ? 'pass' : (r.testsTotal > 0 && r.testsPassed > 0) ? 'partial' : 'fail';
|
||||
const pctBar = (passed, total, w=80) => {
|
||||
if (total === 0) return '-';
|
||||
const pct = passed/total*100;
|
||||
const c = pct === 100 ? 'bar-fill' : 'bar-partial';
|
||||
return `<span class="bar bar-bg" style="width:${w}px"><span class="bar ${c}" style="width:${Math.round(pct/100*w)}px"></span></span> ${passed}/${total}`;
|
||||
};
|
||||
|
||||
// Meta
|
||||
const totalTime = DATA.reduce((s,r) => s + r.totalDurationMs, 0);
|
||||
document.getElementById('meta').textContent = `${new Date().toLocaleDateString('fi-FI')} — ${DATA.length} ajoa — ${(totalTime/1000/60).toFixed(1)} min`;
|
||||
|
||||
// Cards
|
||||
const models = [...new Set(DATA.map(r => r.model))];
|
||||
const scenarios = [...new Set(DATA.map(r => r.scenario))];
|
||||
const avgScore = DATA.length ? Math.round(DATA.reduce((s,r) => s + r.score, 0) / DATA.length) : 0;
|
||||
const totalPassed = DATA.reduce((s,r) => s + r.testsPassed, 0);
|
||||
const totalTests = DATA.reduce((s,r) => s + r.testsTotal, 0);
|
||||
const passRate = totalTests ? Math.round(totalPassed/totalTests*100) : 0;
|
||||
const bestModel = models.map(m => {
|
||||
const mrs = DATA.filter(r => r.model === m);
|
||||
return { model: m, avg: Math.round(mrs.reduce((s,r) => s + r.score, 0) / mrs.length) };
|
||||
}).sort((a,b) => b.avg - a.avg)[0];
|
||||
const fastestModel = models.map(m => {
|
||||
const mrs = DATA.filter(r => r.model === m);
|
||||
return { model: m, speed: Math.round(mrs.reduce((s,r) => s + r.avgTokPerSec, 0) / mrs.length) };
|
||||
}).sort((a,b) => b.speed - a.speed)[0];
|
||||
|
||||
document.getElementById('cards').innerHTML = `
|
||||
<div class="card"><div class="label">Keskiarvo</div><div class="value">${starsFor(avgScore)}</div><div class="sub">${avgScore} pistetta</div></div>
|
||||
<div class="card"><div class="label">Testien läpäisy</div><div class="value">${passRate}%</div><div class="sub">${totalPassed}/${totalTests} testiä</div></div>
|
||||
<div class="card"><div class="label">Paras malli</div><div class="value" style="font-size:1.2rem">${bestModel?.model || '-'}</div><div class="sub">${bestModel?.avg || 0}p</div></div>
|
||||
<div class="card"><div class="label">Nopein</div><div class="value" style="font-size:1.2rem">${fastestModel?.model || '-'}</div><div class="sub">${fastestModel?.speed || 0} tok/s</div></div>
|
||||
<div class="card"><div class="label">Malleja</div><div class="value">${models.length}</div><div class="sub">${scenarios.length} skenaariota</div></div>
|
||||
<div class="card"><div class="label">Kokonaisaika</div><div class="value">${(totalTime/1000/60).toFixed(1)}</div><div class="sub">minuuttia</div></div>
|
||||
`;
|
||||
|
||||
// Summary table
|
||||
const sumHead = document.querySelector('#summary-table thead');
|
||||
const sumBody = document.querySelector('#summary-table tbody');
|
||||
sumHead.innerHTML = '<tr><th>Malli</th>' + scenarios.map(s => `<th>${s}</th>`).join('') + '<th>Yht.</th><th>Out tok</th><th>Aika</th><th>tok/s</th><th>Pisteet</th></tr>';
|
||||
|
||||
const modelRows = models.map(m => {
|
||||
const mrs = DATA.filter(r => r.model === m);
|
||||
const tp = mrs.reduce((s,r) => s + r.testsPassed, 0);
|
||||
const tt = mrs.reduce((s,r) => s + r.testsTotal, 0);
|
||||
const tok = mrs.reduce((s,r) => s + r.totalTokens, 0);
|
||||
const time = mrs.reduce((s,r) => s + r.totalDurationMs, 0);
|
||||
const speed = Math.round(mrs.reduce((s,r) => s + r.avgTokPerSec, 0) / mrs.length);
|
||||
const avg = Math.round(mrs.reduce((s,r) => s + r.score, 0) / mrs.length);
|
||||
const scenCols = scenarios.map(s => {
|
||||
const r = mrs.find(r => r.scenario === s);
|
||||
if (!r) return '<td>-</td>';
|
||||
return `<td class="${cls(r)}">${pctBar(r.testsPassed, r.testsTotal, 60)} <span style="color:var(--dim)">${(r.totalDurationMs/1000).toFixed(0)}s</span></td>`;
|
||||
}).join('');
|
||||
return { avg, html: `<tr><td class="model-name">${m}</td>${scenCols}<td>${pctBar(tp, tt)}</td><td>${(tok/1000).toFixed(1)}K</td><td>${(time/1000).toFixed(0)}s</td><td>${speed}</td><td><span class="stars">${starsFor(avg)}</span> ${avg}p</td></tr>` };
|
||||
}).sort((a,b) => b.avg - a.avg);
|
||||
sumBody.innerHTML = modelRows.map(r => r.html).join('');
|
||||
|
||||
// Results table
|
||||
const resHead = document.querySelector('#results-table thead');
|
||||
const resBody = document.querySelector('#results-table tbody');
|
||||
const resCols = ['Malli','Skenaario','Speksi','Testit','Korjaus','Ctx','Out tok','Aika','tok/s','Pisteet'];
|
||||
resHead.innerHTML = '<tr>' + resCols.map((c,i) => `<th data-col="${i}">${c}</th>`).join('') + '</tr>';
|
||||
|
||||
let sortCol = 9, sortAsc = false;
|
||||
function renderResults() {
|
||||
const sorted = [...DATA].sort((a,b) => {
|
||||
const vals = [
|
||||
[a.model, b.model],
|
||||
[a.scenario, b.scenario],
|
||||
[a.specEntities, b.specEntities],
|
||||
[a.testsPassed/Math.max(a.testsTotal,1), b.testsPassed/Math.max(b.testsTotal,1)],
|
||||
[a.fixRounds, b.fixRounds],
|
||||
[a.promptTokensEst, b.promptTokensEst],
|
||||
[a.totalTokens, b.totalTokens],
|
||||
[a.totalDurationMs, b.totalDurationMs],
|
||||
[a.avgTokPerSec, b.avgTokPerSec],
|
||||
[a.score, b.score],
|
||||
][sortCol];
|
||||
const cmp = typeof vals[0] === 'string' ? vals[0].localeCompare(vals[1]) : vals[0] - vals[1];
|
||||
return sortAsc ? cmp : -cmp;
|
||||
});
|
||||
resBody.innerHTML = sorted.map(r => {
|
||||
const c = cls(r);
|
||||
return `<tr>
|
||||
<td class="model-name">${r.model}</td>
|
||||
<td>${r.scenario}</td>
|
||||
<td>${r.specOk ? `✓ ${r.specEntities}e` : '<span class="fail">✗</span>'}</td>
|
||||
<td class="${c}">${pctBar(r.testsPassed, r.testsTotal)}</td>
|
||||
<td>${r.fixRounds > 0 ? r.fixRounds + '×' : '-'}</td>
|
||||
<td>${r.promptTokensEst > 0 ? '~'+(r.promptTokensEst/1000).toFixed(1)+'K' : '-'}</td>
|
||||
<td>${r.totalTokens > 0 ? (r.totalTokens/1000).toFixed(1)+'K' : '-'}</td>
|
||||
<td>${(r.totalDurationMs/1000).toFixed(0)}s</td>
|
||||
<td>${r.avgTokPerSec.toFixed(0)}</td>
|
||||
<td><span class="stars">${r.stars}</span> ${r.score}p</td>
|
||||
</tr>`;
|
||||
}).join('');
|
||||
document.querySelectorAll('#results-table th').forEach((th,i) => {
|
||||
th.className = i === sortCol ? (sortAsc ? 'sorted-asc' : 'sorted-desc') : '';
|
||||
});
|
||||
}
|
||||
document.querySelector('#results-table thead').addEventListener('click', e => {
|
||||
const col = parseInt(e.target.dataset.col);
|
||||
if (isNaN(col)) return;
|
||||
if (sortCol === col) sortAsc = !sortAsc;
|
||||
else { sortCol = col; sortAsc = false; }
|
||||
renderResults();
|
||||
});
|
||||
renderResults();
|
||||
</script>
|
||||
</body>
|
||||
</html>
|
||||
117
kipina-codebench/results/2026-04-14T13-12.json
Normal file
117
kipina-codebench/results/2026-04-14T13-12.json
Normal file
@@ -0,0 +1,117 @@
|
||||
[
|
||||
{
|
||||
"model": "qwen3-coder:30b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 2,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 3,
|
||||
"testsTotal": 3,
|
||||
"testsPassed": 0,
|
||||
"testsFailed": 3,
|
||||
"totalDurationMs": 217110,
|
||||
"totalTokens": 21602,
|
||||
"avgTokPerSec": 114.70956637458333,
|
||||
"promptChars": 12612,
|
||||
"promptTokensEst": 3153,
|
||||
"score": 20,
|
||||
"stars": "★☆☆☆☆",
|
||||
"error": null,
|
||||
"profile": "large",
|
||||
"promptName": "code-rs",
|
||||
"round": 1
|
||||
},
|
||||
{
|
||||
"model": "qwen3-coder:30b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 2,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 3,
|
||||
"testsTotal": 3,
|
||||
"testsPassed": 0,
|
||||
"testsFailed": 3,
|
||||
"totalDurationMs": 204772,
|
||||
"totalTokens": 20717,
|
||||
"avgTokPerSec": 114.45999021594592,
|
||||
"promptChars": 12743,
|
||||
"promptTokensEst": 3186,
|
||||
"score": 20,
|
||||
"stars": "★☆☆☆☆",
|
||||
"error": null,
|
||||
"profile": "large",
|
||||
"promptName": "code-rs",
|
||||
"round": 2
|
||||
},
|
||||
{
|
||||
"model": "qwen3-coder:30b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 2,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 3,
|
||||
"testsTotal": 4,
|
||||
"testsPassed": 0,
|
||||
"testsFailed": 4,
|
||||
"totalDurationMs": 180501,
|
||||
"totalTokens": 18467,
|
||||
"avgTokPerSec": 115.23583963958032,
|
||||
"promptChars": 12392,
|
||||
"promptTokensEst": 3098,
|
||||
"score": 20,
|
||||
"stars": "★☆☆☆☆",
|
||||
"error": null,
|
||||
"profile": "large",
|
||||
"promptName": "code-rs",
|
||||
"round": 3
|
||||
},
|
||||
{
|
||||
"model": "qwen3-coder:30b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 2,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 3,
|
||||
"testsTotal": 25,
|
||||
"testsPassed": 0,
|
||||
"testsFailed": 25,
|
||||
"totalDurationMs": 282681,
|
||||
"totalTokens": 27665,
|
||||
"avgTokPerSec": 111.29688837623901,
|
||||
"promptChars": 12675,
|
||||
"promptTokensEst": 3169,
|
||||
"score": 20,
|
||||
"stars": "★☆☆☆☆",
|
||||
"error": null,
|
||||
"profile": "large",
|
||||
"promptName": "code-rs",
|
||||
"round": 4
|
||||
},
|
||||
{
|
||||
"model": "qwen3-coder:30b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 2,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 3,
|
||||
"testsTotal": 5,
|
||||
"testsPassed": 0,
|
||||
"testsFailed": 5,
|
||||
"totalDurationMs": 171686,
|
||||
"totalTokens": 17525,
|
||||
"avgTokPerSec": 114.88288274375243,
|
||||
"promptChars": 12618,
|
||||
"promptTokensEst": 3155,
|
||||
"score": 20,
|
||||
"stars": "★☆☆☆☆",
|
||||
"error": null,
|
||||
"profile": "large",
|
||||
"promptName": "code-rs",
|
||||
"round": 5
|
||||
}
|
||||
]
|
||||
183
kipina-codebench/results/2026-04-14T13-42.html
Normal file
183
kipina-codebench/results/2026-04-14T13-42.html
Normal file
@@ -0,0 +1,183 @@
|
||||
<!DOCTYPE html>
|
||||
<html lang="fi">
|
||||
<head>
|
||||
<meta charset="UTF-8">
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
||||
<title>Kipina Model Benchmark</title>
|
||||
<style>
|
||||
:root { --bg: #0d1117; --card: #161b22; --border: #30363d; --text: #e6edf3; --dim: #8b949e; --green: #3fb950; --yellow: #d29922; --red: #f85149; --blue: #58a6ff; }
|
||||
* { box-sizing: border-box; margin: 0; padding: 0; }
|
||||
body { font-family: -apple-system, 'Segoe UI', Helvetica, Arial, sans-serif; background: var(--bg); color: var(--text); padding: 2rem; max-width: 1400px; margin: 0 auto; }
|
||||
h1 { font-size: 1.5rem; margin-bottom: 0.5rem; }
|
||||
.meta { color: var(--dim); font-size: 0.85rem; margin-bottom: 2rem; }
|
||||
.cards { display: grid; grid-template-columns: repeat(auto-fit, minmax(180px, 1fr)); gap: 1rem; margin-bottom: 2rem; }
|
||||
.card { background: var(--card); border: 1px solid var(--border); border-radius: 8px; padding: 1rem; }
|
||||
.card .label { color: var(--dim); font-size: 0.75rem; text-transform: uppercase; letter-spacing: 0.05em; }
|
||||
.card .value { font-size: 1.8rem; font-weight: 600; margin-top: 0.25rem; }
|
||||
.card .sub { color: var(--dim); font-size: 0.8rem; margin-top: 0.25rem; }
|
||||
table { width: 100%; border-collapse: collapse; background: var(--card); border: 1px solid var(--border); border-radius: 8px; overflow: hidden; margin-bottom: 2rem; }
|
||||
th { background: #1c2128; text-align: left; padding: 0.6rem 0.8rem; font-size: 0.75rem; text-transform: uppercase; letter-spacing: 0.05em; color: var(--dim); cursor: pointer; user-select: none; white-space: nowrap; }
|
||||
th:hover { color: var(--text); }
|
||||
th.sorted-asc::after { content: ' ▲'; }
|
||||
th.sorted-desc::after { content: ' ▼'; }
|
||||
td { padding: 0.5rem 0.8rem; border-top: 1px solid var(--border); font-size: 0.85rem; white-space: nowrap; }
|
||||
tr:hover td { background: #1c2128; }
|
||||
.pass { color: var(--green); }
|
||||
.partial { color: var(--yellow); }
|
||||
.fail { color: var(--red); }
|
||||
.stars { letter-spacing: 1px; }
|
||||
.bar { display: inline-block; height: 8px; border-radius: 4px; vertical-align: middle; }
|
||||
.bar-bg { background: var(--border); }
|
||||
.bar-fill { background: var(--green); }
|
||||
.bar-partial { background: var(--yellow); }
|
||||
.model-name { font-weight: 600; }
|
||||
h2 { font-size: 1.1rem; margin-bottom: 1rem; color: var(--dim); }
|
||||
.summary-table th:first-child, .summary-table td:first-child { min-width: 200px; }
|
||||
</style>
|
||||
</head>
|
||||
<body>
|
||||
|
||||
<h1>Kipina Model Benchmark</h1>
|
||||
<div class="meta" id="meta"></div>
|
||||
|
||||
<div class="cards" id="cards"></div>
|
||||
|
||||
<h2>Mallikohtainen yhteenveto</h2>
|
||||
<table class="summary-table" id="summary-table"><thead></thead><tbody></tbody></table>
|
||||
|
||||
<h2>Kaikki tulokset</h2>
|
||||
<table id="results-table"><thead></thead><tbody></tbody></table>
|
||||
|
||||
<script>
|
||||
const RAW = [{"model":"qwen3-coder:30b","scenario":"blog","reqOk":true,"specOk":true,"specEntities":2,"validationIssues":0,"fixRounds":3,"testsTotal":18,"testsPassed":0,"testsFailed":18,"totalDurationMs":208078,"totalTokens":20783,"avgTokPerSec":114.94478559756693,"promptChars":13278,"promptTokensEst":3320,"score":20,"stars":"★☆☆☆☆","error":null,"profile":"large","promptName":"code-rs","round":1},{"model":"qwen3-coder:30b","scenario":"blog","reqOk":true,"specOk":true,"specEntities":2,"validationIssues":0,"fixRounds":0,"testsTotal":0,"testsPassed":0,"testsFailed":0,"totalDurationMs":0,"totalTokens":0,"avgTokPerSec":0,"promptChars":13362,"promptTokensEst":3341,"score":0,"stars":"","error":"Puuttuvat: src/lib.rs, src/main.rs, tests/api_test.rs","round":2},{"model":"qwen3-coder:30b","scenario":"blog","reqOk":true,"specOk":true,"specEntities":2,"validationIssues":0,"fixRounds":3,"testsTotal":9,"testsPassed":0,"testsFailed":9,"totalDurationMs":221174,"totalTokens":22354,"avgTokPerSec":114.09551344946065,"promptChars":13234,"promptTokensEst":3309,"score":20,"stars":"★☆☆☆☆","error":null,"profile":"large","promptName":"code-rs","round":3},{"model":"qwen3-coder:30b","scenario":"blog","reqOk":true,"specOk":true,"specEntities":2,"validationIssues":0,"fixRounds":0,"testsTotal":0,"testsPassed":0,"testsFailed":0,"totalDurationMs":0,"totalTokens":0,"avgTokPerSec":0,"promptChars":13317,"promptTokensEst":3329,"score":0,"stars":"","error":"Puuttuvat: src/lib.rs, src/main.rs, tests/api_test.rs","round":4},{"model":"qwen3-coder:30b","scenario":"blog","reqOk":true,"specOk":true,"specEntities":2,"validationIssues":0,"fixRounds":0,"testsTotal":0,"testsPassed":0,"testsFailed":0,"totalDurationMs":8795,"totalTokens":954,"avgTokPerSec":124.86009274372915,"promptChars":13335,"promptTokensEst":3334,"score":0,"stars":"☆☆☆☆☆","error":"fetch failed","profile":"large","promptName":"code-rs","round":5}];
|
||||
|
||||
const starsFor = s => s >= 90 ? '★★★★★' : s >= 70 ? '★★★★☆' : s >= 50 ? '★★★☆☆' : s >= 25 ? '★★☆☆☆' : s > 0 ? '★☆☆☆☆' : '☆☆☆☆☆';
|
||||
function calcScore(r) {
|
||||
if (r.error && r.testsTotal === 0) return 0;
|
||||
let s = 0;
|
||||
if (r.specOk) s += 10;
|
||||
if (!r.error || r.testsTotal > 0) s += 10;
|
||||
if (r.testsTotal > 0) s += Math.round((r.testsPassed / r.testsTotal) * 60);
|
||||
s += Math.max(0, 20 - (r.fixRounds || 0) * 10);
|
||||
return Math.min(100, s);
|
||||
}
|
||||
// Laske pisteet jos puuttuvat
|
||||
const DATA = RAW.map(r => {
|
||||
if (r.score == null) r.score = calcScore(r);
|
||||
if (!r.stars) r.stars = starsFor(r.score);
|
||||
if (!r.promptTokensEst) r.promptTokensEst = r.promptChars ? Math.round(r.promptChars / 4) : 0;
|
||||
return r;
|
||||
});
|
||||
const cls = r => (!r.error && r.testsPassed === r.testsTotal && r.testsTotal > 0) ? 'pass' : (r.testsTotal > 0 && r.testsPassed > 0) ? 'partial' : 'fail';
|
||||
const pctBar = (passed, total, w=80) => {
|
||||
if (total === 0) return '-';
|
||||
const pct = passed/total*100;
|
||||
const c = pct === 100 ? 'bar-fill' : 'bar-partial';
|
||||
return `<span class="bar bar-bg" style="width:${w}px"><span class="bar ${c}" style="width:${Math.round(pct/100*w)}px"></span></span> ${passed}/${total}`;
|
||||
};
|
||||
|
||||
// Meta
|
||||
const totalTime = DATA.reduce((s,r) => s + r.totalDurationMs, 0);
|
||||
document.getElementById('meta').textContent = `${new Date().toLocaleDateString('fi-FI')} — ${DATA.length} ajoa — ${(totalTime/1000/60).toFixed(1)} min`;
|
||||
|
||||
// Cards
|
||||
const models = [...new Set(DATA.map(r => r.model))];
|
||||
const scenarios = [...new Set(DATA.map(r => r.scenario))];
|
||||
const avgScore = DATA.length ? Math.round(DATA.reduce((s,r) => s + r.score, 0) / DATA.length) : 0;
|
||||
const totalPassed = DATA.reduce((s,r) => s + r.testsPassed, 0);
|
||||
const totalTests = DATA.reduce((s,r) => s + r.testsTotal, 0);
|
||||
const passRate = totalTests ? Math.round(totalPassed/totalTests*100) : 0;
|
||||
const bestModel = models.map(m => {
|
||||
const mrs = DATA.filter(r => r.model === m);
|
||||
return { model: m, avg: Math.round(mrs.reduce((s,r) => s + r.score, 0) / mrs.length) };
|
||||
}).sort((a,b) => b.avg - a.avg)[0];
|
||||
const fastestModel = models.map(m => {
|
||||
const mrs = DATA.filter(r => r.model === m);
|
||||
return { model: m, speed: Math.round(mrs.reduce((s,r) => s + r.avgTokPerSec, 0) / mrs.length) };
|
||||
}).sort((a,b) => b.speed - a.speed)[0];
|
||||
|
||||
document.getElementById('cards').innerHTML = `
|
||||
<div class="card"><div class="label">Keskiarvo</div><div class="value">${starsFor(avgScore)}</div><div class="sub">${avgScore} pistetta</div></div>
|
||||
<div class="card"><div class="label">Testien läpäisy</div><div class="value">${passRate}%</div><div class="sub">${totalPassed}/${totalTests} testiä</div></div>
|
||||
<div class="card"><div class="label">Paras malli</div><div class="value" style="font-size:1.2rem">${bestModel?.model || '-'}</div><div class="sub">${bestModel?.avg || 0}p</div></div>
|
||||
<div class="card"><div class="label">Nopein</div><div class="value" style="font-size:1.2rem">${fastestModel?.model || '-'}</div><div class="sub">${fastestModel?.speed || 0} tok/s</div></div>
|
||||
<div class="card"><div class="label">Malleja</div><div class="value">${models.length}</div><div class="sub">${scenarios.length} skenaariota</div></div>
|
||||
<div class="card"><div class="label">Kokonaisaika</div><div class="value">${(totalTime/1000/60).toFixed(1)}</div><div class="sub">minuuttia</div></div>
|
||||
`;
|
||||
|
||||
// Summary table
|
||||
const sumHead = document.querySelector('#summary-table thead');
|
||||
const sumBody = document.querySelector('#summary-table tbody');
|
||||
sumHead.innerHTML = '<tr><th>Malli</th>' + scenarios.map(s => `<th>${s}</th>`).join('') + '<th>Yht.</th><th>Out tok</th><th>Aika</th><th>tok/s</th><th>Pisteet</th></tr>';
|
||||
|
||||
const modelRows = models.map(m => {
|
||||
const mrs = DATA.filter(r => r.model === m);
|
||||
const tp = mrs.reduce((s,r) => s + r.testsPassed, 0);
|
||||
const tt = mrs.reduce((s,r) => s + r.testsTotal, 0);
|
||||
const tok = mrs.reduce((s,r) => s + r.totalTokens, 0);
|
||||
const time = mrs.reduce((s,r) => s + r.totalDurationMs, 0);
|
||||
const speed = Math.round(mrs.reduce((s,r) => s + r.avgTokPerSec, 0) / mrs.length);
|
||||
const avg = Math.round(mrs.reduce((s,r) => s + r.score, 0) / mrs.length);
|
||||
const scenCols = scenarios.map(s => {
|
||||
const r = mrs.find(r => r.scenario === s);
|
||||
if (!r) return '<td>-</td>';
|
||||
return `<td class="${cls(r)}">${pctBar(r.testsPassed, r.testsTotal, 60)} <span style="color:var(--dim)">${(r.totalDurationMs/1000).toFixed(0)}s</span></td>`;
|
||||
}).join('');
|
||||
return { avg, html: `<tr><td class="model-name">${m}</td>${scenCols}<td>${pctBar(tp, tt)}</td><td>${(tok/1000).toFixed(1)}K</td><td>${(time/1000).toFixed(0)}s</td><td>${speed}</td><td><span class="stars">${starsFor(avg)}</span> ${avg}p</td></tr>` };
|
||||
}).sort((a,b) => b.avg - a.avg);
|
||||
sumBody.innerHTML = modelRows.map(r => r.html).join('');
|
||||
|
||||
// Results table
|
||||
const resHead = document.querySelector('#results-table thead');
|
||||
const resBody = document.querySelector('#results-table tbody');
|
||||
const resCols = ['Malli','Skenaario','Speksi','Testit','Korjaus','Ctx','Out tok','Aika','tok/s','Pisteet'];
|
||||
resHead.innerHTML = '<tr>' + resCols.map((c,i) => `<th data-col="${i}">${c}</th>`).join('') + '</tr>';
|
||||
|
||||
let sortCol = 9, sortAsc = false;
|
||||
function renderResults() {
|
||||
const sorted = [...DATA].sort((a,b) => {
|
||||
const vals = [
|
||||
[a.model, b.model],
|
||||
[a.scenario, b.scenario],
|
||||
[a.specEntities, b.specEntities],
|
||||
[a.testsPassed/Math.max(a.testsTotal,1), b.testsPassed/Math.max(b.testsTotal,1)],
|
||||
[a.fixRounds, b.fixRounds],
|
||||
[a.promptTokensEst, b.promptTokensEst],
|
||||
[a.totalTokens, b.totalTokens],
|
||||
[a.totalDurationMs, b.totalDurationMs],
|
||||
[a.avgTokPerSec, b.avgTokPerSec],
|
||||
[a.score, b.score],
|
||||
][sortCol];
|
||||
const cmp = typeof vals[0] === 'string' ? vals[0].localeCompare(vals[1]) : vals[0] - vals[1];
|
||||
return sortAsc ? cmp : -cmp;
|
||||
});
|
||||
resBody.innerHTML = sorted.map(r => {
|
||||
const c = cls(r);
|
||||
return `<tr>
|
||||
<td class="model-name">${r.model}</td>
|
||||
<td>${r.scenario}</td>
|
||||
<td>${r.specOk ? `✓ ${r.specEntities}e` : '<span class="fail">✗</span>'}</td>
|
||||
<td class="${c}">${pctBar(r.testsPassed, r.testsTotal)}</td>
|
||||
<td>${r.fixRounds > 0 ? r.fixRounds + '×' : '-'}</td>
|
||||
<td>${r.promptTokensEst > 0 ? '~'+(r.promptTokensEst/1000).toFixed(1)+'K' : '-'}</td>
|
||||
<td>${r.totalTokens > 0 ? (r.totalTokens/1000).toFixed(1)+'K' : '-'}</td>
|
||||
<td>${(r.totalDurationMs/1000).toFixed(0)}s</td>
|
||||
<td>${r.avgTokPerSec.toFixed(0)}</td>
|
||||
<td><span class="stars">${r.stars}</span> ${r.score}p</td>
|
||||
</tr>`;
|
||||
}).join('');
|
||||
document.querySelectorAll('#results-table th').forEach((th,i) => {
|
||||
th.className = i === sortCol ? (sortAsc ? 'sorted-asc' : 'sorted-desc') : '';
|
||||
});
|
||||
}
|
||||
document.querySelector('#results-table thead').addEventListener('click', e => {
|
||||
const col = parseInt(e.target.dataset.col);
|
||||
if (isNaN(col)) return;
|
||||
if (sortCol === col) sortAsc = !sortAsc;
|
||||
else { sortCol = col; sortAsc = false; }
|
||||
renderResults();
|
||||
});
|
||||
renderResults();
|
||||
</script>
|
||||
</body>
|
||||
</html>
|
||||
113
kipina-codebench/results/2026-04-14T13-42.json
Normal file
113
kipina-codebench/results/2026-04-14T13-42.json
Normal file
@@ -0,0 +1,113 @@
|
||||
[
|
||||
{
|
||||
"model": "qwen3-coder:30b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 2,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 3,
|
||||
"testsTotal": 18,
|
||||
"testsPassed": 0,
|
||||
"testsFailed": 18,
|
||||
"totalDurationMs": 208078,
|
||||
"totalTokens": 20783,
|
||||
"avgTokPerSec": 114.94478559756693,
|
||||
"promptChars": 13278,
|
||||
"promptTokensEst": 3320,
|
||||
"score": 20,
|
||||
"stars": "★☆☆☆☆",
|
||||
"error": null,
|
||||
"profile": "large",
|
||||
"promptName": "code-rs",
|
||||
"round": 1
|
||||
},
|
||||
{
|
||||
"model": "qwen3-coder:30b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 2,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 0,
|
||||
"testsPassed": 0,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 0,
|
||||
"totalTokens": 0,
|
||||
"avgTokPerSec": 0,
|
||||
"promptChars": 13362,
|
||||
"promptTokensEst": 3341,
|
||||
"score": 0,
|
||||
"stars": "",
|
||||
"error": "Puuttuvat: src/lib.rs, src/main.rs, tests/api_test.rs",
|
||||
"round": 2
|
||||
},
|
||||
{
|
||||
"model": "qwen3-coder:30b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 2,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 3,
|
||||
"testsTotal": 9,
|
||||
"testsPassed": 0,
|
||||
"testsFailed": 9,
|
||||
"totalDurationMs": 221174,
|
||||
"totalTokens": 22354,
|
||||
"avgTokPerSec": 114.09551344946065,
|
||||
"promptChars": 13234,
|
||||
"promptTokensEst": 3309,
|
||||
"score": 20,
|
||||
"stars": "★☆☆☆☆",
|
||||
"error": null,
|
||||
"profile": "large",
|
||||
"promptName": "code-rs",
|
||||
"round": 3
|
||||
},
|
||||
{
|
||||
"model": "qwen3-coder:30b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 2,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 0,
|
||||
"testsPassed": 0,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 0,
|
||||
"totalTokens": 0,
|
||||
"avgTokPerSec": 0,
|
||||
"promptChars": 13317,
|
||||
"promptTokensEst": 3329,
|
||||
"score": 0,
|
||||
"stars": "",
|
||||
"error": "Puuttuvat: src/lib.rs, src/main.rs, tests/api_test.rs",
|
||||
"round": 4
|
||||
},
|
||||
{
|
||||
"model": "qwen3-coder:30b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 2,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 0,
|
||||
"testsPassed": 0,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 8795,
|
||||
"totalTokens": 954,
|
||||
"avgTokPerSec": 124.86009274372915,
|
||||
"promptChars": 13335,
|
||||
"promptTokensEst": 3334,
|
||||
"score": 0,
|
||||
"stars": "☆☆☆☆☆",
|
||||
"error": "fetch failed",
|
||||
"profile": "large",
|
||||
"promptName": "code-rs",
|
||||
"round": 5
|
||||
}
|
||||
]
|
||||
183
kipina-codebench/results/2026-04-14T14-12.html
Normal file
183
kipina-codebench/results/2026-04-14T14-12.html
Normal file
@@ -0,0 +1,183 @@
|
||||
<!DOCTYPE html>
|
||||
<html lang="fi">
|
||||
<head>
|
||||
<meta charset="UTF-8">
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
||||
<title>Kipina Model Benchmark</title>
|
||||
<style>
|
||||
:root { --bg: #0d1117; --card: #161b22; --border: #30363d; --text: #e6edf3; --dim: #8b949e; --green: #3fb950; --yellow: #d29922; --red: #f85149; --blue: #58a6ff; }
|
||||
* { box-sizing: border-box; margin: 0; padding: 0; }
|
||||
body { font-family: -apple-system, 'Segoe UI', Helvetica, Arial, sans-serif; background: var(--bg); color: var(--text); padding: 2rem; max-width: 1400px; margin: 0 auto; }
|
||||
h1 { font-size: 1.5rem; margin-bottom: 0.5rem; }
|
||||
.meta { color: var(--dim); font-size: 0.85rem; margin-bottom: 2rem; }
|
||||
.cards { display: grid; grid-template-columns: repeat(auto-fit, minmax(180px, 1fr)); gap: 1rem; margin-bottom: 2rem; }
|
||||
.card { background: var(--card); border: 1px solid var(--border); border-radius: 8px; padding: 1rem; }
|
||||
.card .label { color: var(--dim); font-size: 0.75rem; text-transform: uppercase; letter-spacing: 0.05em; }
|
||||
.card .value { font-size: 1.8rem; font-weight: 600; margin-top: 0.25rem; }
|
||||
.card .sub { color: var(--dim); font-size: 0.8rem; margin-top: 0.25rem; }
|
||||
table { width: 100%; border-collapse: collapse; background: var(--card); border: 1px solid var(--border); border-radius: 8px; overflow: hidden; margin-bottom: 2rem; }
|
||||
th { background: #1c2128; text-align: left; padding: 0.6rem 0.8rem; font-size: 0.75rem; text-transform: uppercase; letter-spacing: 0.05em; color: var(--dim); cursor: pointer; user-select: none; white-space: nowrap; }
|
||||
th:hover { color: var(--text); }
|
||||
th.sorted-asc::after { content: ' ▲'; }
|
||||
th.sorted-desc::after { content: ' ▼'; }
|
||||
td { padding: 0.5rem 0.8rem; border-top: 1px solid var(--border); font-size: 0.85rem; white-space: nowrap; }
|
||||
tr:hover td { background: #1c2128; }
|
||||
.pass { color: var(--green); }
|
||||
.partial { color: var(--yellow); }
|
||||
.fail { color: var(--red); }
|
||||
.stars { letter-spacing: 1px; }
|
||||
.bar { display: inline-block; height: 8px; border-radius: 4px; vertical-align: middle; }
|
||||
.bar-bg { background: var(--border); }
|
||||
.bar-fill { background: var(--green); }
|
||||
.bar-partial { background: var(--yellow); }
|
||||
.model-name { font-weight: 600; }
|
||||
h2 { font-size: 1.1rem; margin-bottom: 1rem; color: var(--dim); }
|
||||
.summary-table th:first-child, .summary-table td:first-child { min-width: 200px; }
|
||||
</style>
|
||||
</head>
|
||||
<body>
|
||||
|
||||
<h1>Kipina Model Benchmark</h1>
|
||||
<div class="meta" id="meta"></div>
|
||||
|
||||
<div class="cards" id="cards"></div>
|
||||
|
||||
<h2>Mallikohtainen yhteenveto</h2>
|
||||
<table class="summary-table" id="summary-table"><thead></thead><tbody></tbody></table>
|
||||
|
||||
<h2>Kaikki tulokset</h2>
|
||||
<table id="results-table"><thead></thead><tbody></tbody></table>
|
||||
|
||||
<script>
|
||||
const RAW = [{"model":"qwen3-coder:30b","scenario":"blog","reqOk":true,"specOk":true,"specEntities":2,"validationIssues":0,"fixRounds":1,"testsTotal":0,"testsPassed":0,"testsFailed":0,"totalDurationMs":133173,"totalTokens":13174,"avgTokPerSec":117.52479437665707,"promptChars":14102,"promptTokensEst":3526,"score":30,"stars":"★★☆☆☆","error":null,"profile":"large","promptName":"code-rs","round":1},{"model":"qwen3-coder:30b","scenario":"blog","reqOk":true,"specOk":true,"specEntities":2,"validationIssues":0,"fixRounds":3,"testsTotal":5,"testsPassed":0,"testsFailed":5,"totalDurationMs":267561,"totalTokens":27021,"avgTokPerSec":113.5812238661422,"promptChars":14052,"promptTokensEst":3513,"score":20,"stars":"★☆☆☆☆","error":null,"profile":"large","promptName":"code-rs","round":2},{"model":"qwen3-coder:30b","scenario":"blog","reqOk":true,"specOk":true,"specEntities":2,"validationIssues":0,"fixRounds":0,"testsTotal":0,"testsPassed":0,"testsFailed":0,"totalDurationMs":0,"totalTokens":0,"avgTokPerSec":0,"promptChars":13914,"promptTokensEst":3479,"score":0,"stars":"","error":"Puuttuvat: src/handlers.rs, src/lib.rs, src/main.rs, tests/api_test.rs","round":3},{"model":"qwen3-coder:30b","scenario":"blog","reqOk":true,"specOk":true,"specEntities":2,"validationIssues":0,"fixRounds":2,"testsTotal":0,"testsPassed":0,"testsFailed":0,"totalDurationMs":162271,"totalTokens":16343,"avgTokPerSec":115.53039090208604,"promptChars":14062,"promptTokensEst":3516,"score":20,"stars":"★☆☆☆☆","error":null,"profile":"large","promptName":"code-rs","round":4},{"model":"qwen3-coder:30b","scenario":"blog","reqOk":true,"specOk":true,"specEntities":2,"validationIssues":0,"fixRounds":3,"testsTotal":1,"testsPassed":0,"testsFailed":1,"totalDurationMs":211367,"totalTokens":21183,"avgTokPerSec":113.22772767359652,"promptChars":14038,"promptTokensEst":3510,"score":20,"stars":"★☆☆☆☆","error":null,"profile":"large","promptName":"code-rs","round":5}];
|
||||
|
||||
const starsFor = s => s >= 90 ? '★★★★★' : s >= 70 ? '★★★★☆' : s >= 50 ? '★★★☆☆' : s >= 25 ? '★★☆☆☆' : s > 0 ? '★☆☆☆☆' : '☆☆☆☆☆';
|
||||
function calcScore(r) {
|
||||
if (r.error && r.testsTotal === 0) return 0;
|
||||
let s = 0;
|
||||
if (r.specOk) s += 10;
|
||||
if (!r.error || r.testsTotal > 0) s += 10;
|
||||
if (r.testsTotal > 0) s += Math.round((r.testsPassed / r.testsTotal) * 60);
|
||||
s += Math.max(0, 20 - (r.fixRounds || 0) * 10);
|
||||
return Math.min(100, s);
|
||||
}
|
||||
// Laske pisteet jos puuttuvat
|
||||
const DATA = RAW.map(r => {
|
||||
if (r.score == null) r.score = calcScore(r);
|
||||
if (!r.stars) r.stars = starsFor(r.score);
|
||||
if (!r.promptTokensEst) r.promptTokensEst = r.promptChars ? Math.round(r.promptChars / 4) : 0;
|
||||
return r;
|
||||
});
|
||||
const cls = r => (!r.error && r.testsPassed === r.testsTotal && r.testsTotal > 0) ? 'pass' : (r.testsTotal > 0 && r.testsPassed > 0) ? 'partial' : 'fail';
|
||||
const pctBar = (passed, total, w=80) => {
|
||||
if (total === 0) return '-';
|
||||
const pct = passed/total*100;
|
||||
const c = pct === 100 ? 'bar-fill' : 'bar-partial';
|
||||
return `<span class="bar bar-bg" style="width:${w}px"><span class="bar ${c}" style="width:${Math.round(pct/100*w)}px"></span></span> ${passed}/${total}`;
|
||||
};
|
||||
|
||||
// Meta
|
||||
const totalTime = DATA.reduce((s,r) => s + r.totalDurationMs, 0);
|
||||
document.getElementById('meta').textContent = `${new Date().toLocaleDateString('fi-FI')} — ${DATA.length} ajoa — ${(totalTime/1000/60).toFixed(1)} min`;
|
||||
|
||||
// Cards
|
||||
const models = [...new Set(DATA.map(r => r.model))];
|
||||
const scenarios = [...new Set(DATA.map(r => r.scenario))];
|
||||
const avgScore = DATA.length ? Math.round(DATA.reduce((s,r) => s + r.score, 0) / DATA.length) : 0;
|
||||
const totalPassed = DATA.reduce((s,r) => s + r.testsPassed, 0);
|
||||
const totalTests = DATA.reduce((s,r) => s + r.testsTotal, 0);
|
||||
const passRate = totalTests ? Math.round(totalPassed/totalTests*100) : 0;
|
||||
const bestModel = models.map(m => {
|
||||
const mrs = DATA.filter(r => r.model === m);
|
||||
return { model: m, avg: Math.round(mrs.reduce((s,r) => s + r.score, 0) / mrs.length) };
|
||||
}).sort((a,b) => b.avg - a.avg)[0];
|
||||
const fastestModel = models.map(m => {
|
||||
const mrs = DATA.filter(r => r.model === m);
|
||||
return { model: m, speed: Math.round(mrs.reduce((s,r) => s + r.avgTokPerSec, 0) / mrs.length) };
|
||||
}).sort((a,b) => b.speed - a.speed)[0];
|
||||
|
||||
document.getElementById('cards').innerHTML = `
|
||||
<div class="card"><div class="label">Keskiarvo</div><div class="value">${starsFor(avgScore)}</div><div class="sub">${avgScore} pistetta</div></div>
|
||||
<div class="card"><div class="label">Testien läpäisy</div><div class="value">${passRate}%</div><div class="sub">${totalPassed}/${totalTests} testiä</div></div>
|
||||
<div class="card"><div class="label">Paras malli</div><div class="value" style="font-size:1.2rem">${bestModel?.model || '-'}</div><div class="sub">${bestModel?.avg || 0}p</div></div>
|
||||
<div class="card"><div class="label">Nopein</div><div class="value" style="font-size:1.2rem">${fastestModel?.model || '-'}</div><div class="sub">${fastestModel?.speed || 0} tok/s</div></div>
|
||||
<div class="card"><div class="label">Malleja</div><div class="value">${models.length}</div><div class="sub">${scenarios.length} skenaariota</div></div>
|
||||
<div class="card"><div class="label">Kokonaisaika</div><div class="value">${(totalTime/1000/60).toFixed(1)}</div><div class="sub">minuuttia</div></div>
|
||||
`;
|
||||
|
||||
// Summary table
|
||||
const sumHead = document.querySelector('#summary-table thead');
|
||||
const sumBody = document.querySelector('#summary-table tbody');
|
||||
sumHead.innerHTML = '<tr><th>Malli</th>' + scenarios.map(s => `<th>${s}</th>`).join('') + '<th>Yht.</th><th>Out tok</th><th>Aika</th><th>tok/s</th><th>Pisteet</th></tr>';
|
||||
|
||||
const modelRows = models.map(m => {
|
||||
const mrs = DATA.filter(r => r.model === m);
|
||||
const tp = mrs.reduce((s,r) => s + r.testsPassed, 0);
|
||||
const tt = mrs.reduce((s,r) => s + r.testsTotal, 0);
|
||||
const tok = mrs.reduce((s,r) => s + r.totalTokens, 0);
|
||||
const time = mrs.reduce((s,r) => s + r.totalDurationMs, 0);
|
||||
const speed = Math.round(mrs.reduce((s,r) => s + r.avgTokPerSec, 0) / mrs.length);
|
||||
const avg = Math.round(mrs.reduce((s,r) => s + r.score, 0) / mrs.length);
|
||||
const scenCols = scenarios.map(s => {
|
||||
const r = mrs.find(r => r.scenario === s);
|
||||
if (!r) return '<td>-</td>';
|
||||
return `<td class="${cls(r)}">${pctBar(r.testsPassed, r.testsTotal, 60)} <span style="color:var(--dim)">${(r.totalDurationMs/1000).toFixed(0)}s</span></td>`;
|
||||
}).join('');
|
||||
return { avg, html: `<tr><td class="model-name">${m}</td>${scenCols}<td>${pctBar(tp, tt)}</td><td>${(tok/1000).toFixed(1)}K</td><td>${(time/1000).toFixed(0)}s</td><td>${speed}</td><td><span class="stars">${starsFor(avg)}</span> ${avg}p</td></tr>` };
|
||||
}).sort((a,b) => b.avg - a.avg);
|
||||
sumBody.innerHTML = modelRows.map(r => r.html).join('');
|
||||
|
||||
// Results table
|
||||
const resHead = document.querySelector('#results-table thead');
|
||||
const resBody = document.querySelector('#results-table tbody');
|
||||
const resCols = ['Malli','Skenaario','Speksi','Testit','Korjaus','Ctx','Out tok','Aika','tok/s','Pisteet'];
|
||||
resHead.innerHTML = '<tr>' + resCols.map((c,i) => `<th data-col="${i}">${c}</th>`).join('') + '</tr>';
|
||||
|
||||
let sortCol = 9, sortAsc = false;
|
||||
function renderResults() {
|
||||
const sorted = [...DATA].sort((a,b) => {
|
||||
const vals = [
|
||||
[a.model, b.model],
|
||||
[a.scenario, b.scenario],
|
||||
[a.specEntities, b.specEntities],
|
||||
[a.testsPassed/Math.max(a.testsTotal,1), b.testsPassed/Math.max(b.testsTotal,1)],
|
||||
[a.fixRounds, b.fixRounds],
|
||||
[a.promptTokensEst, b.promptTokensEst],
|
||||
[a.totalTokens, b.totalTokens],
|
||||
[a.totalDurationMs, b.totalDurationMs],
|
||||
[a.avgTokPerSec, b.avgTokPerSec],
|
||||
[a.score, b.score],
|
||||
][sortCol];
|
||||
const cmp = typeof vals[0] === 'string' ? vals[0].localeCompare(vals[1]) : vals[0] - vals[1];
|
||||
return sortAsc ? cmp : -cmp;
|
||||
});
|
||||
resBody.innerHTML = sorted.map(r => {
|
||||
const c = cls(r);
|
||||
return `<tr>
|
||||
<td class="model-name">${r.model}</td>
|
||||
<td>${r.scenario}</td>
|
||||
<td>${r.specOk ? `✓ ${r.specEntities}e` : '<span class="fail">✗</span>'}</td>
|
||||
<td class="${c}">${pctBar(r.testsPassed, r.testsTotal)}</td>
|
||||
<td>${r.fixRounds > 0 ? r.fixRounds + '×' : '-'}</td>
|
||||
<td>${r.promptTokensEst > 0 ? '~'+(r.promptTokensEst/1000).toFixed(1)+'K' : '-'}</td>
|
||||
<td>${r.totalTokens > 0 ? (r.totalTokens/1000).toFixed(1)+'K' : '-'}</td>
|
||||
<td>${(r.totalDurationMs/1000).toFixed(0)}s</td>
|
||||
<td>${r.avgTokPerSec.toFixed(0)}</td>
|
||||
<td><span class="stars">${r.stars}</span> ${r.score}p</td>
|
||||
</tr>`;
|
||||
}).join('');
|
||||
document.querySelectorAll('#results-table th').forEach((th,i) => {
|
||||
th.className = i === sortCol ? (sortAsc ? 'sorted-asc' : 'sorted-desc') : '';
|
||||
});
|
||||
}
|
||||
document.querySelector('#results-table thead').addEventListener('click', e => {
|
||||
const col = parseInt(e.target.dataset.col);
|
||||
if (isNaN(col)) return;
|
||||
if (sortCol === col) sortAsc = !sortAsc;
|
||||
else { sortCol = col; sortAsc = false; }
|
||||
renderResults();
|
||||
});
|
||||
renderResults();
|
||||
</script>
|
||||
</body>
|
||||
</html>
|
||||
115
kipina-codebench/results/2026-04-14T14-12.json
Normal file
115
kipina-codebench/results/2026-04-14T14-12.json
Normal file
@@ -0,0 +1,115 @@
|
||||
[
|
||||
{
|
||||
"model": "qwen3-coder:30b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 2,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 1,
|
||||
"testsTotal": 0,
|
||||
"testsPassed": 0,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 133173,
|
||||
"totalTokens": 13174,
|
||||
"avgTokPerSec": 117.52479437665707,
|
||||
"promptChars": 14102,
|
||||
"promptTokensEst": 3526,
|
||||
"score": 30,
|
||||
"stars": "★★☆☆☆",
|
||||
"error": null,
|
||||
"profile": "large",
|
||||
"promptName": "code-rs",
|
||||
"round": 1
|
||||
},
|
||||
{
|
||||
"model": "qwen3-coder:30b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 2,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 3,
|
||||
"testsTotal": 5,
|
||||
"testsPassed": 0,
|
||||
"testsFailed": 5,
|
||||
"totalDurationMs": 267561,
|
||||
"totalTokens": 27021,
|
||||
"avgTokPerSec": 113.5812238661422,
|
||||
"promptChars": 14052,
|
||||
"promptTokensEst": 3513,
|
||||
"score": 20,
|
||||
"stars": "★☆☆☆☆",
|
||||
"error": null,
|
||||
"profile": "large",
|
||||
"promptName": "code-rs",
|
||||
"round": 2
|
||||
},
|
||||
{
|
||||
"model": "qwen3-coder:30b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 2,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 0,
|
||||
"testsPassed": 0,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 0,
|
||||
"totalTokens": 0,
|
||||
"avgTokPerSec": 0,
|
||||
"promptChars": 13914,
|
||||
"promptTokensEst": 3479,
|
||||
"score": 0,
|
||||
"stars": "",
|
||||
"error": "Puuttuvat: src/handlers.rs, src/lib.rs, src/main.rs, tests/api_test.rs",
|
||||
"round": 3
|
||||
},
|
||||
{
|
||||
"model": "qwen3-coder:30b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 2,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 2,
|
||||
"testsTotal": 0,
|
||||
"testsPassed": 0,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 162271,
|
||||
"totalTokens": 16343,
|
||||
"avgTokPerSec": 115.53039090208604,
|
||||
"promptChars": 14062,
|
||||
"promptTokensEst": 3516,
|
||||
"score": 20,
|
||||
"stars": "★☆☆☆☆",
|
||||
"error": null,
|
||||
"profile": "large",
|
||||
"promptName": "code-rs",
|
||||
"round": 4
|
||||
},
|
||||
{
|
||||
"model": "qwen3-coder:30b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 2,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 3,
|
||||
"testsTotal": 1,
|
||||
"testsPassed": 0,
|
||||
"testsFailed": 1,
|
||||
"totalDurationMs": 211367,
|
||||
"totalTokens": 21183,
|
||||
"avgTokPerSec": 113.22772767359652,
|
||||
"promptChars": 14038,
|
||||
"promptTokensEst": 3510,
|
||||
"score": 20,
|
||||
"stars": "★☆☆☆☆",
|
||||
"error": null,
|
||||
"profile": "large",
|
||||
"promptName": "code-rs",
|
||||
"round": 5
|
||||
}
|
||||
]
|
||||
183
kipina-codebench/results/2026-04-14T14-38.html
Normal file
183
kipina-codebench/results/2026-04-14T14-38.html
Normal file
@@ -0,0 +1,183 @@
|
||||
<!DOCTYPE html>
|
||||
<html lang="fi">
|
||||
<head>
|
||||
<meta charset="UTF-8">
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
||||
<title>Kipina Model Benchmark</title>
|
||||
<style>
|
||||
:root { --bg: #0d1117; --card: #161b22; --border: #30363d; --text: #e6edf3; --dim: #8b949e; --green: #3fb950; --yellow: #d29922; --red: #f85149; --blue: #58a6ff; }
|
||||
* { box-sizing: border-box; margin: 0; padding: 0; }
|
||||
body { font-family: -apple-system, 'Segoe UI', Helvetica, Arial, sans-serif; background: var(--bg); color: var(--text); padding: 2rem; max-width: 1400px; margin: 0 auto; }
|
||||
h1 { font-size: 1.5rem; margin-bottom: 0.5rem; }
|
||||
.meta { color: var(--dim); font-size: 0.85rem; margin-bottom: 2rem; }
|
||||
.cards { display: grid; grid-template-columns: repeat(auto-fit, minmax(180px, 1fr)); gap: 1rem; margin-bottom: 2rem; }
|
||||
.card { background: var(--card); border: 1px solid var(--border); border-radius: 8px; padding: 1rem; }
|
||||
.card .label { color: var(--dim); font-size: 0.75rem; text-transform: uppercase; letter-spacing: 0.05em; }
|
||||
.card .value { font-size: 1.8rem; font-weight: 600; margin-top: 0.25rem; }
|
||||
.card .sub { color: var(--dim); font-size: 0.8rem; margin-top: 0.25rem; }
|
||||
table { width: 100%; border-collapse: collapse; background: var(--card); border: 1px solid var(--border); border-radius: 8px; overflow: hidden; margin-bottom: 2rem; }
|
||||
th { background: #1c2128; text-align: left; padding: 0.6rem 0.8rem; font-size: 0.75rem; text-transform: uppercase; letter-spacing: 0.05em; color: var(--dim); cursor: pointer; user-select: none; white-space: nowrap; }
|
||||
th:hover { color: var(--text); }
|
||||
th.sorted-asc::after { content: ' ▲'; }
|
||||
th.sorted-desc::after { content: ' ▼'; }
|
||||
td { padding: 0.5rem 0.8rem; border-top: 1px solid var(--border); font-size: 0.85rem; white-space: nowrap; }
|
||||
tr:hover td { background: #1c2128; }
|
||||
.pass { color: var(--green); }
|
||||
.partial { color: var(--yellow); }
|
||||
.fail { color: var(--red); }
|
||||
.stars { letter-spacing: 1px; }
|
||||
.bar { display: inline-block; height: 8px; border-radius: 4px; vertical-align: middle; }
|
||||
.bar-bg { background: var(--border); }
|
||||
.bar-fill { background: var(--green); }
|
||||
.bar-partial { background: var(--yellow); }
|
||||
.model-name { font-weight: 600; }
|
||||
h2 { font-size: 1.1rem; margin-bottom: 1rem; color: var(--dim); }
|
||||
.summary-table th:first-child, .summary-table td:first-child { min-width: 200px; }
|
||||
</style>
|
||||
</head>
|
||||
<body>
|
||||
|
||||
<h1>Kipina Model Benchmark</h1>
|
||||
<div class="meta" id="meta"></div>
|
||||
|
||||
<div class="cards" id="cards"></div>
|
||||
|
||||
<h2>Mallikohtainen yhteenveto</h2>
|
||||
<table class="summary-table" id="summary-table"><thead></thead><tbody></tbody></table>
|
||||
|
||||
<h2>Kaikki tulokset</h2>
|
||||
<table id="results-table"><thead></thead><tbody></tbody></table>
|
||||
|
||||
<script>
|
||||
const RAW = [{"model":"qwen3-coder:30b","scenario":"blog","reqOk":true,"specOk":true,"specEntities":2,"validationIssues":0,"fixRounds":0,"testsTotal":0,"testsPassed":0,"testsFailed":0,"totalDurationMs":38807,"totalTokens":5667,"avgTokPerSec":183.83891911423427,"promptChars":21818,"promptTokensEst":5455,"score":40,"stars":"★★☆☆☆","error":null,"profile":"large","promptName":"code-rs","round":1},{"model":"qwen3-coder:30b","scenario":"blog","reqOk":true,"specOk":true,"specEntities":2,"validationIssues":0,"fixRounds":3,"testsTotal":0,"testsPassed":0,"testsFailed":0,"totalDurationMs":178290,"totalTokens":26265,"avgTokPerSec":168.77786498646262,"promptChars":21840,"promptTokensEst":5460,"score":0,"stars":"☆☆☆☆☆","error":"Testit kaatuivat","profile":"large","promptName":"code-rs","round":2},{"model":"qwen3-coder:30b","scenario":"blog","reqOk":true,"specOk":true,"specEntities":2,"validationIssues":0,"fixRounds":3,"testsTotal":0,"testsPassed":0,"testsFailed":0,"totalDurationMs":151603,"totalTokens":22725,"avgTokPerSec":170.74115131582644,"promptChars":21750,"promptTokensEst":5438,"score":0,"stars":"☆☆☆☆☆","error":"Testit kaatuivat","profile":"large","promptName":"code-rs","round":3},{"model":"qwen3-coder:30b","scenario":"blog","reqOk":true,"specOk":true,"specEntities":2,"validationIssues":0,"fixRounds":0,"testsTotal":0,"testsPassed":0,"testsFailed":0,"totalDurationMs":41059,"totalTokens":6288,"avgTokPerSec":183.76827829344424,"promptChars":21848,"promptTokensEst":5462,"score":40,"stars":"★★☆☆☆","error":null,"profile":"large","promptName":"code-rs","round":4},{"model":"qwen3-coder:30b","scenario":"blog","reqOk":true,"specOk":true,"specEntities":2,"validationIssues":0,"fixRounds":3,"testsTotal":0,"testsPassed":0,"testsFailed":0,"totalDurationMs":187666,"totalTokens":27278,"avgTokPerSec":166.24197655672018,"promptChars":21694,"promptTokensEst":5424,"score":0,"stars":"☆☆☆☆☆","error":"Testit kaatuivat","profile":"large","promptName":"code-rs","round":5}];
|
||||
|
||||
const starsFor = s => s >= 90 ? '★★★★★' : s >= 70 ? '★★★★☆' : s >= 50 ? '★★★☆☆' : s >= 25 ? '★★☆☆☆' : s > 0 ? '★☆☆☆☆' : '☆☆☆☆☆';
|
||||
function calcScore(r) {
|
||||
if (r.error && r.testsTotal === 0) return 0;
|
||||
let s = 0;
|
||||
if (r.specOk) s += 10;
|
||||
if (!r.error || r.testsTotal > 0) s += 10;
|
||||
if (r.testsTotal > 0) s += Math.round((r.testsPassed / r.testsTotal) * 60);
|
||||
s += Math.max(0, 20 - (r.fixRounds || 0) * 10);
|
||||
return Math.min(100, s);
|
||||
}
|
||||
// Laske pisteet jos puuttuvat
|
||||
const DATA = RAW.map(r => {
|
||||
if (r.score == null) r.score = calcScore(r);
|
||||
if (!r.stars) r.stars = starsFor(r.score);
|
||||
if (!r.promptTokensEst) r.promptTokensEst = r.promptChars ? Math.round(r.promptChars / 4) : 0;
|
||||
return r;
|
||||
});
|
||||
const cls = r => (!r.error && r.testsPassed === r.testsTotal && r.testsTotal > 0) ? 'pass' : (r.testsTotal > 0 && r.testsPassed > 0) ? 'partial' : 'fail';
|
||||
const pctBar = (passed, total, w=80) => {
|
||||
if (total === 0) return '-';
|
||||
const pct = passed/total*100;
|
||||
const c = pct === 100 ? 'bar-fill' : 'bar-partial';
|
||||
return `<span class="bar bar-bg" style="width:${w}px"><span class="bar ${c}" style="width:${Math.round(pct/100*w)}px"></span></span> ${passed}/${total}`;
|
||||
};
|
||||
|
||||
// Meta
|
||||
const totalTime = DATA.reduce((s,r) => s + r.totalDurationMs, 0);
|
||||
document.getElementById('meta').textContent = `${new Date().toLocaleDateString('fi-FI')} — ${DATA.length} ajoa — ${(totalTime/1000/60).toFixed(1)} min`;
|
||||
|
||||
// Cards
|
||||
const models = [...new Set(DATA.map(r => r.model))];
|
||||
const scenarios = [...new Set(DATA.map(r => r.scenario))];
|
||||
const avgScore = DATA.length ? Math.round(DATA.reduce((s,r) => s + r.score, 0) / DATA.length) : 0;
|
||||
const totalPassed = DATA.reduce((s,r) => s + r.testsPassed, 0);
|
||||
const totalTests = DATA.reduce((s,r) => s + r.testsTotal, 0);
|
||||
const passRate = totalTests ? Math.round(totalPassed/totalTests*100) : 0;
|
||||
const bestModel = models.map(m => {
|
||||
const mrs = DATA.filter(r => r.model === m);
|
||||
return { model: m, avg: Math.round(mrs.reduce((s,r) => s + r.score, 0) / mrs.length) };
|
||||
}).sort((a,b) => b.avg - a.avg)[0];
|
||||
const fastestModel = models.map(m => {
|
||||
const mrs = DATA.filter(r => r.model === m);
|
||||
return { model: m, speed: Math.round(mrs.reduce((s,r) => s + r.avgTokPerSec, 0) / mrs.length) };
|
||||
}).sort((a,b) => b.speed - a.speed)[0];
|
||||
|
||||
document.getElementById('cards').innerHTML = `
|
||||
<div class="card"><div class="label">Keskiarvo</div><div class="value">${starsFor(avgScore)}</div><div class="sub">${avgScore} pistetta</div></div>
|
||||
<div class="card"><div class="label">Testien läpäisy</div><div class="value">${passRate}%</div><div class="sub">${totalPassed}/${totalTests} testiä</div></div>
|
||||
<div class="card"><div class="label">Paras malli</div><div class="value" style="font-size:1.2rem">${bestModel?.model || '-'}</div><div class="sub">${bestModel?.avg || 0}p</div></div>
|
||||
<div class="card"><div class="label">Nopein</div><div class="value" style="font-size:1.2rem">${fastestModel?.model || '-'}</div><div class="sub">${fastestModel?.speed || 0} tok/s</div></div>
|
||||
<div class="card"><div class="label">Malleja</div><div class="value">${models.length}</div><div class="sub">${scenarios.length} skenaariota</div></div>
|
||||
<div class="card"><div class="label">Kokonaisaika</div><div class="value">${(totalTime/1000/60).toFixed(1)}</div><div class="sub">minuuttia</div></div>
|
||||
`;
|
||||
|
||||
// Summary table
|
||||
const sumHead = document.querySelector('#summary-table thead');
|
||||
const sumBody = document.querySelector('#summary-table tbody');
|
||||
sumHead.innerHTML = '<tr><th>Malli</th>' + scenarios.map(s => `<th>${s}</th>`).join('') + '<th>Yht.</th><th>Out tok</th><th>Aika</th><th>tok/s</th><th>Pisteet</th></tr>';
|
||||
|
||||
const modelRows = models.map(m => {
|
||||
const mrs = DATA.filter(r => r.model === m);
|
||||
const tp = mrs.reduce((s,r) => s + r.testsPassed, 0);
|
||||
const tt = mrs.reduce((s,r) => s + r.testsTotal, 0);
|
||||
const tok = mrs.reduce((s,r) => s + r.totalTokens, 0);
|
||||
const time = mrs.reduce((s,r) => s + r.totalDurationMs, 0);
|
||||
const speed = Math.round(mrs.reduce((s,r) => s + r.avgTokPerSec, 0) / mrs.length);
|
||||
const avg = Math.round(mrs.reduce((s,r) => s + r.score, 0) / mrs.length);
|
||||
const scenCols = scenarios.map(s => {
|
||||
const r = mrs.find(r => r.scenario === s);
|
||||
if (!r) return '<td>-</td>';
|
||||
return `<td class="${cls(r)}">${pctBar(r.testsPassed, r.testsTotal, 60)} <span style="color:var(--dim)">${(r.totalDurationMs/1000).toFixed(0)}s</span></td>`;
|
||||
}).join('');
|
||||
return { avg, html: `<tr><td class="model-name">${m}</td>${scenCols}<td>${pctBar(tp, tt)}</td><td>${(tok/1000).toFixed(1)}K</td><td>${(time/1000).toFixed(0)}s</td><td>${speed}</td><td><span class="stars">${starsFor(avg)}</span> ${avg}p</td></tr>` };
|
||||
}).sort((a,b) => b.avg - a.avg);
|
||||
sumBody.innerHTML = modelRows.map(r => r.html).join('');
|
||||
|
||||
// Results table
|
||||
const resHead = document.querySelector('#results-table thead');
|
||||
const resBody = document.querySelector('#results-table tbody');
|
||||
const resCols = ['Malli','Skenaario','Speksi','Testit','Korjaus','Ctx','Out tok','Aika','tok/s','Pisteet'];
|
||||
resHead.innerHTML = '<tr>' + resCols.map((c,i) => `<th data-col="${i}">${c}</th>`).join('') + '</tr>';
|
||||
|
||||
let sortCol = 9, sortAsc = false;
|
||||
function renderResults() {
|
||||
const sorted = [...DATA].sort((a,b) => {
|
||||
const vals = [
|
||||
[a.model, b.model],
|
||||
[a.scenario, b.scenario],
|
||||
[a.specEntities, b.specEntities],
|
||||
[a.testsPassed/Math.max(a.testsTotal,1), b.testsPassed/Math.max(b.testsTotal,1)],
|
||||
[a.fixRounds, b.fixRounds],
|
||||
[a.promptTokensEst, b.promptTokensEst],
|
||||
[a.totalTokens, b.totalTokens],
|
||||
[a.totalDurationMs, b.totalDurationMs],
|
||||
[a.avgTokPerSec, b.avgTokPerSec],
|
||||
[a.score, b.score],
|
||||
][sortCol];
|
||||
const cmp = typeof vals[0] === 'string' ? vals[0].localeCompare(vals[1]) : vals[0] - vals[1];
|
||||
return sortAsc ? cmp : -cmp;
|
||||
});
|
||||
resBody.innerHTML = sorted.map(r => {
|
||||
const c = cls(r);
|
||||
return `<tr>
|
||||
<td class="model-name">${r.model}</td>
|
||||
<td>${r.scenario}</td>
|
||||
<td>${r.specOk ? `✓ ${r.specEntities}e` : '<span class="fail">✗</span>'}</td>
|
||||
<td class="${c}">${pctBar(r.testsPassed, r.testsTotal)}</td>
|
||||
<td>${r.fixRounds > 0 ? r.fixRounds + '×' : '-'}</td>
|
||||
<td>${r.promptTokensEst > 0 ? '~'+(r.promptTokensEst/1000).toFixed(1)+'K' : '-'}</td>
|
||||
<td>${r.totalTokens > 0 ? (r.totalTokens/1000).toFixed(1)+'K' : '-'}</td>
|
||||
<td>${(r.totalDurationMs/1000).toFixed(0)}s</td>
|
||||
<td>${r.avgTokPerSec.toFixed(0)}</td>
|
||||
<td><span class="stars">${r.stars}</span> ${r.score}p</td>
|
||||
</tr>`;
|
||||
}).join('');
|
||||
document.querySelectorAll('#results-table th').forEach((th,i) => {
|
||||
th.className = i === sortCol ? (sortAsc ? 'sorted-asc' : 'sorted-desc') : '';
|
||||
});
|
||||
}
|
||||
document.querySelector('#results-table thead').addEventListener('click', e => {
|
||||
const col = parseInt(e.target.dataset.col);
|
||||
if (isNaN(col)) return;
|
||||
if (sortCol === col) sortAsc = !sortAsc;
|
||||
else { sortCol = col; sortAsc = false; }
|
||||
renderResults();
|
||||
});
|
||||
renderResults();
|
||||
</script>
|
||||
</body>
|
||||
</html>
|
||||
117
kipina-codebench/results/2026-04-14T14-38.json
Normal file
117
kipina-codebench/results/2026-04-14T14-38.json
Normal file
@@ -0,0 +1,117 @@
|
||||
[
|
||||
{
|
||||
"model": "qwen3-coder:30b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 2,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 0,
|
||||
"testsPassed": 0,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 38807,
|
||||
"totalTokens": 5667,
|
||||
"avgTokPerSec": 183.83891911423427,
|
||||
"promptChars": 21818,
|
||||
"promptTokensEst": 5455,
|
||||
"score": 40,
|
||||
"stars": "★★☆☆☆",
|
||||
"error": null,
|
||||
"profile": "large",
|
||||
"promptName": "code-rs",
|
||||
"round": 1
|
||||
},
|
||||
{
|
||||
"model": "qwen3-coder:30b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 2,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 3,
|
||||
"testsTotal": 0,
|
||||
"testsPassed": 0,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 178290,
|
||||
"totalTokens": 26265,
|
||||
"avgTokPerSec": 168.77786498646262,
|
||||
"promptChars": 21840,
|
||||
"promptTokensEst": 5460,
|
||||
"score": 0,
|
||||
"stars": "☆☆☆☆☆",
|
||||
"error": "Testit kaatuivat",
|
||||
"profile": "large",
|
||||
"promptName": "code-rs",
|
||||
"round": 2
|
||||
},
|
||||
{
|
||||
"model": "qwen3-coder:30b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 2,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 3,
|
||||
"testsTotal": 0,
|
||||
"testsPassed": 0,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 151603,
|
||||
"totalTokens": 22725,
|
||||
"avgTokPerSec": 170.74115131582644,
|
||||
"promptChars": 21750,
|
||||
"promptTokensEst": 5438,
|
||||
"score": 0,
|
||||
"stars": "☆☆☆☆☆",
|
||||
"error": "Testit kaatuivat",
|
||||
"profile": "large",
|
||||
"promptName": "code-rs",
|
||||
"round": 3
|
||||
},
|
||||
{
|
||||
"model": "qwen3-coder:30b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 2,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 0,
|
||||
"testsTotal": 0,
|
||||
"testsPassed": 0,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 41059,
|
||||
"totalTokens": 6288,
|
||||
"avgTokPerSec": 183.76827829344424,
|
||||
"promptChars": 21848,
|
||||
"promptTokensEst": 5462,
|
||||
"score": 40,
|
||||
"stars": "★★☆☆☆",
|
||||
"error": null,
|
||||
"profile": "large",
|
||||
"promptName": "code-rs",
|
||||
"round": 4
|
||||
},
|
||||
{
|
||||
"model": "qwen3-coder:30b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 2,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 3,
|
||||
"testsTotal": 0,
|
||||
"testsPassed": 0,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 187666,
|
||||
"totalTokens": 27278,
|
||||
"avgTokPerSec": 166.24197655672018,
|
||||
"promptChars": 21694,
|
||||
"promptTokensEst": 5424,
|
||||
"score": 0,
|
||||
"stars": "☆☆☆☆☆",
|
||||
"error": "Testit kaatuivat",
|
||||
"profile": "large",
|
||||
"promptName": "code-rs",
|
||||
"round": 5
|
||||
}
|
||||
]
|
||||
183
kipina-codebench/results/2026-04-14T14-52.html
Normal file
183
kipina-codebench/results/2026-04-14T14-52.html
Normal file
@@ -0,0 +1,183 @@
|
||||
<!DOCTYPE html>
|
||||
<html lang="fi">
|
||||
<head>
|
||||
<meta charset="UTF-8">
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
||||
<title>Kipina Model Benchmark</title>
|
||||
<style>
|
||||
:root { --bg: #0d1117; --card: #161b22; --border: #30363d; --text: #e6edf3; --dim: #8b949e; --green: #3fb950; --yellow: #d29922; --red: #f85149; --blue: #58a6ff; }
|
||||
* { box-sizing: border-box; margin: 0; padding: 0; }
|
||||
body { font-family: -apple-system, 'Segoe UI', Helvetica, Arial, sans-serif; background: var(--bg); color: var(--text); padding: 2rem; max-width: 1400px; margin: 0 auto; }
|
||||
h1 { font-size: 1.5rem; margin-bottom: 0.5rem; }
|
||||
.meta { color: var(--dim); font-size: 0.85rem; margin-bottom: 2rem; }
|
||||
.cards { display: grid; grid-template-columns: repeat(auto-fit, minmax(180px, 1fr)); gap: 1rem; margin-bottom: 2rem; }
|
||||
.card { background: var(--card); border: 1px solid var(--border); border-radius: 8px; padding: 1rem; }
|
||||
.card .label { color: var(--dim); font-size: 0.75rem; text-transform: uppercase; letter-spacing: 0.05em; }
|
||||
.card .value { font-size: 1.8rem; font-weight: 600; margin-top: 0.25rem; }
|
||||
.card .sub { color: var(--dim); font-size: 0.8rem; margin-top: 0.25rem; }
|
||||
table { width: 100%; border-collapse: collapse; background: var(--card); border: 1px solid var(--border); border-radius: 8px; overflow: hidden; margin-bottom: 2rem; }
|
||||
th { background: #1c2128; text-align: left; padding: 0.6rem 0.8rem; font-size: 0.75rem; text-transform: uppercase; letter-spacing: 0.05em; color: var(--dim); cursor: pointer; user-select: none; white-space: nowrap; }
|
||||
th:hover { color: var(--text); }
|
||||
th.sorted-asc::after { content: ' ▲'; }
|
||||
th.sorted-desc::after { content: ' ▼'; }
|
||||
td { padding: 0.5rem 0.8rem; border-top: 1px solid var(--border); font-size: 0.85rem; white-space: nowrap; }
|
||||
tr:hover td { background: #1c2128; }
|
||||
.pass { color: var(--green); }
|
||||
.partial { color: var(--yellow); }
|
||||
.fail { color: var(--red); }
|
||||
.stars { letter-spacing: 1px; }
|
||||
.bar { display: inline-block; height: 8px; border-radius: 4px; vertical-align: middle; }
|
||||
.bar-bg { background: var(--border); }
|
||||
.bar-fill { background: var(--green); }
|
||||
.bar-partial { background: var(--yellow); }
|
||||
.model-name { font-weight: 600; }
|
||||
h2 { font-size: 1.1rem; margin-bottom: 1rem; color: var(--dim); }
|
||||
.summary-table th:first-child, .summary-table td:first-child { min-width: 200px; }
|
||||
</style>
|
||||
</head>
|
||||
<body>
|
||||
|
||||
<h1>Kipina Model Benchmark</h1>
|
||||
<div class="meta" id="meta"></div>
|
||||
|
||||
<div class="cards" id="cards"></div>
|
||||
|
||||
<h2>Mallikohtainen yhteenveto</h2>
|
||||
<table class="summary-table" id="summary-table"><thead></thead><tbody></tbody></table>
|
||||
|
||||
<h2>Kaikki tulokset</h2>
|
||||
<table id="results-table"><thead></thead><tbody></tbody></table>
|
||||
|
||||
<script>
|
||||
const RAW = [{"model":"qwen3-coder:30b","scenario":"blog","reqOk":true,"specOk":true,"specEntities":2,"validationIssues":0,"fixRounds":4,"testsTotal":1,"testsPassed":0,"testsFailed":1,"totalDurationMs":231122,"totalTokens":22952,"avgTokPerSec":113.75113825466987,"promptChars":17604,"promptTokensEst":4401,"score":20,"stars":"★☆☆☆☆","error":null,"profile":"large","promptName":"code-rs","round":1},{"model":"qwen3-coder:30b","scenario":"blog","reqOk":true,"specOk":true,"specEntities":2,"validationIssues":0,"fixRounds":5,"testsTotal":0,"testsPassed":0,"testsFailed":0,"totalDurationMs":260314,"totalTokens":26144,"avgTokPerSec":113.40388181735229,"promptChars":17539,"promptTokensEst":4385,"score":0,"stars":"☆☆☆☆☆","error":"Testit kaatuivat","profile":"large","promptName":"code-rs","round":2},{"model":"qwen3-coder:30b","scenario":"blog","reqOk":true,"specOk":true,"specEntities":2,"validationIssues":0,"fixRounds":4,"testsTotal":0,"testsPassed":0,"testsFailed":0,"totalDurationMs":227228,"totalTokens":22381,"avgTokPerSec":113.5362722539456,"promptChars":17630,"promptTokensEst":4408,"score":0,"stars":"☆☆☆☆☆","error":"Testit kaatuivat","profile":"large","promptName":"code-rs","round":3},{"model":"qwen3-coder:30b","scenario":"blog","reqOk":true,"specOk":true,"specEntities":2,"validationIssues":0,"fixRounds":1,"testsTotal":0,"testsPassed":0,"testsFailed":0,"totalDurationMs":102052,"totalTokens":9984,"avgTokPerSec":117.77973450501808,"promptChars":17571,"promptTokensEst":4393,"score":30,"stars":"★★☆☆☆","error":null,"profile":"large","promptName":"code-rs","round":4},{"model":"qwen3-coder:30b","scenario":"blog","reqOk":true,"specOk":true,"specEntities":2,"validationIssues":0,"fixRounds":2,"testsTotal":0,"testsPassed":0,"testsFailed":0,"totalDurationMs":146321,"totalTokens":14445,"avgTokPerSec":115.61186488022163,"promptChars":17589,"promptTokensEst":4397,"score":20,"stars":"★☆☆☆☆","error":null,"profile":"large","promptName":"code-rs","round":5}];
|
||||
|
||||
const starsFor = s => s >= 90 ? '★★★★★' : s >= 70 ? '★★★★☆' : s >= 50 ? '★★★☆☆' : s >= 25 ? '★★☆☆☆' : s > 0 ? '★☆☆☆☆' : '☆☆☆☆☆';
|
||||
function calcScore(r) {
|
||||
if (r.error && r.testsTotal === 0) return 0;
|
||||
let s = 0;
|
||||
if (r.specOk) s += 10;
|
||||
if (!r.error || r.testsTotal > 0) s += 10;
|
||||
if (r.testsTotal > 0) s += Math.round((r.testsPassed / r.testsTotal) * 60);
|
||||
s += Math.max(0, 20 - (r.fixRounds || 0) * 10);
|
||||
return Math.min(100, s);
|
||||
}
|
||||
// Laske pisteet jos puuttuvat
|
||||
const DATA = RAW.map(r => {
|
||||
if (r.score == null) r.score = calcScore(r);
|
||||
if (!r.stars) r.stars = starsFor(r.score);
|
||||
if (!r.promptTokensEst) r.promptTokensEst = r.promptChars ? Math.round(r.promptChars / 4) : 0;
|
||||
return r;
|
||||
});
|
||||
const cls = r => (!r.error && r.testsPassed === r.testsTotal && r.testsTotal > 0) ? 'pass' : (r.testsTotal > 0 && r.testsPassed > 0) ? 'partial' : 'fail';
|
||||
const pctBar = (passed, total, w=80) => {
|
||||
if (total === 0) return '-';
|
||||
const pct = passed/total*100;
|
||||
const c = pct === 100 ? 'bar-fill' : 'bar-partial';
|
||||
return `<span class="bar bar-bg" style="width:${w}px"><span class="bar ${c}" style="width:${Math.round(pct/100*w)}px"></span></span> ${passed}/${total}`;
|
||||
};
|
||||
|
||||
// Meta
|
||||
const totalTime = DATA.reduce((s,r) => s + r.totalDurationMs, 0);
|
||||
document.getElementById('meta').textContent = `${new Date().toLocaleDateString('fi-FI')} — ${DATA.length} ajoa — ${(totalTime/1000/60).toFixed(1)} min`;
|
||||
|
||||
// Cards
|
||||
const models = [...new Set(DATA.map(r => r.model))];
|
||||
const scenarios = [...new Set(DATA.map(r => r.scenario))];
|
||||
const avgScore = DATA.length ? Math.round(DATA.reduce((s,r) => s + r.score, 0) / DATA.length) : 0;
|
||||
const totalPassed = DATA.reduce((s,r) => s + r.testsPassed, 0);
|
||||
const totalTests = DATA.reduce((s,r) => s + r.testsTotal, 0);
|
||||
const passRate = totalTests ? Math.round(totalPassed/totalTests*100) : 0;
|
||||
const bestModel = models.map(m => {
|
||||
const mrs = DATA.filter(r => r.model === m);
|
||||
return { model: m, avg: Math.round(mrs.reduce((s,r) => s + r.score, 0) / mrs.length) };
|
||||
}).sort((a,b) => b.avg - a.avg)[0];
|
||||
const fastestModel = models.map(m => {
|
||||
const mrs = DATA.filter(r => r.model === m);
|
||||
return { model: m, speed: Math.round(mrs.reduce((s,r) => s + r.avgTokPerSec, 0) / mrs.length) };
|
||||
}).sort((a,b) => b.speed - a.speed)[0];
|
||||
|
||||
document.getElementById('cards').innerHTML = `
|
||||
<div class="card"><div class="label">Keskiarvo</div><div class="value">${starsFor(avgScore)}</div><div class="sub">${avgScore} pistetta</div></div>
|
||||
<div class="card"><div class="label">Testien läpäisy</div><div class="value">${passRate}%</div><div class="sub">${totalPassed}/${totalTests} testiä</div></div>
|
||||
<div class="card"><div class="label">Paras malli</div><div class="value" style="font-size:1.2rem">${bestModel?.model || '-'}</div><div class="sub">${bestModel?.avg || 0}p</div></div>
|
||||
<div class="card"><div class="label">Nopein</div><div class="value" style="font-size:1.2rem">${fastestModel?.model || '-'}</div><div class="sub">${fastestModel?.speed || 0} tok/s</div></div>
|
||||
<div class="card"><div class="label">Malleja</div><div class="value">${models.length}</div><div class="sub">${scenarios.length} skenaariota</div></div>
|
||||
<div class="card"><div class="label">Kokonaisaika</div><div class="value">${(totalTime/1000/60).toFixed(1)}</div><div class="sub">minuuttia</div></div>
|
||||
`;
|
||||
|
||||
// Summary table
|
||||
const sumHead = document.querySelector('#summary-table thead');
|
||||
const sumBody = document.querySelector('#summary-table tbody');
|
||||
sumHead.innerHTML = '<tr><th>Malli</th>' + scenarios.map(s => `<th>${s}</th>`).join('') + '<th>Yht.</th><th>Out tok</th><th>Aika</th><th>tok/s</th><th>Pisteet</th></tr>';
|
||||
|
||||
const modelRows = models.map(m => {
|
||||
const mrs = DATA.filter(r => r.model === m);
|
||||
const tp = mrs.reduce((s,r) => s + r.testsPassed, 0);
|
||||
const tt = mrs.reduce((s,r) => s + r.testsTotal, 0);
|
||||
const tok = mrs.reduce((s,r) => s + r.totalTokens, 0);
|
||||
const time = mrs.reduce((s,r) => s + r.totalDurationMs, 0);
|
||||
const speed = Math.round(mrs.reduce((s,r) => s + r.avgTokPerSec, 0) / mrs.length);
|
||||
const avg = Math.round(mrs.reduce((s,r) => s + r.score, 0) / mrs.length);
|
||||
const scenCols = scenarios.map(s => {
|
||||
const r = mrs.find(r => r.scenario === s);
|
||||
if (!r) return '<td>-</td>';
|
||||
return `<td class="${cls(r)}">${pctBar(r.testsPassed, r.testsTotal, 60)} <span style="color:var(--dim)">${(r.totalDurationMs/1000).toFixed(0)}s</span></td>`;
|
||||
}).join('');
|
||||
return { avg, html: `<tr><td class="model-name">${m}</td>${scenCols}<td>${pctBar(tp, tt)}</td><td>${(tok/1000).toFixed(1)}K</td><td>${(time/1000).toFixed(0)}s</td><td>${speed}</td><td><span class="stars">${starsFor(avg)}</span> ${avg}p</td></tr>` };
|
||||
}).sort((a,b) => b.avg - a.avg);
|
||||
sumBody.innerHTML = modelRows.map(r => r.html).join('');
|
||||
|
||||
// Results table
|
||||
const resHead = document.querySelector('#results-table thead');
|
||||
const resBody = document.querySelector('#results-table tbody');
|
||||
const resCols = ['Malli','Skenaario','Speksi','Testit','Korjaus','Ctx','Out tok','Aika','tok/s','Pisteet'];
|
||||
resHead.innerHTML = '<tr>' + resCols.map((c,i) => `<th data-col="${i}">${c}</th>`).join('') + '</tr>';
|
||||
|
||||
let sortCol = 9, sortAsc = false;
|
||||
function renderResults() {
|
||||
const sorted = [...DATA].sort((a,b) => {
|
||||
const vals = [
|
||||
[a.model, b.model],
|
||||
[a.scenario, b.scenario],
|
||||
[a.specEntities, b.specEntities],
|
||||
[a.testsPassed/Math.max(a.testsTotal,1), b.testsPassed/Math.max(b.testsTotal,1)],
|
||||
[a.fixRounds, b.fixRounds],
|
||||
[a.promptTokensEst, b.promptTokensEst],
|
||||
[a.totalTokens, b.totalTokens],
|
||||
[a.totalDurationMs, b.totalDurationMs],
|
||||
[a.avgTokPerSec, b.avgTokPerSec],
|
||||
[a.score, b.score],
|
||||
][sortCol];
|
||||
const cmp = typeof vals[0] === 'string' ? vals[0].localeCompare(vals[1]) : vals[0] - vals[1];
|
||||
return sortAsc ? cmp : -cmp;
|
||||
});
|
||||
resBody.innerHTML = sorted.map(r => {
|
||||
const c = cls(r);
|
||||
return `<tr>
|
||||
<td class="model-name">${r.model}</td>
|
||||
<td>${r.scenario}</td>
|
||||
<td>${r.specOk ? `✓ ${r.specEntities}e` : '<span class="fail">✗</span>'}</td>
|
||||
<td class="${c}">${pctBar(r.testsPassed, r.testsTotal)}</td>
|
||||
<td>${r.fixRounds > 0 ? r.fixRounds + '×' : '-'}</td>
|
||||
<td>${r.promptTokensEst > 0 ? '~'+(r.promptTokensEst/1000).toFixed(1)+'K' : '-'}</td>
|
||||
<td>${r.totalTokens > 0 ? (r.totalTokens/1000).toFixed(1)+'K' : '-'}</td>
|
||||
<td>${(r.totalDurationMs/1000).toFixed(0)}s</td>
|
||||
<td>${r.avgTokPerSec.toFixed(0)}</td>
|
||||
<td><span class="stars">${r.stars}</span> ${r.score}p</td>
|
||||
</tr>`;
|
||||
}).join('');
|
||||
document.querySelectorAll('#results-table th').forEach((th,i) => {
|
||||
th.className = i === sortCol ? (sortAsc ? 'sorted-asc' : 'sorted-desc') : '';
|
||||
});
|
||||
}
|
||||
document.querySelector('#results-table thead').addEventListener('click', e => {
|
||||
const col = parseInt(e.target.dataset.col);
|
||||
if (isNaN(col)) return;
|
||||
if (sortCol === col) sortAsc = !sortAsc;
|
||||
else { sortCol = col; sortAsc = false; }
|
||||
renderResults();
|
||||
});
|
||||
renderResults();
|
||||
</script>
|
||||
</body>
|
||||
</html>
|
||||
117
kipina-codebench/results/2026-04-14T14-52.json
Normal file
117
kipina-codebench/results/2026-04-14T14-52.json
Normal file
@@ -0,0 +1,117 @@
|
||||
[
|
||||
{
|
||||
"model": "qwen3-coder:30b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 2,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 4,
|
||||
"testsTotal": 1,
|
||||
"testsPassed": 0,
|
||||
"testsFailed": 1,
|
||||
"totalDurationMs": 231122,
|
||||
"totalTokens": 22952,
|
||||
"avgTokPerSec": 113.75113825466987,
|
||||
"promptChars": 17604,
|
||||
"promptTokensEst": 4401,
|
||||
"score": 20,
|
||||
"stars": "★☆☆☆☆",
|
||||
"error": null,
|
||||
"profile": "large",
|
||||
"promptName": "code-rs",
|
||||
"round": 1
|
||||
},
|
||||
{
|
||||
"model": "qwen3-coder:30b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 2,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 5,
|
||||
"testsTotal": 0,
|
||||
"testsPassed": 0,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 260314,
|
||||
"totalTokens": 26144,
|
||||
"avgTokPerSec": 113.40388181735229,
|
||||
"promptChars": 17539,
|
||||
"promptTokensEst": 4385,
|
||||
"score": 0,
|
||||
"stars": "☆☆☆☆☆",
|
||||
"error": "Testit kaatuivat",
|
||||
"profile": "large",
|
||||
"promptName": "code-rs",
|
||||
"round": 2
|
||||
},
|
||||
{
|
||||
"model": "qwen3-coder:30b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 2,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 4,
|
||||
"testsTotal": 0,
|
||||
"testsPassed": 0,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 227228,
|
||||
"totalTokens": 22381,
|
||||
"avgTokPerSec": 113.5362722539456,
|
||||
"promptChars": 17630,
|
||||
"promptTokensEst": 4408,
|
||||
"score": 0,
|
||||
"stars": "☆☆☆☆☆",
|
||||
"error": "Testit kaatuivat",
|
||||
"profile": "large",
|
||||
"promptName": "code-rs",
|
||||
"round": 3
|
||||
},
|
||||
{
|
||||
"model": "qwen3-coder:30b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 2,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 1,
|
||||
"testsTotal": 0,
|
||||
"testsPassed": 0,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 102052,
|
||||
"totalTokens": 9984,
|
||||
"avgTokPerSec": 117.77973450501808,
|
||||
"promptChars": 17571,
|
||||
"promptTokensEst": 4393,
|
||||
"score": 30,
|
||||
"stars": "★★☆☆☆",
|
||||
"error": null,
|
||||
"profile": "large",
|
||||
"promptName": "code-rs",
|
||||
"round": 4
|
||||
},
|
||||
{
|
||||
"model": "qwen3-coder:30b",
|
||||
"scenario": "blog",
|
||||
"reqOk": true,
|
||||
"specOk": true,
|
||||
"specEntities": 2,
|
||||
"validationIssues": 0,
|
||||
"fixRounds": 2,
|
||||
"testsTotal": 0,
|
||||
"testsPassed": 0,
|
||||
"testsFailed": 0,
|
||||
"totalDurationMs": 146321,
|
||||
"totalTokens": 14445,
|
||||
"avgTokPerSec": 115.61186488022163,
|
||||
"promptChars": 17589,
|
||||
"promptTokensEst": 4397,
|
||||
"score": 20,
|
||||
"stars": "★☆☆☆☆",
|
||||
"error": null,
|
||||
"profile": "large",
|
||||
"promptName": "code-rs",
|
||||
"round": 5
|
||||
}
|
||||
]
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user