Compare commits
7 Commits
20250813-W
...
main
| Author | SHA1 | Date | |
|---|---|---|---|
| cadd23e554 | |||
| ad6df74ef4 | |||
| 9327bc48d8 | |||
| 508fafd0df | |||
| 1d50e7042e | |||
| 6a4e97f4e4 | |||
| 59e7e64af7 |
90
PMO/WP-17-kickoff.md
Normal file
90
PMO/WP-17-kickoff.md
Normal file
|
|
@ -0,0 +1,90 @@
|
|||
# WP-17 – Retriever & Composer (Kern ohne LLM)
|
||||
|
||||
## Projektkontext
|
||||
Wir entwickeln eine deterministische Planerstellung aus bestehenden **plan_templates** und **exercises**.
|
||||
WP-15 hat die Collections, Indizes und CRUD-APIs für `plan_templates` und `plans` produktiv geliefert.
|
||||
WP-02 stellt die exercises-Collection mit Capabilities und Qdrant-Anbindung bereit.
|
||||
|
||||
**Technologie-Stack:** Python 3.12, FastAPI, Qdrant
|
||||
|
||||
---
|
||||
|
||||
## Ziele
|
||||
Implementierung eines `/plan/generate`-Endpoints, der:
|
||||
|
||||
- Filter- und Vektor-Suche in Qdrant kombiniert
|
||||
- Scoring nach Coverage, Diversity und Novelty durchführt
|
||||
- Pläne deterministisch und ohne LLM generiert
|
||||
- Zeitbudgets einhält und Wiederholungen (Novelty-Penalty) vermeidet
|
||||
|
||||
---
|
||||
|
||||
## Deliverables
|
||||
1. **API**: POST `/plan/generate`
|
||||
- Parameter: `discipline`, `age_group`, `target_group`, `goals`, `time_budget_minutes`, `novelty_horizon` (5), `coverage_threshold` (0.8), `strict_mode`
|
||||
- Rückgabe: Plan-JSON mit Exercises-Referenzen und Metadaten
|
||||
|
||||
2. **Retriever**
|
||||
- Filter-Layer (Payload)
|
||||
- Vector-Layer (Ranking)
|
||||
- Kombinierte Gewichtung
|
||||
|
||||
3. **Composer**
|
||||
- Sections aufbauen (aus Template oder Default)
|
||||
- Zeitbudget pro Section und Gesamt einhalten
|
||||
- Strict-Mode: nur gültige `external_id`
|
||||
|
||||
4. **Scoring-Funktionen**
|
||||
- Coverage (Capabilites-Abdeckung)
|
||||
- Diversity (Variabilität)
|
||||
- Novelty (Neuheit gegenüber Historie)
|
||||
|
||||
5. **Tests**
|
||||
- Unit-Tests (Scoring, Filter)
|
||||
- E2E: Template → Retriever → Composer → Persistenz
|
||||
|
||||
6. **Dokumentation**
|
||||
- OpenAPI-Beispiele, Parametrierung, Konfigurationsoptionen
|
||||
|
||||
---
|
||||
|
||||
## Akzeptanzkriterien
|
||||
- Identische Eingaben → identischer Plan (Determinismus)
|
||||
- Keine doppelten Übungen im Plan
|
||||
- Budget- und Coverage-Ziele in ≥95 % der Testfälle erreicht
|
||||
- Novelty-Penalty wirkt wie konfiguriert
|
||||
|
||||
---
|
||||
|
||||
## Risiken
|
||||
- Konflikte zwischen Budget, Coverage, Novelty (Priorisierung erforderlich)
|
||||
- Geringe Übungsvielfalt → eingeschränkte Ergebnisse
|
||||
- Performance-Einbußen bei großen Collections
|
||||
|
||||
---
|
||||
|
||||
## Technische Vorgaben
|
||||
**Voreinstellungen:**
|
||||
- `novelty_horizon`: 5
|
||||
- `coverage_threshold`: 0.8
|
||||
- Priorität bei Konflikt: 1. Budget, 2. Coverage, 3. Novelty
|
||||
|
||||
**Benötigte Dateien:**
|
||||
- `llm-api/plan_router.py` (v0.13.4)
|
||||
- `llm-api/exercise_router.py` (aus WP-02)
|
||||
- `scripts/bootstrap_qdrant_plans.py` (v1.3.x)
|
||||
- Schema-Definitionen für `plan_templates` und `plans`
|
||||
- Beispiel-Datensätze (Golden-Cases)
|
||||
- `.env` (ohne Secrets, mit API-URLs)
|
||||
|
||||
---
|
||||
|
||||
## Prompt für das Entwicklerteam (direkt nutzbar)
|
||||
> **Rolle:** Entwicklerteam WP-17 – Retriever & Composer (Kern ohne LLM)
|
||||
> **Aufgabe:** Implementiere `/plan/generate`, der deterministisch aus plan_templates und exercises Pläne generiert.
|
||||
> Nutze Filter- und Vektor-Suche in Qdrant, Scoring-Funktionen (Coverage, Diversity, Novelty) und eine Composer-Logik, die Zeitbudgets einhält.
|
||||
> **Parameter:** discipline, age_group, target_group, goals, time_budget_minutes, novelty_horizon=5, coverage_threshold=0.8, strict_mode.
|
||||
> **Anforderungen:** Deterministische Ergebnisse, keine Duplikate, ≥95 % Zielerreichung bei Budget/Coverage, funktionierender Novelty-Penalty.
|
||||
> **Rahmen:** Python 3.12, FastAPI, Qdrant, vorhandene plan_templates/plans/exercises-Collections.
|
||||
> **Liefere:** Code, Unit- und E2E-Tests, OpenAPI-Doku mit Beispielen.
|
||||
> **Dateien:** siehe Liste oben.
|
||||
45
llm-api/audit_ki_stack.sh
Normal file
45
llm-api/audit_ki_stack.sh
Normal file
|
|
@ -0,0 +1,45 @@
|
|||
#!/usr/bin/env bash
|
||||
set -euo pipefail
|
||||
|
||||
echo "=== SYSTEM ==="
|
||||
uname -a || true
|
||||
echo
|
||||
echo "CPU/Mem:"
|
||||
lscpu | egrep 'Model name|CPU\(s\)|Thread|Core|Socket' || true
|
||||
free -h || true
|
||||
echo
|
||||
echo "Disk:"
|
||||
df -hT | awk 'NR==1 || /\/(srv|opt|home|var|$)/'
|
||||
echo
|
||||
|
||||
echo "=== DOCKER ==="
|
||||
docker --version || true
|
||||
docker compose version || docker-compose --version || true
|
||||
echo
|
||||
echo "Running containers:"
|
||||
docker ps --format "table {{.Names}}\t{{.Image}}\t{{.Ports}}" || true
|
||||
echo
|
||||
|
||||
echo "=== PYTHON ==="
|
||||
python3 --version || true
|
||||
python3.12 --version || true
|
||||
pip --version || true
|
||||
echo
|
||||
|
||||
echo "=== NODE/NPM (für n8n, falls nativ) ==="
|
||||
node -v || true
|
||||
npm -v || true
|
||||
echo
|
||||
|
||||
echo "=== BESETZTE PORTS (root zeigt Prozesse) ==="
|
||||
for p in 8000 6333 11434 5678; do
|
||||
echo "--- Port $p ---"
|
||||
(sudo ss -ltnp | grep ":$p ") || echo "frei"
|
||||
done
|
||||
echo
|
||||
|
||||
echo "=== DIENSTE / HINWEISE ==="
|
||||
systemctl list-units --type=service | egrep -i 'qdrant|ollama|n8n|uvicorn|gunicorn' || true
|
||||
echo
|
||||
|
||||
echo "Fertig. Prüfe ob Ports frei sind und welche Container bereits laufen."
|
||||
|
|
@ -1,15 +1,17 @@
|
|||
# -*- coding: utf-8 -*-
|
||||
"""
|
||||
exercise_router.py – v1.7.0
|
||||
exercise_router.py – v1.7.1 (Swagger angereichert)
|
||||
|
||||
Neu:
|
||||
- Endpoint **POST /exercise/search**: kombinierbare Filter (discipline, duration, equipment any/all, keywords any/all,
|
||||
capability_geN / capability_eqN + names) + optionaler Vektor-Query (query-Text). Ausgabe inkl. Score.
|
||||
- Facetten erweitert: neben capability_ge1..ge5 jetzt auch capability_eq1..eq5.
|
||||
- Idempotenz-Fix & Payload-Scroll (aus v1.6.2) beibehalten.
|
||||
- API-Signaturen bestehender Routen unverändert.
|
||||
Ergänzt:
|
||||
- Aussagekräftige summary/description/response_description je Endpoint
|
||||
- Beispiele (x-codeSamples) für curl-Aufrufe
|
||||
- Pydantic-Felder mit description + json_schema_extra (Beispiele)
|
||||
- Keine API-Signatur-/Pfadänderungen, keine Prefix-Änderungen
|
||||
|
||||
Hinweis: Die „eq/ge“-Felder werden beim Upsert gesetzt; für Alt-Punkte einmal das Backfill laufen lassen.
|
||||
Hinweis:
|
||||
- Endpunkte bleiben weiterhin unter /exercise/* (weil die Routenstrings bereits /exercise/... enthalten).
|
||||
- Falls du später einen APIRouter-Prefix setzen willst, dann bitte die Pfade unten von '/exercise/...' auf relative Pfade ändern,
|
||||
sonst entstehen Doppelpfade.
|
||||
"""
|
||||
|
||||
from fastapi import APIRouter, HTTPException, Query
|
||||
|
|
@ -27,77 +29,137 @@ from qdrant_client.models import (
|
|||
FieldCondition,
|
||||
MatchValue,
|
||||
)
|
||||
import logging
|
||||
import os
|
||||
|
||||
router = APIRouter()
|
||||
logger = logging.getLogger("exercise_router")
|
||||
logger.setLevel(logging.INFO)
|
||||
|
||||
# Router ohne prefix (Pfadstrings enthalten bereits '/exercise/...')
|
||||
router = APIRouter(tags=["exercise"])
|
||||
|
||||
# =========================
|
||||
# Models
|
||||
# =========================
|
||||
class Exercise(BaseModel):
|
||||
id: str = Field(default_factory=lambda: str(uuid4()))
|
||||
id: str = Field(default_factory=lambda: str(uuid4()), description="Interne UUID (Qdrant-Punkt-ID)")
|
||||
# Upsert-Metadaten
|
||||
external_id: Optional[str] = None
|
||||
fingerprint: Optional[str] = None
|
||||
source: Optional[str] = None
|
||||
imported_at: Optional[datetime] = None
|
||||
external_id: Optional[str] = Field(default=None, description="Upsert-Schlüssel (z. B. 'mw:{pageid}')")
|
||||
fingerprint: Optional[str] = Field(default=None, description="sha256 der Kernfelder für Idempotenz/Diff")
|
||||
source: Optional[str] = Field(default=None, description="Quelle (z. B. 'mediawiki', 'pdf-import', …)")
|
||||
imported_at: Optional[datetime] = Field(default=None, description="Zeitpunkt des Imports (ISO-8601)")
|
||||
|
||||
# Domain-Felder
|
||||
title: str
|
||||
summary: str
|
||||
short_description: str
|
||||
keywords: List[str] = []
|
||||
link: Optional[str] = None
|
||||
discipline: str
|
||||
group: Optional[str] = None
|
||||
age_group: str
|
||||
target_group: str
|
||||
min_participants: int
|
||||
duration_minutes: int
|
||||
capabilities: Dict[str, int] = {}
|
||||
category: str
|
||||
purpose: str
|
||||
execution: str
|
||||
notes: str
|
||||
preparation: str
|
||||
method: str
|
||||
equipment: List[str] = []
|
||||
title: str = Field(..., description="Übungstitel")
|
||||
summary: str = Field(..., description="Kurzbeschreibung/Ziel der Übung")
|
||||
short_description: str = Field(..., description="Alternative Kurzform / Teaser")
|
||||
keywords: List[str] = Field(default_factory=list, description="Freie Schlagworte (normalisiert)")
|
||||
link: Optional[str] = Field(default=None, description="Kanonsiche URL/Permalink zur Quelle")
|
||||
discipline: str = Field(..., description="Disziplin (z. B. Karate)")
|
||||
group: Optional[str] = Field(default=None, description="Optionale Gruppierung/Kategorie")
|
||||
age_group: str = Field(..., description="Altersgruppe (z. B. Kinder/Schüler/Teenager/Erwachsene)")
|
||||
target_group: str = Field(..., description="Zielgruppe (z. B. Breitensportler)")
|
||||
min_participants: int = Field(..., ge=0, description="Minimale Gruppenstärke")
|
||||
duration_minutes: int = Field(..., ge=0, description="Dauer in Minuten")
|
||||
capabilities: Dict[str, int] = Field(default_factory=dict, description="Fähigkeiten-Map: {Name: Level 1..5}")
|
||||
category: str = Field(..., description="Abschnitt / Kategorie (z. B. Aufwärmen, Grundschule, …)")
|
||||
purpose: str = Field(..., description="Zweck/Zielabsicht")
|
||||
execution: str = Field(..., description="Durchführungsschritte (Markdown/Wiki-ähnlich)")
|
||||
notes: str = Field(..., description="Hinweise/Coaching-Cues")
|
||||
preparation: str = Field(..., description="Vorbereitung/Material")
|
||||
method: str = Field(..., description="Methodik/Didaktik")
|
||||
equipment: List[str] = Field(default_factory=list, description="Benötigte Hilfsmittel")
|
||||
|
||||
model_config = {
|
||||
"json_schema_extra": {
|
||||
"example": {
|
||||
"external_id": "mw:218",
|
||||
"title": "Affenklatschen",
|
||||
"summary": "Koordination & Aufmerksamkeit mit Ballwechseln",
|
||||
"short_description": "Ballgewöhnung im Stand/Gehen/Laufen",
|
||||
"keywords": ["Hand-Auge-Koordination", "Reaktion"],
|
||||
"link": "https://www.karatetrainer.de/index.php?title=Affenklatschen",
|
||||
"discipline": "Karate",
|
||||
"age_group": "Teenager",
|
||||
"target_group": "Breitensportler",
|
||||
"min_participants": 4,
|
||||
"duration_minutes": 8,
|
||||
"capabilities": {"Reaktionsfähigkeit": 2, "Kopplungsfähigkeit": 2},
|
||||
"category": "Aufwärmen",
|
||||
"purpose": "Aufmerksamkeit & Reaktionskette aktivieren",
|
||||
"execution": "* Paarweise aufstellen …",
|
||||
"notes": "* nicht zu lange werden lassen",
|
||||
"preparation": "* Bälle bereit halten",
|
||||
"method": "* klare Regeln/Strafrunde",
|
||||
"equipment": ["Bälle"]
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
class DeleteResponse(BaseModel):
|
||||
status: str
|
||||
count: int
|
||||
collection: str
|
||||
status: str = Field(..., description="Statusmeldung")
|
||||
count: int = Field(..., ge=0, description="Anzahl betroffener Punkte")
|
||||
collection: str = Field(..., description="Qdrant-Collection-Name")
|
||||
|
||||
class ExerciseSearchRequest(BaseModel):
|
||||
# Optionaler Semantik-Query (Vektor)
|
||||
query: Optional[str] = None
|
||||
limit: int = Field(default=20, ge=1, le=200)
|
||||
offset: int = Field(default=0, ge=0)
|
||||
query: Optional[str] = Field(default=None, description="Freitext für Vektor-Suche (optional)")
|
||||
limit: int = Field(default=20, ge=1, le=200, description="Max. Treffer")
|
||||
offset: int = Field(default=0, ge=0, description="Offset/Pagination")
|
||||
|
||||
# Einfache Filter
|
||||
discipline: Optional[str] = None
|
||||
target_group: Optional[str] = None
|
||||
age_group: Optional[str] = None
|
||||
max_duration: Optional[int] = Field(default=None, ge=0)
|
||||
discipline: Optional[str] = Field(default=None, description="z. B. Karate")
|
||||
target_group: Optional[str] = Field(default=None, description="z. B. Breitensportler")
|
||||
age_group: Optional[str] = Field(default=None, description="z. B. Teenager")
|
||||
max_duration: Optional[int] = Field(default=None, ge=0, description="Obergrenze Minuten")
|
||||
|
||||
# Listen-Filter
|
||||
equipment_any: Optional[List[str]] = None # mindestens eins muss passen
|
||||
equipment_all: Optional[List[str]] = None # alle müssen passen
|
||||
keywords_any: Optional[List[str]] = None
|
||||
keywords_all: Optional[List[str]] = None
|
||||
equipment_any: Optional[List[str]] = Field(default=None, description="Mind. eines muss passen")
|
||||
equipment_all: Optional[List[str]] = Field(default=None, description="Alle müssen passen")
|
||||
keywords_any: Optional[List[str]] = Field(default=None, description="Mind. eines muss passen")
|
||||
keywords_all: Optional[List[str]] = Field(default=None, description="Alle müssen passen")
|
||||
|
||||
# Capabilities (Namen + Level-Operator)
|
||||
capability_names: Optional[List[str]] = None
|
||||
capability_ge_level: Optional[int] = Field(default=None, ge=1, le=5)
|
||||
capability_eq_level: Optional[int] = Field(default=None, ge=1, le=5)
|
||||
capability_names: Optional[List[str]] = Field(default=None, description="Capability-Bezeichnungen")
|
||||
capability_ge_level: Optional[int] = Field(default=None, ge=1, le=5, description="Level ≥ N")
|
||||
capability_eq_level: Optional[int] = Field(default=None, ge=1, le=5, description="Level == N")
|
||||
|
||||
model_config = {
|
||||
"json_schema_extra": {
|
||||
"examples": [{
|
||||
"discipline": "Karate",
|
||||
"max_duration": 12,
|
||||
"equipment_any": ["Bälle"],
|
||||
"capability_names": ["Reaktionsfähigkeit"],
|
||||
"capability_ge_level": 2,
|
||||
"limit": 5
|
||||
}, {
|
||||
"query": "Aufwärmen Reaktionsfähigkeit 10min Teenager Bälle",
|
||||
"discipline": "Karate",
|
||||
"limit": 3
|
||||
}]
|
||||
}
|
||||
}
|
||||
|
||||
class ExerciseSearchHit(BaseModel):
|
||||
id: str
|
||||
score: Optional[float] = None
|
||||
payload: Exercise
|
||||
id: str = Field(..., description="Qdrant-Punkt-ID")
|
||||
score: Optional[float] = Field(default=None, description="Ähnlichkeitsscore (nur bei Vektor-Suche)")
|
||||
payload: Exercise = Field(..., description="Übungsdaten (Payload)")
|
||||
|
||||
class ExerciseSearchResponse(BaseModel):
|
||||
hits: List[ExerciseSearchHit]
|
||||
hits: List[ExerciseSearchHit] = Field(..., description="Trefferliste")
|
||||
|
||||
model_config = {
|
||||
"json_schema_extra": {
|
||||
"example": {
|
||||
"hits": [{
|
||||
"id": "c1f1-…",
|
||||
"score": 0.78,
|
||||
"payload": Exercise.model_config["json_schema_extra"]["example"]
|
||||
}]
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
# =========================
|
||||
# Helpers
|
||||
|
|
@ -160,6 +222,12 @@ def _norm_list(xs: List[Any]) -> List[str]:
|
|||
|
||||
|
||||
def _facet_capabilities(caps: Dict[str, Any]) -> Dict[str, List[str]]:
|
||||
"""
|
||||
Leitet Facettenfelder aus der capabilities-Map ab:
|
||||
- capability_keys: alle Namen
|
||||
- capability_geN: Namen mit Level >= N (1..5)
|
||||
- capability_eqN: Namen mit Level == N (1..5)
|
||||
"""
|
||||
caps = caps or {}
|
||||
|
||||
def names_where(pred) -> List[str]:
|
||||
|
|
@ -194,6 +262,7 @@ def _facet_capabilities(caps: Dict[str, Any]) -> Dict[str, List[str]]:
|
|||
|
||||
|
||||
def _response_strip_extras(payload: Dict[str, Any]) -> Dict[str, Any]:
|
||||
# Nur definierte Exercise-Felder zurückgeben (saubere API)
|
||||
allowed = set(Exercise.model_fields.keys())
|
||||
return {k: v for k, v in payload.items() if k in allowed}
|
||||
|
||||
|
|
@ -209,8 +278,7 @@ def _build_filter(req: ExerciseSearchRequest) -> Filter:
|
|||
if req.age_group:
|
||||
must.append(FieldCondition(key="age_group", match=MatchValue(value=req.age_group)))
|
||||
if req.max_duration is not None:
|
||||
# Range ohne Import zusätzlicher Modelle: Qdrant akzeptiert auch {'range': {'lte': n}} per JSON;
|
||||
# über Client-Modell tun wir es hier nicht, da wir Filter primär für Keyword-Felder nutzen.
|
||||
# Range in Qdrant: über rohen JSON-Range-Ausdruck (Client-Modell hat keinen Komfort-Wrapper)
|
||||
must.append({"key": "duration_minutes", "range": {"lte": int(req.max_duration)}})
|
||||
|
||||
# equipment
|
||||
|
|
@ -218,7 +286,6 @@ def _build_filter(req: ExerciseSearchRequest) -> Filter:
|
|||
for it in req.equipment_all:
|
||||
must.append(FieldCondition(key="equipment", match=MatchValue(value=it)))
|
||||
if req.equipment_any:
|
||||
# OR: über 'should' Liste
|
||||
for it in req.equipment_any:
|
||||
should.append(FieldCondition(key="equipment", match=MatchValue(value=it)))
|
||||
|
||||
|
|
@ -248,22 +315,55 @@ def _build_filter(req: ExerciseSearchRequest) -> Filter:
|
|||
|
||||
flt = Filter(must=must)
|
||||
if should:
|
||||
# qdrant: 'should' mit implizitem minimum_should_match=1
|
||||
# Qdrant: 'should' entspricht OR mit minimum_should_match=1
|
||||
flt.should = should
|
||||
return flt
|
||||
|
||||
# =========================
|
||||
# Endpoints
|
||||
# =========================
|
||||
@router.get("/exercise/by-external-id")
|
||||
def get_exercise_by_external_id(external_id: str = Query(..., min_length=3)):
|
||||
@router.get(
|
||||
"/exercise/by-external-id",
|
||||
summary="Übung per external_id abrufen",
|
||||
description=(
|
||||
"Liefert die Übung mit der gegebenen `external_id` (z. B. `mw:{pageid}`). "
|
||||
"Verwendet einen Qdrant-Filter auf dem Payload-Feld `external_id`."
|
||||
),
|
||||
response_description="Vollständiger Exercise-Payload oder 404 bei Nichtfund.",
|
||||
openapi_extra={
|
||||
"x-codeSamples": [{
|
||||
"lang": "bash",
|
||||
"label": "curl",
|
||||
"source": "curl -s 'http://localhost:8000/exercise/by-external-id?external_id=mw:218' | jq ."
|
||||
}]
|
||||
}
|
||||
)
|
||||
def get_exercise_by_external_id(external_id: str = Query(..., min_length=3, description="Upsert-Schlüssel, z. B. 'mw:218'")):
|
||||
found = _lookup_by_external_id(external_id)
|
||||
if not found:
|
||||
raise HTTPException(status_code=404, detail="not found")
|
||||
return found
|
||||
|
||||
|
||||
@router.post("/exercise", response_model=Exercise)
|
||||
@router.post(
|
||||
"/exercise",
|
||||
response_model=Exercise,
|
||||
summary="Create/Update (idempotent per external_id)",
|
||||
description=(
|
||||
"Legt eine Übung an oder aktualisiert sie. Wenn `external_id` vorhanden und bereits in der Collection existiert, "
|
||||
"wird **Update** auf dem bestehenden Punkt ausgeführt (Upsert). `keywords`/`equipment` werden normalisiert, "
|
||||
"Capability-Facetten (`capability_ge1..5`, `capability_eq1..5`, `capability_keys`) automatisch abgeleitet. "
|
||||
"Der Vektor wird aus Kernfeldern (title/summary/short_description/purpose/execution/notes) berechnet."
|
||||
),
|
||||
response_description="Gespeicherter Exercise-Datensatz (Payload-View).",
|
||||
openapi_extra={
|
||||
"x-codeSamples": [{
|
||||
"lang": "bash",
|
||||
"label": "curl",
|
||||
"source": "curl -s -X POST http://localhost:8000/exercise -H 'Content-Type: application/json' -d @exercise.json | jq ."
|
||||
}]
|
||||
}
|
||||
)
|
||||
def create_or_update_exercise(ex: Exercise):
|
||||
_ensure_collection()
|
||||
|
||||
|
|
@ -290,7 +390,20 @@ def create_or_update_exercise(ex: Exercise):
|
|||
return Exercise(**_response_strip_extras(payload))
|
||||
|
||||
|
||||
@router.get("/exercise/{exercise_id}", response_model=Exercise)
|
||||
@router.get(
|
||||
"/exercise/{exercise_id}",
|
||||
response_model=Exercise,
|
||||
summary="Übung per interner ID (Qdrant-Punkt-ID) lesen",
|
||||
description="Scrollt nach `id` und gibt den Payload als Exercise zurück.",
|
||||
response_description="Exercise-Payload oder 404 bei Nichtfund.",
|
||||
openapi_extra={
|
||||
"x-codeSamples": [{
|
||||
"lang": "bash",
|
||||
"label": "curl",
|
||||
"source": "curl -s 'http://localhost:8000/exercise/1234-uuid' | jq ."
|
||||
}]
|
||||
}
|
||||
)
|
||||
def get_exercise(exercise_id: str):
|
||||
_ensure_collection()
|
||||
pts, _ = qdrant.scroll(
|
||||
|
|
@ -306,7 +419,32 @@ def get_exercise(exercise_id: str):
|
|||
return Exercise(**_response_strip_extras(payload))
|
||||
|
||||
|
||||
@router.post("/exercise/search", response_model=ExerciseSearchResponse)
|
||||
@router.post(
|
||||
"/exercise/search",
|
||||
response_model=ExerciseSearchResponse,
|
||||
summary="Suche Übungen (Filter + optional Vektor)",
|
||||
description=(
|
||||
"Kombinierbare Filter auf Payload-Feldern (`discipline`, `age_group`, `target_group`, `equipment`, `keywords`, "
|
||||
"`capability_geN/eqN`) und **optional** Vektor-Suche via `query`. "
|
||||
"`should`-Filter (equipment_any/keywords_any) wirken als OR (minimum_should_match=1). "
|
||||
"`max_duration` wird als Range (lte) angewandt. Ergebnis enthält bei Vektor-Suche `score`, sonst `null`."
|
||||
),
|
||||
response_description="Trefferliste (payload + Score bei Vektor-Suche).",
|
||||
openapi_extra={
|
||||
"x-codeSamples": [
|
||||
{
|
||||
"lang": "bash",
|
||||
"label": "Filter",
|
||||
"source": "curl -s -X POST http://localhost:8000/exercise/search -H 'Content-Type: application/json' -d '{\"discipline\":\"Karate\",\"max_duration\":12,\"equipment_any\":[\"Bälle\"],\"capability_names\":[\"Reaktionsfähigkeit\"],\"capability_ge_level\":2,\"limit\":5}' | jq ."
|
||||
},
|
||||
{
|
||||
"lang": "bash",
|
||||
"label": "Vektor + Filter",
|
||||
"source": "curl -s -X POST http://localhost:8000/exercise/search -H 'Content-Type: application/json' -d '{\"query\":\"Aufwärmen 10min Teenager Bälle\",\"discipline\":\"Karate\",\"limit\":3}' | jq ."
|
||||
}
|
||||
]
|
||||
}
|
||||
)
|
||||
def search_exercises(req: ExerciseSearchRequest) -> ExerciseSearchResponse:
|
||||
_ensure_collection()
|
||||
flt = _build_filter(req)
|
||||
|
|
@ -314,7 +452,6 @@ def search_exercises(req: ExerciseSearchRequest) -> ExerciseSearchResponse:
|
|||
hits: List[ExerciseSearchHit] = []
|
||||
if req.query:
|
||||
vec = _make_vector_from_query(req.query)
|
||||
# qdrant_client.search unterstützt offset/limit
|
||||
res = qdrant.search(
|
||||
collection_name=COLLECTION,
|
||||
query_vector=vec,
|
||||
|
|
@ -327,8 +464,7 @@ def search_exercises(req: ExerciseSearchRequest) -> ExerciseSearchResponse:
|
|||
payload.setdefault("id", str(h.id))
|
||||
hits.append(ExerciseSearchHit(id=str(h.id), score=float(h.score or 0.0), payload=Exercise(**_response_strip_extras(payload))))
|
||||
else:
|
||||
# Filter-only: per Scroll (ohne Score); einfache Paginierung via offset/limit
|
||||
# Hole offset+limit Punkte und simuliere Score=None
|
||||
# Filter-only: Scroll-Paginierung, Score=None
|
||||
collected = 0
|
||||
skipped = 0
|
||||
next_offset = None
|
||||
|
|
@ -357,8 +493,24 @@ def search_exercises(req: ExerciseSearchRequest) -> ExerciseSearchResponse:
|
|||
return ExerciseSearchResponse(hits=hits)
|
||||
|
||||
|
||||
@router.delete("/exercise/delete-by-external-id", response_model=DeleteResponse)
|
||||
def delete_by_external_id(external_id: str = Query(...)):
|
||||
@router.delete(
|
||||
"/exercise/delete-by-external-id",
|
||||
response_model=DeleteResponse,
|
||||
summary="Löscht Punkte mit gegebener external_id",
|
||||
description=(
|
||||
"Scrollt nach `external_id` und löscht alle passenden Punkte. "
|
||||
"Idempotent: wenn nichts gefunden → count=0. Vorsicht: **löscht dauerhaft**."
|
||||
),
|
||||
response_description="Status + Anzahl gelöschter Punkte.",
|
||||
openapi_extra={
|
||||
"x-codeSamples": [{
|
||||
"lang": "bash",
|
||||
"label": "curl",
|
||||
"source": "curl -s 'http://localhost:8000/exercise/delete-by-external-id?external_id=mw:9999' | jq ."
|
||||
}]
|
||||
}
|
||||
)
|
||||
def delete_by_external_id(external_id: str = Query(..., description="Upsert-Schlüssel, z. B. 'mw:218'")):
|
||||
_ensure_collection()
|
||||
flt = Filter(must=[FieldCondition(key="external_id", match=MatchValue(value=external_id))])
|
||||
pts, _ = qdrant.scroll(collection_name=COLLECTION, scroll_filter=flt, limit=10000, with_payload=False)
|
||||
|
|
@ -369,8 +521,24 @@ def delete_by_external_id(external_id: str = Query(...)):
|
|||
return DeleteResponse(status="🗑️ gelöscht", count=len(ids), collection=COLLECTION)
|
||||
|
||||
|
||||
@router.delete("/exercise/delete-collection", response_model=DeleteResponse)
|
||||
def delete_collection(collection: str = Query(default=COLLECTION)):
|
||||
@router.delete(
|
||||
"/exercise/delete-collection",
|
||||
response_model=DeleteResponse,
|
||||
summary="Collection komplett löschen",
|
||||
description=(
|
||||
"Entfernt die gesamte Collection aus Qdrant. **Gefährlich** – alle Übungen sind danach weg. "
|
||||
"Nutze nur in Testumgebungen oder für einen kompletten Neuaufbau."
|
||||
),
|
||||
response_description="Status. count=0 (nicht relevant beim Drop).",
|
||||
openapi_extra={
|
||||
"x-codeSamples": [{
|
||||
"lang": "bash",
|
||||
"label": "curl",
|
||||
"source": "curl -s 'http://localhost:8000/exercise/delete-collection?collection=exercises' | jq ."
|
||||
}]
|
||||
}
|
||||
)
|
||||
def delete_collection(collection: str = Query(default=COLLECTION, description="Collection-Name (Default: 'exercises')")):
|
||||
if not qdrant.collection_exists(collection):
|
||||
raise HTTPException(status_code=404, detail=f"Collection '{collection}' nicht gefunden.")
|
||||
qdrant.delete_collection(collection_name=collection)
|
||||
|
|
@ -384,7 +552,6 @@ TEST_DOC = """
|
|||
Speicher als tests/test_exercise_search.py und mit pytest laufen lassen.
|
||||
|
||||
import os, requests
|
||||
|
||||
BASE = os.getenv("API_BASE", "http://localhost:8000")
|
||||
|
||||
# 1) Filter-only
|
||||
|
|
|
|||
|
|
@ -1,37 +1,161 @@
|
|||
from dotenv import load_dotenv
|
||||
load_dotenv() # Lädt Variablen aus .env in os.environ
|
||||
# -*- coding: utf-8 -*-
|
||||
"""
|
||||
llm_api.py – v1.2.0 (zentraler .env-Bootstrap, saubere Router-Einbindung, Swagger-Doku)
|
||||
|
||||
Änderungen ggü. v1.1.6:
|
||||
- Zentrales .env-Bootstrapping VOR allen Router-Imports (findet Datei robust; setzt LLMAPI_ENV_FILE/LLMAPI_ENV_BOOTSTRAPPED)
|
||||
- Konsistente Swagger-Beschreibung + Tags-Metadaten
|
||||
- Router ohne doppelte Prefixe einbinden (die Prefixe werden in den Routern definiert)
|
||||
- Root-/health und /version Endpoints
|
||||
- Defensive Includes (Router-Importfehler verhindern Server-Absturz; Logging statt Crash)
|
||||
- Beibehaltener globaler Fehlerhandler (generische 500)
|
||||
|
||||
Hinweis:
|
||||
- wiki_router im Canvas (v1.4.2) nutzt bereits robustes .env-Loading, respektiert aber die zentral gesetzten ENV-Variablen.
|
||||
- Wenn du ENV-Datei an anderem Ort hast, setze in der Systemd-Unit `Environment=LLMAPI_ENV_FILE=/pfad/.env`.
|
||||
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import os
|
||||
from pathlib import Path
|
||||
from textwrap import dedent
|
||||
from typing import Optional
|
||||
|
||||
from fastapi import FastAPI
|
||||
from fastapi.responses import JSONResponse
|
||||
from clients import model, qdrant
|
||||
from wiki_router import router as wiki_router
|
||||
from embed_router import router as embed_router
|
||||
from exercise_router import router as exercise_router
|
||||
from plan_router import router as plan_router
|
||||
from plan_session_router import router as plan_session_router
|
||||
from fastapi.middleware.cors import CORSMiddleware
|
||||
|
||||
# Version
|
||||
__version__ = "1.1.6"
|
||||
# ----------------------
|
||||
# Zentraler .env-Bootstrap (VOR Router-Imports ausführen!)
|
||||
# ----------------------
|
||||
def _bootstrap_env() -> Optional[str]:
|
||||
try:
|
||||
from dotenv import load_dotenv, find_dotenv
|
||||
except Exception:
|
||||
print("[env] python-dotenv nicht installiert – überspringe .env-Loading", flush=True)
|
||||
return None
|
||||
|
||||
candidates: list[str] = []
|
||||
if os.getenv("LLMAPI_ENV_FILE"):
|
||||
candidates.append(os.getenv("LLMAPI_ENV_FILE") or "")
|
||||
fd = find_dotenv(".env", usecwd=True)
|
||||
if fd:
|
||||
candidates.append(fd)
|
||||
candidates += [
|
||||
str(Path.cwd() / ".env"),
|
||||
str(Path(__file__).parent / ".env"),
|
||||
str(Path.home() / ".env"),
|
||||
str(Path.home() / ".llm-api.env"),
|
||||
"/etc/llm-api.env",
|
||||
]
|
||||
|
||||
for p in candidates:
|
||||
try:
|
||||
if p and Path(p).exists():
|
||||
if load_dotenv(p, override=False):
|
||||
os.environ["LLMAPI_ENV_FILE"] = p
|
||||
os.environ["LLMAPI_ENV_BOOTSTRAPPED"] = "1"
|
||||
print(f"[env] loaded: {p}", flush=True)
|
||||
return p
|
||||
except Exception as e:
|
||||
print(f"[env] load failed for {p}: {e}", flush=True)
|
||||
print("[env] no .env found; using process env", flush=True)
|
||||
return None
|
||||
|
||||
_ENV_SRC = _bootstrap_env()
|
||||
|
||||
# ----------------------
|
||||
# App + OpenAPI-Metadaten
|
||||
# ----------------------
|
||||
__version__ = "1.2.0"
|
||||
print(f"[DEBUG] llm_api.py version {__version__} loaded from {__file__}", flush=True)
|
||||
|
||||
TAGS = [
|
||||
{
|
||||
"name": "wiki",
|
||||
"description": dedent(
|
||||
"""
|
||||
MediaWiki-Proxy (Health, Login, Page-Info/Parse, SMW-Ask).
|
||||
**ENV**: `WIKI_API_URL`, `WIKI_TIMEOUT`, `WIKI_RETRIES`, `WIKI_SLEEP_MS`, `WIKI_BATCH`.
|
||||
"""
|
||||
),
|
||||
},
|
||||
{
|
||||
"name": "exercise",
|
||||
"description": dedent(
|
||||
"""
|
||||
Übungen (Upsert, Suche, Delete). Upsert-Schlüssel: `external_id` (z. B. `mw:{pageid}`).
|
||||
**ENV**: `EXERCISE_COLLECTION`, `QDRANT_HOST`, `QDRANT_PORT`.
|
||||
"""
|
||||
),
|
||||
},
|
||||
{
|
||||
"name": "plans",
|
||||
"description": "Trainingspläne (Templates/Generate/Export).",
|
||||
},
|
||||
]
|
||||
|
||||
|
||||
# FastAPI-Instanz
|
||||
app = FastAPI(
|
||||
title="KI Trainerassistent API",
|
||||
description="Modulare API für Trainingsplanung und MediaWiki-Import",
|
||||
description=dedent(
|
||||
f"""
|
||||
Modulare API für Trainingsplanung und MediaWiki-Import.
|
||||
|
||||
**Version:** {__version__}
|
||||
|
||||
## Quickstart (CLI)
|
||||
```bash
|
||||
python3 wiki_importer.py --all
|
||||
python3 wiki_importer.py --all --category "Übungen" --dry-run
|
||||
```
|
||||
"""
|
||||
),
|
||||
version=__version__,
|
||||
openapi_tags=TAGS,
|
||||
swagger_ui_parameters={"docExpansion": "list", "defaultModelsExpandDepth": 0},
|
||||
)
|
||||
|
||||
# Globaler Fehlerhandler
|
||||
# Optional: CORS für lokale UIs/Tools
|
||||
app.add_middleware(
|
||||
CORSMiddleware,
|
||||
allow_origins=["*"],
|
||||
allow_methods=["*"],
|
||||
allow_headers=["*"],
|
||||
)
|
||||
|
||||
# ----------------------
|
||||
# Globaler Fehlerhandler (generisch)
|
||||
# ----------------------
|
||||
@app.exception_handler(Exception)
|
||||
async def unicorn_exception_handler(request, exc):
|
||||
return JSONResponse(status_code=500, content={"detail": "Interner Serverfehler."})
|
||||
|
||||
# Router einbinden
|
||||
app.include_router(wiki_router)
|
||||
app.include_router(embed_router)
|
||||
app.include_router(exercise_router)
|
||||
app.include_router(plan_router)
|
||||
app.include_router(plan_session_router)
|
||||
# ----------------------
|
||||
# Router einbinden (WICHTIG: keine zusätzlichen Prefixe hier setzen)
|
||||
# ----------------------
|
||||
|
||||
def _include_router_safely(name: str, import_path: str):
|
||||
try:
|
||||
module = __import__(import_path, fromlist=["router"]) # lazy import nach ENV-Bootstrap
|
||||
app.include_router(module.router)
|
||||
print(f"[router] {name} included", flush=True)
|
||||
except Exception as e:
|
||||
print(f"[router] {name} NOT included: {e}", flush=True)
|
||||
|
||||
_include_router_safely("wiki_router", "wiki_router") # prefix in Datei: /import/wiki
|
||||
_include_router_safely("embed_router", "embed_router")
|
||||
_include_router_safely("exercise_router", "exercise_router")
|
||||
_include_router_safely("plan_router", "plan_router")
|
||||
_include_router_safely("plan_session_router", "plan_session_router")
|
||||
|
||||
# ----------------------
|
||||
# Basis-Endpunkte
|
||||
# ----------------------
|
||||
@app.get("/health", tags=["wiki"], summary="API-Health (lokal)")
|
||||
def api_health():
|
||||
return {"status": "ok"}
|
||||
|
||||
@app.get("/version", tags=["wiki"], summary="API-Version & ENV-Quelle")
|
||||
def api_version():
|
||||
return {"version": __version__, "env_file": _ENV_SRC}
|
||||
|
|
|
|||
|
|
@ -1,20 +1,16 @@
|
|||
# -*- coding: utf-8 -*-
|
||||
"""
|
||||
wiki_router.py – v1.4.2 (Swagger angereichert)
|
||||
wiki_router.py – v1.4.3 (Swagger + robustes .env + optionaler ENV-Login)
|
||||
|
||||
Änderungen ggü. v1.4.1:
|
||||
- Alle Endpunkte mit aussagekräftigem `summary`/`description`/`response_description` versehen
|
||||
- Parameter-Beschreibungen ergänzt (z. B. `verbose`, `category`, `title`)
|
||||
- Beispiele über `x-codeSamples` (cURL) und `json_schema_extra`
|
||||
- **Keine API-Signaturänderungen**
|
||||
Änderungen ggü. v1.4.2:
|
||||
- **/login/env** hinzugefügt: Login mit WIKI_BOT_USER/WIKI_BOT_PASSWORD aus ENV (Secrets werden nie ausgegeben)
|
||||
- .env-Bootstrap robuster und **vor** dem ersten Aufruf geloggt
|
||||
- /.meta/env/runtime um Credentials-Flags ergänzt (ohne Klartext)
|
||||
- response_description-Strings mit JSON-Beispielen sauber gequotet
|
||||
- Keine Breaking-Changes (Signaturen & Pfade unverändert)
|
||||
|
||||
Ziele:
|
||||
- /semantic/pages reichert pageid/fullurl für ALLE Titel batchweise an (redirects=1, converttitles=1)
|
||||
- /info robust: 404 statt 500, mit Titel-Varianten (Leerzeichen/Unterstrich/Bindestrich)
|
||||
- Wiederholungen & Throttling gegen MediaWiki (WIKI_RETRIES, WIKI_SLEEP_MS)
|
||||
- Optional: Diagnose-Ausgaben (verbose) und Coverage-Kennzahlen (Logs)
|
||||
|
||||
Hinweis Prefix:
|
||||
- Der Router setzt `prefix="/import/wiki"`. In `llm_api.py` **ohne** weiteren Prefix einbinden, sonst entstehen Doppelpfade.
|
||||
Prefix-Hinweis:
|
||||
- Der Router setzt `prefix="/import/wiki"`. In `llm_api.py` **ohne** weiteren Prefix einbinden.
|
||||
"""
|
||||
|
||||
from typing import Dict, Any, Optional, List
|
||||
|
|
@ -23,16 +19,64 @@ from pydantic import BaseModel, Field
|
|||
from textwrap import dedent
|
||||
import os, time, logging
|
||||
import requests
|
||||
from dotenv import load_dotenv
|
||||
|
||||
load_dotenv()
|
||||
from dotenv import load_dotenv, find_dotenv
|
||||
from starlette.responses import PlainTextResponse
|
||||
|
||||
# -------------------------------------------------
|
||||
# Logging **vor** .env-Bootstrap initialisieren
|
||||
# -------------------------------------------------
|
||||
logger = logging.getLogger("wiki_router")
|
||||
logger.setLevel(logging.INFO)
|
||||
|
||||
router = APIRouter(prefix="/import/wiki", tags=["wiki"])
|
||||
# -------------------------------------------------
|
||||
# Robustes .env-Loading (findet Datei auch außerhalb des CWD)
|
||||
# -------------------------------------------------
|
||||
|
||||
# -------- Konfiguration --------
|
||||
def _bootstrap_env() -> Optional[str]:
|
||||
"""Versucht mehrere typische Pfade für .env zu laden und loggt die Fundstelle.
|
||||
Reihenfolge:
|
||||
1) env `LLMAPI_ENV_FILE`
|
||||
2) find_dotenv() relativ zum CWD
|
||||
3) CWD/.env
|
||||
4) Verzeichnis dieser Datei /.env
|
||||
5) $HOME/.env
|
||||
6) $HOME/.llm-api.env
|
||||
7) /etc/llm-api.env
|
||||
"""
|
||||
candidates: List[str] = []
|
||||
if os.getenv("LLMAPI_ENV_FILE"):
|
||||
candidates.append(os.getenv("LLMAPI_ENV_FILE") or "")
|
||||
fd = find_dotenv(".env", usecwd=True)
|
||||
if fd:
|
||||
candidates.append(fd)
|
||||
candidates += [
|
||||
os.path.join(os.getcwd(), ".env"),
|
||||
os.path.join(os.path.dirname(__file__), ".env"),
|
||||
os.path.expanduser("~/.env"),
|
||||
os.path.expanduser("~/.llm-api.env"),
|
||||
"/etc/llm-api.env",
|
||||
]
|
||||
for path in candidates:
|
||||
try:
|
||||
if path and os.path.exists(path):
|
||||
loaded = load_dotenv(path, override=False)
|
||||
if loaded:
|
||||
logger.info("wiki_router: .env geladen aus %s", path)
|
||||
return path
|
||||
except Exception as e:
|
||||
logger.warning("wiki_router: .env laden fehlgeschlagen (%s): %s", path, e)
|
||||
logger.info("wiki_router: keine .env gefunden – verwende Prozess-Umgebung")
|
||||
return None
|
||||
|
||||
_BOOTSTRAP_ENV = _bootstrap_env()
|
||||
|
||||
# -------------------------------------------------
|
||||
# Router & Konfiguration
|
||||
# -------------------------------------------------
|
||||
router = APIRouter(prefix="/import/wiki", tags=["wiki"])
|
||||
|
||||
# Hinweis: Werte werden NACH dem .env-Bootstrap aus os.environ gelesen.
|
||||
# Änderungen an .env erfordern i. d. R. einen Neustart des Dienstes.
|
||||
WIKI_API_URL = os.getenv("WIKI_API_URL", "https://karatetrainer.net/api.php")
|
||||
WIKI_TIMEOUT = float(os.getenv("WIKI_TIMEOUT", "15"))
|
||||
WIKI_BATCH = int(os.getenv("WIKI_BATCH", "50"))
|
||||
|
|
@ -41,18 +85,15 @@ WIKI_SLEEPMS = int(os.getenv("WIKI_SLEEP_MS", "0")) # Throttle zwischen Requ
|
|||
|
||||
# Single Session (Cookies für Login)
|
||||
wiki_session = requests.Session()
|
||||
wiki_session.headers.update({"User-Agent": "local-llm-wiki-proxy/1.4.2"})
|
||||
wiki_session.headers.update({"User-Agent": "local-llm-wiki-proxy/1.4.3"})
|
||||
|
||||
# -------- Schemas --------
|
||||
# -------------------------------------------------
|
||||
# Schemas
|
||||
# -------------------------------------------------
|
||||
class WikiLoginRequest(BaseModel):
|
||||
username: str = Field(..., description="MediaWiki-Benutzername (kein .env-Wert)")
|
||||
password: str = Field(..., description="MediaWiki-Passwort (kein .env-Wert)")
|
||||
|
||||
model_config = {
|
||||
"json_schema_extra": {
|
||||
"example": {"username": "Bot", "password": "••••••"}
|
||||
}
|
||||
}
|
||||
model_config = {"json_schema_extra": {"example": {"username": "Bot", "password": "••••••"}}}
|
||||
|
||||
class WikiLoginResponse(BaseModel):
|
||||
status: str = Field(..., description="'success' bei erfolgreichem Login")
|
||||
|
|
@ -62,25 +103,17 @@ class PageInfoResponse(BaseModel):
|
|||
pageid: int = Field(..., description="Eindeutige PageID der MediaWiki-Seite")
|
||||
title: str = Field(..., description="Aufgelöster Titel (kann von Eingabe abweichen, z. B. Redirect/Normalize)")
|
||||
fullurl: str = Field(..., description="Kanonsiche URL zur Seite")
|
||||
|
||||
model_config = {
|
||||
"json_schema_extra": {
|
||||
"example": {"pageid": 218, "title": "Affenklatschen", "fullurl": "https://…/index.php?title=Affenklatschen"}
|
||||
}
|
||||
}
|
||||
model_config = {"json_schema_extra": {"example": {"pageid": 218, "title": "Affenklatschen", "fullurl": "https://…/index.php?title=Affenklatschen"}}}
|
||||
|
||||
class PageContentResponse(BaseModel):
|
||||
pageid: int = Field(..., description="PageID der angefragten Seite")
|
||||
title: str = Field(..., description="Echo des mitgegebenen Titels (optional)")
|
||||
wikitext: str = Field(..., description="Roh-Wikitext (inkl. Templates), keine Sanitization")
|
||||
model_config = {"json_schema_extra": {"example": {"pageid": 218, "title": "Affenklatschen", "wikitext": "{{ÜbungInfoBox|…}}"}}}
|
||||
|
||||
model_config = {
|
||||
"json_schema_extra": {
|
||||
"example": {"pageid": 218, "title": "Affenklatschen", "wikitext": "{{ÜbungInfoBox|…}}"}
|
||||
}
|
||||
}
|
||||
|
||||
# -------- Utils --------
|
||||
# -------------------------------------------------
|
||||
# Utils
|
||||
# -------------------------------------------------
|
||||
|
||||
def _sleep():
|
||||
if WIKI_SLEEPMS > 0:
|
||||
|
|
@ -163,7 +196,117 @@ def _fetch_pageinfo_batch(titles: List[str]) -> Dict[str, Dict[str, Any]]:
|
|||
_sleep()
|
||||
return out
|
||||
|
||||
# -------- Endpoints --------
|
||||
# -------------------------------------------------
|
||||
# Doku-Konstanten (Markdown/.env)
|
||||
# -------------------------------------------------
|
||||
MANUAL_WIKI_IMPORTER = dedent("""
|
||||
# wiki_importer.py – Kurzanleitung
|
||||
|
||||
## Voraussetzungen
|
||||
- API erreichbar: `GET /import/wiki/health` (Status `ok`)
|
||||
- .env:
|
||||
- `API_BASE_URL=http://localhost:8000`
|
||||
- `WIKI_BOT_USER`, `WIKI_BOT_PASSWORD`
|
||||
- optional: `EXERCISE_COLLECTION=exercises`
|
||||
|
||||
## Smoke-Test (3 Läufe)
|
||||
```bash
|
||||
python3 wiki_importer.py --title "Affenklatschen" --category "Übungen" --smoke-test
|
||||
```
|
||||
|
||||
## Vollimport
|
||||
```bash
|
||||
python3 wiki_importer.py --all
|
||||
# optional:
|
||||
python3 wiki_importer.py --all --category "Übungen"
|
||||
python3 wiki_importer.py --all --dry-run
|
||||
```
|
||||
|
||||
## Idempotenz-Logik
|
||||
- external_id = `mw:{pageid}`
|
||||
- Fingerprint (sha256) über: `title, summary, execution, notes, duration_minutes, capabilities, keywords`
|
||||
- Entscheid:
|
||||
- not found → create
|
||||
- fingerprint gleich → skip
|
||||
- fingerprint ungleich → update (+ `imported_at`)
|
||||
|
||||
## Mapping (Wiki → Exercise)
|
||||
- Schlüsselworte → `keywords` (`,`-getrennt, getrimmt, dedupliziert)
|
||||
- Hilfsmittel → `equipment`
|
||||
- Disziplin → `discipline`
|
||||
- Durchführung/Notizen/Vorbereitung/Methodik → `execution`, `notes`, `preparation`, `method`
|
||||
- Capabilities → `capabilities` (Level 1..5) + Facetten (`capability_ge1..5`, `capability_eq1..5`, `capability_keys`)
|
||||
- Metadaten → `external_id`, `source="mediawiki"`, `imported_at`
|
||||
|
||||
## Troubleshooting
|
||||
- 404 bei `/import/wiki/info?...`: prüfe Prefix (kein Doppelprefix), Titelvarianten
|
||||
- 401 Login: echte User-Creds verwenden
|
||||
- 502 Upstream: `WIKI_API_URL`/TLS prüfen; Timeouts/Retry/Throttle (`WIKI_TIMEOUT`, `WIKI_RETRIES`, `WIKI_SLEEP_MS`)
|
||||
""")
|
||||
|
||||
ENV_DOC = [
|
||||
{"name": "WIKI_API_URL", "desc": "Basis-URL zur MediaWiki-API (z. B. http://…/w/api.php)"},
|
||||
{"name": "WIKI_TIMEOUT", "desc": "Timeout in Sekunden (Default 15)"},
|
||||
{"name": "WIKI_RETRIES", "desc": "Anzahl zusätzlicher Versuche (Default 1)"},
|
||||
{"name": "WIKI_SLEEP_MS", "desc": "Throttle zwischen Requests in Millisekunden (Default 0)"},
|
||||
{"name": "WIKI_BATCH", "desc": "Batchgröße für Titel-Enrichment (Default 50)"},
|
||||
{"name": "WIKI_BOT_USER", "desc": "(optional) Benutzername für /login/env – **Wert wird nie im Klartext zurückgegeben**"},
|
||||
{"name": "WIKI_BOT_PASSWORD", "desc": "(optional) Passwort für /login/env – **Wert wird nie im Klartext zurückgegeben**"},
|
||||
]
|
||||
|
||||
# -------------------------------------------------
|
||||
# Doku-/Meta-Endpunkte
|
||||
# -------------------------------------------------
|
||||
@router.get(
|
||||
"/manual/wiki_importer",
|
||||
summary="Handbuch: wiki_importer.py (Markdown)",
|
||||
description="Kompaktes Handbuch mit .env-Hinweisen, Aufrufen, Idempotenz und Troubleshooting.",
|
||||
response_class=PlainTextResponse,
|
||||
response_description="Markdown-Text.",
|
||||
openapi_extra={
|
||||
"x-codeSamples": [
|
||||
{"lang": "bash", "label": "Vollimport (Standard)", "source": "python3 wiki_importer.py --all"},
|
||||
{"lang": "bash", "label": "Dry-Run + Kategorie", "source": "python3 wiki_importer.py --all --category \"Übungen\" --dry-run"},
|
||||
]
|
||||
},
|
||||
)
|
||||
def manual_wiki_importer():
|
||||
return MANUAL_WIKI_IMPORTER
|
||||
|
||||
|
||||
@router.get(
|
||||
"/meta/env",
|
||||
summary=".env Referenz (Wiki-bezogen)",
|
||||
description="Listet die relevanten Umgebungsvariablen für die Wiki-Integration auf (ohne Werte).",
|
||||
response_description="Array aus {name, desc}.",
|
||||
)
|
||||
def meta_env() -> List[Dict[str, str]]:
|
||||
return ENV_DOC
|
||||
|
||||
|
||||
@router.get(
|
||||
"/meta/env/runtime",
|
||||
summary=".env Runtime (wirksame Werte)",
|
||||
description="Zeigt die aktuell wirksamen Konfigurationswerte für den Wiki-Router (ohne Secrets) und die geladene .env-Quelle.",
|
||||
response_description="Objekt mit 'loaded_from' und 'env' (Key→Value).",
|
||||
)
|
||||
def meta_env_runtime() -> Dict[str, Any]:
|
||||
keys = ["WIKI_API_URL", "WIKI_TIMEOUT", "WIKI_RETRIES", "WIKI_SLEEP_MS", "WIKI_BATCH"]
|
||||
has_user = bool(os.getenv("WIKI_BOT_USER"))
|
||||
has_pwd = bool(os.getenv("WIKI_BOT_PASSWORD"))
|
||||
return {
|
||||
"loaded_from": _BOOTSTRAP_ENV,
|
||||
"env": {k: os.getenv(k) for k in keys},
|
||||
"credentials": {
|
||||
"WIKI_BOT_USER_set": has_user,
|
||||
"WIKI_BOT_PASSWORD_set": has_pwd,
|
||||
"ready_for_login_env": has_user and has_pwd,
|
||||
},
|
||||
}
|
||||
|
||||
# -------------------------------------------------
|
||||
# API-Endpunkte
|
||||
# -------------------------------------------------
|
||||
@router.get(
|
||||
"/health",
|
||||
summary="Ping & Site-Info des MediaWiki-Upstreams",
|
||||
|
|
@ -179,12 +322,12 @@ def _fetch_pageinfo_batch(titles: List[str]) -> Dict[str, Dict[str, Any]]:
|
|||
**Hinweis**: Je nach Wiki-Konfiguration sind detaillierte Infos (Generator/Sitename) nur **nach Login** sichtbar.
|
||||
"""
|
||||
),
|
||||
response_description="`{\"status\":\"ok\"}` oder mit `wiki.sitename/generator` bei `verbose=1`.",
|
||||
response_description='`{"status":"ok"}` oder mit `wiki.sitename/generator` bei `verbose=1`.',
|
||||
openapi_extra={
|
||||
"x-codeSamples": [
|
||||
{"lang": "bash", "label": "curl", "source": "curl -s 'http://localhost:8000/import/wiki/health?verbose=1' | jq ."}
|
||||
]
|
||||
}
|
||||
},
|
||||
)
|
||||
def health(verbose: Optional[int] = Query(default=0, description="1 = Site-Metadaten (sitename/generator) mitsenden")) -> Dict[str, Any]:
|
||||
resp = _request_with_retry("GET", {"action": "query", "meta": "siteinfo", "format": "json"})
|
||||
|
|
@ -211,7 +354,7 @@ def health(verbose: Optional[int] = Query(default=0, description="1 = Site-Metad
|
|||
- Respektiert Retry/Throttle aus `.env`.
|
||||
"""
|
||||
),
|
||||
response_description="`{\"status\":\"success\"}` bei Erfolg."
|
||||
response_description='`{"status":"success"}` bei Erfolg.',
|
||||
)
|
||||
def login(data: WikiLoginRequest):
|
||||
# Token holen
|
||||
|
|
@ -249,6 +392,33 @@ def login(data: WikiLoginRequest):
|
|||
raise HTTPException(status_code=401, detail=f"Login fehlgeschlagen: {res}")
|
||||
|
||||
|
||||
@router.post(
|
||||
"/login/env",
|
||||
response_model=WikiLoginResponse,
|
||||
summary="MediaWiki-Login mit .env-Credentials",
|
||||
description=dedent(
|
||||
"""
|
||||
Führt den Login mit **WIKI_BOT_USER/WIKI_BOT_PASSWORD** aus der Prozess-Umgebung durch.
|
||||
Praktisch für geplante Jobs/CLI ohne Übergabe im Body. Secrets werden **nie** im Klartext zurückgegeben.
|
||||
|
||||
**Voraussetzung**: Beide Variablen sind gesetzt (siehe `/import/wiki/meta/env/runtime`).
|
||||
"""
|
||||
),
|
||||
response_description='`{"status":"success"}` bei Erfolg.',
|
||||
openapi_extra={
|
||||
"x-codeSamples": [
|
||||
{"lang": "bash", "label": "curl", "source": "curl -s -X POST http://localhost:8000/import/wiki/login/env | jq ."}
|
||||
]
|
||||
},
|
||||
)
|
||||
def login_env():
|
||||
user = os.getenv("WIKI_BOT_USER")
|
||||
pwd = os.getenv("WIKI_BOT_PASSWORD")
|
||||
if not user or not pwd:
|
||||
raise HTTPException(status_code=400, detail="WIKI_BOT_USER/WIKI_BOT_PASSWORD nicht gesetzt")
|
||||
return login(WikiLoginRequest(username=user, password=pwd))
|
||||
|
||||
|
||||
@router.get(
|
||||
"/semantic/pages",
|
||||
summary="SMW-Ask-Ergebnisse einer Kategorie mit PageID/URL anreichern",
|
||||
|
|
@ -270,16 +440,14 @@ def login(data: WikiLoginRequest):
|
|||
"x-codeSamples": [
|
||||
{"lang": "bash", "label": "curl", "source": "curl -s 'http://localhost:8000/import/wiki/semantic/pages?category=%C3%9Cbungen' | jq . | head"}
|
||||
]
|
||||
}
|
||||
},
|
||||
)
|
||||
def semantic_pages(category: str = Query(..., description="Kategorie-Name **ohne** 'Category:' Präfix")) -> Dict[str, Any]:
|
||||
# Rohdaten aus SMW (Ask)
|
||||
ask_query = f"[[Category:{category}]]|limit=50000"
|
||||
r = _request_with_retry("GET", {"action": "ask", "query": ask_query, "format": "json"})
|
||||
results = r.json().get("query", {}).get("results", {}) or {}
|
||||
titles = list(results.keys())
|
||||
|
||||
# Batch-Anreicherung mit pageid/fullurl für ALLE Titel
|
||||
info_map = _fetch_pageinfo_batch(titles)
|
||||
|
||||
enriched: Dict[str, Any] = {}
|
||||
|
|
@ -319,7 +487,7 @@ def semantic_pages(category: str = Query(..., description="Kategorie-Name **ohne
|
|||
"x-codeSamples": [
|
||||
{"lang": "bash", "label": "curl", "source": "curl -s 'http://localhost:8000/import/wiki/parsepage?pageid=218&title=Affenklatschen' | jq ."}
|
||||
]
|
||||
}
|
||||
},
|
||||
)
|
||||
def parse_page(pageid: int = Query(..., description="Numerische PageID der Seite"), title: str = Query(None, description="Optional: Seitentitel (nur Echo)")):
|
||||
resp = _request_with_retry("GET", {"action": "parse", "pageid": pageid, "prop": "wikitext", "format": "json"})
|
||||
|
|
@ -347,16 +515,14 @@ def parse_page(pageid: int = Query(..., description="Numerische PageID der Seite
|
|||
"x-codeSamples": [
|
||||
{"lang": "bash", "label": "curl", "source": "curl -s 'http://localhost:8000/import/wiki/info?title=Affenklatschen' | jq ."}
|
||||
]
|
||||
}
|
||||
},
|
||||
)
|
||||
def page_info(title: str = Query(..., description="Seitentitel (unscharf; Varianten werden versucht)")):
|
||||
# 1. Versuch: wie geliefert, mit redirects/converttitles
|
||||
res = _fetch_pageinfo_batch([title])
|
||||
if res.get(title):
|
||||
d = res[title]
|
||||
return PageInfoResponse(pageid=d["pageid"], title=title, fullurl=d.get("fullurl", ""))
|
||||
|
||||
# 2. Varianten probieren
|
||||
for v in _normalize_variants(title):
|
||||
if v == title:
|
||||
continue
|
||||
|
|
@ -365,5 +531,4 @@ def page_info(title: str = Query(..., description="Seitentitel (unscharf; Varian
|
|||
d = res2[v]
|
||||
return PageInfoResponse(pageid=d["pageid"], title=v, fullurl=d.get("fullurl", ""))
|
||||
|
||||
# 3. sauber 404
|
||||
raise HTTPException(status_code=404, detail=f"Page not found: {title}")
|
||||
|
|
|
|||
Loading…
Reference in New Issue
Block a user