Dateien nach "docs" hochladen
All checks were successful
Deploy mindnet to llm-node / deploy (push) Successful in 3s
All checks were successful
Deploy mindnet to llm-node / deploy (push) Successful in 3s
This commit is contained in:
parent
95affe87e5
commit
3374c41f07
96
docs/docs_mindnet_retriever.md
Normal file
96
docs/docs_mindnet_retriever.md
Normal file
|
|
@ -0,0 +1,96 @@
|
||||||
|
# mindnet Retriever (WP-04 / Step 4a) – Kurz-Dokumentation
|
||||||
|
|
||||||
|
## 1. Überblick
|
||||||
|
|
||||||
|
Der Retriever besteht aus:
|
||||||
|
|
||||||
|
- **Semantic Retriever** (`semantic_retrieve`)
|
||||||
|
- **Hybrid Retriever** (`hybrid_retrieve`)
|
||||||
|
- Qdrant-Backend (`*_chunks`, `*_edges`)
|
||||||
|
- optionaler Edge-Expansion über `graph_adapter.expand`
|
||||||
|
|
||||||
|
Die API nutzt das bestehende `/query`-Endpoint (FastAPI), das `QueryRequest` und `QueryResponse` aus `app.models.dto` verwendet.
|
||||||
|
|
||||||
|
## 2. Scoring-Formel
|
||||||
|
|
||||||
|
Pro Treffer werden folgende Komponenten geführt:
|
||||||
|
|
||||||
|
- `semantic_score` – direkt aus Qdrant (Cosine/Euclidean, je nach Collection)
|
||||||
|
- `retriever_weight` – aus Chunk-Payload (`types.yaml` / Frontmatter)
|
||||||
|
- `edge_bonus` – Bonus aus dem Subgraph (z. B. Zahl der eingehenden/ausgehenden Edges, Beziehungstypen)
|
||||||
|
- `centrality_bonus` – Bonus aus einer einfachen Zentralitäts-Heuristik im Subgraph
|
||||||
|
|
||||||
|
Aktuelle Formel (Step 4a):
|
||||||
|
|
||||||
|
```text
|
||||||
|
total_score = semantic_score * max(retriever_weight, 0.0)
|
||||||
|
+ edge_bonus
|
||||||
|
+ centrality_bonus
|
||||||
|
```
|
||||||
|
|
||||||
|
Die Sortierung der Ergebnisse erfolgt absteigend nach `total_score`.
|
||||||
|
|
||||||
|
## 3. Modi
|
||||||
|
|
||||||
|
### 3.1 Semantic Mode
|
||||||
|
|
||||||
|
- Funktion: `semantic_retrieve`
|
||||||
|
- Request-Feld `mode = "semantic"`
|
||||||
|
- Keine Edge-Expansion, kein Graph-Call.
|
||||||
|
- Kandidaten nur aus Vektorsuche in `<prefix>_chunks`.
|
||||||
|
|
||||||
|
### 3.2 Hybrid Mode
|
||||||
|
|
||||||
|
- Funktion: `hybrid_retrieve`
|
||||||
|
- Request-Feld `mode = "hybrid"`
|
||||||
|
- Schritte:
|
||||||
|
1. Semantische Chunk-Suche wie im Semantic Mode
|
||||||
|
2. Falls `expand.depth > 0` gesetzt ist:
|
||||||
|
- Seeds = `chunk_id` (Fallback: `note_id`) der Treffer
|
||||||
|
- Aufruf `graph_adapter.expand(client, prefix, seed_ids, depth, edge_types)`
|
||||||
|
- Ergebnis-Subgraph liefert `edge_bonus` und `centrality_bonus` pro Node
|
||||||
|
3. Scores werden nach obiger Formel kombiniert.
|
||||||
|
|
||||||
|
## 4. Request-Struktur (JSON)
|
||||||
|
|
||||||
|
Beispiel-Hybrid-Request:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"mode": "hybrid",
|
||||||
|
"query": "embeddings und qdrant",
|
||||||
|
"top_k": 10,
|
||||||
|
"expand": {
|
||||||
|
"depth": 1,
|
||||||
|
"edge_types": ["references", "depends_on"]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Minimaler Semantic-Request:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"mode": "semantic",
|
||||||
|
"query": "karate trainingsplan",
|
||||||
|
"top_k": 5
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## 5. Smoke-Test
|
||||||
|
|
||||||
|
Für einen schnellen End-to-End-Test kann folgendes Skript verwendet werden:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python scripts/test_retriever_smoke.py --query "karate" --mode hybrid --expand-depth 1 --top-k 5
|
||||||
|
```
|
||||||
|
|
||||||
|
Das Skript ruft `/query` auf, wertet Statuscode und JSON-Antwort aus und zeigt die Scores der Top-K-Treffer an.
|
||||||
|
|
||||||
|
## 6. Tests
|
||||||
|
|
||||||
|
Die wichtigsten Tests für den Retriever sind:
|
||||||
|
|
||||||
|
- `tests/test_retriever_basic.py` – Basisfunktion (Semantic + Hybrid)
|
||||||
|
- `tests/test_retriever_weight.py` – Einfluss von `retriever_weight` auf das Ranking
|
||||||
|
- `tests/test_retriever_edges.py` – Einfluss von Edge-/Centrality-Boni im Hybrid-Modus
|
||||||
Loading…
Reference in New Issue
Block a user