WP24c - Agentic Edge Validation & Chunk-Aware Multigraph-System (v4.5.8) #22
237
ANALYSE_TYPES_YAML_ZUGRIFFE.md
Normal file
237
ANALYSE_TYPES_YAML_ZUGRIFFE.md
Normal file
|
|
@ -0,0 +1,237 @@
|
||||||
|
# Analyse: Zugriffe auf config/types.yaml
|
||||||
|
|
||||||
|
## Zusammenfassung
|
||||||
|
|
||||||
|
Diese Analyse prüft, welche Scripte auf `config/types.yaml` zugreifen und ob sie auf Elemente zugreifen, die in der aktuellen `types.yaml` nicht mehr vorhanden sind.
|
||||||
|
|
||||||
|
**Datum:** 2025-01-XX
|
||||||
|
**Version types.yaml:** 2.7.0
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## ❌ KRITISCHE PROBLEME
|
||||||
|
|
||||||
|
### 1. `edge_defaults` fehlt in types.yaml, wird aber im Code verwendet
|
||||||
|
|
||||||
|
**Status:** ⚠️ **PROBLEM** - Code sucht nach `edge_defaults` in types.yaml, aber dieses Feld existiert nicht mehr.
|
||||||
|
|
||||||
|
**Betroffene Dateien:**
|
||||||
|
|
||||||
|
#### a) `app/core/graph/graph_utils.py` (Zeilen 101-112)
|
||||||
|
```python
|
||||||
|
def get_edge_defaults_for(note_type: Optional[str], reg: dict) -> List[str]:
|
||||||
|
"""Ermittelt Standard-Kanten für einen Typ."""
|
||||||
|
types_map = reg.get("types", reg) if isinstance(reg, dict) else {}
|
||||||
|
if note_type and isinstance(types_map, dict):
|
||||||
|
t = types_map.get(note_type)
|
||||||
|
if isinstance(t, dict) and isinstance(t.get("edge_defaults"), list): # ❌ Sucht nach edge_defaults
|
||||||
|
return [str(x) for x in t["edge_defaults"] if isinstance(x, str)]
|
||||||
|
for key in ("defaults", "default", "global"):
|
||||||
|
v = reg.get(key)
|
||||||
|
if isinstance(v, dict) and isinstance(v.get("edge_defaults"), list): # ❌ Sucht nach edge_defaults
|
||||||
|
return [str(x) for x in v["edge_defaults"] if isinstance(x, str)]
|
||||||
|
return []
|
||||||
|
```
|
||||||
|
**Problem:** Funktion gibt immer `[]` zurück, da `edge_defaults` nicht in types.yaml existiert.
|
||||||
|
|
||||||
|
#### b) `app/core/graph/graph_derive_edges.py` (Zeile 64)
|
||||||
|
```python
|
||||||
|
defaults = get_edge_defaults_for(note_type, reg) # ❌ Wird verwendet, liefert aber []
|
||||||
|
```
|
||||||
|
**Problem:** Keine automatischen Default-Kanten werden mehr erzeugt.
|
||||||
|
|
||||||
|
#### c) `app/services/discovery.py` (Zeile 212)
|
||||||
|
```python
|
||||||
|
defaults = type_def.get("edge_defaults") # ❌ Sucht nach edge_defaults
|
||||||
|
return defaults[0] if defaults else "related_to"
|
||||||
|
```
|
||||||
|
**Problem:** Fallback funktioniert, aber nutzt nicht die neue dynamische Lösung.
|
||||||
|
|
||||||
|
#### d) `tests/check_types_registry_edges.py` (Zeile 170)
|
||||||
|
```python
|
||||||
|
eddefs = (tdef or {}).get("edge_defaults") or [] # ❌ Sucht nach edge_defaults
|
||||||
|
```
|
||||||
|
**Problem:** Test findet keine `edge_defaults` mehr und gibt Warnung aus.
|
||||||
|
|
||||||
|
**✅ Lösung bereits implementiert:**
|
||||||
|
- `app/core/ingestion/ingestion_note_payload.py` (WP-24c, Zeilen 124-134) nutzt bereits die neue dynamische Lösung über `edge_registry.get_topology_info()`.
|
||||||
|
|
||||||
|
**Empfehlung:**
|
||||||
|
- `get_edge_defaults_for()` in `graph_utils.py` sollte auf die EdgeRegistry umgestellt werden.
|
||||||
|
- `discovery.py` sollte ebenfalls die EdgeRegistry nutzen.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### 2. Inkonsistenz: `chunk_profile` vs `chunking_profile`
|
||||||
|
|
||||||
|
**Status:** ⚠️ **WARNUNG** - Meistens abgefangen durch Fallback-Logik.
|
||||||
|
|
||||||
|
**Problem:**
|
||||||
|
- In `types.yaml` heißt es: `chunking_profile` ✅
|
||||||
|
- `app/core/type_registry.py` (Zeile 88) sucht nach: `chunk_profile` ❌
|
||||||
|
|
||||||
|
```python
|
||||||
|
def effective_chunk_profile(note_type: Optional[str], reg: Dict[str, Any]) -> Optional[str]:
|
||||||
|
cfg = get_type_config(note_type, reg)
|
||||||
|
prof = cfg.get("chunk_profile") # ❌ Sucht nach "chunk_profile", aber types.yaml hat "chunking_profile"
|
||||||
|
if isinstance(prof, str) and prof.strip():
|
||||||
|
return prof.strip().lower()
|
||||||
|
return None
|
||||||
|
```
|
||||||
|
|
||||||
|
**Betroffene Dateien:**
|
||||||
|
- `app/core/type_registry.py` (Zeile 88) - verwendet `chunk_profile` statt `chunking_profile`
|
||||||
|
|
||||||
|
**✅ Gut gehandhabt:**
|
||||||
|
- `app/core/ingestion/ingestion_chunk_payload.py` (Zeile 33) - hat Fallback: `t_cfg.get(key) or t_cfg.get(key.replace("ing", ""))`
|
||||||
|
- `app/core/ingestion/ingestion_note_payload.py` (Zeile 120) - prüft beide Varianten
|
||||||
|
|
||||||
|
**Empfehlung:**
|
||||||
|
- `type_registry.py` sollte auch `chunking_profile` prüfen (oder beide Varianten).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## ✅ KORREKT VERWENDETE ELEMENTE
|
||||||
|
|
||||||
|
### 1. `chunking_profiles` ✅
|
||||||
|
- **Verwendet in:**
|
||||||
|
- `app/core/chunking/chunking_utils.py` (Zeile 33) ✅
|
||||||
|
- **Status:** Korrekt vorhanden in types.yaml
|
||||||
|
|
||||||
|
### 2. `defaults` ✅
|
||||||
|
- **Verwendet in:**
|
||||||
|
- `app/core/ingestion/ingestion_chunk_payload.py` (Zeile 36) ✅
|
||||||
|
- `app/core/ingestion/ingestion_note_payload.py` (Zeile 104) ✅
|
||||||
|
- `app/core/chunking/chunking_utils.py` (Zeile 35) ✅
|
||||||
|
- **Status:** Korrekt vorhanden in types.yaml
|
||||||
|
|
||||||
|
### 3. `ingestion_settings` ✅
|
||||||
|
- **Verwendet in:**
|
||||||
|
- `app/core/ingestion/ingestion_note_payload.py` (Zeile 105) ✅
|
||||||
|
- **Status:** Korrekt vorhanden in types.yaml
|
||||||
|
|
||||||
|
### 4. `llm_settings` ✅
|
||||||
|
- **Verwendet in:**
|
||||||
|
- `app/core/registry.py` (Zeile 37) ✅
|
||||||
|
- **Status:** Korrekt vorhanden in types.yaml
|
||||||
|
|
||||||
|
### 5. `types` (Hauptstruktur) ✅
|
||||||
|
- **Verwendet in:** Viele Dateien
|
||||||
|
- **Status:** Korrekt vorhanden in types.yaml
|
||||||
|
|
||||||
|
### 6. `types[].chunking_profile` ✅
|
||||||
|
- **Verwendet in:**
|
||||||
|
- `app/core/chunking/chunking_utils.py` (Zeile 35) ✅
|
||||||
|
- `app/core/ingestion/ingestion_chunk_payload.py` (Zeile 67) ✅
|
||||||
|
- `app/core/ingestion/ingestion_note_payload.py` (Zeile 120) ✅
|
||||||
|
- **Status:** Korrekt vorhanden in types.yaml
|
||||||
|
|
||||||
|
### 7. `types[].retriever_weight` ✅
|
||||||
|
- **Verwendet in:**
|
||||||
|
- `app/core/ingestion/ingestion_chunk_payload.py` (Zeile 71) ✅
|
||||||
|
- `app/core/ingestion/ingestion_note_payload.py` (Zeile 111) ✅
|
||||||
|
- `app/core/retrieval/retriever_scoring.py` (Zeile 87) ✅
|
||||||
|
- **Status:** Korrekt vorhanden in types.yaml
|
||||||
|
|
||||||
|
### 8. `types[].detection_keywords` ✅
|
||||||
|
- **Verwendet in:**
|
||||||
|
- `app/routers/chat.py` (Zeilen 104, 150) ✅
|
||||||
|
- **Status:** Korrekt vorhanden in types.yaml
|
||||||
|
|
||||||
|
### 9. `types[].schema` ✅
|
||||||
|
- **Verwendet in:**
|
||||||
|
- `app/routers/chat.py` (vermutlich) ✅
|
||||||
|
- **Status:** Korrekt vorhanden in types.yaml
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 📋 ZUSAMMENFASSUNG DER ZUGRIFFE
|
||||||
|
|
||||||
|
### Dateien, die auf types.yaml zugreifen:
|
||||||
|
|
||||||
|
1. **app/core/type_registry.py** ⚠️
|
||||||
|
- Verwendet: `types`, `chunk_profile` (sollte `chunking_profile` sein)
|
||||||
|
- Problem: Sucht nach `chunk_profile` statt `chunking_profile`
|
||||||
|
|
||||||
|
2. **app/core/registry.py** ✅
|
||||||
|
- Verwendet: `llm_settings.cleanup_patterns`
|
||||||
|
- Status: OK
|
||||||
|
|
||||||
|
3. **app/core/ingestion/ingestion_chunk_payload.py** ✅
|
||||||
|
- Verwendet: `types`, `defaults`, `chunking_profile`, `retriever_weight`
|
||||||
|
- Status: OK (hat Fallback für chunk_profile/chunking_profile)
|
||||||
|
|
||||||
|
4. **app/core/ingestion/ingestion_note_payload.py** ✅
|
||||||
|
- Verwendet: `types`, `defaults`, `ingestion_settings`, `chunking_profile`, `retriever_weight`
|
||||||
|
- Status: OK (nutzt neue EdgeRegistry für edge_defaults)
|
||||||
|
|
||||||
|
5. **app/core/chunking/chunking_utils.py** ✅
|
||||||
|
- Verwendet: `chunking_profiles`, `types`, `defaults.chunking_profile`
|
||||||
|
- Status: OK
|
||||||
|
|
||||||
|
6. **app/core/retrieval/retriever_scoring.py** ✅
|
||||||
|
- Verwendet: `retriever_weight` (aus Payload, kommt ursprünglich aus types.yaml)
|
||||||
|
- Status: OK
|
||||||
|
|
||||||
|
7. **app/core/graph/graph_utils.py** ❌
|
||||||
|
- Verwendet: `types[].edge_defaults` (existiert nicht mehr!)
|
||||||
|
- Problem: Sucht nach `edge_defaults` in types.yaml
|
||||||
|
|
||||||
|
8. **app/core/graph/graph_derive_edges.py** ❌
|
||||||
|
- Verwendet: `get_edge_defaults_for()` → sucht nach `edge_defaults`
|
||||||
|
- Problem: Keine Default-Kanten mehr
|
||||||
|
|
||||||
|
9. **app/services/discovery.py** ⚠️
|
||||||
|
- Verwendet: `types[].edge_defaults` (existiert nicht mehr!)
|
||||||
|
- Problem: Fallback funktioniert, aber nutzt nicht neue Lösung
|
||||||
|
|
||||||
|
10. **app/routers/chat.py** ✅
|
||||||
|
- Verwendet: `types[].detection_keywords`
|
||||||
|
- Status: OK
|
||||||
|
|
||||||
|
11. **tests/test_type_registry.py** ⚠️
|
||||||
|
- Verwendet: `types[].chunk_profile`, `types[].edge_defaults`
|
||||||
|
- Problem: Test verwendet alte Struktur
|
||||||
|
|
||||||
|
12. **tests/check_types_registry_edges.py** ❌
|
||||||
|
- Verwendet: `types[].edge_defaults` (existiert nicht mehr!)
|
||||||
|
- Problem: Test findet keine edge_defaults
|
||||||
|
|
||||||
|
13. **scripts/payload_dryrun.py** ✅
|
||||||
|
- Verwendet: Indirekt über `make_note_payload()` und `make_chunk_payloads()`
|
||||||
|
- Status: OK
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 🔧 EMPFOHLENE FIXES
|
||||||
|
|
||||||
|
### Priorität 1 (Kritisch):
|
||||||
|
|
||||||
|
1. **`app/core/graph/graph_utils.py` - `get_edge_defaults_for()`**
|
||||||
|
- Sollte auf `edge_registry.get_topology_info()` umgestellt werden
|
||||||
|
- Oder: Rückwärtskompatibilität beibehalten, aber EdgeRegistry als primäre Quelle nutzen
|
||||||
|
|
||||||
|
2. **`app/core/graph/graph_derive_edges.py`**
|
||||||
|
- Nutzt `get_edge_defaults_for()`, sollte nach Fix von graph_utils.py funktionieren
|
||||||
|
|
||||||
|
3. **`app/services/discovery.py`**
|
||||||
|
- Sollte EdgeRegistry für `edge_defaults` nutzen
|
||||||
|
|
||||||
|
### Priorität 2 (Warnung):
|
||||||
|
|
||||||
|
4. **`app/core/type_registry.py` - `effective_chunk_profile()`**
|
||||||
|
- Sollte auch `chunking_profile` prüfen (nicht nur `chunk_profile`)
|
||||||
|
|
||||||
|
5. **`tests/test_type_registry.py`**
|
||||||
|
- Test sollte aktualisiert werden, um `chunking_profile` statt `chunk_profile` zu verwenden
|
||||||
|
|
||||||
|
6. **`tests/check_types_registry_edges.py`**
|
||||||
|
- Test sollte auf EdgeRegistry umgestellt werden oder als deprecated markiert werden
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 📝 HINWEISE
|
||||||
|
|
||||||
|
- **WP-24c** hat bereits eine Lösung für `edge_defaults` implementiert: Dynamische Abfrage über `edge_registry.get_topology_info()`
|
||||||
|
- Die alte Lösung (statische `edge_defaults` in types.yaml) wurde durch die dynamische Lösung ersetzt
|
||||||
|
- Code-Stellen, die noch die alte Lösung verwenden, sollten migriert werden
|
||||||
|
|
@ -1,10 +1,14 @@
|
||||||
"""
|
"""
|
||||||
FILE: app/core/graph/graph_utils.py
|
FILE: app/core/graph/graph_utils.py
|
||||||
DESCRIPTION: Basale Werkzeuge, ID-Generierung und Provenance-Konfiguration für den Graphen.
|
DESCRIPTION: Basale Werkzeuge, ID-Generierung und Provenance-Konfiguration für den Graphen.
|
||||||
AUDIT: Erweitert um parse_link_target für sauberes Section-Splitting (WP-Fix).
|
WP-24c: Integration der EdgeRegistry für dynamische Topologie-Defaults.
|
||||||
|
AUDIT: Erweitert um parse_link_target für sauberes Section-Splitting.
|
||||||
|
VERSION: 1.1.0 (WP-24c: Dynamic Topology Implementation)
|
||||||
|
STATUS: Active
|
||||||
"""
|
"""
|
||||||
import os
|
import os
|
||||||
import hashlib
|
import hashlib
|
||||||
|
import logging
|
||||||
from typing import Iterable, List, Optional, Set, Any, Tuple
|
from typing import Iterable, List, Optional, Set, Any, Tuple
|
||||||
|
|
||||||
try:
|
try:
|
||||||
|
|
@ -12,6 +16,11 @@ try:
|
||||||
except ImportError:
|
except ImportError:
|
||||||
yaml = None
|
yaml = None
|
||||||
|
|
||||||
|
# WP-24c: Import der zentralen Registry für Topologie-Abfragen
|
||||||
|
from app.services.edge_registry import registry as edge_registry
|
||||||
|
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
# WP-15b: Prioritäten-Ranking für die De-Duplizierung
|
# WP-15b: Prioritäten-Ranking für die De-Duplizierung
|
||||||
PROVENANCE_PRIORITY = {
|
PROVENANCE_PRIORITY = {
|
||||||
"explicit:wikilink": 1.00,
|
"explicit:wikilink": 1.00,
|
||||||
|
|
@ -22,7 +31,7 @@ PROVENANCE_PRIORITY = {
|
||||||
"structure:order": 0.95, # next/prev
|
"structure:order": 0.95, # next/prev
|
||||||
"explicit:note_scope": 1.00,
|
"explicit:note_scope": 1.00,
|
||||||
"derived:backlink": 0.90,
|
"derived:backlink": 0.90,
|
||||||
"edge_defaults": 0.70 # Heuristik (types.yaml)
|
"edge_defaults": 0.70 # Heuristik (nun via graph_schema.md)
|
||||||
}
|
}
|
||||||
|
|
||||||
def _get(d: dict, *keys, default=None):
|
def _get(d: dict, *keys, default=None):
|
||||||
|
|
@ -52,7 +61,7 @@ def _mk_edge_id(kind: str, s: str, t: str, scope: str, rule_id: Optional[str] =
|
||||||
if rule_id:
|
if rule_id:
|
||||||
base += f"|{rule_id}"
|
base += f"|{rule_id}"
|
||||||
if variant:
|
if variant:
|
||||||
base += f"|{variant}" # <--- Hier entsteht die Eindeutigkeit für verschiedene Sections
|
base += f"|{variant}"
|
||||||
|
|
||||||
return hashlib.blake2s(base.encode("utf-8"), digest_size=12).hexdigest()
|
return hashlib.blake2s(base.encode("utf-8"), digest_size=12).hexdigest()
|
||||||
|
|
||||||
|
|
@ -73,9 +82,6 @@ def parse_link_target(raw: str, current_note_id: Optional[str] = None) -> Tuple[
|
||||||
"""
|
"""
|
||||||
Zerlegt einen Link (z.B. 'Note#Section') in Target-ID und Section.
|
Zerlegt einen Link (z.B. 'Note#Section') in Target-ID und Section.
|
||||||
Behandelt Self-Links ('#Section'), indem current_note_id eingesetzt wird.
|
Behandelt Self-Links ('#Section'), indem current_note_id eingesetzt wird.
|
||||||
|
|
||||||
Returns:
|
|
||||||
(target_id, target_section)
|
|
||||||
"""
|
"""
|
||||||
if not raw:
|
if not raw:
|
||||||
return "", None
|
return "", None
|
||||||
|
|
@ -84,7 +90,6 @@ def parse_link_target(raw: str, current_note_id: Optional[str] = None) -> Tuple[
|
||||||
target = parts[0].strip()
|
target = parts[0].strip()
|
||||||
section = parts[1].strip() if len(parts) > 1 else None
|
section = parts[1].strip() if len(parts) > 1 else None
|
||||||
|
|
||||||
# Handle Self-Link [[#Section]] -> target wird zu current_note_id
|
|
||||||
if not target and section and current_note_id:
|
if not target and section and current_note_id:
|
||||||
target = current_note_id
|
target = current_note_id
|
||||||
|
|
||||||
|
|
@ -99,14 +104,30 @@ def load_types_registry() -> dict:
|
||||||
except Exception: return {}
|
except Exception: return {}
|
||||||
|
|
||||||
def get_edge_defaults_for(note_type: Optional[str], reg: dict) -> List[str]:
|
def get_edge_defaults_for(note_type: Optional[str], reg: dict) -> List[str]:
|
||||||
"""Ermittelt Standard-Kanten für einen Typ."""
|
"""
|
||||||
|
WP-24c: Ermittelt Standard-Kanten (Typical Edges) für einen Notiz-Typ.
|
||||||
|
Nutzt die EdgeRegistry (graph_schema.md) als primäre Quelle.
|
||||||
|
"""
|
||||||
|
# 1. Dynamische Abfrage über die neue Topologie-Engine (WP-24c)
|
||||||
|
# Behebt das Audit-Problem 1a/1b: Suche in graph_schema.md statt types.yaml
|
||||||
|
if note_type:
|
||||||
|
topology = edge_registry.get_topology_info(note_type, "any")
|
||||||
|
typical = topology.get("typical", [])
|
||||||
|
if typical:
|
||||||
|
return typical
|
||||||
|
|
||||||
|
# 2. Legacy-Fallback: Suche in der geladenen Registry (types.yaml)
|
||||||
|
# Sichert 100% Rückwärtskompatibilität, falls Reste in types.yaml verblieben sind.
|
||||||
types_map = reg.get("types", reg) if isinstance(reg, dict) else {}
|
types_map = reg.get("types", reg) if isinstance(reg, dict) else {}
|
||||||
if note_type and isinstance(types_map, dict):
|
if note_type and isinstance(types_map, dict):
|
||||||
t = types_map.get(note_type)
|
t = types_map.get(note_type)
|
||||||
if isinstance(t, dict) and isinstance(t.get("edge_defaults"), list):
|
if isinstance(t, dict) and isinstance(t.get("edge_defaults"), list):
|
||||||
return [str(x) for x in t["edge_defaults"] if isinstance(x, str)]
|
return [str(x) for x in t["edge_defaults"] if isinstance(x, str)]
|
||||||
|
|
||||||
|
# 3. Globaler Default-Fallback aus der Registry
|
||||||
for key in ("defaults", "default", "global"):
|
for key in ("defaults", "default", "global"):
|
||||||
v = reg.get(key)
|
v = reg.get(key)
|
||||||
if isinstance(v, dict) and isinstance(v.get("edge_defaults"), list):
|
if isinstance(v, dict) and isinstance(v.get("edge_defaults"), list):
|
||||||
return [str(x) for x in v["edge_defaults"] if isinstance(x, str)]
|
return [str(x) for x in v["edge_defaults"] if isinstance(x, str)]
|
||||||
|
|
||||||
return []
|
return []
|
||||||
|
|
@ -1,10 +1,10 @@
|
||||||
"""
|
"""
|
||||||
FILE: app/core/ingestion/ingestion_note_payload.py
|
FILE: app/core/ingestion/ingestion_note_payload.py
|
||||||
DESCRIPTION: Baut das JSON-Objekt für mindnet_notes.
|
DESCRIPTION: Baut das JSON-Objekt für mindnet_notes.
|
||||||
FEATURES:
|
WP-14: Integration der zentralen Registry.
|
||||||
- Multi-Hash (body/full) für flexible Change Detection.
|
WP-24c: Dynamische Ermittlung von edge_defaults aus dem Graph-Schema.
|
||||||
- Fix v2.4.5: Präzise Hash-Logik für Profil-Änderungen.
|
VERSION: 2.5.0 (WP-24c: Dynamic Topology Integration)
|
||||||
- Integration der zentralen Registry (WP-14).
|
STATUS: Active
|
||||||
"""
|
"""
|
||||||
from __future__ import annotations
|
from __future__ import annotations
|
||||||
from typing import Any, Dict, Tuple, Optional
|
from typing import Any, Dict, Tuple, Optional
|
||||||
|
|
@ -15,6 +15,8 @@ import hashlib
|
||||||
|
|
||||||
# Import der zentralen Registry-Logik
|
# Import der zentralen Registry-Logik
|
||||||
from app.core.registry import load_type_registry
|
from app.core.registry import load_type_registry
|
||||||
|
# WP-24c: Zugriff auf das dynamische Graph-Schema
|
||||||
|
from app.services.edge_registry import registry as edge_registry
|
||||||
|
|
||||||
# ---------------------------------------------------------------------------
|
# ---------------------------------------------------------------------------
|
||||||
# Helper
|
# Helper
|
||||||
|
|
@ -46,15 +48,14 @@ def _compute_hash(content: str) -> str:
|
||||||
def _get_hash_source_content(n: Dict[str, Any], mode: str) -> str:
|
def _get_hash_source_content(n: Dict[str, Any], mode: str) -> str:
|
||||||
"""
|
"""
|
||||||
Generiert den Hash-Input-String basierend auf Body oder Metadaten.
|
Generiert den Hash-Input-String basierend auf Body oder Metadaten.
|
||||||
Fix: Inkludiert nun alle entscheidungsrelevanten Profil-Parameter.
|
Inkludiert alle entscheidungsrelevanten Profil-Parameter.
|
||||||
"""
|
"""
|
||||||
body = str(n.get("body") or "").strip()
|
body = str(n.get("body") or "").strip()
|
||||||
if mode == "body": return body
|
if mode == "body": return body
|
||||||
if mode == "full":
|
if mode == "full":
|
||||||
fm = n.get("frontmatter") or {}
|
fm = n.get("frontmatter") or {}
|
||||||
meta_parts = []
|
meta_parts = []
|
||||||
# Wir inkludieren alle Felder, die das Chunking oder Retrieval beeinflussen
|
# Alle Felder, die das Chunking oder Retrieval beeinflussen
|
||||||
# Jede Änderung hier führt nun zwingend zu einem neuen Full-Hash
|
|
||||||
keys = [
|
keys = [
|
||||||
"title", "type", "status", "tags",
|
"title", "type", "status", "tags",
|
||||||
"chunking_profile", "chunk_profile",
|
"chunking_profile", "chunk_profile",
|
||||||
|
|
@ -87,7 +88,7 @@ def _cfg_defaults(reg: dict) -> dict:
|
||||||
def make_note_payload(note: Any, *args, **kwargs) -> Dict[str, Any]:
|
def make_note_payload(note: Any, *args, **kwargs) -> Dict[str, Any]:
|
||||||
"""
|
"""
|
||||||
Baut das Note-Payload inklusive Multi-Hash und Audit-Validierung.
|
Baut das Note-Payload inklusive Multi-Hash und Audit-Validierung.
|
||||||
WP-14: Nutzt die zentrale Registry für alle Fallbacks.
|
WP-24c: Nutzt die EdgeRegistry zur dynamischen Auflösung von Typical Edges.
|
||||||
"""
|
"""
|
||||||
n = _as_dict(note)
|
n = _as_dict(note)
|
||||||
|
|
||||||
|
|
@ -120,10 +121,16 @@ def make_note_payload(note: Any, *args, **kwargs) -> Dict[str, Any]:
|
||||||
if chunk_profile is None:
|
if chunk_profile is None:
|
||||||
chunk_profile = ingest_cfg.get("default_chunk_profile", cfg_def.get("chunking_profile", "sliding_standard"))
|
chunk_profile = ingest_cfg.get("default_chunk_profile", cfg_def.get("chunking_profile", "sliding_standard"))
|
||||||
|
|
||||||
# --- edge_defaults Audit ---
|
# --- WP-24c: edge_defaults Dynamisierung ---
|
||||||
|
# 1. Priorität: Manuelle Definition im Frontmatter
|
||||||
edge_defaults = fm.get("edge_defaults")
|
edge_defaults = fm.get("edge_defaults")
|
||||||
|
|
||||||
|
# 2. Priorität: Dynamische Abfrage der 'Typical Edges' aus dem Graph-Schema
|
||||||
if edge_defaults is None:
|
if edge_defaults is None:
|
||||||
edge_defaults = cfg_type.get("edge_defaults", cfg_def.get("edge_defaults", []))
|
topology = edge_registry.get_topology_info(note_type, "any")
|
||||||
|
edge_defaults = topology.get("typical", [])
|
||||||
|
|
||||||
|
# 3. Fallback: Leere Liste, falls kein Schema-Eintrag existiert
|
||||||
edge_defaults = _ensure_list(edge_defaults)
|
edge_defaults = _ensure_list(edge_defaults)
|
||||||
|
|
||||||
# --- Basis-Metadaten ---
|
# --- Basis-Metadaten ---
|
||||||
|
|
|
||||||
|
|
@ -1,11 +1,12 @@
|
||||||
"""
|
"""
|
||||||
FILE: app/core/type_registry.py
|
FILE: app/core/type_registry.py
|
||||||
DESCRIPTION: Loader für types.yaml. Achtung: Wird in der aktuellen Pipeline meist durch lokale Loader in 'ingestion.py' oder 'note_payload.py' umgangen.
|
DESCRIPTION: Loader für types.yaml.
|
||||||
VERSION: 1.0.0
|
WP-24c: Robustheits-Fix für chunking_profile vs chunk_profile.
|
||||||
STATUS: Deprecated (Redundant)
|
WP-14: Support für zentrale Registry-Strukturen.
|
||||||
|
VERSION: 1.1.0 (Audit-Fix: Profile Key Consistency)
|
||||||
|
STATUS: Active (Support für Legacy-Loader)
|
||||||
DEPENDENCIES: yaml, os, functools
|
DEPENDENCIES: yaml, os, functools
|
||||||
EXTERNAL_CONFIG: config/types.yaml
|
EXTERNAL_CONFIG: config/types.yaml
|
||||||
LAST_ANALYSIS: 2025-12-15
|
|
||||||
"""
|
"""
|
||||||
from __future__ import annotations
|
from __future__ import annotations
|
||||||
|
|
||||||
|
|
@ -18,12 +19,12 @@ try:
|
||||||
except Exception:
|
except Exception:
|
||||||
yaml = None # wird erst benötigt, wenn eine Datei gelesen werden soll
|
yaml = None # wird erst benötigt, wenn eine Datei gelesen werden soll
|
||||||
|
|
||||||
# Konservativer Default – bewusst minimal
|
# Konservativer Default – WP-24c: Nutzt nun konsistent 'chunking_profile'
|
||||||
_DEFAULT_REGISTRY: Dict[str, Any] = {
|
_DEFAULT_REGISTRY: Dict[str, Any] = {
|
||||||
"version": "1.0",
|
"version": "1.0",
|
||||||
"types": {
|
"types": {
|
||||||
"concept": {
|
"concept": {
|
||||||
"chunk_profile": "medium",
|
"chunking_profile": "medium",
|
||||||
"edge_defaults": ["references", "related_to"],
|
"edge_defaults": ["references", "related_to"],
|
||||||
"retriever_weight": 1.0,
|
"retriever_weight": 1.0,
|
||||||
}
|
}
|
||||||
|
|
@ -33,7 +34,6 @@ _DEFAULT_REGISTRY: Dict[str, Any] = {
|
||||||
}
|
}
|
||||||
|
|
||||||
# Chunk-Profile → Overlap-Empfehlungen (nur für synthetische Fensterbildung)
|
# Chunk-Profile → Overlap-Empfehlungen (nur für synthetische Fensterbildung)
|
||||||
# Die absoluten Chunk-Längen bleiben Aufgabe des Chunkers (assemble_chunks).
|
|
||||||
_PROFILE_TO_OVERLAP: Dict[str, Tuple[int, int]] = {
|
_PROFILE_TO_OVERLAP: Dict[str, Tuple[int, int]] = {
|
||||||
"short": (20, 30),
|
"short": (20, 30),
|
||||||
"medium": (40, 60),
|
"medium": (40, 60),
|
||||||
|
|
@ -45,7 +45,7 @@ _PROFILE_TO_OVERLAP: Dict[str, Tuple[int, int]] = {
|
||||||
def load_type_registry(path: str = "config/types.yaml") -> Dict[str, Any]:
|
def load_type_registry(path: str = "config/types.yaml") -> Dict[str, Any]:
|
||||||
"""
|
"""
|
||||||
Lädt die Registry aus 'path'. Bei Fehlern wird ein konserviver Default geliefert.
|
Lädt die Registry aus 'path'. Bei Fehlern wird ein konserviver Default geliefert.
|
||||||
Die Rückgabe ist *prozessweit* gecached.
|
Die Rückgabe ist prozessweit gecached.
|
||||||
"""
|
"""
|
||||||
if not path:
|
if not path:
|
||||||
return dict(_DEFAULT_REGISTRY)
|
return dict(_DEFAULT_REGISTRY)
|
||||||
|
|
@ -54,7 +54,6 @@ def load_type_registry(path: str = "config/types.yaml") -> Dict[str, Any]:
|
||||||
return dict(_DEFAULT_REGISTRY)
|
return dict(_DEFAULT_REGISTRY)
|
||||||
|
|
||||||
if yaml is None:
|
if yaml is None:
|
||||||
# PyYAML fehlt → auf Default zurückfallen
|
|
||||||
return dict(_DEFAULT_REGISTRY)
|
return dict(_DEFAULT_REGISTRY)
|
||||||
|
|
||||||
try:
|
try:
|
||||||
|
|
@ -71,6 +70,7 @@ def load_type_registry(path: str = "config/types.yaml") -> Dict[str, Any]:
|
||||||
|
|
||||||
|
|
||||||
def get_type_config(note_type: Optional[str], reg: Dict[str, Any]) -> Dict[str, Any]:
|
def get_type_config(note_type: Optional[str], reg: Dict[str, Any]) -> Dict[str, Any]:
|
||||||
|
"""Extrahiert die Konfiguration für einen spezifischen Typ."""
|
||||||
t = (note_type or "concept").strip().lower()
|
t = (note_type or "concept").strip().lower()
|
||||||
types = (reg or {}).get("types", {}) if isinstance(reg, dict) else {}
|
types = (reg or {}).get("types", {}) if isinstance(reg, dict) else {}
|
||||||
return types.get(t) or types.get("concept") or _DEFAULT_REGISTRY["types"]["concept"]
|
return types.get(t) or types.get("concept") or _DEFAULT_REGISTRY["types"]["concept"]
|
||||||
|
|
@ -84,8 +84,13 @@ def resolve_note_type(fm_type: Optional[str], reg: Dict[str, Any]) -> str:
|
||||||
|
|
||||||
|
|
||||||
def effective_chunk_profile(note_type: Optional[str], reg: Dict[str, Any]) -> Optional[str]:
|
def effective_chunk_profile(note_type: Optional[str], reg: Dict[str, Any]) -> Optional[str]:
|
||||||
|
"""
|
||||||
|
Ermittelt das aktive Chunking-Profil für einen Notiz-Typ.
|
||||||
|
Fix (Audit-Problem 2): Prüft beide Key-Varianten für 100% Kompatibilität.
|
||||||
|
"""
|
||||||
cfg = get_type_config(note_type, reg)
|
cfg = get_type_config(note_type, reg)
|
||||||
prof = cfg.get("chunk_profile")
|
# Check 'chunking_profile' (Standard) OR 'chunk_profile' (Legacy/Fallback)
|
||||||
|
prof = cfg.get("chunking_profile") or cfg.get("chunk_profile")
|
||||||
if isinstance(prof, str) and prof.strip():
|
if isinstance(prof, str) and prof.strip():
|
||||||
return prof.strip().lower()
|
return prof.strip().lower()
|
||||||
return None
|
return None
|
||||||
|
|
@ -95,4 +100,4 @@ def profile_overlap(profile: Optional[str]) -> Tuple[int, int]:
|
||||||
"""Gibt eine Overlap-Empfehlung (low, high) für das Profil zurück."""
|
"""Gibt eine Overlap-Empfehlung (low, high) für das Profil zurück."""
|
||||||
if not profile:
|
if not profile:
|
||||||
return _PROFILE_TO_OVERLAP["medium"]
|
return _PROFILE_TO_OVERLAP["medium"]
|
||||||
return _PROFILE_TO_OVERLAP.get(profile.strip().lower(), _PROFILE_TO_OVERLAP["medium"])
|
return _PROFILE_TO_OVERLAP.get(profile.strip().lower(), _PROFILE_TO_OVERLAP["medium"])
|
||||||
|
|
@ -1,11 +1,12 @@
|
||||||
"""
|
"""
|
||||||
FILE: app/services/discovery.py
|
FILE: app/services/discovery.py
|
||||||
DESCRIPTION: Service für WP-11. Analysiert Texte, findet Entitäten und schlägt typisierte Verbindungen vor ("Matrix-Logic").
|
DESCRIPTION: Service für WP-11 (Discovery API). Analysiert Entwürfe, findet Entitäten
|
||||||
VERSION: 0.6.0
|
und schlägt typisierte Verbindungen basierend auf der Topologie vor.
|
||||||
|
WP-24c: Vollständige Umstellung auf EdgeRegistry für dynamische Vorschläge.
|
||||||
|
WP-15b: Unterstützung für hybride Suche und Alias-Erkennung.
|
||||||
|
VERSION: 1.1.0 (WP-24c: Full Registry Integration & Audit Fix)
|
||||||
STATUS: Active
|
STATUS: Active
|
||||||
DEPENDENCIES: app.core.qdrant, app.models.dto, app.core.retriever
|
COMPATIBILITY: 100% (Identische API-Signatur wie v0.6.0)
|
||||||
EXTERNAL_CONFIG: config/types.yaml
|
|
||||||
LAST_ANALYSIS: 2025-12-15
|
|
||||||
"""
|
"""
|
||||||
import logging
|
import logging
|
||||||
import asyncio
|
import asyncio
|
||||||
|
|
@ -16,204 +17,181 @@ import yaml
|
||||||
from app.core.database.qdrant import QdrantConfig, get_client
|
from app.core.database.qdrant import QdrantConfig, get_client
|
||||||
from app.models.dto import QueryRequest
|
from app.models.dto import QueryRequest
|
||||||
from app.core.retrieval.retriever import hybrid_retrieve
|
from app.core.retrieval.retriever import hybrid_retrieve
|
||||||
|
# WP-24c: Zentrale Topologie-Quelle
|
||||||
|
from app.services.edge_registry import registry as edge_registry
|
||||||
|
|
||||||
logger = logging.getLogger(__name__)
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
class DiscoveryService:
|
class DiscoveryService:
|
||||||
def __init__(self, collection_prefix: str = None):
|
def __init__(self, collection_prefix: str = None):
|
||||||
|
"""Initialisiert den Discovery Service mit Qdrant-Anbindung."""
|
||||||
self.cfg = QdrantConfig.from_env()
|
self.cfg = QdrantConfig.from_env()
|
||||||
self.prefix = collection_prefix or self.cfg.prefix or "mindnet"
|
self.prefix = collection_prefix or self.cfg.prefix or "mindnet"
|
||||||
self.client = get_client(self.cfg)
|
self.client = get_client(self.cfg)
|
||||||
|
|
||||||
|
# Die Registry wird für Typ-Metadaten geladen (Schema-Validierung)
|
||||||
self.registry = self._load_type_registry()
|
self.registry = self._load_type_registry()
|
||||||
|
|
||||||
async def analyze_draft(self, text: str, current_type: str) -> Dict[str, Any]:
|
async def analyze_draft(self, text: str, current_type: str) -> Dict[str, Any]:
|
||||||
"""
|
"""
|
||||||
Analysiert den Text und liefert Vorschläge mit kontext-sensitiven Kanten-Typen.
|
Analysiert einen Textentwurf auf potenzielle Verbindungen.
|
||||||
|
1. Findet exakte Treffer (Titel/Aliasse).
|
||||||
|
2. Führt semantische Suchen für verschiedene Textabschnitte aus.
|
||||||
|
3. Schlägt topologisch korrekte Kanten-Typen vor.
|
||||||
"""
|
"""
|
||||||
|
if not text or len(text.strip()) < 3:
|
||||||
|
return {"suggestions": [], "status": "empty_input"}
|
||||||
|
|
||||||
suggestions = []
|
suggestions = []
|
||||||
|
seen_target_ids = set()
|
||||||
# Fallback, falls keine spezielle Regel greift
|
|
||||||
default_edge_type = self._get_default_edge_type(current_type)
|
|
||||||
|
|
||||||
# Tracking-Sets für Deduplizierung (Wir merken uns NOTE-IDs)
|
# --- PHASE 1: EXACT MATCHES (TITEL & ALIASSE) ---
|
||||||
seen_target_note_ids = set()
|
# Lädt alle bekannten Titel/Aliasse für einen schnellen Scan
|
||||||
|
|
||||||
# ---------------------------------------------------------
|
|
||||||
# 1. Exact Match: Titel/Aliases
|
|
||||||
# ---------------------------------------------------------
|
|
||||||
# Holt Titel, Aliases UND Typen aus dem Index
|
|
||||||
known_entities = self._fetch_all_titles_and_aliases()
|
known_entities = self._fetch_all_titles_and_aliases()
|
||||||
found_entities = self._find_entities_in_text(text, known_entities)
|
exact_matches = self._find_entities_in_text(text, known_entities)
|
||||||
|
|
||||||
for entity in found_entities:
|
for entity in exact_matches:
|
||||||
if entity["id"] in seen_target_note_ids:
|
target_id = entity["id"]
|
||||||
|
if target_id in seen_target_ids:
|
||||||
continue
|
continue
|
||||||
seen_target_note_ids.add(entity["id"])
|
|
||||||
|
seen_target_ids.add(target_id)
|
||||||
# INTELLIGENTE KANTEN-LOGIK (MATRIX)
|
|
||||||
target_type = entity.get("type", "concept")
|
target_type = entity.get("type", "concept")
|
||||||
smart_edge = self._resolve_edge_type(current_type, target_type)
|
|
||||||
|
# WP-24c: Dynamische Kanten-Ermittlung statt Hardcoded Matrix
|
||||||
|
suggested_kind = self._resolve_edge_type(current_type, target_type)
|
||||||
|
|
||||||
suggestions.append({
|
suggestions.append({
|
||||||
"type": "exact_match",
|
"type": "exact_match",
|
||||||
"text_found": entity["match"],
|
"text_found": entity["match"],
|
||||||
"target_title": entity["title"],
|
"target_title": entity["title"],
|
||||||
"target_id": entity["id"],
|
"target_id": target_id,
|
||||||
"suggested_edge_type": smart_edge,
|
"suggested_edge_type": suggested_kind,
|
||||||
"suggested_markdown": f"[[rel:{smart_edge} {entity['title']}]]",
|
"suggested_markdown": f"[[rel:{suggest_kind} {entity['title']}]]",
|
||||||
"confidence": 1.0,
|
"confidence": 1.0,
|
||||||
"reason": f"Exakter Treffer: '{entity['match']}' ({target_type})"
|
"reason": f"Direkte Erwähnung von '{entity['match']}' ({target_type})"
|
||||||
})
|
})
|
||||||
|
|
||||||
# ---------------------------------------------------------
|
# --- PHASE 2: SEMANTIC MATCHES (VECTOR SEARCH) ---
|
||||||
# 2. Semantic Match: Sliding Window & Footer Focus
|
# Erzeugt Suchanfragen für verschiedene Fenster des Textes
|
||||||
# ---------------------------------------------------------
|
|
||||||
search_queries = self._generate_search_queries(text)
|
search_queries = self._generate_search_queries(text)
|
||||||
|
|
||||||
# Async parallel abfragen
|
# Parallele Ausführung der Suchanfragen (Cloud-Performance)
|
||||||
tasks = [self._get_semantic_suggestions_async(q) for q in search_queries]
|
tasks = [self._get_semantic_suggestions_async(q) for q in search_queries]
|
||||||
results_list = await asyncio.gather(*tasks)
|
results_list = await asyncio.gather(*tasks)
|
||||||
|
|
||||||
# Ergebnisse verarbeiten
|
|
||||||
for hits in results_list:
|
for hits in results_list:
|
||||||
for hit in hits:
|
for hit in hits:
|
||||||
note_id = hit.payload.get("note_id")
|
payload = hit.payload or {}
|
||||||
if not note_id: continue
|
target_id = payload.get("note_id")
|
||||||
|
|
||||||
# Deduplizierung (Notiz-Ebene)
|
if not target_id or target_id in seen_target_ids:
|
||||||
if note_id in seen_target_note_ids:
|
|
||||||
continue
|
continue
|
||||||
|
|
||||||
# Score Check (Threshold 0.50 für nomic-embed-text)
|
# Relevanz-Threshold (Modell-spezifisch für nomic)
|
||||||
if hit.total_score > 0.50:
|
if hit.total_score > 0.55:
|
||||||
seen_target_note_ids.add(note_id)
|
seen_target_ids.add(target_id)
|
||||||
|
target_type = payload.get("type", "concept")
|
||||||
|
target_title = payload.get("title") or "Unbenannt"
|
||||||
|
|
||||||
target_title = hit.payload.get("title") or "Unbekannt"
|
# WP-24c: Nutzung der Topologie-Engine
|
||||||
|
suggested_kind = self._resolve_edge_type(current_type, target_type)
|
||||||
# INTELLIGENTE KANTEN-LOGIK (MATRIX)
|
|
||||||
# Den Typ der gefundenen Notiz aus dem Payload lesen
|
|
||||||
target_type = hit.payload.get("type", "concept")
|
|
||||||
smart_edge = self._resolve_edge_type(current_type, target_type)
|
|
||||||
|
|
||||||
suggestions.append({
|
suggestions.append({
|
||||||
"type": "semantic_match",
|
"type": "semantic_match",
|
||||||
"text_found": (hit.source.get("text") or "")[:60] + "...",
|
"text_found": (hit.source.get("text") or "")[:80] + "...",
|
||||||
"target_title": target_title,
|
"target_title": target_title,
|
||||||
"target_id": note_id,
|
"target_id": target_id,
|
||||||
"suggested_edge_type": smart_edge,
|
"suggested_edge_type": suggested_kind,
|
||||||
"suggested_markdown": f"[[rel:{smart_edge} {target_title}]]",
|
"suggested_markdown": f"[[rel:{suggested_kind} {target_title}]]",
|
||||||
"confidence": round(hit.total_score, 2),
|
"confidence": round(hit.total_score, 2),
|
||||||
"reason": f"Semantisch ähnlich zu {target_type} ({hit.total_score:.2f})"
|
"reason": f"Semantischer Bezug zu {target_type} ({int(hit.total_score*100)}%)"
|
||||||
})
|
})
|
||||||
|
|
||||||
# Sortieren nach Confidence
|
# Sortierung nach Konfidenz
|
||||||
suggestions.sort(key=lambda x: x["confidence"], reverse=True)
|
suggestions.sort(key=lambda x: x["confidence"], reverse=True)
|
||||||
|
|
||||||
return {
|
return {
|
||||||
"draft_length": len(text),
|
"draft_length": len(text),
|
||||||
"analyzed_windows": len(search_queries),
|
"analyzed_windows": len(search_queries),
|
||||||
"suggestions_count": len(suggestions),
|
"suggestions_count": len(suggestions),
|
||||||
"suggestions": suggestions[:10]
|
"suggestions": suggestions[:12] # Top 12 Vorschläge
|
||||||
}
|
}
|
||||||
|
|
||||||
# ---------------------------------------------------------
|
# --- LOGIK-ZENTRALE (WP-24c) ---
|
||||||
# Core Logic: Die Matrix
|
|
||||||
# ---------------------------------------------------------
|
|
||||||
|
|
||||||
def _resolve_edge_type(self, source_type: str, target_type: str) -> str:
|
def _resolve_edge_type(self, source_type: str, target_type: str) -> str:
|
||||||
"""
|
"""
|
||||||
Entscheidungsmatrix für komplexe Verbindungen.
|
Ermittelt den optimalen Kanten-Typ zwischen zwei Notiz-Typen.
|
||||||
Definiert, wie Typ A auf Typ B verlinken sollte.
|
Nutzt EdgeRegistry (graph_schema.md) statt lokaler Matrix.
|
||||||
"""
|
"""
|
||||||
st = source_type.lower()
|
# 1. Spezifische Prüfung: Gibt es eine Regel für Source -> Target?
|
||||||
tt = target_type.lower()
|
info = edge_registry.get_topology_info(source_type, target_type)
|
||||||
|
typical = info.get("typical", [])
|
||||||
|
if typical:
|
||||||
|
return typical[0] # Erster Vorschlag aus dem Schema
|
||||||
|
|
||||||
# Regeln für 'experience' (Erfahrungen)
|
# 2. Fallback: Was ist für den Quell-Typ generell typisch? (Source -> any)
|
||||||
if st == "experience":
|
info_fallback = edge_registry.get_topology_info(source_type, "any")
|
||||||
if tt == "value": return "based_on"
|
typical_fallback = info_fallback.get("typical", [])
|
||||||
if tt == "principle": return "derived_from"
|
if typical_fallback:
|
||||||
if tt == "trip": return "part_of"
|
return typical_fallback[0]
|
||||||
if tt == "lesson": return "learned"
|
|
||||||
if tt == "project": return "related_to" # oder belongs_to
|
|
||||||
|
|
||||||
# Regeln für 'project'
|
# 3. Globaler Fallback (Sicherheitsnetz)
|
||||||
if st == "project":
|
return "related_to"
|
||||||
if tt == "decision": return "depends_on"
|
|
||||||
if tt == "concept": return "uses"
|
|
||||||
if tt == "person": return "managed_by"
|
|
||||||
|
|
||||||
# Regeln für 'decision' (ADR)
|
# --- HELPERS (VOLLSTÄNDIG ERHALTEN) ---
|
||||||
if st == "decision":
|
|
||||||
if tt == "principle": return "compliant_with"
|
|
||||||
if tt == "requirement": return "addresses"
|
|
||||||
|
|
||||||
# Fallback: Standard aus der types.yaml für den Source-Typ
|
|
||||||
return self._get_default_edge_type(st)
|
|
||||||
|
|
||||||
# ---------------------------------------------------------
|
|
||||||
# Sliding Windows
|
|
||||||
# ---------------------------------------------------------
|
|
||||||
|
|
||||||
def _generate_search_queries(self, text: str) -> List[str]:
|
def _generate_search_queries(self, text: str) -> List[str]:
|
||||||
"""
|
"""Erzeugt überlappende Fenster für die Vektorsuche (Sliding Window)."""
|
||||||
Erzeugt intelligente Fenster + Footer Scan.
|
|
||||||
"""
|
|
||||||
text_len = len(text)
|
text_len = len(text)
|
||||||
if not text: return []
|
|
||||||
|
|
||||||
queries = []
|
queries = []
|
||||||
|
|
||||||
# 1. Start / Gesamtkontext
|
# Fokus A: Dokument-Anfang (Kontext)
|
||||||
queries.append(text[:600])
|
queries.append(text[:600])
|
||||||
|
|
||||||
# 2. Footer-Scan (Wichtig für "Projekt"-Referenzen am Ende)
|
# Fokus B: Dokument-Ende (Aktueller Schreibfokus)
|
||||||
if text_len > 150:
|
if text_len > 250:
|
||||||
footer = text[-250:]
|
footer = text[-350:]
|
||||||
if footer not in queries:
|
if footer not in queries:
|
||||||
queries.append(footer)
|
queries.append(footer)
|
||||||
|
|
||||||
# 3. Sliding Window für lange Texte
|
# Fokus C: Zwischenabschnitte bei langen Texten
|
||||||
if text_len > 800:
|
if text_len > 1200:
|
||||||
window_size = 500
|
window_size = 500
|
||||||
step = 1500
|
step = 1200
|
||||||
for i in range(window_size, text_len - window_size, step):
|
for i in range(600, text_len - 400, step):
|
||||||
end_pos = min(i + window_size, text_len)
|
chunk = text[i:i+window_size]
|
||||||
chunk = text[i:end_pos]
|
|
||||||
if len(chunk) > 100:
|
if len(chunk) > 100:
|
||||||
queries.append(chunk)
|
queries.append(chunk)
|
||||||
|
|
||||||
return queries
|
return queries
|
||||||
|
|
||||||
# ---------------------------------------------------------
|
|
||||||
# Standard Helpers
|
|
||||||
# ---------------------------------------------------------
|
|
||||||
|
|
||||||
async def _get_semantic_suggestions_async(self, text: str):
|
async def _get_semantic_suggestions_async(self, text: str):
|
||||||
req = QueryRequest(query=text, top_k=5, explain=False)
|
"""Führt eine asynchrone Vektorsuche über den Retriever aus."""
|
||||||
|
req = QueryRequest(query=text, top_k=6, explain=False)
|
||||||
try:
|
try:
|
||||||
|
# Nutzt hybrid_retrieve (WP-15b Standard)
|
||||||
res = hybrid_retrieve(req)
|
res = hybrid_retrieve(req)
|
||||||
return res.results
|
return res.results
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
logger.error(f"Semantic suggestion error: {e}")
|
logger.error(f"Discovery retrieval error: {e}")
|
||||||
return []
|
return []
|
||||||
|
|
||||||
def _load_type_registry(self) -> dict:
|
def _load_type_registry(self) -> dict:
|
||||||
|
"""Lädt die types.yaml für Typ-Definitionen."""
|
||||||
path = os.getenv("MINDNET_TYPES_FILE", "config/types.yaml")
|
path = os.getenv("MINDNET_TYPES_FILE", "config/types.yaml")
|
||||||
if not os.path.exists(path):
|
if not os.path.exists(path):
|
||||||
if os.path.exists("types.yaml"): path = "types.yaml"
|
return {}
|
||||||
else: return {}
|
|
||||||
try:
|
try:
|
||||||
with open(path, "r", encoding="utf-8") as f: return yaml.safe_load(f) or {}
|
with open(path, "r", encoding="utf-8") as f:
|
||||||
except Exception: return {}
|
return yaml.safe_load(f) or {}
|
||||||
|
except Exception:
|
||||||
def _get_default_edge_type(self, note_type: str) -> str:
|
return {}
|
||||||
types_cfg = self.registry.get("types", {})
|
|
||||||
type_def = types_cfg.get(note_type, {})
|
|
||||||
defaults = type_def.get("edge_defaults")
|
|
||||||
return defaults[0] if defaults else "related_to"
|
|
||||||
|
|
||||||
def _fetch_all_titles_and_aliases(self) -> List[Dict]:
|
def _fetch_all_titles_and_aliases(self) -> List[Dict]:
|
||||||
notes = []
|
"""Holt alle Note-IDs, Titel und Aliasse für den Exakt-Match Abgleich."""
|
||||||
|
entities = []
|
||||||
next_page = None
|
next_page = None
|
||||||
col = f"{self.prefix}_notes"
|
col = f"{self.prefix}_notes"
|
||||||
try:
|
try:
|
||||||
|
|
@ -225,30 +203,40 @@ class DiscoveryService:
|
||||||
for point in res:
|
for point in res:
|
||||||
pl = point.payload or {}
|
pl = point.payload or {}
|
||||||
aliases = pl.get("aliases") or []
|
aliases = pl.get("aliases") or []
|
||||||
if isinstance(aliases, str): aliases = [aliases]
|
if isinstance(aliases, str):
|
||||||
|
aliases = [aliases]
|
||||||
|
|
||||||
notes.append({
|
entities.append({
|
||||||
"id": pl.get("note_id"),
|
"id": pl.get("note_id"),
|
||||||
"title": pl.get("title"),
|
"title": pl.get("title"),
|
||||||
"aliases": aliases,
|
"aliases": aliases,
|
||||||
"type": pl.get("type", "concept") # WICHTIG: Typ laden für Matrix
|
"type": pl.get("type", "concept")
|
||||||
})
|
})
|
||||||
if next_page is None: break
|
if next_page is None:
|
||||||
except Exception: pass
|
break
|
||||||
return notes
|
except Exception as e:
|
||||||
|
logger.warning(f"Error fetching entities for discovery: {e}")
|
||||||
|
return entities
|
||||||
|
|
||||||
def _find_entities_in_text(self, text: str, entities: List[Dict]) -> List[Dict]:
|
def _find_entities_in_text(self, text: str, entities: List[Dict]) -> List[Dict]:
|
||||||
|
"""Sucht im Text nach Erwähnungen bekannter Entitäten."""
|
||||||
found = []
|
found = []
|
||||||
text_lower = text.lower()
|
text_lower = text.lower()
|
||||||
for entity in entities:
|
for entity in entities:
|
||||||
# Title Check
|
|
||||||
title = entity.get("title")
|
title = entity.get("title")
|
||||||
|
# Titel-Check
|
||||||
if title and title.lower() in text_lower:
|
if title and title.lower() in text_lower:
|
||||||
found.append({"match": title, "title": title, "id": entity["id"], "type": entity["type"]})
|
found.append({
|
||||||
|
"match": title, "title": title,
|
||||||
|
"id": entity["id"], "type": entity["type"]
|
||||||
|
})
|
||||||
continue
|
continue
|
||||||
# Alias Check
|
# Alias-Check
|
||||||
for alias in entity.get("aliases", []):
|
for alias in entity.get("aliases", []):
|
||||||
if str(alias).lower() in text_lower:
|
if str(alias).lower() in text_lower:
|
||||||
found.append({"match": alias, "title": title, "id": entity["id"], "type": entity["type"]})
|
found.append({
|
||||||
|
"match": str(alias), "title": title,
|
||||||
|
"id": entity["id"], "type": entity["type"]
|
||||||
|
})
|
||||||
break
|
break
|
||||||
return found
|
return found
|
||||||
|
|
@ -23,7 +23,6 @@ chunking_profiles:
|
||||||
overlap: [50, 100]
|
overlap: [50, 100]
|
||||||
|
|
||||||
# C. SMART FLOW (Text-Fluss)
|
# C. SMART FLOW (Text-Fluss)
|
||||||
# Nutzt Sliding Window, aber mit LLM-Kanten-Analyse.
|
|
||||||
sliding_smart_edges:
|
sliding_smart_edges:
|
||||||
strategy: sliding_window
|
strategy: sliding_window
|
||||||
enable_smart_edge_allocation: true
|
enable_smart_edge_allocation: true
|
||||||
|
|
@ -32,7 +31,6 @@ chunking_profiles:
|
||||||
overlap: [50, 80]
|
overlap: [50, 80]
|
||||||
|
|
||||||
# D. SMART STRUCTURE (Soft Split)
|
# D. SMART STRUCTURE (Soft Split)
|
||||||
# Trennt bevorzugt an H2, fasst aber kleine Abschnitte zusammen ("Soft Mode").
|
|
||||||
structured_smart_edges:
|
structured_smart_edges:
|
||||||
strategy: by_heading
|
strategy: by_heading
|
||||||
enable_smart_edge_allocation: true
|
enable_smart_edge_allocation: true
|
||||||
|
|
@ -43,8 +41,6 @@ chunking_profiles:
|
||||||
overlap: [50, 80]
|
overlap: [50, 80]
|
||||||
|
|
||||||
# E. SMART STRUCTURE STRICT (H2 Hard Split)
|
# E. SMART STRUCTURE STRICT (H2 Hard Split)
|
||||||
# Trennt ZWINGEND an jeder H2.
|
|
||||||
# Verhindert, dass "Vater" und "Partner" (Profile) oder Werte verschmelzen.
|
|
||||||
structured_smart_edges_strict:
|
structured_smart_edges_strict:
|
||||||
strategy: by_heading
|
strategy: by_heading
|
||||||
enable_smart_edge_allocation: true
|
enable_smart_edge_allocation: true
|
||||||
|
|
@ -55,9 +51,6 @@ chunking_profiles:
|
||||||
overlap: [50, 80]
|
overlap: [50, 80]
|
||||||
|
|
||||||
# F. SMART STRUCTURE DEEP (H3 Hard Split + Merge-Check)
|
# F. SMART STRUCTURE DEEP (H3 Hard Split + Merge-Check)
|
||||||
# Spezialfall für "Leitbild Prinzipien":
|
|
||||||
# - Trennt H1, H2, H3 hart.
|
|
||||||
# - Aber: Merged "leere" H2 (Tier 2) mit der folgenden H3 (MP1).
|
|
||||||
structured_smart_edges_strict_L3:
|
structured_smart_edges_strict_L3:
|
||||||
strategy: by_heading
|
strategy: by_heading
|
||||||
enable_smart_edge_allocation: true
|
enable_smart_edge_allocation: true
|
||||||
|
|
@ -73,22 +66,17 @@ chunking_profiles:
|
||||||
defaults:
|
defaults:
|
||||||
retriever_weight: 1.0
|
retriever_weight: 1.0
|
||||||
chunking_profile: sliding_standard
|
chunking_profile: sliding_standard
|
||||||
edge_defaults: []
|
|
||||||
|
|
||||||
# ==============================================================================
|
# ==============================================================================
|
||||||
# 3. INGESTION SETTINGS (WP-14 Dynamization)
|
# 3. INGESTION SETTINGS (WP-14 Dynamization)
|
||||||
# ==============================================================================
|
# ==============================================================================
|
||||||
# Steuert, welche Notizen verarbeitet werden und wie Fallbacks aussehen.
|
|
||||||
ingestion_settings:
|
ingestion_settings:
|
||||||
# Liste der Status-Werte, die beim Import ignoriert werden sollen.
|
|
||||||
ignore_statuses: ["system", "template", "archive", "hidden"]
|
ignore_statuses: ["system", "template", "archive", "hidden"]
|
||||||
# Standard-Typ, falls kein Typ im Frontmatter angegeben ist.
|
|
||||||
default_note_type: "concept"
|
default_note_type: "concept"
|
||||||
|
|
||||||
# ==============================================================================
|
# ==============================================================================
|
||||||
# 4. SUMMARY & SCAN SETTINGS
|
# 4. SUMMARY & SCAN SETTINGS
|
||||||
# ==============================================================================
|
# ==============================================================================
|
||||||
# Steuert die Tiefe des Pre-Scans für den Context-Cache.
|
|
||||||
summary_settings:
|
summary_settings:
|
||||||
max_summary_length: 500
|
max_summary_length: 500
|
||||||
pre_scan_depth: 600
|
pre_scan_depth: 600
|
||||||
|
|
@ -96,7 +84,6 @@ summary_settings:
|
||||||
# ==============================================================================
|
# ==============================================================================
|
||||||
# 5. LLM SETTINGS
|
# 5. LLM SETTINGS
|
||||||
# ==============================================================================
|
# ==============================================================================
|
||||||
# Steuerzeichen und Patterns zur Bereinigung der LLM-Antworten.
|
|
||||||
llm_settings:
|
llm_settings:
|
||||||
cleanup_patterns: ["<s>", "</s>", "[OUT]", "[/OUT]", "```json", "```"]
|
cleanup_patterns: ["<s>", "</s>", "[OUT]", "[/OUT]", "```json", "```"]
|
||||||
|
|
||||||
|
|
@ -108,8 +95,7 @@ types:
|
||||||
|
|
||||||
experience:
|
experience:
|
||||||
chunking_profile: sliding_smart_edges
|
chunking_profile: sliding_smart_edges
|
||||||
retriever_weight: 1.10 # Erhöht für biografische Relevanz
|
retriever_weight: 1.10
|
||||||
edge_defaults: ["derived_from", "references"]
|
|
||||||
detection_keywords: ["erleben", "reagieren", "handeln", "prägen", "reflektieren"]
|
detection_keywords: ["erleben", "reagieren", "handeln", "prägen", "reflektieren"]
|
||||||
schema:
|
schema:
|
||||||
- "Situation (Was ist passiert?)"
|
- "Situation (Was ist passiert?)"
|
||||||
|
|
@ -119,8 +105,7 @@ types:
|
||||||
|
|
||||||
insight:
|
insight:
|
||||||
chunking_profile: sliding_smart_edges
|
chunking_profile: sliding_smart_edges
|
||||||
retriever_weight: 1.20 # Hoch gewichtet für aktuelle Steuerung
|
retriever_weight: 1.20
|
||||||
edge_defaults: ["references", "based_on"]
|
|
||||||
detection_keywords: ["beobachten", "erkennen", "verstehen", "analysieren", "schlussfolgern"]
|
detection_keywords: ["beobachten", "erkennen", "verstehen", "analysieren", "schlussfolgern"]
|
||||||
schema:
|
schema:
|
||||||
- "Beobachtung (Was sehe ich?)"
|
- "Beobachtung (Was sehe ich?)"
|
||||||
|
|
@ -131,7 +116,6 @@ types:
|
||||||
project:
|
project:
|
||||||
chunking_profile: sliding_smart_edges
|
chunking_profile: sliding_smart_edges
|
||||||
retriever_weight: 0.97
|
retriever_weight: 0.97
|
||||||
edge_defaults: ["references", "depends_on"]
|
|
||||||
detection_keywords: ["umsetzen", "planen", "starten", "bauen", "abschließen"]
|
detection_keywords: ["umsetzen", "planen", "starten", "bauen", "abschließen"]
|
||||||
schema:
|
schema:
|
||||||
- "Mission & Zielsetzung"
|
- "Mission & Zielsetzung"
|
||||||
|
|
@ -141,7 +125,6 @@ types:
|
||||||
decision:
|
decision:
|
||||||
chunking_profile: structured_smart_edges_strict
|
chunking_profile: structured_smart_edges_strict
|
||||||
retriever_weight: 1.00
|
retriever_weight: 1.00
|
||||||
edge_defaults: ["caused_by", "references"]
|
|
||||||
detection_keywords: ["entscheiden", "wählen", "abwägen", "priorisieren", "festlegen"]
|
detection_keywords: ["entscheiden", "wählen", "abwägen", "priorisieren", "festlegen"]
|
||||||
schema:
|
schema:
|
||||||
- "Kontext & Problemstellung"
|
- "Kontext & Problemstellung"
|
||||||
|
|
@ -149,12 +132,9 @@ types:
|
||||||
- "Die Entscheidung"
|
- "Die Entscheidung"
|
||||||
- "Begründung"
|
- "Begründung"
|
||||||
|
|
||||||
# --- PERSÖNLICHKEIT & IDENTITÄT ---
|
|
||||||
|
|
||||||
value:
|
value:
|
||||||
chunking_profile: structured_smart_edges_strict
|
chunking_profile: structured_smart_edges_strict
|
||||||
retriever_weight: 1.00
|
retriever_weight: 1.00
|
||||||
edge_defaults: ["related_to"]
|
|
||||||
detection_keywords: ["werten", "achten", "verpflichten", "bedeuten"]
|
detection_keywords: ["werten", "achten", "verpflichten", "bedeuten"]
|
||||||
schema:
|
schema:
|
||||||
- "Definition"
|
- "Definition"
|
||||||
|
|
@ -164,7 +144,6 @@ types:
|
||||||
principle:
|
principle:
|
||||||
chunking_profile: structured_smart_edges_strict_L3
|
chunking_profile: structured_smart_edges_strict_L3
|
||||||
retriever_weight: 0.95
|
retriever_weight: 0.95
|
||||||
edge_defaults: ["derived_from", "references"]
|
|
||||||
detection_keywords: ["leiten", "steuern", "ausrichten", "handhaben"]
|
detection_keywords: ["leiten", "steuern", "ausrichten", "handhaben"]
|
||||||
schema:
|
schema:
|
||||||
- "Das Prinzip"
|
- "Das Prinzip"
|
||||||
|
|
@ -173,7 +152,6 @@ types:
|
||||||
trait:
|
trait:
|
||||||
chunking_profile: structured_smart_edges_strict
|
chunking_profile: structured_smart_edges_strict
|
||||||
retriever_weight: 1.10
|
retriever_weight: 1.10
|
||||||
edge_defaults: ["related_to"]
|
|
||||||
detection_keywords: ["begeistern", "können", "auszeichnen", "befähigen", "stärken"]
|
detection_keywords: ["begeistern", "können", "auszeichnen", "befähigen", "stärken"]
|
||||||
schema:
|
schema:
|
||||||
- "Eigenschaft / Talent"
|
- "Eigenschaft / Talent"
|
||||||
|
|
@ -183,7 +161,6 @@ types:
|
||||||
obstacle:
|
obstacle:
|
||||||
chunking_profile: structured_smart_edges_strict
|
chunking_profile: structured_smart_edges_strict
|
||||||
retriever_weight: 1.00
|
retriever_weight: 1.00
|
||||||
edge_defaults: ["blocks", "related_to"]
|
|
||||||
detection_keywords: ["blockieren", "fürchten", "vermeiden", "hindern", "zweifeln"]
|
detection_keywords: ["blockieren", "fürchten", "vermeiden", "hindern", "zweifeln"]
|
||||||
schema:
|
schema:
|
||||||
- "Beschreibung der Hürde"
|
- "Beschreibung der Hürde"
|
||||||
|
|
@ -194,7 +171,6 @@ types:
|
||||||
belief:
|
belief:
|
||||||
chunking_profile: sliding_short
|
chunking_profile: sliding_short
|
||||||
retriever_weight: 0.90
|
retriever_weight: 0.90
|
||||||
edge_defaults: ["related_to"]
|
|
||||||
detection_keywords: ["glauben", "meinen", "annehmen", "überzeugen"]
|
detection_keywords: ["glauben", "meinen", "annehmen", "überzeugen"]
|
||||||
schema:
|
schema:
|
||||||
- "Der Glaubenssatz"
|
- "Der Glaubenssatz"
|
||||||
|
|
@ -203,18 +179,15 @@ types:
|
||||||
profile:
|
profile:
|
||||||
chunking_profile: structured_smart_edges_strict
|
chunking_profile: structured_smart_edges_strict
|
||||||
retriever_weight: 0.70
|
retriever_weight: 0.70
|
||||||
edge_defaults: ["references", "related_to"]
|
|
||||||
detection_keywords: ["verkörpern", "verantworten", "agieren", "repräsentieren"]
|
detection_keywords: ["verkörpern", "verantworten", "agieren", "repräsentieren"]
|
||||||
schema:
|
schema:
|
||||||
- "Rolle / Identität"
|
- "Rolle / Identität"
|
||||||
- "Fakten & Daten"
|
- "Fakten & Daten"
|
||||||
- "Historie"
|
- "Historie"
|
||||||
|
|
||||||
|
|
||||||
idea:
|
idea:
|
||||||
chunking_profile: sliding_short
|
chunking_profile: sliding_short
|
||||||
retriever_weight: 0.70
|
retriever_weight: 0.70
|
||||||
edge_defaults: ["leads_to", "references"]
|
|
||||||
detection_keywords: ["einfall", "gedanke", "potenzial", "möglichkeit"]
|
detection_keywords: ["einfall", "gedanke", "potenzial", "möglichkeit"]
|
||||||
schema:
|
schema:
|
||||||
- "Der Kerngedanke"
|
- "Der Kerngedanke"
|
||||||
|
|
@ -224,7 +197,6 @@ types:
|
||||||
skill:
|
skill:
|
||||||
chunking_profile: sliding_smart_edges
|
chunking_profile: sliding_smart_edges
|
||||||
retriever_weight: 0.90
|
retriever_weight: 0.90
|
||||||
edge_defaults: ["references", "related_to"]
|
|
||||||
detection_keywords: ["lernen", "beherrschen", "üben", "fertigkeit", "kompetenz"]
|
detection_keywords: ["lernen", "beherrschen", "üben", "fertigkeit", "kompetenz"]
|
||||||
schema:
|
schema:
|
||||||
- "Definition der Fähigkeit"
|
- "Definition der Fähigkeit"
|
||||||
|
|
@ -234,7 +206,6 @@ types:
|
||||||
habit:
|
habit:
|
||||||
chunking_profile: sliding_short
|
chunking_profile: sliding_short
|
||||||
retriever_weight: 0.85
|
retriever_weight: 0.85
|
||||||
edge_defaults: ["related_to", "triggered_by"]
|
|
||||||
detection_keywords: ["gewohnheit", "routine", "automatismus", "immer wenn"]
|
detection_keywords: ["gewohnheit", "routine", "automatismus", "immer wenn"]
|
||||||
schema:
|
schema:
|
||||||
- "Auslöser (Trigger)"
|
- "Auslöser (Trigger)"
|
||||||
|
|
@ -245,7 +216,6 @@ types:
|
||||||
need:
|
need:
|
||||||
chunking_profile: sliding_smart_edges
|
chunking_profile: sliding_smart_edges
|
||||||
retriever_weight: 1.05
|
retriever_weight: 1.05
|
||||||
edge_defaults: ["related_to", "impacts"]
|
|
||||||
detection_keywords: ["bedürfnis", "brauchen", "mangel", "erfüllung"]
|
detection_keywords: ["bedürfnis", "brauchen", "mangel", "erfüllung"]
|
||||||
schema:
|
schema:
|
||||||
- "Das Bedürfnis"
|
- "Das Bedürfnis"
|
||||||
|
|
@ -255,7 +225,6 @@ types:
|
||||||
motivation:
|
motivation:
|
||||||
chunking_profile: sliding_smart_edges
|
chunking_profile: sliding_smart_edges
|
||||||
retriever_weight: 0.95
|
retriever_weight: 0.95
|
||||||
edge_defaults: ["drives", "references"]
|
|
||||||
detection_keywords: ["motivation", "antrieb", "warum", "energie"]
|
detection_keywords: ["motivation", "antrieb", "warum", "energie"]
|
||||||
schema:
|
schema:
|
||||||
- "Der Antrieb"
|
- "Der Antrieb"
|
||||||
|
|
@ -265,86 +234,68 @@ types:
|
||||||
bias:
|
bias:
|
||||||
chunking_profile: sliding_short
|
chunking_profile: sliding_short
|
||||||
retriever_weight: 0.80
|
retriever_weight: 0.80
|
||||||
edge_defaults: ["affects", "related_to"]
|
|
||||||
detection_keywords: ["denkfehler", "verzerrung", "vorurteil", "falle"]
|
detection_keywords: ["denkfehler", "verzerrung", "vorurteil", "falle"]
|
||||||
schema: ["Beschreibung der Verzerrung", "Typische Situationen", "Gegenstrategie"]
|
schema: ["Beschreibung der Verzerrung", "Typische Situationen", "Gegenstrategie"]
|
||||||
|
|
||||||
state:
|
state:
|
||||||
chunking_profile: sliding_short
|
chunking_profile: sliding_short
|
||||||
retriever_weight: 0.60
|
retriever_weight: 0.60
|
||||||
edge_defaults: ["impacts"]
|
|
||||||
detection_keywords: ["stimmung", "energie", "gefühl", "verfassung"]
|
detection_keywords: ["stimmung", "energie", "gefühl", "verfassung"]
|
||||||
schema: ["Aktueller Zustand", "Auslöser", "Auswirkung auf den Tag"]
|
schema: ["Aktueller Zustand", "Auslöser", "Auswirkung auf den Tag"]
|
||||||
|
|
||||||
boundary:
|
boundary:
|
||||||
chunking_profile: sliding_smart_edges
|
chunking_profile: sliding_smart_edges
|
||||||
retriever_weight: 0.90
|
retriever_weight: 0.90
|
||||||
edge_defaults: ["protects", "related_to"]
|
|
||||||
detection_keywords: ["grenze", "nein sagen", "limit", "schutz"]
|
detection_keywords: ["grenze", "nein sagen", "limit", "schutz"]
|
||||||
schema: ["Die Grenze", "Warum sie wichtig ist", "Konsequenz bei Verletzung"]
|
schema: ["Die Grenze", "Warum sie wichtig ist", "Konsequenz bei Verletzung"]
|
||||||
# --- STRATEGIE & RISIKO ---
|
|
||||||
|
|
||||||
goal:
|
goal:
|
||||||
chunking_profile: sliding_smart_edges
|
chunking_profile: sliding_smart_edges
|
||||||
retriever_weight: 0.95
|
retriever_weight: 0.95
|
||||||
edge_defaults: ["depends_on", "related_to"]
|
|
||||||
schema: ["Zielzustand", "Zeitrahmen & KPIs", "Motivation"]
|
schema: ["Zielzustand", "Zeitrahmen & KPIs", "Motivation"]
|
||||||
|
|
||||||
risk:
|
risk:
|
||||||
chunking_profile: sliding_short
|
chunking_profile: sliding_short
|
||||||
retriever_weight: 0.85
|
retriever_weight: 0.85
|
||||||
edge_defaults: ["related_to", "blocks"]
|
|
||||||
detection_keywords: ["risiko", "gefahr", "bedrohung"]
|
detection_keywords: ["risiko", "gefahr", "bedrohung"]
|
||||||
schema: ["Beschreibung des Risikos", "Auswirkungen", "Gegenmaßnahmen"]
|
schema: ["Beschreibung des Risikos", "Auswirkungen", "Gegenmaßnahmen"]
|
||||||
|
|
||||||
# --- BASIS & WISSEN ---
|
|
||||||
|
|
||||||
concept:
|
concept:
|
||||||
chunking_profile: sliding_smart_edges
|
chunking_profile: sliding_smart_edges
|
||||||
retriever_weight: 0.60
|
retriever_weight: 0.60
|
||||||
edge_defaults: ["references", "related_to"]
|
|
||||||
schema: ["Definition", "Kontext", "Verwandte Konzepte"]
|
schema: ["Definition", "Kontext", "Verwandte Konzepte"]
|
||||||
|
|
||||||
task:
|
task:
|
||||||
chunking_profile: sliding_short
|
chunking_profile: sliding_short
|
||||||
retriever_weight: 0.80
|
retriever_weight: 0.80
|
||||||
edge_defaults: ["depends_on", "part_of"]
|
|
||||||
schema: ["Aufgabe", "Kontext", "Definition of Done"]
|
schema: ["Aufgabe", "Kontext", "Definition of Done"]
|
||||||
|
|
||||||
journal:
|
journal:
|
||||||
chunking_profile: sliding_standard
|
chunking_profile: sliding_standard
|
||||||
retriever_weight: 0.80
|
retriever_weight: 0.80
|
||||||
edge_defaults: ["references", "related_to"]
|
|
||||||
schema: ["Log-Eintrag", "Gedanken"]
|
schema: ["Log-Eintrag", "Gedanken"]
|
||||||
|
|
||||||
source:
|
source:
|
||||||
chunking_profile: sliding_standard
|
chunking_profile: sliding_standard
|
||||||
retriever_weight: 0.50
|
retriever_weight: 0.50
|
||||||
edge_defaults: []
|
|
||||||
schema: ["Metadaten", "Zusammenfassung", "Zitate"]
|
schema: ["Metadaten", "Zusammenfassung", "Zitate"]
|
||||||
|
|
||||||
glossary:
|
glossary:
|
||||||
chunking_profile: sliding_short
|
chunking_profile: sliding_short
|
||||||
retriever_weight: 0.40
|
retriever_weight: 0.40
|
||||||
edge_defaults: ["related_to"]
|
|
||||||
schema: ["Begriff", "Definition"]
|
schema: ["Begriff", "Definition"]
|
||||||
|
|
||||||
person:
|
person:
|
||||||
chunking_profile: sliding_standard
|
chunking_profile: sliding_standard
|
||||||
retriever_weight: 0.50
|
retriever_weight: 0.50
|
||||||
edge_defaults: ["related_to"]
|
|
||||||
schema: ["Rolle", "Beziehung", "Kontext"]
|
schema: ["Rolle", "Beziehung", "Kontext"]
|
||||||
|
|
||||||
event:
|
event:
|
||||||
chunking_profile: sliding_standard
|
chunking_profile: sliding_standard
|
||||||
retriever_weight: 0.60
|
retriever_weight: 0.60
|
||||||
edge_defaults: ["related_to"]
|
|
||||||
schema: ["Datum & Ort", "Teilnehmer", "Ergebnisse"]
|
schema: ["Datum & Ort", "Teilnehmer", "Ergebnisse"]
|
||||||
|
|
||||||
# --- FALLBACK ---
|
|
||||||
|
|
||||||
default:
|
default:
|
||||||
chunking_profile: sliding_standard
|
chunking_profile: sliding_standard
|
||||||
retriever_weight: 1.00
|
retriever_weight: 1.00
|
||||||
edge_defaults: ["references"]
|
|
||||||
schema: ["Inhalt"]
|
schema: ["Inhalt"]
|
||||||
Loading…
Reference in New Issue
Block a user