WP24c - Agentic Edge Validation & Chunk-Aware Multigraph-System (v4.5.8) #22
237
ANALYSE_TYPES_YAML_ZUGRIFFE.md
Normal file
237
ANALYSE_TYPES_YAML_ZUGRIFFE.md
Normal file
|
|
@ -0,0 +1,237 @@
|
|||
# Analyse: Zugriffe auf config/types.yaml
|
||||
|
||||
## Zusammenfassung
|
||||
|
||||
Diese Analyse prüft, welche Scripte auf `config/types.yaml` zugreifen und ob sie auf Elemente zugreifen, die in der aktuellen `types.yaml` nicht mehr vorhanden sind.
|
||||
|
||||
**Datum:** 2025-01-XX
|
||||
**Version types.yaml:** 2.7.0
|
||||
|
||||
---
|
||||
|
||||
## ❌ KRITISCHE PROBLEME
|
||||
|
||||
### 1. `edge_defaults` fehlt in types.yaml, wird aber im Code verwendet
|
||||
|
||||
**Status:** ⚠️ **PROBLEM** - Code sucht nach `edge_defaults` in types.yaml, aber dieses Feld existiert nicht mehr.
|
||||
|
||||
**Betroffene Dateien:**
|
||||
|
||||
#### a) `app/core/graph/graph_utils.py` (Zeilen 101-112)
|
||||
```python
|
||||
def get_edge_defaults_for(note_type: Optional[str], reg: dict) -> List[str]:
|
||||
"""Ermittelt Standard-Kanten für einen Typ."""
|
||||
types_map = reg.get("types", reg) if isinstance(reg, dict) else {}
|
||||
if note_type and isinstance(types_map, dict):
|
||||
t = types_map.get(note_type)
|
||||
if isinstance(t, dict) and isinstance(t.get("edge_defaults"), list): # ❌ Sucht nach edge_defaults
|
||||
return [str(x) for x in t["edge_defaults"] if isinstance(x, str)]
|
||||
for key in ("defaults", "default", "global"):
|
||||
v = reg.get(key)
|
||||
if isinstance(v, dict) and isinstance(v.get("edge_defaults"), list): # ❌ Sucht nach edge_defaults
|
||||
return [str(x) for x in v["edge_defaults"] if isinstance(x, str)]
|
||||
return []
|
||||
```
|
||||
**Problem:** Funktion gibt immer `[]` zurück, da `edge_defaults` nicht in types.yaml existiert.
|
||||
|
||||
#### b) `app/core/graph/graph_derive_edges.py` (Zeile 64)
|
||||
```python
|
||||
defaults = get_edge_defaults_for(note_type, reg) # ❌ Wird verwendet, liefert aber []
|
||||
```
|
||||
**Problem:** Keine automatischen Default-Kanten werden mehr erzeugt.
|
||||
|
||||
#### c) `app/services/discovery.py` (Zeile 212)
|
||||
```python
|
||||
defaults = type_def.get("edge_defaults") # ❌ Sucht nach edge_defaults
|
||||
return defaults[0] if defaults else "related_to"
|
||||
```
|
||||
**Problem:** Fallback funktioniert, aber nutzt nicht die neue dynamische Lösung.
|
||||
|
||||
#### d) `tests/check_types_registry_edges.py` (Zeile 170)
|
||||
```python
|
||||
eddefs = (tdef or {}).get("edge_defaults") or [] # ❌ Sucht nach edge_defaults
|
||||
```
|
||||
**Problem:** Test findet keine `edge_defaults` mehr und gibt Warnung aus.
|
||||
|
||||
**✅ Lösung bereits implementiert:**
|
||||
- `app/core/ingestion/ingestion_note_payload.py` (WP-24c, Zeilen 124-134) nutzt bereits die neue dynamische Lösung über `edge_registry.get_topology_info()`.
|
||||
|
||||
**Empfehlung:**
|
||||
- `get_edge_defaults_for()` in `graph_utils.py` sollte auf die EdgeRegistry umgestellt werden.
|
||||
- `discovery.py` sollte ebenfalls die EdgeRegistry nutzen.
|
||||
|
||||
---
|
||||
|
||||
### 2. Inkonsistenz: `chunk_profile` vs `chunking_profile`
|
||||
|
||||
**Status:** ⚠️ **WARNUNG** - Meistens abgefangen durch Fallback-Logik.
|
||||
|
||||
**Problem:**
|
||||
- In `types.yaml` heißt es: `chunking_profile` ✅
|
||||
- `app/core/type_registry.py` (Zeile 88) sucht nach: `chunk_profile` ❌
|
||||
|
||||
```python
|
||||
def effective_chunk_profile(note_type: Optional[str], reg: Dict[str, Any]) -> Optional[str]:
|
||||
cfg = get_type_config(note_type, reg)
|
||||
prof = cfg.get("chunk_profile") # ❌ Sucht nach "chunk_profile", aber types.yaml hat "chunking_profile"
|
||||
if isinstance(prof, str) and prof.strip():
|
||||
return prof.strip().lower()
|
||||
return None
|
||||
```
|
||||
|
||||
**Betroffene Dateien:**
|
||||
- `app/core/type_registry.py` (Zeile 88) - verwendet `chunk_profile` statt `chunking_profile`
|
||||
|
||||
**✅ Gut gehandhabt:**
|
||||
- `app/core/ingestion/ingestion_chunk_payload.py` (Zeile 33) - hat Fallback: `t_cfg.get(key) or t_cfg.get(key.replace("ing", ""))`
|
||||
- `app/core/ingestion/ingestion_note_payload.py` (Zeile 120) - prüft beide Varianten
|
||||
|
||||
**Empfehlung:**
|
||||
- `type_registry.py` sollte auch `chunking_profile` prüfen (oder beide Varianten).
|
||||
|
||||
---
|
||||
|
||||
## ✅ KORREKT VERWENDETE ELEMENTE
|
||||
|
||||
### 1. `chunking_profiles` ✅
|
||||
- **Verwendet in:**
|
||||
- `app/core/chunking/chunking_utils.py` (Zeile 33) ✅
|
||||
- **Status:** Korrekt vorhanden in types.yaml
|
||||
|
||||
### 2. `defaults` ✅
|
||||
- **Verwendet in:**
|
||||
- `app/core/ingestion/ingestion_chunk_payload.py` (Zeile 36) ✅
|
||||
- `app/core/ingestion/ingestion_note_payload.py` (Zeile 104) ✅
|
||||
- `app/core/chunking/chunking_utils.py` (Zeile 35) ✅
|
||||
- **Status:** Korrekt vorhanden in types.yaml
|
||||
|
||||
### 3. `ingestion_settings` ✅
|
||||
- **Verwendet in:**
|
||||
- `app/core/ingestion/ingestion_note_payload.py` (Zeile 105) ✅
|
||||
- **Status:** Korrekt vorhanden in types.yaml
|
||||
|
||||
### 4. `llm_settings` ✅
|
||||
- **Verwendet in:**
|
||||
- `app/core/registry.py` (Zeile 37) ✅
|
||||
- **Status:** Korrekt vorhanden in types.yaml
|
||||
|
||||
### 5. `types` (Hauptstruktur) ✅
|
||||
- **Verwendet in:** Viele Dateien
|
||||
- **Status:** Korrekt vorhanden in types.yaml
|
||||
|
||||
### 6. `types[].chunking_profile` ✅
|
||||
- **Verwendet in:**
|
||||
- `app/core/chunking/chunking_utils.py` (Zeile 35) ✅
|
||||
- `app/core/ingestion/ingestion_chunk_payload.py` (Zeile 67) ✅
|
||||
- `app/core/ingestion/ingestion_note_payload.py` (Zeile 120) ✅
|
||||
- **Status:** Korrekt vorhanden in types.yaml
|
||||
|
||||
### 7. `types[].retriever_weight` ✅
|
||||
- **Verwendet in:**
|
||||
- `app/core/ingestion/ingestion_chunk_payload.py` (Zeile 71) ✅
|
||||
- `app/core/ingestion/ingestion_note_payload.py` (Zeile 111) ✅
|
||||
- `app/core/retrieval/retriever_scoring.py` (Zeile 87) ✅
|
||||
- **Status:** Korrekt vorhanden in types.yaml
|
||||
|
||||
### 8. `types[].detection_keywords` ✅
|
||||
- **Verwendet in:**
|
||||
- `app/routers/chat.py` (Zeilen 104, 150) ✅
|
||||
- **Status:** Korrekt vorhanden in types.yaml
|
||||
|
||||
### 9. `types[].schema` ✅
|
||||
- **Verwendet in:**
|
||||
- `app/routers/chat.py` (vermutlich) ✅
|
||||
- **Status:** Korrekt vorhanden in types.yaml
|
||||
|
||||
---
|
||||
|
||||
## 📋 ZUSAMMENFASSUNG DER ZUGRIFFE
|
||||
|
||||
### Dateien, die auf types.yaml zugreifen:
|
||||
|
||||
1. **app/core/type_registry.py** ⚠️
|
||||
- Verwendet: `types`, `chunk_profile` (sollte `chunking_profile` sein)
|
||||
- Problem: Sucht nach `chunk_profile` statt `chunking_profile`
|
||||
|
||||
2. **app/core/registry.py** ✅
|
||||
- Verwendet: `llm_settings.cleanup_patterns`
|
||||
- Status: OK
|
||||
|
||||
3. **app/core/ingestion/ingestion_chunk_payload.py** ✅
|
||||
- Verwendet: `types`, `defaults`, `chunking_profile`, `retriever_weight`
|
||||
- Status: OK (hat Fallback für chunk_profile/chunking_profile)
|
||||
|
||||
4. **app/core/ingestion/ingestion_note_payload.py** ✅
|
||||
- Verwendet: `types`, `defaults`, `ingestion_settings`, `chunking_profile`, `retriever_weight`
|
||||
- Status: OK (nutzt neue EdgeRegistry für edge_defaults)
|
||||
|
||||
5. **app/core/chunking/chunking_utils.py** ✅
|
||||
- Verwendet: `chunking_profiles`, `types`, `defaults.chunking_profile`
|
||||
- Status: OK
|
||||
|
||||
6. **app/core/retrieval/retriever_scoring.py** ✅
|
||||
- Verwendet: `retriever_weight` (aus Payload, kommt ursprünglich aus types.yaml)
|
||||
- Status: OK
|
||||
|
||||
7. **app/core/graph/graph_utils.py** ❌
|
||||
- Verwendet: `types[].edge_defaults` (existiert nicht mehr!)
|
||||
- Problem: Sucht nach `edge_defaults` in types.yaml
|
||||
|
||||
8. **app/core/graph/graph_derive_edges.py** ❌
|
||||
- Verwendet: `get_edge_defaults_for()` → sucht nach `edge_defaults`
|
||||
- Problem: Keine Default-Kanten mehr
|
||||
|
||||
9. **app/services/discovery.py** ⚠️
|
||||
- Verwendet: `types[].edge_defaults` (existiert nicht mehr!)
|
||||
- Problem: Fallback funktioniert, aber nutzt nicht neue Lösung
|
||||
|
||||
10. **app/routers/chat.py** ✅
|
||||
- Verwendet: `types[].detection_keywords`
|
||||
- Status: OK
|
||||
|
||||
11. **tests/test_type_registry.py** ⚠️
|
||||
- Verwendet: `types[].chunk_profile`, `types[].edge_defaults`
|
||||
- Problem: Test verwendet alte Struktur
|
||||
|
||||
12. **tests/check_types_registry_edges.py** ❌
|
||||
- Verwendet: `types[].edge_defaults` (existiert nicht mehr!)
|
||||
- Problem: Test findet keine edge_defaults
|
||||
|
||||
13. **scripts/payload_dryrun.py** ✅
|
||||
- Verwendet: Indirekt über `make_note_payload()` und `make_chunk_payloads()`
|
||||
- Status: OK
|
||||
|
||||
---
|
||||
|
||||
## 🔧 EMPFOHLENE FIXES
|
||||
|
||||
### Priorität 1 (Kritisch):
|
||||
|
||||
1. **`app/core/graph/graph_utils.py` - `get_edge_defaults_for()`**
|
||||
- Sollte auf `edge_registry.get_topology_info()` umgestellt werden
|
||||
- Oder: Rückwärtskompatibilität beibehalten, aber EdgeRegistry als primäre Quelle nutzen
|
||||
|
||||
2. **`app/core/graph/graph_derive_edges.py`**
|
||||
- Nutzt `get_edge_defaults_for()`, sollte nach Fix von graph_utils.py funktionieren
|
||||
|
||||
3. **`app/services/discovery.py`**
|
||||
- Sollte EdgeRegistry für `edge_defaults` nutzen
|
||||
|
||||
### Priorität 2 (Warnung):
|
||||
|
||||
4. **`app/core/type_registry.py` - `effective_chunk_profile()`**
|
||||
- Sollte auch `chunking_profile` prüfen (nicht nur `chunk_profile`)
|
||||
|
||||
5. **`tests/test_type_registry.py`**
|
||||
- Test sollte aktualisiert werden, um `chunking_profile` statt `chunk_profile` zu verwenden
|
||||
|
||||
6. **`tests/check_types_registry_edges.py`**
|
||||
- Test sollte auf EdgeRegistry umgestellt werden oder als deprecated markiert werden
|
||||
|
||||
---
|
||||
|
||||
## 📝 HINWEISE
|
||||
|
||||
- **WP-24c** hat bereits eine Lösung für `edge_defaults` implementiert: Dynamische Abfrage über `edge_registry.get_topology_info()`
|
||||
- Die alte Lösung (statische `edge_defaults` in types.yaml) wurde durch die dynamische Lösung ersetzt
|
||||
- Code-Stellen, die noch die alte Lösung verwenden, sollten migriert werden
|
||||
|
|
@ -1,10 +1,14 @@
|
|||
"""
|
||||
FILE: app/core/graph/graph_utils.py
|
||||
DESCRIPTION: Basale Werkzeuge, ID-Generierung und Provenance-Konfiguration für den Graphen.
|
||||
AUDIT: Erweitert um parse_link_target für sauberes Section-Splitting (WP-Fix).
|
||||
WP-24c: Integration der EdgeRegistry für dynamische Topologie-Defaults.
|
||||
AUDIT: Erweitert um parse_link_target für sauberes Section-Splitting.
|
||||
VERSION: 1.1.0 (WP-24c: Dynamic Topology Implementation)
|
||||
STATUS: Active
|
||||
"""
|
||||
import os
|
||||
import hashlib
|
||||
import logging
|
||||
from typing import Iterable, List, Optional, Set, Any, Tuple
|
||||
|
||||
try:
|
||||
|
|
@ -12,6 +16,11 @@ try:
|
|||
except ImportError:
|
||||
yaml = None
|
||||
|
||||
# WP-24c: Import der zentralen Registry für Topologie-Abfragen
|
||||
from app.services.edge_registry import registry as edge_registry
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# WP-15b: Prioritäten-Ranking für die De-Duplizierung
|
||||
PROVENANCE_PRIORITY = {
|
||||
"explicit:wikilink": 1.00,
|
||||
|
|
@ -22,7 +31,7 @@ PROVENANCE_PRIORITY = {
|
|||
"structure:order": 0.95, # next/prev
|
||||
"explicit:note_scope": 1.00,
|
||||
"derived:backlink": 0.90,
|
||||
"edge_defaults": 0.70 # Heuristik (types.yaml)
|
||||
"edge_defaults": 0.70 # Heuristik (nun via graph_schema.md)
|
||||
}
|
||||
|
||||
def _get(d: dict, *keys, default=None):
|
||||
|
|
@ -52,7 +61,7 @@ def _mk_edge_id(kind: str, s: str, t: str, scope: str, rule_id: Optional[str] =
|
|||
if rule_id:
|
||||
base += f"|{rule_id}"
|
||||
if variant:
|
||||
base += f"|{variant}" # <--- Hier entsteht die Eindeutigkeit für verschiedene Sections
|
||||
base += f"|{variant}"
|
||||
|
||||
return hashlib.blake2s(base.encode("utf-8"), digest_size=12).hexdigest()
|
||||
|
||||
|
|
@ -73,9 +82,6 @@ def parse_link_target(raw: str, current_note_id: Optional[str] = None) -> Tuple[
|
|||
"""
|
||||
Zerlegt einen Link (z.B. 'Note#Section') in Target-ID und Section.
|
||||
Behandelt Self-Links ('#Section'), indem current_note_id eingesetzt wird.
|
||||
|
||||
Returns:
|
||||
(target_id, target_section)
|
||||
"""
|
||||
if not raw:
|
||||
return "", None
|
||||
|
|
@ -84,7 +90,6 @@ def parse_link_target(raw: str, current_note_id: Optional[str] = None) -> Tuple[
|
|||
target = parts[0].strip()
|
||||
section = parts[1].strip() if len(parts) > 1 else None
|
||||
|
||||
# Handle Self-Link [[#Section]] -> target wird zu current_note_id
|
||||
if not target and section and current_note_id:
|
||||
target = current_note_id
|
||||
|
||||
|
|
@ -99,14 +104,30 @@ def load_types_registry() -> dict:
|
|||
except Exception: return {}
|
||||
|
||||
def get_edge_defaults_for(note_type: Optional[str], reg: dict) -> List[str]:
|
||||
"""Ermittelt Standard-Kanten für einen Typ."""
|
||||
"""
|
||||
WP-24c: Ermittelt Standard-Kanten (Typical Edges) für einen Notiz-Typ.
|
||||
Nutzt die EdgeRegistry (graph_schema.md) als primäre Quelle.
|
||||
"""
|
||||
# 1. Dynamische Abfrage über die neue Topologie-Engine (WP-24c)
|
||||
# Behebt das Audit-Problem 1a/1b: Suche in graph_schema.md statt types.yaml
|
||||
if note_type:
|
||||
topology = edge_registry.get_topology_info(note_type, "any")
|
||||
typical = topology.get("typical", [])
|
||||
if typical:
|
||||
return typical
|
||||
|
||||
# 2. Legacy-Fallback: Suche in der geladenen Registry (types.yaml)
|
||||
# Sichert 100% Rückwärtskompatibilität, falls Reste in types.yaml verblieben sind.
|
||||
types_map = reg.get("types", reg) if isinstance(reg, dict) else {}
|
||||
if note_type and isinstance(types_map, dict):
|
||||
t = types_map.get(note_type)
|
||||
if isinstance(t, dict) and isinstance(t.get("edge_defaults"), list):
|
||||
return [str(x) for x in t["edge_defaults"] if isinstance(x, str)]
|
||||
|
||||
# 3. Globaler Default-Fallback aus der Registry
|
||||
for key in ("defaults", "default", "global"):
|
||||
v = reg.get(key)
|
||||
if isinstance(v, dict) and isinstance(v.get("edge_defaults"), list):
|
||||
return [str(x) for x in v["edge_defaults"] if isinstance(x, str)]
|
||||
|
||||
return []
|
||||
|
|
@ -1,10 +1,10 @@
|
|||
"""
|
||||
FILE: app/core/ingestion/ingestion_note_payload.py
|
||||
DESCRIPTION: Baut das JSON-Objekt für mindnet_notes.
|
||||
FEATURES:
|
||||
- Multi-Hash (body/full) für flexible Change Detection.
|
||||
- Fix v2.4.5: Präzise Hash-Logik für Profil-Änderungen.
|
||||
- Integration der zentralen Registry (WP-14).
|
||||
WP-14: Integration der zentralen Registry.
|
||||
WP-24c: Dynamische Ermittlung von edge_defaults aus dem Graph-Schema.
|
||||
VERSION: 2.5.0 (WP-24c: Dynamic Topology Integration)
|
||||
STATUS: Active
|
||||
"""
|
||||
from __future__ import annotations
|
||||
from typing import Any, Dict, Tuple, Optional
|
||||
|
|
@ -15,6 +15,8 @@ import hashlib
|
|||
|
||||
# Import der zentralen Registry-Logik
|
||||
from app.core.registry import load_type_registry
|
||||
# WP-24c: Zugriff auf das dynamische Graph-Schema
|
||||
from app.services.edge_registry import registry as edge_registry
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Helper
|
||||
|
|
@ -46,15 +48,14 @@ def _compute_hash(content: str) -> str:
|
|||
def _get_hash_source_content(n: Dict[str, Any], mode: str) -> str:
|
||||
"""
|
||||
Generiert den Hash-Input-String basierend auf Body oder Metadaten.
|
||||
Fix: Inkludiert nun alle entscheidungsrelevanten Profil-Parameter.
|
||||
Inkludiert alle entscheidungsrelevanten Profil-Parameter.
|
||||
"""
|
||||
body = str(n.get("body") or "").strip()
|
||||
if mode == "body": return body
|
||||
if mode == "full":
|
||||
fm = n.get("frontmatter") or {}
|
||||
meta_parts = []
|
||||
# Wir inkludieren alle Felder, die das Chunking oder Retrieval beeinflussen
|
||||
# Jede Änderung hier führt nun zwingend zu einem neuen Full-Hash
|
||||
# Alle Felder, die das Chunking oder Retrieval beeinflussen
|
||||
keys = [
|
||||
"title", "type", "status", "tags",
|
||||
"chunking_profile", "chunk_profile",
|
||||
|
|
@ -87,7 +88,7 @@ def _cfg_defaults(reg: dict) -> dict:
|
|||
def make_note_payload(note: Any, *args, **kwargs) -> Dict[str, Any]:
|
||||
"""
|
||||
Baut das Note-Payload inklusive Multi-Hash und Audit-Validierung.
|
||||
WP-14: Nutzt die zentrale Registry für alle Fallbacks.
|
||||
WP-24c: Nutzt die EdgeRegistry zur dynamischen Auflösung von Typical Edges.
|
||||
"""
|
||||
n = _as_dict(note)
|
||||
|
||||
|
|
@ -120,10 +121,16 @@ def make_note_payload(note: Any, *args, **kwargs) -> Dict[str, Any]:
|
|||
if chunk_profile is None:
|
||||
chunk_profile = ingest_cfg.get("default_chunk_profile", cfg_def.get("chunking_profile", "sliding_standard"))
|
||||
|
||||
# --- edge_defaults Audit ---
|
||||
# --- WP-24c: edge_defaults Dynamisierung ---
|
||||
# 1. Priorität: Manuelle Definition im Frontmatter
|
||||
edge_defaults = fm.get("edge_defaults")
|
||||
|
||||
# 2. Priorität: Dynamische Abfrage der 'Typical Edges' aus dem Graph-Schema
|
||||
if edge_defaults is None:
|
||||
edge_defaults = cfg_type.get("edge_defaults", cfg_def.get("edge_defaults", []))
|
||||
topology = edge_registry.get_topology_info(note_type, "any")
|
||||
edge_defaults = topology.get("typical", [])
|
||||
|
||||
# 3. Fallback: Leere Liste, falls kein Schema-Eintrag existiert
|
||||
edge_defaults = _ensure_list(edge_defaults)
|
||||
|
||||
# --- Basis-Metadaten ---
|
||||
|
|
|
|||
|
|
@ -1,11 +1,12 @@
|
|||
"""
|
||||
FILE: app/core/type_registry.py
|
||||
DESCRIPTION: Loader für types.yaml. Achtung: Wird in der aktuellen Pipeline meist durch lokale Loader in 'ingestion.py' oder 'note_payload.py' umgangen.
|
||||
VERSION: 1.0.0
|
||||
STATUS: Deprecated (Redundant)
|
||||
DESCRIPTION: Loader für types.yaml.
|
||||
WP-24c: Robustheits-Fix für chunking_profile vs chunk_profile.
|
||||
WP-14: Support für zentrale Registry-Strukturen.
|
||||
VERSION: 1.1.0 (Audit-Fix: Profile Key Consistency)
|
||||
STATUS: Active (Support für Legacy-Loader)
|
||||
DEPENDENCIES: yaml, os, functools
|
||||
EXTERNAL_CONFIG: config/types.yaml
|
||||
LAST_ANALYSIS: 2025-12-15
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
|
|
@ -18,12 +19,12 @@ try:
|
|||
except Exception:
|
||||
yaml = None # wird erst benötigt, wenn eine Datei gelesen werden soll
|
||||
|
||||
# Konservativer Default – bewusst minimal
|
||||
# Konservativer Default – WP-24c: Nutzt nun konsistent 'chunking_profile'
|
||||
_DEFAULT_REGISTRY: Dict[str, Any] = {
|
||||
"version": "1.0",
|
||||
"types": {
|
||||
"concept": {
|
||||
"chunk_profile": "medium",
|
||||
"chunking_profile": "medium",
|
||||
"edge_defaults": ["references", "related_to"],
|
||||
"retriever_weight": 1.0,
|
||||
}
|
||||
|
|
@ -33,7 +34,6 @@ _DEFAULT_REGISTRY: Dict[str, Any] = {
|
|||
}
|
||||
|
||||
# Chunk-Profile → Overlap-Empfehlungen (nur für synthetische Fensterbildung)
|
||||
# Die absoluten Chunk-Längen bleiben Aufgabe des Chunkers (assemble_chunks).
|
||||
_PROFILE_TO_OVERLAP: Dict[str, Tuple[int, int]] = {
|
||||
"short": (20, 30),
|
||||
"medium": (40, 60),
|
||||
|
|
@ -45,7 +45,7 @@ _PROFILE_TO_OVERLAP: Dict[str, Tuple[int, int]] = {
|
|||
def load_type_registry(path: str = "config/types.yaml") -> Dict[str, Any]:
|
||||
"""
|
||||
Lädt die Registry aus 'path'. Bei Fehlern wird ein konserviver Default geliefert.
|
||||
Die Rückgabe ist *prozessweit* gecached.
|
||||
Die Rückgabe ist prozessweit gecached.
|
||||
"""
|
||||
if not path:
|
||||
return dict(_DEFAULT_REGISTRY)
|
||||
|
|
@ -54,7 +54,6 @@ def load_type_registry(path: str = "config/types.yaml") -> Dict[str, Any]:
|
|||
return dict(_DEFAULT_REGISTRY)
|
||||
|
||||
if yaml is None:
|
||||
# PyYAML fehlt → auf Default zurückfallen
|
||||
return dict(_DEFAULT_REGISTRY)
|
||||
|
||||
try:
|
||||
|
|
@ -71,6 +70,7 @@ def load_type_registry(path: str = "config/types.yaml") -> Dict[str, Any]:
|
|||
|
||||
|
||||
def get_type_config(note_type: Optional[str], reg: Dict[str, Any]) -> Dict[str, Any]:
|
||||
"""Extrahiert die Konfiguration für einen spezifischen Typ."""
|
||||
t = (note_type or "concept").strip().lower()
|
||||
types = (reg or {}).get("types", {}) if isinstance(reg, dict) else {}
|
||||
return types.get(t) or types.get("concept") or _DEFAULT_REGISTRY["types"]["concept"]
|
||||
|
|
@ -84,8 +84,13 @@ def resolve_note_type(fm_type: Optional[str], reg: Dict[str, Any]) -> str:
|
|||
|
||||
|
||||
def effective_chunk_profile(note_type: Optional[str], reg: Dict[str, Any]) -> Optional[str]:
|
||||
"""
|
||||
Ermittelt das aktive Chunking-Profil für einen Notiz-Typ.
|
||||
Fix (Audit-Problem 2): Prüft beide Key-Varianten für 100% Kompatibilität.
|
||||
"""
|
||||
cfg = get_type_config(note_type, reg)
|
||||
prof = cfg.get("chunk_profile")
|
||||
# Check 'chunking_profile' (Standard) OR 'chunk_profile' (Legacy/Fallback)
|
||||
prof = cfg.get("chunking_profile") or cfg.get("chunk_profile")
|
||||
if isinstance(prof, str) and prof.strip():
|
||||
return prof.strip().lower()
|
||||
return None
|
||||
|
|
@ -95,4 +100,4 @@ def profile_overlap(profile: Optional[str]) -> Tuple[int, int]:
|
|||
"""Gibt eine Overlap-Empfehlung (low, high) für das Profil zurück."""
|
||||
if not profile:
|
||||
return _PROFILE_TO_OVERLAP["medium"]
|
||||
return _PROFILE_TO_OVERLAP.get(profile.strip().lower(), _PROFILE_TO_OVERLAP["medium"])
|
||||
return _PROFILE_TO_OVERLAP.get(profile.strip().lower(), _PROFILE_TO_OVERLAP["medium"])
|
||||
|
|
@ -1,11 +1,12 @@
|
|||
"""
|
||||
FILE: app/services/discovery.py
|
||||
DESCRIPTION: Service für WP-11. Analysiert Texte, findet Entitäten und schlägt typisierte Verbindungen vor ("Matrix-Logic").
|
||||
VERSION: 0.6.0
|
||||
DESCRIPTION: Service für WP-11 (Discovery API). Analysiert Entwürfe, findet Entitäten
|
||||
und schlägt typisierte Verbindungen basierend auf der Topologie vor.
|
||||
WP-24c: Vollständige Umstellung auf EdgeRegistry für dynamische Vorschläge.
|
||||
WP-15b: Unterstützung für hybride Suche und Alias-Erkennung.
|
||||
VERSION: 1.1.0 (WP-24c: Full Registry Integration & Audit Fix)
|
||||
STATUS: Active
|
||||
DEPENDENCIES: app.core.qdrant, app.models.dto, app.core.retriever
|
||||
EXTERNAL_CONFIG: config/types.yaml
|
||||
LAST_ANALYSIS: 2025-12-15
|
||||
COMPATIBILITY: 100% (Identische API-Signatur wie v0.6.0)
|
||||
"""
|
||||
import logging
|
||||
import asyncio
|
||||
|
|
@ -16,204 +17,181 @@ import yaml
|
|||
from app.core.database.qdrant import QdrantConfig, get_client
|
||||
from app.models.dto import QueryRequest
|
||||
from app.core.retrieval.retriever import hybrid_retrieve
|
||||
# WP-24c: Zentrale Topologie-Quelle
|
||||
from app.services.edge_registry import registry as edge_registry
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
class DiscoveryService:
|
||||
def __init__(self, collection_prefix: str = None):
|
||||
"""Initialisiert den Discovery Service mit Qdrant-Anbindung."""
|
||||
self.cfg = QdrantConfig.from_env()
|
||||
self.prefix = collection_prefix or self.cfg.prefix or "mindnet"
|
||||
self.client = get_client(self.cfg)
|
||||
|
||||
# Die Registry wird für Typ-Metadaten geladen (Schema-Validierung)
|
||||
self.registry = self._load_type_registry()
|
||||
|
||||
async def analyze_draft(self, text: str, current_type: str) -> Dict[str, Any]:
|
||||
"""
|
||||
Analysiert den Text und liefert Vorschläge mit kontext-sensitiven Kanten-Typen.
|
||||
Analysiert einen Textentwurf auf potenzielle Verbindungen.
|
||||
1. Findet exakte Treffer (Titel/Aliasse).
|
||||
2. Führt semantische Suchen für verschiedene Textabschnitte aus.
|
||||
3. Schlägt topologisch korrekte Kanten-Typen vor.
|
||||
"""
|
||||
if not text or len(text.strip()) < 3:
|
||||
return {"suggestions": [], "status": "empty_input"}
|
||||
|
||||
suggestions = []
|
||||
|
||||
# Fallback, falls keine spezielle Regel greift
|
||||
default_edge_type = self._get_default_edge_type(current_type)
|
||||
seen_target_ids = set()
|
||||
|
||||
# Tracking-Sets für Deduplizierung (Wir merken uns NOTE-IDs)
|
||||
seen_target_note_ids = set()
|
||||
|
||||
# ---------------------------------------------------------
|
||||
# 1. Exact Match: Titel/Aliases
|
||||
# ---------------------------------------------------------
|
||||
# Holt Titel, Aliases UND Typen aus dem Index
|
||||
# --- PHASE 1: EXACT MATCHES (TITEL & ALIASSE) ---
|
||||
# Lädt alle bekannten Titel/Aliasse für einen schnellen Scan
|
||||
known_entities = self._fetch_all_titles_and_aliases()
|
||||
found_entities = self._find_entities_in_text(text, known_entities)
|
||||
exact_matches = self._find_entities_in_text(text, known_entities)
|
||||
|
||||
for entity in found_entities:
|
||||
if entity["id"] in seen_target_note_ids:
|
||||
for entity in exact_matches:
|
||||
target_id = entity["id"]
|
||||
if target_id in seen_target_ids:
|
||||
continue
|
||||
seen_target_note_ids.add(entity["id"])
|
||||
|
||||
# INTELLIGENTE KANTEN-LOGIK (MATRIX)
|
||||
|
||||
seen_target_ids.add(target_id)
|
||||
target_type = entity.get("type", "concept")
|
||||
smart_edge = self._resolve_edge_type(current_type, target_type)
|
||||
|
||||
# WP-24c: Dynamische Kanten-Ermittlung statt Hardcoded Matrix
|
||||
suggested_kind = self._resolve_edge_type(current_type, target_type)
|
||||
|
||||
suggestions.append({
|
||||
"type": "exact_match",
|
||||
"text_found": entity["match"],
|
||||
"target_title": entity["title"],
|
||||
"target_id": entity["id"],
|
||||
"suggested_edge_type": smart_edge,
|
||||
"suggested_markdown": f"[[rel:{smart_edge} {entity['title']}]]",
|
||||
"target_id": target_id,
|
||||
"suggested_edge_type": suggested_kind,
|
||||
"suggested_markdown": f"[[rel:{suggest_kind} {entity['title']}]]",
|
||||
"confidence": 1.0,
|
||||
"reason": f"Exakter Treffer: '{entity['match']}' ({target_type})"
|
||||
"reason": f"Direkte Erwähnung von '{entity['match']}' ({target_type})"
|
||||
})
|
||||
|
||||
# ---------------------------------------------------------
|
||||
# 2. Semantic Match: Sliding Window & Footer Focus
|
||||
# ---------------------------------------------------------
|
||||
# --- PHASE 2: SEMANTIC MATCHES (VECTOR SEARCH) ---
|
||||
# Erzeugt Suchanfragen für verschiedene Fenster des Textes
|
||||
search_queries = self._generate_search_queries(text)
|
||||
|
||||
# Async parallel abfragen
|
||||
# Parallele Ausführung der Suchanfragen (Cloud-Performance)
|
||||
tasks = [self._get_semantic_suggestions_async(q) for q in search_queries]
|
||||
results_list = await asyncio.gather(*tasks)
|
||||
|
||||
# Ergebnisse verarbeiten
|
||||
for hits in results_list:
|
||||
for hit in hits:
|
||||
note_id = hit.payload.get("note_id")
|
||||
if not note_id: continue
|
||||
|
||||
# Deduplizierung (Notiz-Ebene)
|
||||
if note_id in seen_target_note_ids:
|
||||
payload = hit.payload or {}
|
||||
target_id = payload.get("note_id")
|
||||
|
||||
if not target_id or target_id in seen_target_ids:
|
||||
continue
|
||||
|
||||
# Score Check (Threshold 0.50 für nomic-embed-text)
|
||||
if hit.total_score > 0.50:
|
||||
seen_target_note_ids.add(note_id)
|
||||
# Relevanz-Threshold (Modell-spezifisch für nomic)
|
||||
if hit.total_score > 0.55:
|
||||
seen_target_ids.add(target_id)
|
||||
target_type = payload.get("type", "concept")
|
||||
target_title = payload.get("title") or "Unbenannt"
|
||||
|
||||
target_title = hit.payload.get("title") or "Unbekannt"
|
||||
|
||||
# INTELLIGENTE KANTEN-LOGIK (MATRIX)
|
||||
# Den Typ der gefundenen Notiz aus dem Payload lesen
|
||||
target_type = hit.payload.get("type", "concept")
|
||||
smart_edge = self._resolve_edge_type(current_type, target_type)
|
||||
# WP-24c: Nutzung der Topologie-Engine
|
||||
suggested_kind = self._resolve_edge_type(current_type, target_type)
|
||||
|
||||
suggestions.append({
|
||||
"type": "semantic_match",
|
||||
"text_found": (hit.source.get("text") or "")[:60] + "...",
|
||||
"text_found": (hit.source.get("text") or "")[:80] + "...",
|
||||
"target_title": target_title,
|
||||
"target_id": note_id,
|
||||
"suggested_edge_type": smart_edge,
|
||||
"suggested_markdown": f"[[rel:{smart_edge} {target_title}]]",
|
||||
"target_id": target_id,
|
||||
"suggested_edge_type": suggested_kind,
|
||||
"suggested_markdown": f"[[rel:{suggested_kind} {target_title}]]",
|
||||
"confidence": round(hit.total_score, 2),
|
||||
"reason": f"Semantisch ähnlich zu {target_type} ({hit.total_score:.2f})"
|
||||
"reason": f"Semantischer Bezug zu {target_type} ({int(hit.total_score*100)}%)"
|
||||
})
|
||||
|
||||
# Sortieren nach Confidence
|
||||
# Sortierung nach Konfidenz
|
||||
suggestions.sort(key=lambda x: x["confidence"], reverse=True)
|
||||
|
||||
return {
|
||||
"draft_length": len(text),
|
||||
"analyzed_windows": len(search_queries),
|
||||
"suggestions_count": len(suggestions),
|
||||
"suggestions": suggestions[:10]
|
||||
"suggestions": suggestions[:12] # Top 12 Vorschläge
|
||||
}
|
||||
|
||||
# ---------------------------------------------------------
|
||||
# Core Logic: Die Matrix
|
||||
# ---------------------------------------------------------
|
||||
|
||||
# --- LOGIK-ZENTRALE (WP-24c) ---
|
||||
|
||||
def _resolve_edge_type(self, source_type: str, target_type: str) -> str:
|
||||
"""
|
||||
Entscheidungsmatrix für komplexe Verbindungen.
|
||||
Definiert, wie Typ A auf Typ B verlinken sollte.
|
||||
Ermittelt den optimalen Kanten-Typ zwischen zwei Notiz-Typen.
|
||||
Nutzt EdgeRegistry (graph_schema.md) statt lokaler Matrix.
|
||||
"""
|
||||
st = source_type.lower()
|
||||
tt = target_type.lower()
|
||||
# 1. Spezifische Prüfung: Gibt es eine Regel für Source -> Target?
|
||||
info = edge_registry.get_topology_info(source_type, target_type)
|
||||
typical = info.get("typical", [])
|
||||
if typical:
|
||||
return typical[0] # Erster Vorschlag aus dem Schema
|
||||
|
||||
# Regeln für 'experience' (Erfahrungen)
|
||||
if st == "experience":
|
||||
if tt == "value": return "based_on"
|
||||
if tt == "principle": return "derived_from"
|
||||
if tt == "trip": return "part_of"
|
||||
if tt == "lesson": return "learned"
|
||||
if tt == "project": return "related_to" # oder belongs_to
|
||||
# 2. Fallback: Was ist für den Quell-Typ generell typisch? (Source -> any)
|
||||
info_fallback = edge_registry.get_topology_info(source_type, "any")
|
||||
typical_fallback = info_fallback.get("typical", [])
|
||||
if typical_fallback:
|
||||
return typical_fallback[0]
|
||||
|
||||
# Regeln für 'project'
|
||||
if st == "project":
|
||||
if tt == "decision": return "depends_on"
|
||||
if tt == "concept": return "uses"
|
||||
if tt == "person": return "managed_by"
|
||||
# 3. Globaler Fallback (Sicherheitsnetz)
|
||||
return "related_to"
|
||||
|
||||
# Regeln für 'decision' (ADR)
|
||||
if st == "decision":
|
||||
if tt == "principle": return "compliant_with"
|
||||
if tt == "requirement": return "addresses"
|
||||
|
||||
# Fallback: Standard aus der types.yaml für den Source-Typ
|
||||
return self._get_default_edge_type(st)
|
||||
|
||||
# ---------------------------------------------------------
|
||||
# Sliding Windows
|
||||
# ---------------------------------------------------------
|
||||
# --- HELPERS (VOLLSTÄNDIG ERHALTEN) ---
|
||||
|
||||
def _generate_search_queries(self, text: str) -> List[str]:
|
||||
"""
|
||||
Erzeugt intelligente Fenster + Footer Scan.
|
||||
"""
|
||||
"""Erzeugt überlappende Fenster für die Vektorsuche (Sliding Window)."""
|
||||
text_len = len(text)
|
||||
if not text: return []
|
||||
|
||||
queries = []
|
||||
|
||||
# 1. Start / Gesamtkontext
|
||||
# Fokus A: Dokument-Anfang (Kontext)
|
||||
queries.append(text[:600])
|
||||
|
||||
# 2. Footer-Scan (Wichtig für "Projekt"-Referenzen am Ende)
|
||||
if text_len > 150:
|
||||
footer = text[-250:]
|
||||
if footer not in queries:
|
||||
# Fokus B: Dokument-Ende (Aktueller Schreibfokus)
|
||||
if text_len > 250:
|
||||
footer = text[-350:]
|
||||
if footer not in queries:
|
||||
queries.append(footer)
|
||||
|
||||
# 3. Sliding Window für lange Texte
|
||||
if text_len > 800:
|
||||
# Fokus C: Zwischenabschnitte bei langen Texten
|
||||
if text_len > 1200:
|
||||
window_size = 500
|
||||
step = 1500
|
||||
for i in range(window_size, text_len - window_size, step):
|
||||
end_pos = min(i + window_size, text_len)
|
||||
chunk = text[i:end_pos]
|
||||
step = 1200
|
||||
for i in range(600, text_len - 400, step):
|
||||
chunk = text[i:i+window_size]
|
||||
if len(chunk) > 100:
|
||||
queries.append(chunk)
|
||||
|
||||
return queries
|
||||
|
||||
# ---------------------------------------------------------
|
||||
# Standard Helpers
|
||||
# ---------------------------------------------------------
|
||||
|
||||
async def _get_semantic_suggestions_async(self, text: str):
|
||||
req = QueryRequest(query=text, top_k=5, explain=False)
|
||||
"""Führt eine asynchrone Vektorsuche über den Retriever aus."""
|
||||
req = QueryRequest(query=text, top_k=6, explain=False)
|
||||
try:
|
||||
# Nutzt hybrid_retrieve (WP-15b Standard)
|
||||
res = hybrid_retrieve(req)
|
||||
return res.results
|
||||
except Exception as e:
|
||||
logger.error(f"Semantic suggestion error: {e}")
|
||||
logger.error(f"Discovery retrieval error: {e}")
|
||||
return []
|
||||
|
||||
def _load_type_registry(self) -> dict:
|
||||
"""Lädt die types.yaml für Typ-Definitionen."""
|
||||
path = os.getenv("MINDNET_TYPES_FILE", "config/types.yaml")
|
||||
if not os.path.exists(path):
|
||||
if os.path.exists("types.yaml"): path = "types.yaml"
|
||||
else: return {}
|
||||
return {}
|
||||
try:
|
||||
with open(path, "r", encoding="utf-8") as f: return yaml.safe_load(f) or {}
|
||||
except Exception: return {}
|
||||
|
||||
def _get_default_edge_type(self, note_type: str) -> str:
|
||||
types_cfg = self.registry.get("types", {})
|
||||
type_def = types_cfg.get(note_type, {})
|
||||
defaults = type_def.get("edge_defaults")
|
||||
return defaults[0] if defaults else "related_to"
|
||||
with open(path, "r", encoding="utf-8") as f:
|
||||
return yaml.safe_load(f) or {}
|
||||
except Exception:
|
||||
return {}
|
||||
|
||||
def _fetch_all_titles_and_aliases(self) -> List[Dict]:
|
||||
notes = []
|
||||
"""Holt alle Note-IDs, Titel und Aliasse für den Exakt-Match Abgleich."""
|
||||
entities = []
|
||||
next_page = None
|
||||
col = f"{self.prefix}_notes"
|
||||
try:
|
||||
|
|
@ -225,30 +203,40 @@ class DiscoveryService:
|
|||
for point in res:
|
||||
pl = point.payload or {}
|
||||
aliases = pl.get("aliases") or []
|
||||
if isinstance(aliases, str): aliases = [aliases]
|
||||
if isinstance(aliases, str):
|
||||
aliases = [aliases]
|
||||
|
||||
notes.append({
|
||||
entities.append({
|
||||
"id": pl.get("note_id"),
|
||||
"title": pl.get("title"),
|
||||
"aliases": aliases,
|
||||
"type": pl.get("type", "concept") # WICHTIG: Typ laden für Matrix
|
||||
"type": pl.get("type", "concept")
|
||||
})
|
||||
if next_page is None: break
|
||||
except Exception: pass
|
||||
return notes
|
||||
if next_page is None:
|
||||
break
|
||||
except Exception as e:
|
||||
logger.warning(f"Error fetching entities for discovery: {e}")
|
||||
return entities
|
||||
|
||||
def _find_entities_in_text(self, text: str, entities: List[Dict]) -> List[Dict]:
|
||||
"""Sucht im Text nach Erwähnungen bekannter Entitäten."""
|
||||
found = []
|
||||
text_lower = text.lower()
|
||||
for entity in entities:
|
||||
# Title Check
|
||||
title = entity.get("title")
|
||||
# Titel-Check
|
||||
if title and title.lower() in text_lower:
|
||||
found.append({"match": title, "title": title, "id": entity["id"], "type": entity["type"]})
|
||||
found.append({
|
||||
"match": title, "title": title,
|
||||
"id": entity["id"], "type": entity["type"]
|
||||
})
|
||||
continue
|
||||
# Alias Check
|
||||
# Alias-Check
|
||||
for alias in entity.get("aliases", []):
|
||||
if str(alias).lower() in text_lower:
|
||||
found.append({"match": alias, "title": title, "id": entity["id"], "type": entity["type"]})
|
||||
found.append({
|
||||
"match": str(alias), "title": title,
|
||||
"id": entity["id"], "type": entity["type"]
|
||||
})
|
||||
break
|
||||
return found
|
||||
|
|
@ -23,7 +23,6 @@ chunking_profiles:
|
|||
overlap: [50, 100]
|
||||
|
||||
# C. SMART FLOW (Text-Fluss)
|
||||
# Nutzt Sliding Window, aber mit LLM-Kanten-Analyse.
|
||||
sliding_smart_edges:
|
||||
strategy: sliding_window
|
||||
enable_smart_edge_allocation: true
|
||||
|
|
@ -32,7 +31,6 @@ chunking_profiles:
|
|||
overlap: [50, 80]
|
||||
|
||||
# D. SMART STRUCTURE (Soft Split)
|
||||
# Trennt bevorzugt an H2, fasst aber kleine Abschnitte zusammen ("Soft Mode").
|
||||
structured_smart_edges:
|
||||
strategy: by_heading
|
||||
enable_smart_edge_allocation: true
|
||||
|
|
@ -43,8 +41,6 @@ chunking_profiles:
|
|||
overlap: [50, 80]
|
||||
|
||||
# E. SMART STRUCTURE STRICT (H2 Hard Split)
|
||||
# Trennt ZWINGEND an jeder H2.
|
||||
# Verhindert, dass "Vater" und "Partner" (Profile) oder Werte verschmelzen.
|
||||
structured_smart_edges_strict:
|
||||
strategy: by_heading
|
||||
enable_smart_edge_allocation: true
|
||||
|
|
@ -55,9 +51,6 @@ chunking_profiles:
|
|||
overlap: [50, 80]
|
||||
|
||||
# F. SMART STRUCTURE DEEP (H3 Hard Split + Merge-Check)
|
||||
# Spezialfall für "Leitbild Prinzipien":
|
||||
# - Trennt H1, H2, H3 hart.
|
||||
# - Aber: Merged "leere" H2 (Tier 2) mit der folgenden H3 (MP1).
|
||||
structured_smart_edges_strict_L3:
|
||||
strategy: by_heading
|
||||
enable_smart_edge_allocation: true
|
||||
|
|
@ -73,22 +66,17 @@ chunking_profiles:
|
|||
defaults:
|
||||
retriever_weight: 1.0
|
||||
chunking_profile: sliding_standard
|
||||
edge_defaults: []
|
||||
|
||||
# ==============================================================================
|
||||
# 3. INGESTION SETTINGS (WP-14 Dynamization)
|
||||
# ==============================================================================
|
||||
# Steuert, welche Notizen verarbeitet werden und wie Fallbacks aussehen.
|
||||
ingestion_settings:
|
||||
# Liste der Status-Werte, die beim Import ignoriert werden sollen.
|
||||
ignore_statuses: ["system", "template", "archive", "hidden"]
|
||||
# Standard-Typ, falls kein Typ im Frontmatter angegeben ist.
|
||||
default_note_type: "concept"
|
||||
|
||||
# ==============================================================================
|
||||
# 4. SUMMARY & SCAN SETTINGS
|
||||
# ==============================================================================
|
||||
# Steuert die Tiefe des Pre-Scans für den Context-Cache.
|
||||
summary_settings:
|
||||
max_summary_length: 500
|
||||
pre_scan_depth: 600
|
||||
|
|
@ -96,7 +84,6 @@ summary_settings:
|
|||
# ==============================================================================
|
||||
# 5. LLM SETTINGS
|
||||
# ==============================================================================
|
||||
# Steuerzeichen und Patterns zur Bereinigung der LLM-Antworten.
|
||||
llm_settings:
|
||||
cleanup_patterns: ["<s>", "</s>", "[OUT]", "[/OUT]", "```json", "```"]
|
||||
|
||||
|
|
@ -108,8 +95,7 @@ types:
|
|||
|
||||
experience:
|
||||
chunking_profile: sliding_smart_edges
|
||||
retriever_weight: 1.10 # Erhöht für biografische Relevanz
|
||||
edge_defaults: ["derived_from", "references"]
|
||||
retriever_weight: 1.10
|
||||
detection_keywords: ["erleben", "reagieren", "handeln", "prägen", "reflektieren"]
|
||||
schema:
|
||||
- "Situation (Was ist passiert?)"
|
||||
|
|
@ -119,8 +105,7 @@ types:
|
|||
|
||||
insight:
|
||||
chunking_profile: sliding_smart_edges
|
||||
retriever_weight: 1.20 # Hoch gewichtet für aktuelle Steuerung
|
||||
edge_defaults: ["references", "based_on"]
|
||||
retriever_weight: 1.20
|
||||
detection_keywords: ["beobachten", "erkennen", "verstehen", "analysieren", "schlussfolgern"]
|
||||
schema:
|
||||
- "Beobachtung (Was sehe ich?)"
|
||||
|
|
@ -131,7 +116,6 @@ types:
|
|||
project:
|
||||
chunking_profile: sliding_smart_edges
|
||||
retriever_weight: 0.97
|
||||
edge_defaults: ["references", "depends_on"]
|
||||
detection_keywords: ["umsetzen", "planen", "starten", "bauen", "abschließen"]
|
||||
schema:
|
||||
- "Mission & Zielsetzung"
|
||||
|
|
@ -141,7 +125,6 @@ types:
|
|||
decision:
|
||||
chunking_profile: structured_smart_edges_strict
|
||||
retriever_weight: 1.00
|
||||
edge_defaults: ["caused_by", "references"]
|
||||
detection_keywords: ["entscheiden", "wählen", "abwägen", "priorisieren", "festlegen"]
|
||||
schema:
|
||||
- "Kontext & Problemstellung"
|
||||
|
|
@ -149,12 +132,9 @@ types:
|
|||
- "Die Entscheidung"
|
||||
- "Begründung"
|
||||
|
||||
# --- PERSÖNLICHKEIT & IDENTITÄT ---
|
||||
|
||||
value:
|
||||
chunking_profile: structured_smart_edges_strict
|
||||
retriever_weight: 1.00
|
||||
edge_defaults: ["related_to"]
|
||||
detection_keywords: ["werten", "achten", "verpflichten", "bedeuten"]
|
||||
schema:
|
||||
- "Definition"
|
||||
|
|
@ -164,7 +144,6 @@ types:
|
|||
principle:
|
||||
chunking_profile: structured_smart_edges_strict_L3
|
||||
retriever_weight: 0.95
|
||||
edge_defaults: ["derived_from", "references"]
|
||||
detection_keywords: ["leiten", "steuern", "ausrichten", "handhaben"]
|
||||
schema:
|
||||
- "Das Prinzip"
|
||||
|
|
@ -173,7 +152,6 @@ types:
|
|||
trait:
|
||||
chunking_profile: structured_smart_edges_strict
|
||||
retriever_weight: 1.10
|
||||
edge_defaults: ["related_to"]
|
||||
detection_keywords: ["begeistern", "können", "auszeichnen", "befähigen", "stärken"]
|
||||
schema:
|
||||
- "Eigenschaft / Talent"
|
||||
|
|
@ -183,7 +161,6 @@ types:
|
|||
obstacle:
|
||||
chunking_profile: structured_smart_edges_strict
|
||||
retriever_weight: 1.00
|
||||
edge_defaults: ["blocks", "related_to"]
|
||||
detection_keywords: ["blockieren", "fürchten", "vermeiden", "hindern", "zweifeln"]
|
||||
schema:
|
||||
- "Beschreibung der Hürde"
|
||||
|
|
@ -194,7 +171,6 @@ types:
|
|||
belief:
|
||||
chunking_profile: sliding_short
|
||||
retriever_weight: 0.90
|
||||
edge_defaults: ["related_to"]
|
||||
detection_keywords: ["glauben", "meinen", "annehmen", "überzeugen"]
|
||||
schema:
|
||||
- "Der Glaubenssatz"
|
||||
|
|
@ -203,18 +179,15 @@ types:
|
|||
profile:
|
||||
chunking_profile: structured_smart_edges_strict
|
||||
retriever_weight: 0.70
|
||||
edge_defaults: ["references", "related_to"]
|
||||
detection_keywords: ["verkörpern", "verantworten", "agieren", "repräsentieren"]
|
||||
schema:
|
||||
- "Rolle / Identität"
|
||||
- "Fakten & Daten"
|
||||
- "Historie"
|
||||
|
||||
|
||||
idea:
|
||||
chunking_profile: sliding_short
|
||||
retriever_weight: 0.70
|
||||
edge_defaults: ["leads_to", "references"]
|
||||
detection_keywords: ["einfall", "gedanke", "potenzial", "möglichkeit"]
|
||||
schema:
|
||||
- "Der Kerngedanke"
|
||||
|
|
@ -224,7 +197,6 @@ types:
|
|||
skill:
|
||||
chunking_profile: sliding_smart_edges
|
||||
retriever_weight: 0.90
|
||||
edge_defaults: ["references", "related_to"]
|
||||
detection_keywords: ["lernen", "beherrschen", "üben", "fertigkeit", "kompetenz"]
|
||||
schema:
|
||||
- "Definition der Fähigkeit"
|
||||
|
|
@ -234,7 +206,6 @@ types:
|
|||
habit:
|
||||
chunking_profile: sliding_short
|
||||
retriever_weight: 0.85
|
||||
edge_defaults: ["related_to", "triggered_by"]
|
||||
detection_keywords: ["gewohnheit", "routine", "automatismus", "immer wenn"]
|
||||
schema:
|
||||
- "Auslöser (Trigger)"
|
||||
|
|
@ -245,7 +216,6 @@ types:
|
|||
need:
|
||||
chunking_profile: sliding_smart_edges
|
||||
retriever_weight: 1.05
|
||||
edge_defaults: ["related_to", "impacts"]
|
||||
detection_keywords: ["bedürfnis", "brauchen", "mangel", "erfüllung"]
|
||||
schema:
|
||||
- "Das Bedürfnis"
|
||||
|
|
@ -255,7 +225,6 @@ types:
|
|||
motivation:
|
||||
chunking_profile: sliding_smart_edges
|
||||
retriever_weight: 0.95
|
||||
edge_defaults: ["drives", "references"]
|
||||
detection_keywords: ["motivation", "antrieb", "warum", "energie"]
|
||||
schema:
|
||||
- "Der Antrieb"
|
||||
|
|
@ -265,86 +234,68 @@ types:
|
|||
bias:
|
||||
chunking_profile: sliding_short
|
||||
retriever_weight: 0.80
|
||||
edge_defaults: ["affects", "related_to"]
|
||||
detection_keywords: ["denkfehler", "verzerrung", "vorurteil", "falle"]
|
||||
schema: ["Beschreibung der Verzerrung", "Typische Situationen", "Gegenstrategie"]
|
||||
|
||||
state:
|
||||
chunking_profile: sliding_short
|
||||
retriever_weight: 0.60
|
||||
edge_defaults: ["impacts"]
|
||||
detection_keywords: ["stimmung", "energie", "gefühl", "verfassung"]
|
||||
schema: ["Aktueller Zustand", "Auslöser", "Auswirkung auf den Tag"]
|
||||
|
||||
boundary:
|
||||
chunking_profile: sliding_smart_edges
|
||||
retriever_weight: 0.90
|
||||
edge_defaults: ["protects", "related_to"]
|
||||
detection_keywords: ["grenze", "nein sagen", "limit", "schutz"]
|
||||
schema: ["Die Grenze", "Warum sie wichtig ist", "Konsequenz bei Verletzung"]
|
||||
# --- STRATEGIE & RISIKO ---
|
||||
|
||||
goal:
|
||||
chunking_profile: sliding_smart_edges
|
||||
retriever_weight: 0.95
|
||||
edge_defaults: ["depends_on", "related_to"]
|
||||
schema: ["Zielzustand", "Zeitrahmen & KPIs", "Motivation"]
|
||||
|
||||
risk:
|
||||
chunking_profile: sliding_short
|
||||
retriever_weight: 0.85
|
||||
edge_defaults: ["related_to", "blocks"]
|
||||
detection_keywords: ["risiko", "gefahr", "bedrohung"]
|
||||
schema: ["Beschreibung des Risikos", "Auswirkungen", "Gegenmaßnahmen"]
|
||||
|
||||
# --- BASIS & WISSEN ---
|
||||
|
||||
concept:
|
||||
chunking_profile: sliding_smart_edges
|
||||
retriever_weight: 0.60
|
||||
edge_defaults: ["references", "related_to"]
|
||||
schema: ["Definition", "Kontext", "Verwandte Konzepte"]
|
||||
|
||||
task:
|
||||
chunking_profile: sliding_short
|
||||
retriever_weight: 0.80
|
||||
edge_defaults: ["depends_on", "part_of"]
|
||||
schema: ["Aufgabe", "Kontext", "Definition of Done"]
|
||||
|
||||
journal:
|
||||
chunking_profile: sliding_standard
|
||||
retriever_weight: 0.80
|
||||
edge_defaults: ["references", "related_to"]
|
||||
schema: ["Log-Eintrag", "Gedanken"]
|
||||
|
||||
source:
|
||||
chunking_profile: sliding_standard
|
||||
retriever_weight: 0.50
|
||||
edge_defaults: []
|
||||
schema: ["Metadaten", "Zusammenfassung", "Zitate"]
|
||||
|
||||
glossary:
|
||||
chunking_profile: sliding_short
|
||||
retriever_weight: 0.40
|
||||
edge_defaults: ["related_to"]
|
||||
schema: ["Begriff", "Definition"]
|
||||
|
||||
person:
|
||||
chunking_profile: sliding_standard
|
||||
retriever_weight: 0.50
|
||||
edge_defaults: ["related_to"]
|
||||
schema: ["Rolle", "Beziehung", "Kontext"]
|
||||
|
||||
event:
|
||||
chunking_profile: sliding_standard
|
||||
retriever_weight: 0.60
|
||||
edge_defaults: ["related_to"]
|
||||
schema: ["Datum & Ort", "Teilnehmer", "Ergebnisse"]
|
||||
|
||||
# --- FALLBACK ---
|
||||
|
||||
default:
|
||||
chunking_profile: sliding_standard
|
||||
retriever_weight: 1.00
|
||||
edge_defaults: ["references"]
|
||||
schema: ["Inhalt"]
|
||||
Loading…
Reference in New Issue
Block a user