Lars a9ce5a445b Enhance graph schema validation and edge handling

- Improved edge type extraction by refining the `load_graph_schema` function to utilize a comprehensive schema.
- Added new functions for validating intra-note edges against the schema and retrieving topology information.
- Enhanced logging for validation processes and updated documentation to reflect these changes.

2026-01-26 10:34:59 +01:00

8.0 KiB

Raw Blame History

WP-26 Anforderungen-Checkliste

Version: 1.3
Datum: 25. Januar 2026
Status: Implementierung abgeschlossen

Phase 1: Section-Types & Parsing

✅ FA-01: Neues Callout-Format `[!section]`

Status: ✅ Implementiert

Implementierung:

chunking_parser.py: Regex für [!section] Callout-Erkennung
State-Machine für current_section_type und section_introduced_at_level
Retroaktive Propagation via _propagate_section_type_backwards()

Dateien:

app/core/chunking/chunking_parser.py
app/core/chunking/chunking_models.py (RawBlock, Chunk)

Tests:

tests/test_wp26_section_types.py::TestSectionTypeRecognition

✅ FA-01b: Verschachtelte Edge-Callouts

Status: ✅ Implementiert

Implementierung:

graph_derive_edges.py: extract_callout_relations() unterstützt verschachtelte Callouts
Einrückungsebene (>>) wird korrekt erkannt

Dateien:

app/core/graph/graph_derive_edges.py

Tests:

tests/test_wp26_section_types.py::TestNestedEdgeCallouts

✅ FA-02: Scope-Beendigung

Status: ✅ Implementiert

Implementierung:

Scope endet bei Überschrift gleicher oder höherer Ebene
section_introduced_at_level Tracking

Dateien:

app/core/chunking/chunking_parser.py

Tests:

tests/test_wp26_section_types.py::TestSectionTypeScope

✅ FA-02b: Automatische Section-Erkennung

Status: ✅ Implementiert

Implementierung:

Neue Überschrift auf section_introduced_at_level erzeugt automatisch neue Section
Fallback auf note_type wenn kein [!section] Callout vorhanden

Dateien:

app/core/chunking/chunking_parser.py

Tests:

tests/test_wp26_section_types.py::TestAutomaticSectionRecognition

✅ FA-03: `type`-Feld-Befüllung

Status: ✅ Implementiert

Implementierung:

effective_type = section_type if section_type else note_type
Wird in ingestion_chunk_payload.py berechnet
type-Feld enthält immer den effektiven Typ

Dateien:

app/core/ingestion/ingestion_chunk_payload.py

Tests:

tests/test_wp26_section_types.py (implizit)

✅ FA-03b: Body-Section Handling

Status: ✅ Implementiert

Implementierung:

Textblöcke vor erstem [!section] erhalten section: "body"
section_type: None (Fallback auf note_type)

Dateien:

app/core/chunking/chunking_parser.py

✅ FA-04: Optionales Feld `note_type`

Status: ✅ Implementiert

Implementierung:

Neues Feld note_type im Chunk-Payload
Keyword-Index in Qdrant erstellt

Dateien:

app/core/ingestion/ingestion_chunk_payload.py
scripts/setup_mindnet_collections.py

Tests:

tests/test_wp26_section_types.py (implizit)

✅ FA-05: Block-Reference als Link-Format

Status: ✅ Implementiert

Implementierung:

parse_link_target() extrahiert Block-ID aus [[#^block-id]]
Unterstützt auch [[#Section Name ^block-id]] Format

Dateien:

app/core/graph/graph_utils.py

Tests:

tests/test_wp26_section_types.py::TestBlockIdParsing

✅ FA-06: Section-zu-Chunk-Mapping

Status: ✅ Implementiert

Implementierung:

Mapping erfolgt implizit über Block-IDs und Heading-Matches
parse_link_target() löst Section-Referenzen auf

Dateien:

app/core/graph/graph_derive_edges.py
app/core/graph/graph_utils.py

✅ FA-07: Edge-Erstellung für Intra-Note-Links

Status: ✅ Implementiert

Implementierung:

Intra-Note-Links werden zu Chunk-Scope Edges
scope: "chunk" für Intra-Note-Edges

Dateien:

app/core/graph/graph_derive_edges.py

✅ FA-07b: Metadaten-Erweiterung (`is_internal` Flag)

Status: ✅ Implementiert

Implementierung:

is_internal: True für Edges innerhalb derselben Note
Automatische Berechnung in graph_utils._edge()
Boolean-Index in Qdrant

Dateien:

app/core/graph/graph_utils.py
scripts/setup_mindnet_collections.py

Tests:

tests/test_wp26_section_types.py::TestIsInternalFlag

✅ FA-08: Default-Edges aus graph_schema.md

Status: ✅ Implementiert

Implementierung:

get_typical_edge_for() ermittelt Default-Edge aus Schema
Automatische Edge-Erstellung bei Section-Transitions
provenance: "rule", rule_id: "inferred:section_transition"

Dateien:

app/core/graph/graph_derive_edges.py
app/core/graph/graph_utils.py

Tests:

tests/test_wp26_section_types.py::TestAutomaticIntraNoteEdges
tests/test_wp26_section_types.py::TestGraphSchemaParser

Phase 2: Retriever-Anpassungen

✅ FA-09: Edge-Gewichtung für Intra-Note-Edges

Status: ✅ Implementiert

Implementierung:

internal_edge_boost und external_edge_boost in retriever.yaml
Boost wird in Subgraph.add_edge() angewendet

Dateien:

app/core/graph/graph_subgraph.py
config/retriever.yaml

Tests:

tests/test_wp26_phase2_retriever.py::TestIsInternalBoost

✅ FA-09b: Retrieval-Priorisierung (Section-Type vor Note-Type)

Status: ✅ Implementiert

Implementierung:

effective_type wird für retriever_weight Lookup verwendet
type-Feld enthält bereits den effektiven Typ

Dateien:

app/core/ingestion/ingestion_chunk_payload.py

✅ FA-10: Optionale Chunk-Level-Deduplizierung

Status: ✅ Implementiert

Implementierung:

aggregation.level in retriever.yaml ("note" oder "chunk")
max_chunks_per_note für Note-Level-Limitierung
Implementiert in retriever._score_and_pool_hits()

Dateien:

app/core/retrieval/retriever.py
config/retriever.yaml

Tests:

tests/test_wp26_phase2_retriever.py::TestNoteLevelAggregation
tests/test_wp26_phase2_retriever.py::TestChunkLevelAggregation

Phase 3: Schema-Validierung

✅ FA-12: Schema-Validierung gegen effektiven Chunk-Typ

Status: ✅ Implementiert

Implementierung:

validate_intra_note_edge() prüft gegen graph_schema.md
Verwendet effective_type (type-Feld) beider Chunks
get_topology_info() liefert typical und prohibited Listen
Integration in Ingestion-Pipeline (nach LLM-Validierung)

Dateien:

app/core/ingestion/ingestion_validation.py
app/core/graph/graph_utils.py
app/core/ingestion/ingestion_processor.py

Tests:

tests/test_wp26_phase3_validation.py

Verhalten:

Edge in prohibited → ❌ Abgelehnt (confidence: 0.0)
Edge in typical → ✅ Erlaubt (confidence: 1.0)
Edge atypisch → ✅ Erlaubt (confidence: 0.7)

Abwärtskompatibilität

✅ FA-11: Fallback-Verhalten

Status: ✅ Implementiert

Garantien:

Notes ohne [!section] Callouts funktionieren unverändert
Chunk.type = note_type (wie bisher)
Keine Breaking Changes für bestehende Notes

Zusammenfassung

Phase	Requirements	Status
Phase 1	FA-01 bis FA-08	✅ 8/8
Phase 2	FA-09, FA-09b, FA-10	✅ 3/3
Phase 3	FA-12	✅ 1/1
Kompatibilität	FA-11	✅ 1/1
GESAMT		✅ 13/13

Manuelle Tests

1. Umfassendes Test-Script ausführen

cd c:\Dev\cursor\mindnet
python scripts/test_wp26_comprehensive.py

2. Unit-Tests ausführen

# Alle WP-26 Tests
python -m pytest tests/test_wp26_section_types.py tests/test_wp26_phase2_retriever.py tests/test_wp26_phase3_validation.py -v

# Einzelne Phasen
python -m pytest tests/test_wp26_section_types.py -v
python -m pytest tests/test_wp26_phase2_retriever.py -v
python -m pytest tests/test_wp26_phase3_validation.py -v

3. Integrationstest mit echter Note

Erstelle Test-Note im Vault (siehe 05_WP26_Manual_Testing.md)
Importiere via scripts/import_markdown.py
Prüfe Chunks und Edges in Qdrant

Bekannte Einschränkungen

Block-ID-Stability: Obsidian aktualisiert Block-IDs nicht automatisch bei Umbenennung
Heading-Links: [[#Section Name]] funktioniert, aber [[#^block-id]] wird bevorzugt
Strict-Mode: Schema-Validierung im Strict-Mode lehnt atypische Edges ab (Standard: False)

Ende der Checkliste

8.0 KiB Raw Blame History

WP-26 Anforderungen-Checkliste

Phase 1: Section-Types & Parsing

✅ FA-01: Neues Callout-Format [!section]

✅ FA-01b: Verschachtelte Edge-Callouts

✅ FA-02: Scope-Beendigung

✅ FA-02b: Automatische Section-Erkennung

✅ FA-03: type-Feld-Befüllung

✅ FA-03b: Body-Section Handling

✅ FA-04: Optionales Feld note_type

✅ FA-05: Block-Reference als Link-Format

✅ FA-06: Section-zu-Chunk-Mapping

✅ FA-07: Edge-Erstellung für Intra-Note-Links

✅ FA-07b: Metadaten-Erweiterung (is_internal Flag)

✅ FA-08: Default-Edges aus graph_schema.md

Phase 2: Retriever-Anpassungen

✅ FA-09: Edge-Gewichtung für Intra-Note-Edges

✅ FA-09b: Retrieval-Priorisierung (Section-Type vor Note-Type)

✅ FA-10: Optionale Chunk-Level-Deduplizierung

Phase 3: Schema-Validierung

✅ FA-12: Schema-Validierung gegen effektiven Chunk-Typ

Abwärtskompatibilität

✅ FA-11: Fallback-Verhalten

Zusammenfassung

Manuelle Tests

1. Umfassendes Test-Script ausführen

2. Unit-Tests ausführen

3. Integrationstest mit echter Note

Bekannte Einschränkungen

8.0 KiB

Raw Blame History

✅ FA-01: Neues Callout-Format `[!section]`

✅ FA-03: `type`-Feld-Befüllung

✅ FA-04: Optionales Feld `note_type`

✅ FA-07b: Metadaten-Erweiterung (`is_internal` Flag)