Commit Graph

1149 Commits

Author SHA1 Message Date
af3cc0a254 Enhance chunking strategies and graph utilities for section-type transitions
- Implemented WP-26 v1.1: Section-Type-Wechsel erzwingt immer einen neuen Chunk, um konsistente Chunking-Verhalten bei unterschiedlichen section_types zu gewährleisten.
- Introduced automatic Intra-Note-Edges zwischen Sektionen mit unterschiedlichen Typen, um semantische Beziehungen zu erfassen.
- Updated graph utilities to support automatic edge type derivation based on section transitions.
- Added unit tests for section-type changes and automatic edge generation to ensure functionality and reliability.
2026-01-25 17:36:57 +01:00
cc258008dc Refactor provenance handling in EdgeDTO and graph utilities
- Updated provenance priorities and introduced a mapping from internal provenance values to EdgeDTO-compliant literals.
- Added a new function `normalize_provenance` to standardize internal provenance strings.
- Enhanced the `_edge` function to include an `is_internal` flag and provenance normalization.
- Modified the `EdgeDTO` model to include a new `source_hint` field for detailed provenance information and an `is_internal` flag for intra-note edges.
- Reduced the provenance options in `EdgeDTO` to valid literals, improving data integrity.
2026-01-25 16:27:09 +01:00
0d61a9e191 Update types.yaml to change chunking profiles and enhance detection keywords
All checks were successful
Deploy mindnet to llm-node / deploy (push) Successful in 3s
- Replaced 'sliding_smart_edges' with 'structured_smart_edges' for multiple types to improve data processing.
- Added detection keywords for 'goal', 'concept', 'task', 'journal', 'source', 'glossary', 'person', and 'event' to enhance retrieval capabilities.
- Adjusted retriever weights for consistency across types.
2026-01-21 07:17:20 +01:00
55d1a7e290 Update decision_engine.yaml to add new relationship attributes for enhanced edge configuration
All checks were successful
Deploy mindnet to llm-node / deploy (push) Successful in 3s
- Introduced 'upholds', 'violates', 'aligned_with', 'conflicts_with', 'supports', and 'contradicts' attributes to improve the decision engine's relationship handling.
- Added 'followed_by' and 'preceded_by' attributes to the facts_stream for better context in data relationships.
2026-01-20 12:36:10 +01:00
4537e65428 Update decision_engine.yaml to rename 'enforced_by' to 'depends_on' for clarity in edge boost configuration
All checks were successful
Deploy mindnet to llm-node / deploy (push) Successful in 4s
2026-01-20 11:34:39 +01:00
43327c1f6d Update documentation for causal retrieval concept
All checks were successful
Deploy mindnet to llm-node / deploy (push) Successful in 4s
- Added additional spacing for improved readability in the document.
- Ensured consistent formatting throughout the section on causal retrieval for Mindnet.
2026-01-15 11:49:31 +01:00
39a6998123 Implement Phase 3 Agentic Edge Validation and Chunk-Aware Multigraph-System
All checks were successful
Deploy mindnet to llm-node / deploy (push) Successful in 4s
- Introduced final validation gate for edges with candidate: prefix.
- Enabled automatic generation of mirror edges for explicit connections.
- Added support for Note-Scope zones to facilitate global connections.
- Enhanced section-based links in the multigraph system for improved edge handling.
- Updated documentation and added new ENV variables for configuration.
- Ensured no breaking changes for end users, maintaining full backward compatibility.
2026-01-14 22:26:12 +01:00
273c4c6919 Update default COLLECTION_PREFIX to "mindnet" for production environments, requiring explicit setting of COLLECTION_PREFIX=mindnet_dev in .env for development. This change enhances clarity and ensures proper environment configuration.
All checks were successful
Deploy mindnet to llm-node / deploy (push) Successful in 3s
2026-01-12 15:49:44 +01:00
2ed4488cf6 Enhance timeout handling and diagnostics in runtime service verification
All checks were successful
Deploy mindnet to llm-node / deploy (push) Successful in 5s
- Increased the timeout for LLM calls from 30 to 60 seconds to accommodate longer processing times.
- Added informative messages for potential timeout causes and troubleshooting tips to improve user awareness.
- Updated error handling to provide clearer feedback on query failures, emphasizing the resolution of the EdgeDTO issue.
2026-01-12 15:37:12 +01:00
36490425c5 Implement runtime check for EdgeDTO version support in health service
All checks were successful
Deploy mindnet to llm-node / deploy (push) Successful in 4s
- Added a verification step in the health endpoint to check if the service supports 'explicit:callout' for EdgeDTO, providing clearer diagnostics.
- Updated the health response to include messages based on the EdgeDTO version support status, enhancing user awareness of potential issues.
- Adjusted the test query endpoint to reflect the correct path for improved functionality.
2026-01-12 15:34:56 +01:00
b8cb8bb89b Add runtime check for EdgeDTO version support in health endpoint
All checks were successful
Deploy mindnet to llm-node / deploy (push) Successful in 4s
- Implemented a check in the health endpoint to determine if EdgeDTO supports explicit callouts, enhancing diagnostic capabilities.
- The check is non-critical and handles exceptions gracefully, ensuring the health response remains robust.
- Updated health response to include the new `edge_dto_supports_callout` field for better insight into EdgeDTO capabilities.
2026-01-12 15:31:38 +01:00
6d268d9dfb Enhance .env loading mechanism and EdgeDTO creation with error handling
All checks were successful
Deploy mindnet to llm-node / deploy (push) Successful in 4s
- Updated the .env loading process to first check for an explicit path, improving reliability in different working directories.
- Added logging for successful .env loading and fallback mechanisms.
- Enhanced EdgeDTO creation with robust error handling, including fallbacks for unsupported provenance values and logging of errors for better traceability.
2026-01-12 15:27:23 +01:00
df5f9b3fe4 angleichen der Default prefix für die Collections
All checks were successful
Deploy mindnet to llm-node / deploy (push) Successful in 3s
2026-01-12 15:02:23 +01:00
5e67cd470c Merge pull request 'Update deterministic sorting of semantic_groups in build_edges_for_note to handle None values correctly. Introduced a custom sort function to ensure consistent edge extraction across batches, preventing variance in edge counts.' (#23) from WP24c_BugFix into main
All checks were successful
Deploy mindnet to llm-node / deploy (push) Successful in 3s
Reviewed-on: #23
2026-01-12 11:42:01 +01:00
0b2a1f1a63 Update deterministic sorting of semantic_groups in build_edges_for_note to handle None values correctly. Introduced a custom sort function to ensure consistent edge extraction across batches, preventing variance in edge counts. 2026-01-12 11:31:20 +01:00
d0012355b9 Merge pull request 'WP24c - Agentic Edge Validation & Chunk-Aware Multigraph-System (v4.5.8)' (#22) from WP24c into main
All checks were successful
Deploy mindnet to llm-node / deploy (push) Successful in 4s
feat: Phase 3 Agentic Edge Validation & Chunk-Aware Multigraph-System (v4.5.8)

### Phase 3 Agentic Edge Validation
- Finales Validierungs-Gate für Kanten mit candidate: Präfix
- LLM-basierte semantische Prüfung gegen Kontext (Note-Scope vs. Chunk-Scope)
- Differenzierte Fehlerbehandlung: Transiente Fehler erlauben Kante, permanente Fehler lehnen ab
- Kontext-Optimierung: Note-Scope nutzt Note-Summary/Text, Chunk-Scope nutzt spezifischen Chunk-Text
- Implementierung in app/core/ingestion/ingestion_validation.py (v2.14.0)

### Automatische Spiegelkanten (Invers-Logik)
- Automatische Erzeugung von Spiegelkanten für explizite Verbindungen
- Phase 2 Batch-Injektion am Ende des Imports
- Authority-Check: Explizite Kanten haben Vorrang (keine Duplikate)
- Provenance Firewall: System-Kanten können nicht manuell überschrieben werden
- Implementierung in app/core/ingestion/ingestion_processor.py (v2.13.12)

### Note-Scope Zonen (v4.2.0)
- Globale Verbindungen für ganze Notizen (scope: note)
- Konfigurierbare Header-Namen via ENV-Variablen
- Höchste Priorität bei Duplikaten
- Phase 3 Validierung nutzt Note-Summary/Text für bessere Präzision
- Implementierung in app/core/graph/graph_derive_edges.py (v1.1.2)

### Chunk-Aware Multigraph-System
- Section-basierte Links: [[Note#Section]] wird präzise in target_id und target_section aufgeteilt
- Multigraph-Support: Mehrere Kanten zwischen denselben Knoten möglich (verschiedene Sections)
- Semantische Deduplizierung basierend auf src->tgt:kind@sec Key
- Metadaten-Persistenz: target_section, provenance, confidence bleiben erhalten

### Code-Komponenten
- app/core/ingestion/ingestion_validation.py: v2.14.0 (Phase 3 Validierung, Kontext-Optimierung)
- app/core/ingestion/ingestion_processor.py: v2.13.12 (Automatische Spiegelkanten, Authority-Check)
- app/core/graph/graph_derive_edges.py: v1.1.2 (Note-Scope Zonen, LLM-Validierung Zonen)
- app/core/chunking/chunking_processor.py: v2.13.0 (LLM-Validierung Zonen Erkennung)
- app/core/chunking/chunking_parser.py: v2.12.0 (Header-Level Erkennung, Zonen-Extraktion)

### Konfiguration
- Neue ENV-Variablen für konfigurierbare Header:
  - MINDNET_LLM_VALIDATION_HEADERS (Default: "Unzugeordnete Kanten,Edge Pool,Candidates")
  - MINDNET_LLM_VALIDATION_HEADER_LEVEL (Default: 3)
  - MINDNET_NOTE_SCOPE_ZONE_HEADERS (Default: "Smart Edges,Relationen,Global Links,Note-Level Relations,Globale Verbindungen")
  - MINDNET_NOTE_SCOPE_HEADER_LEVEL (Default: 2)
- config/llm_profiles.yaml: ingest_validator Profil für Phase 3 Validierung (Temperature 0.0)
- config/prompts.yaml: edge_validation Prompt für Phase 3 Validierung

### Dokumentation
- 01_knowledge_design.md: Automatische Spiegelkanten, Phase 3 Validierung, Note-Scope Zonen
- NOTE_SCOPE_ZONEN.md: Phase 3 Validierung integriert
- LLM_VALIDIERUNG_VON_LINKS.md: Phase 3 statt global_pool, Kontext-Optimierung
- 02_concept_graph_logic.md: Phase 3 Validierung, automatische Spiegelkanten, Note-Scope vs. Chunk-Scope
- 03_tech_data_model.md: candidate: Präfix, verified Status, virtual Flag, scope Feld
- 03_tech_configuration.md: Neue ENV-Variablen dokumentiert
- 04_admin_operations.md: Troubleshooting für Phase 3 Validierung und Note-Scope Links
- 05_testing_guide.md: WP-24c Test-Szenarien hinzugefügt
- 00_quality_checklist.md: WP-24c Features in Checkliste aufgenommen
- README.md: Version auf v4.5.8 aktualisiert, WP-24c Features verlinkt

### Breaking Changes
- Keine Breaking Changes für Endbenutzer
- Vollständige Rückwärtskompatibilität
- Bestehende Notizen funktionieren ohne Änderungen

### Migration
- Keine Migration erforderlich
- System funktioniert ohne Änderungen
- Optional: ENV-Variablen können für Custom-Header konfiguriert werden

---

**Status:**  WP-24c ist zu 100% implementiert und audit-geprüft.
**Nächster Schritt:** WP-25c (Kontext-Budgeting & Erweiterte Prompt-Optimierung).
```

---

## Zusammenfassung

Dieser Merge führt die **Phase 3 Agentic Edge Validation** und das **Chunk-Aware Multigraph-System** in MindNet ein. Das System validiert nun automatisch Kanten mit `candidate:` Präfix, erzeugt automatisch Spiegelkanten für explizite Verbindungen und unterstützt Note-Scope Zonen für globale Verbindungen.

**Kern-Features:**
- Phase 3 Agentic Edge Validation (finales Validierungs-Gate)
- Automatische Spiegelkanten (Invers-Logik)
- Note-Scope Zonen (globale Verbindungen)
- Chunk-Aware Multigraph-System (Section-basierte Links)

**Technische Integrität:**
- Alle Kanten durchlaufen Phase 3 Validierung (falls candidate: Präfix)
- Spiegelkanten werden automatisch erzeugt (Phase 2)
- Note-Scope Links haben höchste Priorität
- Kontext-Optimierung für bessere Validierungs-Genauigkeit

**Dokumentation:**
- Vollständige Aktualisierung aller relevanten Dokumente
- Neue ENV-Variablen dokumentiert
- Troubleshooting-Guide erweitert
- Test-Szenarien hinzugefügt

**Deployment:**
- Keine Breaking Changes
- Optional: ENV-Variablen für Custom-Header konfigurieren
- System funktioniert ohne Änderungen
2026-01-12 10:53:19 +01:00
1056078e6a Refactor ID collision logging in ingestion_processor.py for improved clarity and structure
Update the logging mechanism for ID collisions to include more structured metadata, enhancing the clarity of logged information. This change aims to facilitate easier analysis of conflicts during the ingestion process and improve overall traceability.
2026-01-12 10:07:24 +01:00
c42a76b3d7 Add dedicated logging for ID collisions in ingestion_processor.py
Implement a new method to log ID collisions into a separate file (logs/id_collisions.log) for manual analysis. This update captures relevant metadata in JSONL format, enhancing traceability during the ingestion process. The logging occurs when a conflict is detected between existing and new files sharing the same note_id, improving error handling and diagnostics.
2026-01-12 09:04:36 +01:00
ec9b3c68af Implement ID collision detection and enhance logging in ingestion_processor.py
Add a check for ID collisions during the ingestion process to prevent multiple files from using the same note_id. Update logging levels to DEBUG for detailed diagnostics on hash comparisons, body lengths, and frontmatter keys, improving traceability and debugging capabilities in the ingestion workflow.
2026-01-12 08:56:28 +01:00
f9118a36f8 Enhance logging in ingestion_processor.py to include normalized file path and note title
Update the logging statement to provide additional context during the ingestion process by including the normalized file path and note title. This change aims to improve traceability and debugging capabilities in the ingestion workflow.
2026-01-12 08:33:11 +01:00
e52eed40ca Refactor hash input handling in ingestion_processor.py to use dictionary format
Update the ingestion process to convert the parsed object to a dictionary before passing it to the hash input function. This change ensures compatibility with the updated function requirements and improves the accuracy of hash comparisons during ingestion workflows.
2026-01-12 08:21:21 +01:00
43641441ef Refactor hash input and body/frontmatter handling in ingestion_processor.py for improved accuracy
Update the ingestion process to utilize the parsed object instead of note_pl for hash input, body, and frontmatter extraction. This change ensures that the correct content is used for comparisons, enhancing the reliability of change detection diagnostics and improving overall ingestion accuracy.
2026-01-12 08:19:43 +01:00
c613d81846 Enhance logging in ingestion_processor.py for detailed change detection diagnostics
Add comprehensive logging for hash input, body length comparisons, and frontmatter key checks in the change detection process. This update aims to improve traceability and facilitate debugging by providing insights into potential discrepancies between new and old payloads during ingestion workflows.
2026-01-12 08:16:03 +01:00
de5db09b51 Update logging levels in ingestion_processor.py and import_markdown.py for improved visibility
Change debug logs to info and warning levels in ingestion_processor.py to enhance the visibility of change detection processes, including hash comparisons and artifact checks. Additionally, ensure .env is loaded before logging setup in import_markdown.py to correctly read the DEBUG environment variable. These adjustments aim to improve traceability and debugging during ingestion workflows.
2026-01-12 08:13:26 +01:00
7cb8fd6602 Enhance logging in ingestion_processor.py for improved change detection diagnostics
Add detailed debug and warning logs to the change detection process, providing insights into hash comparisons and artifact checks. This update aims to facilitate better traceability and debugging during ingestion, particularly when handling hash changes and missing hashes. The changes ensure that the ingestion workflow is more transparent and easier to troubleshoot.
2026-01-12 08:08:29 +01:00
6047e94964 Refactor edge processing in graph_derive_edges.py and ingestion_processor.py for consistency and efficiency
Implement deterministic sorting of semantic groups in graph_derive_edges.py to ensure consistent edge extraction across batches. Update ingestion_processor.py to enhance change detection logic, ensuring that hash checks are performed before artifact checks to prevent redundant processing. These changes improve the reliability and efficiency of the edge building and ingestion workflows.
2026-01-12 08:04:28 +01:00
78fbc9b31b Enhance ingestion_processor.py with path normalization and strict change detection
Implement path normalization to ensure consistent hash checks by converting file paths to absolute paths. Update change detection logic to handle hash comparisons more robustly, treating missing hashes as content changes for safety. This prevents redundant processing and improves efficiency in the ingestion workflow.
2026-01-12 07:53:03 +01:00
742792770c Implement Phase 3 Agentic Edge Validation in ingestion_processor.py and related documentation updates
Introduce a new method for persisting rejected edges for audit purposes, enhancing traceability and validation logic. Update the decision engine to utilize a generic fallback template for improved error handling during LLM validation. Revise documentation across multiple files to reflect the new versioning, context, and features related to Phase 3 validation, including automatic mirror edges and note-scope zones. This update ensures better graph integrity and validation accuracy in the ingestion process.
2026-01-12 07:45:54 +01:00
b19f91c3ee Refactor edge validation process in ingestion_processor.py
Remove LLM validation from the candidate edge processing loop, shifting it to a later phase for improved context handling. Introduce a new validation mechanism that aggregates note text for better decision-making and optimizes the validation criteria to include both rule IDs and provenance. Update logging to reflect the new validation phases and ensure rejected edges are not processed further. This enhances the overall efficiency and accuracy of edge validation during ingestion.
2026-01-11 21:47:11 +01:00
9b0d8c18cb Implement LLM validation for candidate edges in ingestion_processor.py
Enhance the edge validation process by introducing logic to validate edges with rule IDs starting with "candidate:". This includes extracting target IDs, validating against the entire note text, and updating rule IDs upon successful validation. Rejected edges are logged for traceability, improving the overall handling of edge data during ingestion.
2026-01-11 21:27:07 +01:00
f2a2f4d2df Refine LLM validation zone handling in graph_derive_edges.py
Enhance the extraction logic to store the zone status before header updates, ensuring accurate context during callout processing. Initialize the all_chunk_callout_keys set prior to its usage to prevent potential UnboundLocalError. These improvements contribute to more reliable edge construction and better handling of LLM validation zones.
2026-01-11 21:09:07 +01:00
ea0fd951f2 Enhance LLM validation zone extraction in graph_derive_edges.py
Implement support for H2 headers in LLM validation zone detection, allowing for improved flexibility in header recognition. Update the extraction logic to track zones during callout processing, ensuring accurate differentiation between LLM validation and standard zones. This enhancement improves the handling of callouts and their associated metadata, contributing to more precise edge construction.
2026-01-11 20:58:33 +01:00
c8c828c8a8 Add LLM validation zone extraction and configuration support in graph_derive_edges.py
Implement functions to extract LLM validation zones from Markdown, allowing for configurable header identification via environment variables. Enhance the existing note scope zone extraction to differentiate between note scope and LLM validation zones. Update edge building logic to handle LLM validation edges with a 'candidate:' prefix, ensuring proper processing and avoiding duplicates in global scans. This update improves the overall handling of edge data and enhances the flexibility of the extraction process.
2026-01-11 20:19:12 +01:00
716a063849 Enhance decision_engine.py to support context reuse during compression failures. Implement error handling to return original content when compression fails, ensuring robust fallback mechanisms without re-retrieval. Update logging for better traceability of compression and fallback processes, improving overall reliability in stream handling. 2026-01-11 19:14:15 +01:00
3dc81ade0f Update logging in decision_engine.py and retriever.py to use node_id as chunk_id and total_score instead of score for improved accuracy in debug statements. This change aligns with the new data structure introduced in version 4.5.4, enhancing traceability in retrieval processes. 2026-01-11 18:55:13 +01:00
1df89205ac Update EdgeDTO to support extended provenance values and modify explanation building in retriever.py to accommodate new provenance types. This enhances the handling of edge data for improved accuracy in retrieval processes. 2026-01-11 17:54:33 +01:00
2445f7cb2b Implement chunk-aware graph traversal in hybrid_retrieve: Extract both note_id and chunk_id from hits to enhance seed coverage for edge retrieval. Combine direct and additional chunk IDs for improved accuracy in subgraph expansion. Update debug logging to reflect the new seed and chunk ID handling, ensuring better traceability in graph retrieval processes. 2026-01-11 17:48:30 +01:00
47fdcf8eed Update logging in retriever.py for version 4.5.1: Modify edge count logging to utilize the adjacency list instead of the non-existent .edges attribute in the subgraph, enhancing accuracy in debug statements related to graph retrieval processes. 2026-01-11 17:44:20 +01:00
3e27c72b80 Enhance logging capabilities across multiple modules for version 4.5.0: Introduce detailed debug statements in decision_engine.py, retriever_scoring.py, retriever.py, and logging_setup.py to improve traceability during retrieval processes. Implement dynamic log level configuration based on environment variables, allowing for more flexible debugging and monitoring of application behavior. 2026-01-11 17:30:34 +01:00
2d87f9d816 Enhance compatibility in chunking and edge processing for version 4.4.1: Harmonize handling of "to" and "target_id" across chunking_processor.py, graph_derive_edges.py, and ingestion_processor.py. Ensure consistent validation and processing of explicit callouts, improving integration and reliability in edge candidate handling. 2026-01-11 15:39:03 +01:00
d7d6155203 Refactor logging in graph_derive_edges.py for version 4.4.0: Move logger initialization to module level for improved accessibility across functions. This change enhances debugging capabilities and maintains consistency in logging practices. 2026-01-11 15:28:14 +01:00
f8506c0bb2 Refactor logging in graph_derive_edges.py and ingestion_chunk_payload.py: Remove redundant logging import and ensure consistent logger initialization for improved debugging capabilities. This change enhances traceability in edge processing and chunk ingestion. 2026-01-11 15:25:57 +01:00
c91910ee9f Enhance logging and debugging in chunking_processor.py, graph_derive_edges.py, and ingestion_chunk_payload.py for version 4.4.0: Introduce detailed debug statements to trace chunk extraction, global scan comparisons, and payload transfers. Improve visibility into candidate pool handling and decision-making processes for callout edges, ensuring better traceability and debugging capabilities. 2026-01-11 15:21:46 +01:00
ee91583614 Update graph_derive_edges.py to version 4.3.1: Introduce precision prioritization for chunk scope, ensuring chunk candidates are favored over note scope. Adjust confidence values for explicit callouts and enhance key generation for consistent deduplication. Improve edge processing logic to reinforce the precedence of chunk scope in decision-making. 2026-01-11 15:08:08 +01:00
3a17b646e1 Update graph_derive_edges.py and ingestion_chunk_payload.py for version 4.3.0: Introduce debug logging for data transfer audits and candidate pool handling to address potential data loss. Ensure candidate_pool is explicitly retained for accurate chunk attribution, enhancing traceability and reliability in edge processing. 2026-01-11 14:51:38 +01:00
727de50290 Refine edge parsing and chunk attribution in chunking_parser.py and graph_derive_edges.py for version 4.2.9: Ensure current_edge_type persists across empty lines in callout blocks for accurate link processing. Implement two-phase synchronization for chunk authority, collecting explicit callout keys before the global scan to prevent duplicates. Enhance callout extraction logic to respect existing chunk callouts, improving deduplication and processing efficiency. 2026-01-11 14:30:16 +01:00
a780104b3c Enhance edge processing in graph_derive_edges.py for version 4.2.9: Finalize chunk attribution with synchronization to "Semantic First" signal. Collect callout keys from candidate pool before text scan to prevent duplicates. Update callout extraction logic to ensure strict adherence to existing chunk callouts, improving deduplication and processing efficiency. 2026-01-11 14:07:16 +01:00
f51e1cb2c4 Fix regex pattern in parse_edges_robust to support multiple leading '>' characters for edge callouts, enhancing flexibility in edge parsing. 2026-01-11 12:03:36 +01:00
20fb1e92e2 Enhance chunking functionality in version 4.2.8: Update callout pattern to support additional syntax for edge and abstract callouts. Modify get_chunk_config to allow fallback to chunk_profile if chunking_profile is not present. Ensure explicit passing of chunk_profile in make_chunk_payloads for improved payload handling. Update type hints in chunking_parser for better clarity. 2026-01-11 11:49:16 +01:00
1d66ca0649 Update chunking_utils.py to include Optional type hint: Add Optional to the import statement for improved type annotations, enhancing code clarity and maintainability. 2026-01-11 11:16:30 +01:00