Commit Graph

47 Commits

Author SHA1 Message Date
eba2e5d602 Add virtual attribute for automatically generated edges
- Introduced a `virtual` attribute for automatically generated section transitions and backlinks in the graph edge definitions, indicating their virtual nature.
- Updated documentation to reflect the addition of the `virtual` attribute for both section transitions and backlinks, clarifying its implications for scoring in the retriever.
- Enhanced the understanding of edge types by specifying that these automatically generated edges will receive a penalty during scoring.
2026-01-26 18:37:09 +01:00
52ed079067 Implement automatic backlinks for intra-note edges (WP-26 v1.4)
- Added functionality to automatically create inverse backlinks for intra-note edges at the chunk level, ensuring that backlinks are generated only when they do not already exist.
- Updated the documentation to outline the requirements and rules for backlink creation, including conditions for deduplication and scope.
- Introduced unit tests to validate the creation of backlinks and ensure correct behavior when existing backlinks are present.
- Incremented version to 4.4.0 to reflect the new feature addition.
2026-01-26 11:16:40 +01:00
509efc9393 Implement WP-26 v1.3 (Phase 3): Enhance graph schema validation and edge handling
- Introduced a new function `load_graph_schema_full` to parse and cache both typical and prohibited edge types from the graph schema.
- Updated `load_graph_schema` to utilize the full schema for improved edge type extraction.
- Added `get_topology_info` to retrieve typical and prohibited edges for source/target pairs.
- Implemented `validate_intra_note_edge` and `validate_edge_against_schema` for schema validation of intra-note edges.
- Enhanced logging for schema validation outcomes and edge handling.
- Updated documentation to reflect new validation features and testing procedures.
2026-01-26 10:18:31 +01:00
c5215e22e7 Implement WP-26 v1.0 - Phase 2: Enhance edge scoring and aggregation configuration
- Introduced configurable edge scoring with internal and external boosts for intra-note edges.
- Added aggregation configuration to support note-level and chunk-level retrieval strategies.
- Updated retriever and graph subgraph modules to utilize new scoring and aggregation logic.
- Enhanced YAML configuration to include new parameters for edge scoring and aggregation levels.
- Added boolean indexing for filtering based on edge properties in the setup script.
2026-01-25 21:06:13 +01:00
52fdc425f7 Enhance chunking strategies and graph utilities for section-type transitions and block ID extraction
- Implemented WP-26 v1.1: Section-Type-Wechsel erzwingt Split auch in SMART MODE (Schritt 2) zur Verbesserung der Chunking-Logik.
- Updated `parse_link_target` to extract block IDs from section strings, ensuring accurate handling of links with block references.
- Added unit tests to validate section-type change behavior and block ID extraction functionality, enhancing overall reliability.
2026-01-25 17:47:22 +01:00
af3cc0a254 Enhance chunking strategies and graph utilities for section-type transitions
- Implemented WP-26 v1.1: Section-Type-Wechsel erzwingt immer einen neuen Chunk, um konsistente Chunking-Verhalten bei unterschiedlichen section_types zu gewährleisten.
- Introduced automatic Intra-Note-Edges zwischen Sektionen mit unterschiedlichen Typen, um semantische Beziehungen zu erfassen.
- Updated graph utilities to support automatic edge type derivation based on section transitions.
- Added unit tests for section-type changes and automatic edge generation to ensure functionality and reliability.
2026-01-25 17:36:57 +01:00
cc258008dc Refactor provenance handling in EdgeDTO and graph utilities
- Updated provenance priorities and introduced a mapping from internal provenance values to EdgeDTO-compliant literals.
- Added a new function `normalize_provenance` to standardize internal provenance strings.
- Enhanced the `_edge` function to include an `is_internal` flag and provenance normalization.
- Modified the `EdgeDTO` model to include a new `source_hint` field for detailed provenance information and an `is_internal` flag for intra-note edges.
- Reduced the provenance options in `EdgeDTO` to valid literals, improving data integrity.
2026-01-25 16:27:09 +01:00
0b2a1f1a63 Update deterministic sorting of semantic_groups in build_edges_for_note to handle None values correctly. Introduced a custom sort function to ensure consistent edge extraction across batches, preventing variance in edge counts. 2026-01-12 11:31:20 +01:00
6047e94964 Refactor edge processing in graph_derive_edges.py and ingestion_processor.py for consistency and efficiency
Implement deterministic sorting of semantic groups in graph_derive_edges.py to ensure consistent edge extraction across batches. Update ingestion_processor.py to enhance change detection logic, ensuring that hash checks are performed before artifact checks to prevent redundant processing. These changes improve the reliability and efficiency of the edge building and ingestion workflows.
2026-01-12 08:04:28 +01:00
f2a2f4d2df Refine LLM validation zone handling in graph_derive_edges.py
Enhance the extraction logic to store the zone status before header updates, ensuring accurate context during callout processing. Initialize the all_chunk_callout_keys set prior to its usage to prevent potential UnboundLocalError. These improvements contribute to more reliable edge construction and better handling of LLM validation zones.
2026-01-11 21:09:07 +01:00
ea0fd951f2 Enhance LLM validation zone extraction in graph_derive_edges.py
Implement support for H2 headers in LLM validation zone detection, allowing for improved flexibility in header recognition. Update the extraction logic to track zones during callout processing, ensuring accurate differentiation between LLM validation and standard zones. This enhancement improves the handling of callouts and their associated metadata, contributing to more precise edge construction.
2026-01-11 20:58:33 +01:00
c8c828c8a8 Add LLM validation zone extraction and configuration support in graph_derive_edges.py
Implement functions to extract LLM validation zones from Markdown, allowing for configurable header identification via environment variables. Enhance the existing note scope zone extraction to differentiate between note scope and LLM validation zones. Update edge building logic to handle LLM validation edges with a 'candidate:' prefix, ensuring proper processing and avoiding duplicates in global scans. This update improves the overall handling of edge data and enhances the flexibility of the extraction process.
2026-01-11 20:19:12 +01:00
2d87f9d816 Enhance compatibility in chunking and edge processing for version 4.4.1: Harmonize handling of "to" and "target_id" across chunking_processor.py, graph_derive_edges.py, and ingestion_processor.py. Ensure consistent validation and processing of explicit callouts, improving integration and reliability in edge candidate handling. 2026-01-11 15:39:03 +01:00
d7d6155203 Refactor logging in graph_derive_edges.py for version 4.4.0: Move logger initialization to module level for improved accessibility across functions. This change enhances debugging capabilities and maintains consistency in logging practices. 2026-01-11 15:28:14 +01:00
f8506c0bb2 Refactor logging in graph_derive_edges.py and ingestion_chunk_payload.py: Remove redundant logging import and ensure consistent logger initialization for improved debugging capabilities. This change enhances traceability in edge processing and chunk ingestion. 2026-01-11 15:25:57 +01:00
c91910ee9f Enhance logging and debugging in chunking_processor.py, graph_derive_edges.py, and ingestion_chunk_payload.py for version 4.4.0: Introduce detailed debug statements to trace chunk extraction, global scan comparisons, and payload transfers. Improve visibility into candidate pool handling and decision-making processes for callout edges, ensuring better traceability and debugging capabilities. 2026-01-11 15:21:46 +01:00
ee91583614 Update graph_derive_edges.py to version 4.3.1: Introduce precision prioritization for chunk scope, ensuring chunk candidates are favored over note scope. Adjust confidence values for explicit callouts and enhance key generation for consistent deduplication. Improve edge processing logic to reinforce the precedence of chunk scope in decision-making. 2026-01-11 15:08:08 +01:00
3a17b646e1 Update graph_derive_edges.py and ingestion_chunk_payload.py for version 4.3.0: Introduce debug logging for data transfer audits and candidate pool handling to address potential data loss. Ensure candidate_pool is explicitly retained for accurate chunk attribution, enhancing traceability and reliability in edge processing. 2026-01-11 14:51:38 +01:00
727de50290 Refine edge parsing and chunk attribution in chunking_parser.py and graph_derive_edges.py for version 4.2.9: Ensure current_edge_type persists across empty lines in callout blocks for accurate link processing. Implement two-phase synchronization for chunk authority, collecting explicit callout keys before the global scan to prevent duplicates. Enhance callout extraction logic to respect existing chunk callouts, improving deduplication and processing efficiency. 2026-01-11 14:30:16 +01:00
a780104b3c Enhance edge processing in graph_derive_edges.py for version 4.2.9: Finalize chunk attribution with synchronization to "Semantic First" signal. Collect callout keys from candidate pool before text scan to prevent duplicates. Update callout extraction logic to ensure strict adherence to existing chunk callouts, improving deduplication and processing efficiency. 2026-01-11 14:07:16 +01:00
55b64c331a Enhance chunking system with WP-24c v4.2.6 and v4.2.7 updates: Introduce is_meta_content flag for callouts in RawBlock, ensuring they are chunked but later removed for clean context. Update parse_blocks and propagate_section_edges to handle callout edges with explicit provenance for chunk attribution. Implement clean-context logic to remove callout syntax post-processing, maintaining chunk integrity. Adjust get_chunk_config to prioritize frontmatter overrides for chunking profiles. Update documentation to reflect these changes. 2026-01-11 11:14:31 +01:00
6131b315d7 Update graph_derive_edges.py to version 4.2.2: Implement semantic de-duplication with improved scope decision-making. Enhance edge ID calculation by prioritizing semantic grouping before scope assignment, ensuring accurate edge representation across different contexts. Update documentation to reflect changes in edge processing logic and prioritization strategy. 2026-01-10 22:20:13 +01:00
dfff46e45c Update graph_derive_edges.py to version 4.2.1: Implement Clean-Context enhancements, including consolidated callout extraction and smart scope prioritization. Refactor callout handling to avoid duplicates and improve processing efficiency. Update documentation to reflect changes in edge extraction logic and prioritization strategy. 2026-01-10 22:17:03 +01:00
003a270548 Implement WP-24c v4.2.0: Introduce configurable header names and levels for LLM validation and Note-Scope zones in the chunking system. Update chunking models, parser, and processor to support exclusion of edge zones during chunking. Enhance documentation and configuration files to reflect new environment variables for improved flexibility in Markdown processing. 2026-01-10 21:46:51 +01:00
39fd15b565 Update graph_db_adapter.py, graph_derive_edges.py, graph_subgraph.py, graph_utils.py, ingestion_processor.py, and retriever.py to version 4.1.0: Introduce Scope-Awareness and Section-Filtering features, enhancing edge retrieval and processing. Implement Note-Scope Zones extraction from Markdown, improve edge ID generation with target_section, and prioritize Note-Scope Links during de-duplication. Update documentation for clarity and consistency across modules. 2026-01-10 19:55:51 +01:00
2da98e8e37 Update graph_derive_edges.py and graph_utils.py to version 4.1.0: Enhance edge ID generation by incorporating target_section into the ID calculation, allowing for distinct edges across different sections. Update documentation to reflect changes in ID structure and improve clarity on edge handling during de-duplication. 2026-01-10 15:45:26 +01:00
a852975811 Update qdrant_points.py, graph_utils.py, graph_derive_edges.py, and ingestion_processor.py to version 4.0.0: Implement GOLD-STANDARD identity with strict 4-parameter ID generation, eliminating rule_id and variant from ID calculations. Enhance documentation for clarity and consistency across modules, addressing ID drift and ensuring compatibility in the ingestion workflow. 2026-01-10 15:19:46 +01:00
b0f4309a29 Update qdrant_points.py, graph_utils.py, ingestion_processor.py, and import_markdown.py: Enhance ID generation and error handling, centralize identity logic to prevent ID drift, and improve documentation clarity. Update versioning to reflect changes in functionality and maintain compatibility across modules. 2026-01-10 14:00:12 +01:00
c33b1c644a Update graph_utils.py to version 1.6.1: Restore '_edge' function to address ImportError, revert to UUIDv5 for Qdrant compatibility, and maintain section logic in ID generation. Enhance documentation for clarity and refine edge ID generation process. 2026-01-10 10:58:44 +01:00
7cc823e2f4 NEUSTART von vorne mit frischer Codebasis
Update qdrant_points.py, graph_utils.py, ingestion_db.py, ingestion_processor.py, and import_markdown.py: Enhance UUID generation for edge IDs, improve error handling, and refine documentation for clarity. Implement atomic consistency in batch upserts and ensure strict phase separation in the ingestion workflow. Update versioning to reflect changes in functionality and maintain compatibility with the ingestion service.
2026-01-10 10:56:47 +01:00
008a470f02 Refactor graph_utils.py and ingestion_processor.py: Update documentation for deterministic UUIDs to enhance Qdrant compatibility. Improve logging and ID validation in ingestion_processor.py, including adjustments to edge processing logic and batch import handling for better clarity and robustness. Version updates to 1.2.0 and 3.1.9 respectively. 2026-01-09 22:05:50 +01:00
7ed82ad82e Update graph_utils.py and ingestion_processor.py to versions 1.2.0 and 3.1.9 respectively: Transition to deterministic UUIDs for edge ID generation to ensure Qdrant compatibility and prevent HTTP 400 errors. Enhance ID validation and streamline edge processing logic to improve robustness and prevent collisions with known system types. Adjust versioning and documentation accordingly. 2026-01-09 21:46:47 +01:00
a392dc2786 Update type_registry, graph_utils, ingestion_note_payload, and discovery services for dynamic edge handling: Integrate EdgeRegistry for improved edge defaults and topology management (WP-24c). Enhance type loading and edge resolution logic to ensure backward compatibility while transitioning to a more robust architecture. Version bumps to 1.1.0 for type_registry, 1.1.0 for graph_utils, 2.5.0 for ingestion_note_payload, and 1.1.0 for discovery service. 2026-01-09 15:20:12 +01:00
d17c966301 Enhance callout extraction in graph_extractors.py: Update regex to support nested [!edge] callouts and improve handling of indentation levels. This allows for more flexible parsing of callout structures in the input text.
All checks were successful
Deploy mindnet to llm-node / deploy (push) Successful in 4s
2026-01-06 10:20:55 +01:00
d35bdc64b9 Implement WP-15c enhancements across graph and retrieval modules, including full metadata support for Super-Edge aggregation and Note-Level Diversity Pooling. Update scoring logic to reflect new edge handling and improve retrieval accuracy. Version updates to reflect these changes. 2025-12-30 21:47:18 +01:00
4327fc939c zrück zur Vorversion zum Test der LLM checks 2025-12-30 09:40:30 +01:00
ef1046c6f5 Enhance callout relation extraction by ensuring correct termination on new headers. Update regex for simple kinds to support hyphens. Refactor block processing logic for improved clarity and functionality. 2025-12-30 09:26:38 +01:00
b7d1bcce3d Rücksprung zur Vorwersion, in der 2 Kantentypen angelegt wurden 2025-12-29 18:04:14 +01:00
03d3173ca6 neu deduplizierung für callout-edges 2025-12-29 12:42:26 +01:00
38a61d7b50 Fix: Semantische Deduplizierung in graph_derive_edges.py 2025-12-29 12:21:57 +01:00
0a429e1f7b anpassungen Kantenvergeleich 2025-12-29 11:45:25 +01:00
857ba953e3 bug fix 2025-12-29 11:00:00 +01:00
303efefcb7 bug fix 2025-12-29 08:19:40 +01:00
feeb7c2d92 Initial WP4d 2025-12-29 07:58:20 +01:00
7fa9ce81bd letzte anpassungen 2025-12-27 20:30:24 +01:00
8490911958 modularisierung 2025-12-27 20:26:00 +01:00
19c96fd00f graph refacturiert 2025-12-27 14:44:44 +01:00