Commit Graph

560 Commits

Author SHA1 Message Date
0b2a1f1a63 Update deterministic sorting of semantic_groups in build_edges_for_note to handle None values correctly. Introduced a custom sort function to ensure consistent edge extraction across batches, preventing variance in edge counts. 2026-01-12 11:31:20 +01:00
c42a76b3d7 Add dedicated logging for ID collisions in ingestion_processor.py
Implement a new method to log ID collisions into a separate file (logs/id_collisions.log) for manual analysis. This update captures relevant metadata in JSONL format, enhancing traceability during the ingestion process. The logging occurs when a conflict is detected between existing and new files sharing the same note_id, improving error handling and diagnostics.
2026-01-12 09:04:36 +01:00
ec9b3c68af Implement ID collision detection and enhance logging in ingestion_processor.py
Add a check for ID collisions during the ingestion process to prevent multiple files from using the same note_id. Update logging levels to DEBUG for detailed diagnostics on hash comparisons, body lengths, and frontmatter keys, improving traceability and debugging capabilities in the ingestion workflow.
2026-01-12 08:56:28 +01:00
f9118a36f8 Enhance logging in ingestion_processor.py to include normalized file path and note title
Update the logging statement to provide additional context during the ingestion process by including the normalized file path and note title. This change aims to improve traceability and debugging capabilities in the ingestion workflow.
2026-01-12 08:33:11 +01:00
e52eed40ca Refactor hash input handling in ingestion_processor.py to use dictionary format
Update the ingestion process to convert the parsed object to a dictionary before passing it to the hash input function. This change ensures compatibility with the updated function requirements and improves the accuracy of hash comparisons during ingestion workflows.
2026-01-12 08:21:21 +01:00
43641441ef Refactor hash input and body/frontmatter handling in ingestion_processor.py for improved accuracy
Update the ingestion process to utilize the parsed object instead of note_pl for hash input, body, and frontmatter extraction. This change ensures that the correct content is used for comparisons, enhancing the reliability of change detection diagnostics and improving overall ingestion accuracy.
2026-01-12 08:19:43 +01:00
c613d81846 Enhance logging in ingestion_processor.py for detailed change detection diagnostics
Add comprehensive logging for hash input, body length comparisons, and frontmatter key checks in the change detection process. This update aims to improve traceability and facilitate debugging by providing insights into potential discrepancies between new and old payloads during ingestion workflows.
2026-01-12 08:16:03 +01:00
de5db09b51 Update logging levels in ingestion_processor.py and import_markdown.py for improved visibility
Change debug logs to info and warning levels in ingestion_processor.py to enhance the visibility of change detection processes, including hash comparisons and artifact checks. Additionally, ensure .env is loaded before logging setup in import_markdown.py to correctly read the DEBUG environment variable. These adjustments aim to improve traceability and debugging during ingestion workflows.
2026-01-12 08:13:26 +01:00
7cb8fd6602 Enhance logging in ingestion_processor.py for improved change detection diagnostics
Add detailed debug and warning logs to the change detection process, providing insights into hash comparisons and artifact checks. This update aims to facilitate better traceability and debugging during ingestion, particularly when handling hash changes and missing hashes. The changes ensure that the ingestion workflow is more transparent and easier to troubleshoot.
2026-01-12 08:08:29 +01:00
6047e94964 Refactor edge processing in graph_derive_edges.py and ingestion_processor.py for consistency and efficiency
Implement deterministic sorting of semantic groups in graph_derive_edges.py to ensure consistent edge extraction across batches. Update ingestion_processor.py to enhance change detection logic, ensuring that hash checks are performed before artifact checks to prevent redundant processing. These changes improve the reliability and efficiency of the edge building and ingestion workflows.
2026-01-12 08:04:28 +01:00
78fbc9b31b Enhance ingestion_processor.py with path normalization and strict change detection
Implement path normalization to ensure consistent hash checks by converting file paths to absolute paths. Update change detection logic to handle hash comparisons more robustly, treating missing hashes as content changes for safety. This prevents redundant processing and improves efficiency in the ingestion workflow.
2026-01-12 07:53:03 +01:00
742792770c Implement Phase 3 Agentic Edge Validation in ingestion_processor.py and related documentation updates
Introduce a new method for persisting rejected edges for audit purposes, enhancing traceability and validation logic. Update the decision engine to utilize a generic fallback template for improved error handling during LLM validation. Revise documentation across multiple files to reflect the new versioning, context, and features related to Phase 3 validation, including automatic mirror edges and note-scope zones. This update ensures better graph integrity and validation accuracy in the ingestion process.
2026-01-12 07:45:54 +01:00
b19f91c3ee Refactor edge validation process in ingestion_processor.py
Remove LLM validation from the candidate edge processing loop, shifting it to a later phase for improved context handling. Introduce a new validation mechanism that aggregates note text for better decision-making and optimizes the validation criteria to include both rule IDs and provenance. Update logging to reflect the new validation phases and ensure rejected edges are not processed further. This enhances the overall efficiency and accuracy of edge validation during ingestion.
2026-01-11 21:47:11 +01:00
9b0d8c18cb Implement LLM validation for candidate edges in ingestion_processor.py
Enhance the edge validation process by introducing logic to validate edges with rule IDs starting with "candidate:". This includes extracting target IDs, validating against the entire note text, and updating rule IDs upon successful validation. Rejected edges are logged for traceability, improving the overall handling of edge data during ingestion.
2026-01-11 21:27:07 +01:00
f2a2f4d2df Refine LLM validation zone handling in graph_derive_edges.py
Enhance the extraction logic to store the zone status before header updates, ensuring accurate context during callout processing. Initialize the all_chunk_callout_keys set prior to its usage to prevent potential UnboundLocalError. These improvements contribute to more reliable edge construction and better handling of LLM validation zones.
2026-01-11 21:09:07 +01:00
ea0fd951f2 Enhance LLM validation zone extraction in graph_derive_edges.py
Implement support for H2 headers in LLM validation zone detection, allowing for improved flexibility in header recognition. Update the extraction logic to track zones during callout processing, ensuring accurate differentiation between LLM validation and standard zones. This enhancement improves the handling of callouts and their associated metadata, contributing to more precise edge construction.
2026-01-11 20:58:33 +01:00
c8c828c8a8 Add LLM validation zone extraction and configuration support in graph_derive_edges.py
Implement functions to extract LLM validation zones from Markdown, allowing for configurable header identification via environment variables. Enhance the existing note scope zone extraction to differentiate between note scope and LLM validation zones. Update edge building logic to handle LLM validation edges with a 'candidate:' prefix, ensuring proper processing and avoiding duplicates in global scans. This update improves the overall handling of edge data and enhances the flexibility of the extraction process.
2026-01-11 20:19:12 +01:00
716a063849 Enhance decision_engine.py to support context reuse during compression failures. Implement error handling to return original content when compression fails, ensuring robust fallback mechanisms without re-retrieval. Update logging for better traceability of compression and fallback processes, improving overall reliability in stream handling. 2026-01-11 19:14:15 +01:00
3dc81ade0f Update logging in decision_engine.py and retriever.py to use node_id as chunk_id and total_score instead of score for improved accuracy in debug statements. This change aligns with the new data structure introduced in version 4.5.4, enhancing traceability in retrieval processes. 2026-01-11 18:55:13 +01:00
1df89205ac Update EdgeDTO to support extended provenance values and modify explanation building in retriever.py to accommodate new provenance types. This enhances the handling of edge data for improved accuracy in retrieval processes. 2026-01-11 17:54:33 +01:00
2445f7cb2b Implement chunk-aware graph traversal in hybrid_retrieve: Extract both note_id and chunk_id from hits to enhance seed coverage for edge retrieval. Combine direct and additional chunk IDs for improved accuracy in subgraph expansion. Update debug logging to reflect the new seed and chunk ID handling, ensuring better traceability in graph retrieval processes. 2026-01-11 17:48:30 +01:00
47fdcf8eed Update logging in retriever.py for version 4.5.1: Modify edge count logging to utilize the adjacency list instead of the non-existent .edges attribute in the subgraph, enhancing accuracy in debug statements related to graph retrieval processes. 2026-01-11 17:44:20 +01:00
3e27c72b80 Enhance logging capabilities across multiple modules for version 4.5.0: Introduce detailed debug statements in decision_engine.py, retriever_scoring.py, retriever.py, and logging_setup.py to improve traceability during retrieval processes. Implement dynamic log level configuration based on environment variables, allowing for more flexible debugging and monitoring of application behavior. 2026-01-11 17:30:34 +01:00
2d87f9d816 Enhance compatibility in chunking and edge processing for version 4.4.1: Harmonize handling of "to" and "target_id" across chunking_processor.py, graph_derive_edges.py, and ingestion_processor.py. Ensure consistent validation and processing of explicit callouts, improving integration and reliability in edge candidate handling. 2026-01-11 15:39:03 +01:00
d7d6155203 Refactor logging in graph_derive_edges.py for version 4.4.0: Move logger initialization to module level for improved accessibility across functions. This change enhances debugging capabilities and maintains consistency in logging practices. 2026-01-11 15:28:14 +01:00
f8506c0bb2 Refactor logging in graph_derive_edges.py and ingestion_chunk_payload.py: Remove redundant logging import and ensure consistent logger initialization for improved debugging capabilities. This change enhances traceability in edge processing and chunk ingestion. 2026-01-11 15:25:57 +01:00
c91910ee9f Enhance logging and debugging in chunking_processor.py, graph_derive_edges.py, and ingestion_chunk_payload.py for version 4.4.0: Introduce detailed debug statements to trace chunk extraction, global scan comparisons, and payload transfers. Improve visibility into candidate pool handling and decision-making processes for callout edges, ensuring better traceability and debugging capabilities. 2026-01-11 15:21:46 +01:00
ee91583614 Update graph_derive_edges.py to version 4.3.1: Introduce precision prioritization for chunk scope, ensuring chunk candidates are favored over note scope. Adjust confidence values for explicit callouts and enhance key generation for consistent deduplication. Improve edge processing logic to reinforce the precedence of chunk scope in decision-making. 2026-01-11 15:08:08 +01:00
3a17b646e1 Update graph_derive_edges.py and ingestion_chunk_payload.py for version 4.3.0: Introduce debug logging for data transfer audits and candidate pool handling to address potential data loss. Ensure candidate_pool is explicitly retained for accurate chunk attribution, enhancing traceability and reliability in edge processing. 2026-01-11 14:51:38 +01:00
727de50290 Refine edge parsing and chunk attribution in chunking_parser.py and graph_derive_edges.py for version 4.2.9: Ensure current_edge_type persists across empty lines in callout blocks for accurate link processing. Implement two-phase synchronization for chunk authority, collecting explicit callout keys before the global scan to prevent duplicates. Enhance callout extraction logic to respect existing chunk callouts, improving deduplication and processing efficiency. 2026-01-11 14:30:16 +01:00
a780104b3c Enhance edge processing in graph_derive_edges.py for version 4.2.9: Finalize chunk attribution with synchronization to "Semantic First" signal. Collect callout keys from candidate pool before text scan to prevent duplicates. Update callout extraction logic to ensure strict adherence to existing chunk callouts, improving deduplication and processing efficiency. 2026-01-11 14:07:16 +01:00
f51e1cb2c4 Fix regex pattern in parse_edges_robust to support multiple leading '>' characters for edge callouts, enhancing flexibility in edge parsing. 2026-01-11 12:03:36 +01:00
20fb1e92e2 Enhance chunking functionality in version 4.2.8: Update callout pattern to support additional syntax for edge and abstract callouts. Modify get_chunk_config to allow fallback to chunk_profile if chunking_profile is not present. Ensure explicit passing of chunk_profile in make_chunk_payloads for improved payload handling. Update type hints in chunking_parser for better clarity. 2026-01-11 11:49:16 +01:00
1d66ca0649 Update chunking_utils.py to include Optional type hint: Add Optional to the import statement for improved type annotations, enhancing code clarity and maintainability. 2026-01-11 11:16:30 +01:00
55b64c331a Enhance chunking system with WP-24c v4.2.6 and v4.2.7 updates: Introduce is_meta_content flag for callouts in RawBlock, ensuring they are chunked but later removed for clean context. Update parse_blocks and propagate_section_edges to handle callout edges with explicit provenance for chunk attribution. Implement clean-context logic to remove callout syntax post-processing, maintaining chunk integrity. Adjust get_chunk_config to prioritize frontmatter overrides for chunking profiles. Update documentation to reflect these changes. 2026-01-11 11:14:31 +01:00
4d43cc526e Update ingestion_processor.py to version 4.2.4: Implement hash-based change detection for content integrity verification. Restore iterative matching based on content hashes, enhancing the accuracy of change detection. Update documentation to reflect changes in the processing logic and versioning. 2026-01-11 08:08:30 +01:00
6131b315d7 Update graph_derive_edges.py to version 4.2.2: Implement semantic de-duplication with improved scope decision-making. Enhance edge ID calculation by prioritizing semantic grouping before scope assignment, ensuring accurate edge representation across different contexts. Update documentation to reflect changes in edge processing logic and prioritization strategy. 2026-01-10 22:20:13 +01:00
dfff46e45c Update graph_derive_edges.py to version 4.2.1: Implement Clean-Context enhancements, including consolidated callout extraction and smart scope prioritization. Refactor callout handling to avoid duplicates and improve processing efficiency. Update documentation to reflect changes in edge extraction logic and prioritization strategy. 2026-01-10 22:17:03 +01:00
003a270548 Implement WP-24c v4.2.0: Introduce configurable header names and levels for LLM validation and Note-Scope zones in the chunking system. Update chunking models, parser, and processor to support exclusion of edge zones during chunking. Enhance documentation and configuration files to reflect new environment variables for improved flexibility in Markdown processing. 2026-01-10 21:46:51 +01:00
39fd15b565 Update graph_db_adapter.py, graph_derive_edges.py, graph_subgraph.py, graph_utils.py, ingestion_processor.py, and retriever.py to version 4.1.0: Introduce Scope-Awareness and Section-Filtering features, enhancing edge retrieval and processing. Implement Note-Scope Zones extraction from Markdown, improve edge ID generation with target_section, and prioritize Note-Scope Links during de-duplication. Update documentation for clarity and consistency across modules. 2026-01-10 19:55:51 +01:00
be2bed9927 Update qdrant_points.py, ingestion_processor.py, and import_markdown.py to version 4.1.0: Enhance edge ID generation by incorporating target_section for improved multigraph support and symmetry integrity. Update documentation and logging for clarity, ensuring consistent ID generation across phases and compatibility with the ingestion workflow. 2026-01-10 17:03:44 +01:00
2da98e8e37 Update graph_derive_edges.py and graph_utils.py to version 4.1.0: Enhance edge ID generation by incorporating target_section into the ID calculation, allowing for distinct edges across different sections. Update documentation to reflect changes in ID structure and improve clarity on edge handling during de-duplication. 2026-01-10 15:45:26 +01:00
a852975811 Update qdrant_points.py, graph_utils.py, graph_derive_edges.py, and ingestion_processor.py to version 4.0.0: Implement GOLD-STANDARD identity with strict 4-parameter ID generation, eliminating rule_id and variant from ID calculations. Enhance documentation for clarity and consistency across modules, addressing ID drift and ensuring compatibility in the ingestion workflow. 2026-01-10 15:19:46 +01:00
8fd7ef804d Update ingestion_processor.py to version 3.4.3: Remove incompatible edge_registry initialization, maintain strict two-phase strategy, and fix ID generation issues. Enhance logging and comments for clarity, ensuring compatibility and improved functionality in the ingestion workflow. 2026-01-10 14:02:10 +01:00
b0f4309a29 Update qdrant_points.py, graph_utils.py, ingestion_processor.py, and import_markdown.py: Enhance ID generation and error handling, centralize identity logic to prevent ID drift, and improve documentation clarity. Update versioning to reflect changes in functionality and maintain compatibility across modules. 2026-01-10 14:00:12 +01:00
c33b1c644a Update graph_utils.py to version 1.6.1: Restore '_edge' function to address ImportError, revert to UUIDv5 for Qdrant compatibility, and maintain section logic in ID generation. Enhance documentation for clarity and refine edge ID generation process. 2026-01-10 10:58:44 +01:00
7cc823e2f4 NEUSTART von vorne mit frischer Codebasis
Update qdrant_points.py, graph_utils.py, ingestion_db.py, ingestion_processor.py, and import_markdown.py: Enhance UUID generation for edge IDs, improve error handling, and refine documentation for clarity. Implement atomic consistency in batch upserts and ensure strict phase separation in the ingestion workflow. Update versioning to reflect changes in functionality and maintain compatibility with the ingestion service.
2026-01-10 10:56:47 +01:00
7e00344b84 Update ingestion_processor.py to version 3.3.8: Address Ghost-ID issues, enhance Pydantic safety, and improve logging clarity. Refine symmetry injection logic and ensure strict phase separation for authority checks. Adjust comments for better understanding and maintainability. 2026-01-10 08:32:59 +01:00
ec89d83916 Update ingestion_db.py, ingestion_processor.py, and import_markdown.py: Enhance documentation and logging clarity, improve artifact purging and symmetry injection logic, and implement stricter authority checks. Update versioning to 2.6.0 and 3.3.7 to reflect changes in functionality and maintain compatibility with the ingestion service. 2026-01-10 08:06:07 +01:00
57656bbaaf Refactor ingestion_db.py and ingestion_processor.py: Enhance documentation and logging clarity, integrate cloud resilience and error handling, and improve artifact purging logic. Update versioning to 3.3.6 to reflect changes in functionality, including strict phase separation and authority checks for explicit edges. 2026-01-10 07:45:43 +01:00