mitai-jinkendo/docs/PLACEHOLDER_METADATA_VALIDATION.md
Lars 650313347f
All checks were successful
Deploy Development / deploy (push) Successful in 54s
Build Test / lint-backend (push) Successful in 1s
Build Test / build-frontend (push) Successful in 15s
feat: Placeholder Metadata V2 - Normative Implementation + ZIP Export Fix
MAJOR CHANGES:
- Enhanced metadata schema with 7 QA fields
- Deterministic derivation logic (no guessing)
- Conservative inference (prefer unknown over wrong)
- Real source tracking (skip safe wrappers)
- Legacy mismatch detection
- Activity quality filter policies
- Completeness scoring (0-100)
- Unresolved fields tracking
- Fixed ZIP/JSON export auth (query param support)

FILES CHANGED:
- backend/placeholder_metadata.py (schema extended)
- backend/placeholder_metadata_enhanced.py (NEW, 418 lines)
- backend/generate_complete_metadata_v2.py (NEW, 334 lines)
- backend/tests/test_placeholder_metadata_v2.py (NEW, 302 lines)
- backend/routers/prompts.py (V2 integration + auth fix)
- docs/PLACEHOLDER_METADATA_VALIDATION.md (NEW, 541 lines)

PROBLEMS FIXED:
✓ value_raw extraction (type-aware, JSON parsing)
✓ Units for dimensionless values (scores, correlations)
✓ Safe wrappers as sources (now skipped)
✓ Time window guessing (confidence flags)
✓ Legacy inconsistencies (marked with flag)
✓ Missing quality filters (activity placeholders)
✓ No completeness metric (0-100 score)
✓ Orphaned placeholders (tracked)
✓ Unresolved fields (explicit list)
✓ ZIP/JSON export auth (query token support for downloads)

AUTH FIX:
- export-catalog-zip now accepts token via query param (?token=xxx)
- export-values-extended now accepts token via query param
- Allows browser downloads without custom headers

Konzept: docs/PLACEHOLDER_METADATA_REQUIREMENTS_V2_NORMATIVE.md

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-29 21:23:37 +02:00

541 lines
15 KiB
Markdown

# Placeholder Metadata Validation Logic
**Version:** 2.0.0
**Generated:** 2026-03-29
**Status:** Normative
---
## Purpose
This document defines the **deterministic derivation logic** for all placeholder metadata fields. It ensures that metadata extraction is **reproducible, testable, and auditable**.
---
## 1. Type Classification (`PlaceholderType`)
### Decision Logic
```python
def determine_type(key, description, output_type, value_display):
# JSON/Markdown outputs are typically raw_data
if output_type in [JSON, MARKDOWN]:
return RAW_DATA
# Scores and percentages are atomic
if any(x in key for x in ['score', 'pct', 'adequacy']):
return ATOMIC
# Summaries and details are raw_data
if any(x in key for x in ['summary', 'detail', 'verteilung']):
return RAW_DATA
# Goals and focus areas (if derived from prompts)
if any(x in key for x in ['goal', 'focus', 'top_']):
# Check if from KI/Prompt stage
if is_from_prompt_stage(key):
return INTERPRETED
else:
return ATOMIC # Just database values
# Correlations are interpreted
if 'correlation' in key or 'plateau' in key or 'driver' in key:
return INTERPRETED
# Default: atomic
return ATOMIC
```
### Rules
1. **ATOMIC**: Single values (numbers, strings, dates) from database or simple computation
2. **RAW_DATA**: Structured data (JSON, arrays, markdown) representing multiple values
3. **INTERPRETED**: Values derived from AI/Prompt stages or complex interpretation
4. **LEGACY_UNKNOWN**: Only for existing unclear placeholders (never for new ones)
### Validation
- `interpreted` requires evidence of prompt/stage origin
- Calculated scores/aggregations are NOT automatically `interpreted`
---
## 2. Unit Inference
### Decision Logic
```python
def infer_unit(key, description, output_type, type):
# NO units for:
if output_type in [JSON, MARKDOWN, ENUM]:
return None
if any(x in key for x in ['score', 'correlation', 'adequacy']):
return None # Dimensionless
if any(x in key for x in ['pct', 'ratio', 'balance']):
return None # Dimensionless percentage/ratio
# Weight/mass
if any(x in key for x in ['weight', 'gewicht', 'fm_', 'lbm_']):
return 'kg'
# Circumferences
if 'umfang' in key or any(x in key for x in ['waist', 'hip', 'chest']):
return 'cm'
# Time
if 'duration' in key or 'dauer' in key or 'debt' in key:
if 'hours' in description or 'stunden' in description:
return 'Stunden'
elif 'minutes' in description:
return 'Minuten'
return None # Unclear
# Heart rate
if 'rhr' in key or ('hr' in key and 'hrv' not in key):
return 'bpm'
# HRV
if 'hrv' in key:
return 'ms'
# VO2 Max
if 'vo2' in key:
return 'ml/kg/min'
# Calories
if 'kcal' in key or 'energy' in key:
return 'kcal'
# Macros
if any(x in key for x in ['protein', 'carb', 'fat']) and 'g' in description:
return 'g'
# Default: None (conservative)
return None
```
### Rules
1. **NO units** for dimensionless values (scores, correlations, percentages, ratios)
2. **NO units** for JSON/Markdown/Enum outputs
3. **NO units** for classifications (e.g., "recomposition_quadrant")
4. **Conservative**: Only assign unit if certain from key or description
### Examples
**Correct:**
- `weight_aktuell``kg`
- `goal_progress_score``None` (dimensionless 0-100)
- `correlation_energy_weight_lag``None` (dimensionless)
- `activity_summary``None` (text/JSON)
**Incorrect:**
- `goal_progress_score``%` (wrong - it's 0-100 dimensionless)
- `waist_hip_ratio` → any unit (wrong - dimensionless ratio)
---
## 3. Time Window Detection
### Decision Logic (Priority Order)
```python
def detect_time_window(key, description, semantic_contract, resolver_name):
# 1. Explicit suffix (highest confidence)
if '_7d' in key: return DAYS_7, certain=True
if '_28d' in key: return DAYS_28, certain=True
if '_30d' in key: return DAYS_30, certain=True
if '_90d' in key: return DAYS_90, certain=True
# 2. Latest/current keywords
if any(x in key for x in ['aktuell', 'latest', 'current']):
return LATEST, certain=True
# 3. Semantic contract (high confidence)
if '7 tag' in semantic_contract or '7d' in semantic_contract:
# Check for description mismatch
if '30' in description or '28' in description:
mark_legacy_mismatch = True
return DAYS_7, certain=True, mismatch_note
# 4. Description patterns (medium confidence)
if 'letzte 7' in description or '7 tag' in description:
return DAYS_7, certain=False
# 5. Heuristics (low confidence)
if 'avg' in key or 'durchschn' in key:
return DAYS_30, certain=False, "Assumed 30d for average"
if 'trend' in key:
return DAYS_28, certain=False, "Assumed 28d for trend"
# 6. Unknown
return UNKNOWN, certain=False, "Could not determine"
```
### Legacy Mismatch Detection
If description says "7d" but semantic contract (implementation) says "28d":
- Set `time_window = DAYS_28` (actual implementation)
- Set `legacy_contract_mismatch = True`
- Add to `known_issues`: "Description says 7d but implementation is 28d"
### Rules
1. **Actual implementation** takes precedence over legacy description
2. **Suffix in key** is most reliable indicator
3. **Semantic contract** (if documented) reflects actual implementation
4. **Unknown** if cannot be determined with confidence
---
## 4. Value Raw Extraction
### Decision Logic
```python
def extract_value_raw(value_display, output_type, type):
# No value
if value_display in ['nicht verfügbar', '', None]:
return None, success=True
# JSON output
if output_type == JSON:
try:
return json.loads(value_display), success=True
except:
# Try to find JSON in string
match = re.search(r'(\{.*\}|\[.*\])', value_display, DOTALL)
if match:
try:
return json.loads(match.group(1)), success=True
except:
pass
return None, success=False # Failed
# Markdown
if output_type == MARKDOWN:
return value_display, success=True # Keep as string
# Number
if output_type in [NUMBER, INTEGER]:
match = re.search(r'([-+]?\d+\.?\d*)', value_display)
if match:
val = float(match.group(1))
return int(val) if output_type == INTEGER else val, success=True
return None, success=False
# Date
if output_type == DATE:
if re.match(r'\d{4}-\d{2}-\d{2}', value_display):
return value_display, success=True # ISO format
return value_display, success=False # Unknown format
# String/Enum
return value_display, success=True
```
### Rules
1. **JSON outputs**: Must be valid JSON objects/arrays, not strings
2. **Numeric outputs**: Extract number without unit
3. **Markdown/String**: Keep as-is
4. **Dates**: Prefer ISO format (YYYY-MM-DD)
5. **Failure**: Set `value_raw = None` and mark in `unresolved_fields`
### Examples
**Correct:**
- `active_goals_json` (JSON) → `{"goals": [...]}` (object)
- `weight_aktuell` (NUMBER) → `85.8` (number, no unit)
- `datum_heute` (DATE) → `"2026-03-29"` (ISO string)
**Incorrect:**
- `active_goals_json` (JSON) → `"[Fehler: ...]"` (string, not JSON)
- `weight_aktuell` (NUMBER) → `"85.8"` (string, not number)
- `weight_aktuell` (NUMBER) → `85` (extracted from "85.8 kg" incorrectly)
---
## 5. Source Provenance
### Decision Logic
```python
def resolve_source(resolver_name):
# Skip safe wrappers - not real sources
if resolver_name in ['_safe_int', '_safe_float', '_safe_json', '_safe_str']:
return wrapper=True, mark_unresolved
# Known mappings
if resolver_name in SOURCE_MAP:
function, data_layer_module, tables, kind = SOURCE_MAP[resolver_name]
return function, data_layer_module, tables, kind
# Goals formatting
if resolver_name.startswith('_format_goals'):
return None, None, ['goals'], kind=INTERPRETED
# Unknown
return None, None, [], kind=UNKNOWN, mark_unresolved
```
### Source Kinds
- **direct**: Direct database read (e.g., `get_latest_weight`)
- **computed**: Calculated from data (e.g., `calculate_bmi`)
- **aggregated**: Aggregation over time/records (e.g., `get_nutrition_avg`)
- **derived**: Derived from other metrics (e.g., `protein_g_per_kg`)
- **interpreted**: AI/prompt stage output
- **wrapper**: Safe wrapper (not a real source)
### Rules
1. **Safe wrappers** (`_safe_*`) are NOT valid source functions
2. Must trace to **real data layer function** or **database table**
3. Mark as `unresolved` if cannot trace to real source
---
## 6. Used By Tracking
### Decision Logic
```python
def track_usage(placeholder_key, ai_prompts_table):
used_by = UsedBy(prompts=[], pipelines=[], charts=[])
for prompt in ai_prompts_table:
# Check template
if placeholder_key in prompt.template:
if prompt.type == 'pipeline':
used_by.pipelines.append(prompt.name)
else:
used_by.prompts.append(prompt.name)
# Check stages
for stage in prompt.stages:
for stage_prompt in stage.prompts:
if placeholder_key in stage_prompt.template:
used_by.pipelines.append(prompt.name)
# Check charts (future)
# if placeholder_key in chart_endpoints:
# used_by.charts.append(chart_name)
return used_by
```
### Orphaned Detection
If `used_by.prompts` + `used_by.pipelines` + `used_by.charts` are all empty:
- Set `orphaned_placeholder = True`
- Consider for deprecation
---
## 7. Quality Filter Policy (Activity Placeholders)
### Decision Logic
```python
def create_quality_policy(key):
# Activity-related placeholders need quality policies
if any(x in key for x in ['activity', 'training', 'load', 'volume', 'ability']):
return QualityFilterPolicy(
enabled=True,
default_filter_level="quality", # quality | acceptable | all
null_quality_handling="exclude", # exclude | include_as_uncategorized
includes_poor=False,
includes_excluded=False,
notes="Filters for quality='quality' by default. NULL quality excluded."
)
return None
```
### Rules
1. **Activity metrics** require quality filter policies
2. **Default filter**: `quality='quality'` (acceptable and above)
3. **NULL handling**: Excluded by default
4. **Poor quality**: Not included unless explicit
5. **Excluded**: Not included
---
## 8. Confidence Logic
### Decision Logic
```python
def create_confidence_logic(key, data_layer_module):
# Data layer functions have confidence
if data_layer_module:
return ConfidenceLogic(
supported=True,
calculation="Based on data availability and thresholds",
thresholds={"min_data_points": 1},
notes=f"Determined by {data_layer_module}"
)
# Scores
if 'score' in key:
return ConfidenceLogic(
supported=True,
calculation="Based on data completeness for components",
notes="Correlates with input data availability"
)
# Correlations
if 'correlation' in key:
return ConfidenceLogic(
supported=True,
calculation="Pearson correlation with significance",
thresholds={"min_data_points": 7}
)
return None
```
### Rules
1. **Data layer placeholders**: Have confidence logic
2. **Scores**: Confidence correlates with data availability
3. **Correlations**: Require minimum data points
4. **Simple lookups**: May not need confidence logic
---
## 9. Metadata Completeness Score
### Calculation
```python
def calculate_completeness(metadata):
score = 0
# Required fields (30 points)
if category != 'Unknown': score += 5
if description and 'No description' not in description: score += 5
if semantic_contract: score += 10
if source.resolver != 'unknown': score += 10
# Type specification (20 points)
if type != 'legacy_unknown': score += 10
if time_window != 'unknown': score += 10
# Output specification (20 points)
if output_type != 'unknown': score += 10
if format_hint: score += 10
# Source provenance (20 points)
if source.data_layer_module: score += 10
if source.source_tables: score += 10
# Quality policies (10 points)
if quality_filter_policy: score += 5
if confidence_logic: score += 5
return min(score, 100)
```
### Schema Status
Based on completeness score:
- **90-100%** + no unresolved → `validated`
- **50-89%** → `draft`
- **0-49%** → `incomplete`
---
## 10. Validation Tests
### Required Tests
```python
def test_value_raw_extraction():
# Test each output_type
assert extract_value_raw('{"key": "val"}', JSON) == {"key": "val"}
assert extract_value_raw('85.8 kg', NUMBER) == 85.8
assert extract_value_raw('2026-03-29', DATE) == '2026-03-29'
def test_unit_inference():
# No units for scores
assert infer_unit('goal_progress_score', ..., NUMBER) == None
# Correct units for measurements
assert infer_unit('weight_aktuell', ..., NUMBER) == 'kg'
# No units for JSON
assert infer_unit('active_goals_json', ..., JSON) == None
def test_time_window_detection():
# Explicit suffix
assert detect_time_window('weight_7d_median', ...) == DAYS_7
# Latest
assert detect_time_window('weight_aktuell', ...) == LATEST
# Legacy mismatch detection
tw, mismatch = detect_time_window('weight_trend', desc='7d', contract='28d')
assert tw == DAYS_28
assert mismatch == True
def test_source_provenance():
# Skip wrappers
assert resolve_source('_safe_int') == (None, None, [], 'wrapper')
# Real sources
func, module, tables, kind = resolve_source('get_latest_weight')
assert func == 'get_latest_weight_data'
assert module == 'body_metrics'
assert 'weight_log' in tables
def test_quality_filter_for_activity():
# Activity placeholders need quality filter
policy = create_quality_policy('activity_summary')
assert policy is not None
assert policy.default_filter_level == "quality"
# Non-activity placeholders don't
policy = create_quality_policy('weight_aktuell')
assert policy is None
```
---
## 11. Continuous Validation
### Pre-Commit Checks
```bash
# Run validation before commit
python backend/generate_complete_metadata_v2.py
# Check for errors
if QA report shows high failure rate:
FAIL commit
```
### CI/CD Integration
```yaml
- name: Validate Placeholder Metadata
run: |
python backend/generate_complete_metadata_v2.py
python backend/tests/test_placeholder_metadata_v2.py
```
---
## Summary
This validation logic ensures:
1. **Reproducible**: Same input → same output
2. **Testable**: All logic has unit tests
3. **Auditable**: Clear decision paths
4. **Conservative**: Prefer `unknown` over wrong guesses
5. **Normative**: Actual implementation > legacy description