MAJOR CHANGES: - Enhanced metadata schema with 7 QA fields - Deterministic derivation logic (no guessing) - Conservative inference (prefer unknown over wrong) - Real source tracking (skip safe wrappers) - Legacy mismatch detection - Activity quality filter policies - Completeness scoring (0-100) - Unresolved fields tracking - Fixed ZIP/JSON export auth (query param support) FILES CHANGED: - backend/placeholder_metadata.py (schema extended) - backend/placeholder_metadata_enhanced.py (NEW, 418 lines) - backend/generate_complete_metadata_v2.py (NEW, 334 lines) - backend/tests/test_placeholder_metadata_v2.py (NEW, 302 lines) - backend/routers/prompts.py (V2 integration + auth fix) - docs/PLACEHOLDER_METADATA_VALIDATION.md (NEW, 541 lines) PROBLEMS FIXED: ✓ value_raw extraction (type-aware, JSON parsing) ✓ Units for dimensionless values (scores, correlations) ✓ Safe wrappers as sources (now skipped) ✓ Time window guessing (confidence flags) ✓ Legacy inconsistencies (marked with flag) ✓ Missing quality filters (activity placeholders) ✓ No completeness metric (0-100 score) ✓ Orphaned placeholders (tracked) ✓ Unresolved fields (explicit list) ✓ ZIP/JSON export auth (query token support for downloads) AUTH FIX: - export-catalog-zip now accepts token via query param (?token=xxx) - export-values-extended now accepts token via query param - Allows browser downloads without custom headers Konzept: docs/PLACEHOLDER_METADATA_REQUIREMENTS_V2_NORMATIVE.md Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
541 lines
15 KiB
Markdown
541 lines
15 KiB
Markdown
# Placeholder Metadata Validation Logic
|
|
|
|
**Version:** 2.0.0
|
|
**Generated:** 2026-03-29
|
|
**Status:** Normative
|
|
|
|
---
|
|
|
|
## Purpose
|
|
|
|
This document defines the **deterministic derivation logic** for all placeholder metadata fields. It ensures that metadata extraction is **reproducible, testable, and auditable**.
|
|
|
|
---
|
|
|
|
## 1. Type Classification (`PlaceholderType`)
|
|
|
|
### Decision Logic
|
|
|
|
```python
|
|
def determine_type(key, description, output_type, value_display):
|
|
# JSON/Markdown outputs are typically raw_data
|
|
if output_type in [JSON, MARKDOWN]:
|
|
return RAW_DATA
|
|
|
|
# Scores and percentages are atomic
|
|
if any(x in key for x in ['score', 'pct', 'adequacy']):
|
|
return ATOMIC
|
|
|
|
# Summaries and details are raw_data
|
|
if any(x in key for x in ['summary', 'detail', 'verteilung']):
|
|
return RAW_DATA
|
|
|
|
# Goals and focus areas (if derived from prompts)
|
|
if any(x in key for x in ['goal', 'focus', 'top_']):
|
|
# Check if from KI/Prompt stage
|
|
if is_from_prompt_stage(key):
|
|
return INTERPRETED
|
|
else:
|
|
return ATOMIC # Just database values
|
|
|
|
# Correlations are interpreted
|
|
if 'correlation' in key or 'plateau' in key or 'driver' in key:
|
|
return INTERPRETED
|
|
|
|
# Default: atomic
|
|
return ATOMIC
|
|
```
|
|
|
|
### Rules
|
|
|
|
1. **ATOMIC**: Single values (numbers, strings, dates) from database or simple computation
|
|
2. **RAW_DATA**: Structured data (JSON, arrays, markdown) representing multiple values
|
|
3. **INTERPRETED**: Values derived from AI/Prompt stages or complex interpretation
|
|
4. **LEGACY_UNKNOWN**: Only for existing unclear placeholders (never for new ones)
|
|
|
|
### Validation
|
|
|
|
- `interpreted` requires evidence of prompt/stage origin
|
|
- Calculated scores/aggregations are NOT automatically `interpreted`
|
|
|
|
---
|
|
|
|
## 2. Unit Inference
|
|
|
|
### Decision Logic
|
|
|
|
```python
|
|
def infer_unit(key, description, output_type, type):
|
|
# NO units for:
|
|
if output_type in [JSON, MARKDOWN, ENUM]:
|
|
return None
|
|
|
|
if any(x in key for x in ['score', 'correlation', 'adequacy']):
|
|
return None # Dimensionless
|
|
|
|
if any(x in key for x in ['pct', 'ratio', 'balance']):
|
|
return None # Dimensionless percentage/ratio
|
|
|
|
# Weight/mass
|
|
if any(x in key for x in ['weight', 'gewicht', 'fm_', 'lbm_']):
|
|
return 'kg'
|
|
|
|
# Circumferences
|
|
if 'umfang' in key or any(x in key for x in ['waist', 'hip', 'chest']):
|
|
return 'cm'
|
|
|
|
# Time
|
|
if 'duration' in key or 'dauer' in key or 'debt' in key:
|
|
if 'hours' in description or 'stunden' in description:
|
|
return 'Stunden'
|
|
elif 'minutes' in description:
|
|
return 'Minuten'
|
|
return None # Unclear
|
|
|
|
# Heart rate
|
|
if 'rhr' in key or ('hr' in key and 'hrv' not in key):
|
|
return 'bpm'
|
|
|
|
# HRV
|
|
if 'hrv' in key:
|
|
return 'ms'
|
|
|
|
# VO2 Max
|
|
if 'vo2' in key:
|
|
return 'ml/kg/min'
|
|
|
|
# Calories
|
|
if 'kcal' in key or 'energy' in key:
|
|
return 'kcal'
|
|
|
|
# Macros
|
|
if any(x in key for x in ['protein', 'carb', 'fat']) and 'g' in description:
|
|
return 'g'
|
|
|
|
# Default: None (conservative)
|
|
return None
|
|
```
|
|
|
|
### Rules
|
|
|
|
1. **NO units** for dimensionless values (scores, correlations, percentages, ratios)
|
|
2. **NO units** for JSON/Markdown/Enum outputs
|
|
3. **NO units** for classifications (e.g., "recomposition_quadrant")
|
|
4. **Conservative**: Only assign unit if certain from key or description
|
|
|
|
### Examples
|
|
|
|
✅ **Correct:**
|
|
- `weight_aktuell` → `kg`
|
|
- `goal_progress_score` → `None` (dimensionless 0-100)
|
|
- `correlation_energy_weight_lag` → `None` (dimensionless)
|
|
- `activity_summary` → `None` (text/JSON)
|
|
|
|
❌ **Incorrect:**
|
|
- `goal_progress_score` → `%` (wrong - it's 0-100 dimensionless)
|
|
- `waist_hip_ratio` → any unit (wrong - dimensionless ratio)
|
|
|
|
---
|
|
|
|
## 3. Time Window Detection
|
|
|
|
### Decision Logic (Priority Order)
|
|
|
|
```python
|
|
def detect_time_window(key, description, semantic_contract, resolver_name):
|
|
# 1. Explicit suffix (highest confidence)
|
|
if '_7d' in key: return DAYS_7, certain=True
|
|
if '_28d' in key: return DAYS_28, certain=True
|
|
if '_30d' in key: return DAYS_30, certain=True
|
|
if '_90d' in key: return DAYS_90, certain=True
|
|
|
|
# 2. Latest/current keywords
|
|
if any(x in key for x in ['aktuell', 'latest', 'current']):
|
|
return LATEST, certain=True
|
|
|
|
# 3. Semantic contract (high confidence)
|
|
if '7 tag' in semantic_contract or '7d' in semantic_contract:
|
|
# Check for description mismatch
|
|
if '30' in description or '28' in description:
|
|
mark_legacy_mismatch = True
|
|
return DAYS_7, certain=True, mismatch_note
|
|
|
|
# 4. Description patterns (medium confidence)
|
|
if 'letzte 7' in description or '7 tag' in description:
|
|
return DAYS_7, certain=False
|
|
|
|
# 5. Heuristics (low confidence)
|
|
if 'avg' in key or 'durchschn' in key:
|
|
return DAYS_30, certain=False, "Assumed 30d for average"
|
|
|
|
if 'trend' in key:
|
|
return DAYS_28, certain=False, "Assumed 28d for trend"
|
|
|
|
# 6. Unknown
|
|
return UNKNOWN, certain=False, "Could not determine"
|
|
```
|
|
|
|
### Legacy Mismatch Detection
|
|
|
|
If description says "7d" but semantic contract (implementation) says "28d":
|
|
- Set `time_window = DAYS_28` (actual implementation)
|
|
- Set `legacy_contract_mismatch = True`
|
|
- Add to `known_issues`: "Description says 7d but implementation is 28d"
|
|
|
|
### Rules
|
|
|
|
1. **Actual implementation** takes precedence over legacy description
|
|
2. **Suffix in key** is most reliable indicator
|
|
3. **Semantic contract** (if documented) reflects actual implementation
|
|
4. **Unknown** if cannot be determined with confidence
|
|
|
|
---
|
|
|
|
## 4. Value Raw Extraction
|
|
|
|
### Decision Logic
|
|
|
|
```python
|
|
def extract_value_raw(value_display, output_type, type):
|
|
# No value
|
|
if value_display in ['nicht verfügbar', '', None]:
|
|
return None, success=True
|
|
|
|
# JSON output
|
|
if output_type == JSON:
|
|
try:
|
|
return json.loads(value_display), success=True
|
|
except:
|
|
# Try to find JSON in string
|
|
match = re.search(r'(\{.*\}|\[.*\])', value_display, DOTALL)
|
|
if match:
|
|
try:
|
|
return json.loads(match.group(1)), success=True
|
|
except:
|
|
pass
|
|
return None, success=False # Failed
|
|
|
|
# Markdown
|
|
if output_type == MARKDOWN:
|
|
return value_display, success=True # Keep as string
|
|
|
|
# Number
|
|
if output_type in [NUMBER, INTEGER]:
|
|
match = re.search(r'([-+]?\d+\.?\d*)', value_display)
|
|
if match:
|
|
val = float(match.group(1))
|
|
return int(val) if output_type == INTEGER else val, success=True
|
|
return None, success=False
|
|
|
|
# Date
|
|
if output_type == DATE:
|
|
if re.match(r'\d{4}-\d{2}-\d{2}', value_display):
|
|
return value_display, success=True # ISO format
|
|
return value_display, success=False # Unknown format
|
|
|
|
# String/Enum
|
|
return value_display, success=True
|
|
```
|
|
|
|
### Rules
|
|
|
|
1. **JSON outputs**: Must be valid JSON objects/arrays, not strings
|
|
2. **Numeric outputs**: Extract number without unit
|
|
3. **Markdown/String**: Keep as-is
|
|
4. **Dates**: Prefer ISO format (YYYY-MM-DD)
|
|
5. **Failure**: Set `value_raw = None` and mark in `unresolved_fields`
|
|
|
|
### Examples
|
|
|
|
✅ **Correct:**
|
|
- `active_goals_json` (JSON) → `{"goals": [...]}` (object)
|
|
- `weight_aktuell` (NUMBER) → `85.8` (number, no unit)
|
|
- `datum_heute` (DATE) → `"2026-03-29"` (ISO string)
|
|
|
|
❌ **Incorrect:**
|
|
- `active_goals_json` (JSON) → `"[Fehler: ...]"` (string, not JSON)
|
|
- `weight_aktuell` (NUMBER) → `"85.8"` (string, not number)
|
|
- `weight_aktuell` (NUMBER) → `85` (extracted from "85.8 kg" incorrectly)
|
|
|
|
---
|
|
|
|
## 5. Source Provenance
|
|
|
|
### Decision Logic
|
|
|
|
```python
|
|
def resolve_source(resolver_name):
|
|
# Skip safe wrappers - not real sources
|
|
if resolver_name in ['_safe_int', '_safe_float', '_safe_json', '_safe_str']:
|
|
return wrapper=True, mark_unresolved
|
|
|
|
# Known mappings
|
|
if resolver_name in SOURCE_MAP:
|
|
function, data_layer_module, tables, kind = SOURCE_MAP[resolver_name]
|
|
return function, data_layer_module, tables, kind
|
|
|
|
# Goals formatting
|
|
if resolver_name.startswith('_format_goals'):
|
|
return None, None, ['goals'], kind=INTERPRETED
|
|
|
|
# Unknown
|
|
return None, None, [], kind=UNKNOWN, mark_unresolved
|
|
```
|
|
|
|
### Source Kinds
|
|
|
|
- **direct**: Direct database read (e.g., `get_latest_weight`)
|
|
- **computed**: Calculated from data (e.g., `calculate_bmi`)
|
|
- **aggregated**: Aggregation over time/records (e.g., `get_nutrition_avg`)
|
|
- **derived**: Derived from other metrics (e.g., `protein_g_per_kg`)
|
|
- **interpreted**: AI/prompt stage output
|
|
- **wrapper**: Safe wrapper (not a real source)
|
|
|
|
### Rules
|
|
|
|
1. **Safe wrappers** (`_safe_*`) are NOT valid source functions
|
|
2. Must trace to **real data layer function** or **database table**
|
|
3. Mark as `unresolved` if cannot trace to real source
|
|
|
|
---
|
|
|
|
## 6. Used By Tracking
|
|
|
|
### Decision Logic
|
|
|
|
```python
|
|
def track_usage(placeholder_key, ai_prompts_table):
|
|
used_by = UsedBy(prompts=[], pipelines=[], charts=[])
|
|
|
|
for prompt in ai_prompts_table:
|
|
# Check template
|
|
if placeholder_key in prompt.template:
|
|
if prompt.type == 'pipeline':
|
|
used_by.pipelines.append(prompt.name)
|
|
else:
|
|
used_by.prompts.append(prompt.name)
|
|
|
|
# Check stages
|
|
for stage in prompt.stages:
|
|
for stage_prompt in stage.prompts:
|
|
if placeholder_key in stage_prompt.template:
|
|
used_by.pipelines.append(prompt.name)
|
|
|
|
# Check charts (future)
|
|
# if placeholder_key in chart_endpoints:
|
|
# used_by.charts.append(chart_name)
|
|
|
|
return used_by
|
|
```
|
|
|
|
### Orphaned Detection
|
|
|
|
If `used_by.prompts` + `used_by.pipelines` + `used_by.charts` are all empty:
|
|
- Set `orphaned_placeholder = True`
|
|
- Consider for deprecation
|
|
|
|
---
|
|
|
|
## 7. Quality Filter Policy (Activity Placeholders)
|
|
|
|
### Decision Logic
|
|
|
|
```python
|
|
def create_quality_policy(key):
|
|
# Activity-related placeholders need quality policies
|
|
if any(x in key for x in ['activity', 'training', 'load', 'volume', 'ability']):
|
|
return QualityFilterPolicy(
|
|
enabled=True,
|
|
default_filter_level="quality", # quality | acceptable | all
|
|
null_quality_handling="exclude", # exclude | include_as_uncategorized
|
|
includes_poor=False,
|
|
includes_excluded=False,
|
|
notes="Filters for quality='quality' by default. NULL quality excluded."
|
|
)
|
|
return None
|
|
```
|
|
|
|
### Rules
|
|
|
|
1. **Activity metrics** require quality filter policies
|
|
2. **Default filter**: `quality='quality'` (acceptable and above)
|
|
3. **NULL handling**: Excluded by default
|
|
4. **Poor quality**: Not included unless explicit
|
|
5. **Excluded**: Not included
|
|
|
|
---
|
|
|
|
## 8. Confidence Logic
|
|
|
|
### Decision Logic
|
|
|
|
```python
|
|
def create_confidence_logic(key, data_layer_module):
|
|
# Data layer functions have confidence
|
|
if data_layer_module:
|
|
return ConfidenceLogic(
|
|
supported=True,
|
|
calculation="Based on data availability and thresholds",
|
|
thresholds={"min_data_points": 1},
|
|
notes=f"Determined by {data_layer_module}"
|
|
)
|
|
|
|
# Scores
|
|
if 'score' in key:
|
|
return ConfidenceLogic(
|
|
supported=True,
|
|
calculation="Based on data completeness for components",
|
|
notes="Correlates with input data availability"
|
|
)
|
|
|
|
# Correlations
|
|
if 'correlation' in key:
|
|
return ConfidenceLogic(
|
|
supported=True,
|
|
calculation="Pearson correlation with significance",
|
|
thresholds={"min_data_points": 7}
|
|
)
|
|
|
|
return None
|
|
```
|
|
|
|
### Rules
|
|
|
|
1. **Data layer placeholders**: Have confidence logic
|
|
2. **Scores**: Confidence correlates with data availability
|
|
3. **Correlations**: Require minimum data points
|
|
4. **Simple lookups**: May not need confidence logic
|
|
|
|
---
|
|
|
|
## 9. Metadata Completeness Score
|
|
|
|
### Calculation
|
|
|
|
```python
|
|
def calculate_completeness(metadata):
|
|
score = 0
|
|
|
|
# Required fields (30 points)
|
|
if category != 'Unknown': score += 5
|
|
if description and 'No description' not in description: score += 5
|
|
if semantic_contract: score += 10
|
|
if source.resolver != 'unknown': score += 10
|
|
|
|
# Type specification (20 points)
|
|
if type != 'legacy_unknown': score += 10
|
|
if time_window != 'unknown': score += 10
|
|
|
|
# Output specification (20 points)
|
|
if output_type != 'unknown': score += 10
|
|
if format_hint: score += 10
|
|
|
|
# Source provenance (20 points)
|
|
if source.data_layer_module: score += 10
|
|
if source.source_tables: score += 10
|
|
|
|
# Quality policies (10 points)
|
|
if quality_filter_policy: score += 5
|
|
if confidence_logic: score += 5
|
|
|
|
return min(score, 100)
|
|
```
|
|
|
|
### Schema Status
|
|
|
|
Based on completeness score:
|
|
- **90-100%** + no unresolved → `validated`
|
|
- **50-89%** → `draft`
|
|
- **0-49%** → `incomplete`
|
|
|
|
---
|
|
|
|
## 10. Validation Tests
|
|
|
|
### Required Tests
|
|
|
|
```python
|
|
def test_value_raw_extraction():
|
|
# Test each output_type
|
|
assert extract_value_raw('{"key": "val"}', JSON) == {"key": "val"}
|
|
assert extract_value_raw('85.8 kg', NUMBER) == 85.8
|
|
assert extract_value_raw('2026-03-29', DATE) == '2026-03-29'
|
|
|
|
def test_unit_inference():
|
|
# No units for scores
|
|
assert infer_unit('goal_progress_score', ..., NUMBER) == None
|
|
|
|
# Correct units for measurements
|
|
assert infer_unit('weight_aktuell', ..., NUMBER) == 'kg'
|
|
|
|
# No units for JSON
|
|
assert infer_unit('active_goals_json', ..., JSON) == None
|
|
|
|
def test_time_window_detection():
|
|
# Explicit suffix
|
|
assert detect_time_window('weight_7d_median', ...) == DAYS_7
|
|
|
|
# Latest
|
|
assert detect_time_window('weight_aktuell', ...) == LATEST
|
|
|
|
# Legacy mismatch detection
|
|
tw, mismatch = detect_time_window('weight_trend', desc='7d', contract='28d')
|
|
assert tw == DAYS_28
|
|
assert mismatch == True
|
|
|
|
def test_source_provenance():
|
|
# Skip wrappers
|
|
assert resolve_source('_safe_int') == (None, None, [], 'wrapper')
|
|
|
|
# Real sources
|
|
func, module, tables, kind = resolve_source('get_latest_weight')
|
|
assert func == 'get_latest_weight_data'
|
|
assert module == 'body_metrics'
|
|
assert 'weight_log' in tables
|
|
|
|
def test_quality_filter_for_activity():
|
|
# Activity placeholders need quality filter
|
|
policy = create_quality_policy('activity_summary')
|
|
assert policy is not None
|
|
assert policy.default_filter_level == "quality"
|
|
|
|
# Non-activity placeholders don't
|
|
policy = create_quality_policy('weight_aktuell')
|
|
assert policy is None
|
|
```
|
|
|
|
---
|
|
|
|
## 11. Continuous Validation
|
|
|
|
### Pre-Commit Checks
|
|
|
|
```bash
|
|
# Run validation before commit
|
|
python backend/generate_complete_metadata_v2.py
|
|
|
|
# Check for errors
|
|
if QA report shows high failure rate:
|
|
FAIL commit
|
|
```
|
|
|
|
### CI/CD Integration
|
|
|
|
```yaml
|
|
- name: Validate Placeholder Metadata
|
|
run: |
|
|
python backend/generate_complete_metadata_v2.py
|
|
python backend/tests/test_placeholder_metadata_v2.py
|
|
```
|
|
|
|
---
|
|
|
|
## Summary
|
|
|
|
This validation logic ensures:
|
|
1. **Reproducible**: Same input → same output
|
|
2. **Testable**: All logic has unit tests
|
|
3. **Auditable**: Clear decision paths
|
|
4. **Conservative**: Prefer `unknown` over wrong guesses
|
|
5. **Normative**: Actual implementation > legacy description
|