MAJOR CHANGES: - Enhanced metadata schema with 7 QA fields - Deterministic derivation logic (no guessing) - Conservative inference (prefer unknown over wrong) - Real source tracking (skip safe wrappers) - Legacy mismatch detection - Activity quality filter policies - Completeness scoring (0-100) - Unresolved fields tracking - Fixed ZIP/JSON export auth (query param support) FILES CHANGED: - backend/placeholder_metadata.py (schema extended) - backend/placeholder_metadata_enhanced.py (NEW, 418 lines) - backend/generate_complete_metadata_v2.py (NEW, 334 lines) - backend/tests/test_placeholder_metadata_v2.py (NEW, 302 lines) - backend/routers/prompts.py (V2 integration + auth fix) - docs/PLACEHOLDER_METADATA_VALIDATION.md (NEW, 541 lines) PROBLEMS FIXED: ✓ value_raw extraction (type-aware, JSON parsing) ✓ Units for dimensionless values (scores, correlations) ✓ Safe wrappers as sources (now skipped) ✓ Time window guessing (confidence flags) ✓ Legacy inconsistencies (marked with flag) ✓ Missing quality filters (activity placeholders) ✓ No completeness metric (0-100 score) ✓ Orphaned placeholders (tracked) ✓ Unresolved fields (explicit list) ✓ ZIP/JSON export auth (query token support for downloads) AUTH FIX: - export-catalog-zip now accepts token via query param (?token=xxx) - export-values-extended now accepts token via query param - Allows browser downloads without custom headers Konzept: docs/PLACEHOLDER_METADATA_REQUIREMENTS_V2_NORMATIVE.md Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
15 KiB
15 KiB
Placeholder Metadata Validation Logic
Version: 2.0.0 Generated: 2026-03-29 Status: Normative
Purpose
This document defines the deterministic derivation logic for all placeholder metadata fields. It ensures that metadata extraction is reproducible, testable, and auditable.
1. Type Classification (PlaceholderType)
Decision Logic
def determine_type(key, description, output_type, value_display):
# JSON/Markdown outputs are typically raw_data
if output_type in [JSON, MARKDOWN]:
return RAW_DATA
# Scores and percentages are atomic
if any(x in key for x in ['score', 'pct', 'adequacy']):
return ATOMIC
# Summaries and details are raw_data
if any(x in key for x in ['summary', 'detail', 'verteilung']):
return RAW_DATA
# Goals and focus areas (if derived from prompts)
if any(x in key for x in ['goal', 'focus', 'top_']):
# Check if from KI/Prompt stage
if is_from_prompt_stage(key):
return INTERPRETED
else:
return ATOMIC # Just database values
# Correlations are interpreted
if 'correlation' in key or 'plateau' in key or 'driver' in key:
return INTERPRETED
# Default: atomic
return ATOMIC
Rules
- ATOMIC: Single values (numbers, strings, dates) from database or simple computation
- RAW_DATA: Structured data (JSON, arrays, markdown) representing multiple values
- INTERPRETED: Values derived from AI/Prompt stages or complex interpretation
- LEGACY_UNKNOWN: Only for existing unclear placeholders (never for new ones)
Validation
interpretedrequires evidence of prompt/stage origin- Calculated scores/aggregations are NOT automatically
interpreted
2. Unit Inference
Decision Logic
def infer_unit(key, description, output_type, type):
# NO units for:
if output_type in [JSON, MARKDOWN, ENUM]:
return None
if any(x in key for x in ['score', 'correlation', 'adequacy']):
return None # Dimensionless
if any(x in key for x in ['pct', 'ratio', 'balance']):
return None # Dimensionless percentage/ratio
# Weight/mass
if any(x in key for x in ['weight', 'gewicht', 'fm_', 'lbm_']):
return 'kg'
# Circumferences
if 'umfang' in key or any(x in key for x in ['waist', 'hip', 'chest']):
return 'cm'
# Time
if 'duration' in key or 'dauer' in key or 'debt' in key:
if 'hours' in description or 'stunden' in description:
return 'Stunden'
elif 'minutes' in description:
return 'Minuten'
return None # Unclear
# Heart rate
if 'rhr' in key or ('hr' in key and 'hrv' not in key):
return 'bpm'
# HRV
if 'hrv' in key:
return 'ms'
# VO2 Max
if 'vo2' in key:
return 'ml/kg/min'
# Calories
if 'kcal' in key or 'energy' in key:
return 'kcal'
# Macros
if any(x in key for x in ['protein', 'carb', 'fat']) and 'g' in description:
return 'g'
# Default: None (conservative)
return None
Rules
- NO units for dimensionless values (scores, correlations, percentages, ratios)
- NO units for JSON/Markdown/Enum outputs
- NO units for classifications (e.g., "recomposition_quadrant")
- Conservative: Only assign unit if certain from key or description
Examples
✅ Correct:
weight_aktuell→kggoal_progress_score→None(dimensionless 0-100)correlation_energy_weight_lag→None(dimensionless)activity_summary→None(text/JSON)
❌ Incorrect:
goal_progress_score→%(wrong - it's 0-100 dimensionless)waist_hip_ratio→ any unit (wrong - dimensionless ratio)
3. Time Window Detection
Decision Logic (Priority Order)
def detect_time_window(key, description, semantic_contract, resolver_name):
# 1. Explicit suffix (highest confidence)
if '_7d' in key: return DAYS_7, certain=True
if '_28d' in key: return DAYS_28, certain=True
if '_30d' in key: return DAYS_30, certain=True
if '_90d' in key: return DAYS_90, certain=True
# 2. Latest/current keywords
if any(x in key for x in ['aktuell', 'latest', 'current']):
return LATEST, certain=True
# 3. Semantic contract (high confidence)
if '7 tag' in semantic_contract or '7d' in semantic_contract:
# Check for description mismatch
if '30' in description or '28' in description:
mark_legacy_mismatch = True
return DAYS_7, certain=True, mismatch_note
# 4. Description patterns (medium confidence)
if 'letzte 7' in description or '7 tag' in description:
return DAYS_7, certain=False
# 5. Heuristics (low confidence)
if 'avg' in key or 'durchschn' in key:
return DAYS_30, certain=False, "Assumed 30d for average"
if 'trend' in key:
return DAYS_28, certain=False, "Assumed 28d for trend"
# 6. Unknown
return UNKNOWN, certain=False, "Could not determine"
Legacy Mismatch Detection
If description says "7d" but semantic contract (implementation) says "28d":
- Set
time_window = DAYS_28(actual implementation) - Set
legacy_contract_mismatch = True - Add to
known_issues: "Description says 7d but implementation is 28d"
Rules
- Actual implementation takes precedence over legacy description
- Suffix in key is most reliable indicator
- Semantic contract (if documented) reflects actual implementation
- Unknown if cannot be determined with confidence
4. Value Raw Extraction
Decision Logic
def extract_value_raw(value_display, output_type, type):
# No value
if value_display in ['nicht verfügbar', '', None]:
return None, success=True
# JSON output
if output_type == JSON:
try:
return json.loads(value_display), success=True
except:
# Try to find JSON in string
match = re.search(r'(\{.*\}|\[.*\])', value_display, DOTALL)
if match:
try:
return json.loads(match.group(1)), success=True
except:
pass
return None, success=False # Failed
# Markdown
if output_type == MARKDOWN:
return value_display, success=True # Keep as string
# Number
if output_type in [NUMBER, INTEGER]:
match = re.search(r'([-+]?\d+\.?\d*)', value_display)
if match:
val = float(match.group(1))
return int(val) if output_type == INTEGER else val, success=True
return None, success=False
# Date
if output_type == DATE:
if re.match(r'\d{4}-\d{2}-\d{2}', value_display):
return value_display, success=True # ISO format
return value_display, success=False # Unknown format
# String/Enum
return value_display, success=True
Rules
- JSON outputs: Must be valid JSON objects/arrays, not strings
- Numeric outputs: Extract number without unit
- Markdown/String: Keep as-is
- Dates: Prefer ISO format (YYYY-MM-DD)
- Failure: Set
value_raw = Noneand mark inunresolved_fields
Examples
✅ Correct:
active_goals_json(JSON) →{"goals": [...]}(object)weight_aktuell(NUMBER) →85.8(number, no unit)datum_heute(DATE) →"2026-03-29"(ISO string)
❌ Incorrect:
active_goals_json(JSON) →"[Fehler: ...]"(string, not JSON)weight_aktuell(NUMBER) →"85.8"(string, not number)weight_aktuell(NUMBER) →85(extracted from "85.8 kg" incorrectly)
5. Source Provenance
Decision Logic
def resolve_source(resolver_name):
# Skip safe wrappers - not real sources
if resolver_name in ['_safe_int', '_safe_float', '_safe_json', '_safe_str']:
return wrapper=True, mark_unresolved
# Known mappings
if resolver_name in SOURCE_MAP:
function, data_layer_module, tables, kind = SOURCE_MAP[resolver_name]
return function, data_layer_module, tables, kind
# Goals formatting
if resolver_name.startswith('_format_goals'):
return None, None, ['goals'], kind=INTERPRETED
# Unknown
return None, None, [], kind=UNKNOWN, mark_unresolved
Source Kinds
- direct: Direct database read (e.g.,
get_latest_weight) - computed: Calculated from data (e.g.,
calculate_bmi) - aggregated: Aggregation over time/records (e.g.,
get_nutrition_avg) - derived: Derived from other metrics (e.g.,
protein_g_per_kg) - interpreted: AI/prompt stage output
- wrapper: Safe wrapper (not a real source)
Rules
- Safe wrappers (
_safe_*) are NOT valid source functions - Must trace to real data layer function or database table
- Mark as
unresolvedif cannot trace to real source
6. Used By Tracking
Decision Logic
def track_usage(placeholder_key, ai_prompts_table):
used_by = UsedBy(prompts=[], pipelines=[], charts=[])
for prompt in ai_prompts_table:
# Check template
if placeholder_key in prompt.template:
if prompt.type == 'pipeline':
used_by.pipelines.append(prompt.name)
else:
used_by.prompts.append(prompt.name)
# Check stages
for stage in prompt.stages:
for stage_prompt in stage.prompts:
if placeholder_key in stage_prompt.template:
used_by.pipelines.append(prompt.name)
# Check charts (future)
# if placeholder_key in chart_endpoints:
# used_by.charts.append(chart_name)
return used_by
Orphaned Detection
If used_by.prompts + used_by.pipelines + used_by.charts are all empty:
- Set
orphaned_placeholder = True - Consider for deprecation
7. Quality Filter Policy (Activity Placeholders)
Decision Logic
def create_quality_policy(key):
# Activity-related placeholders need quality policies
if any(x in key for x in ['activity', 'training', 'load', 'volume', 'ability']):
return QualityFilterPolicy(
enabled=True,
default_filter_level="quality", # quality | acceptable | all
null_quality_handling="exclude", # exclude | include_as_uncategorized
includes_poor=False,
includes_excluded=False,
notes="Filters for quality='quality' by default. NULL quality excluded."
)
return None
Rules
- Activity metrics require quality filter policies
- Default filter:
quality='quality'(acceptable and above) - NULL handling: Excluded by default
- Poor quality: Not included unless explicit
- Excluded: Not included
8. Confidence Logic
Decision Logic
def create_confidence_logic(key, data_layer_module):
# Data layer functions have confidence
if data_layer_module:
return ConfidenceLogic(
supported=True,
calculation="Based on data availability and thresholds",
thresholds={"min_data_points": 1},
notes=f"Determined by {data_layer_module}"
)
# Scores
if 'score' in key:
return ConfidenceLogic(
supported=True,
calculation="Based on data completeness for components",
notes="Correlates with input data availability"
)
# Correlations
if 'correlation' in key:
return ConfidenceLogic(
supported=True,
calculation="Pearson correlation with significance",
thresholds={"min_data_points": 7}
)
return None
Rules
- Data layer placeholders: Have confidence logic
- Scores: Confidence correlates with data availability
- Correlations: Require minimum data points
- Simple lookups: May not need confidence logic
9. Metadata Completeness Score
Calculation
def calculate_completeness(metadata):
score = 0
# Required fields (30 points)
if category != 'Unknown': score += 5
if description and 'No description' not in description: score += 5
if semantic_contract: score += 10
if source.resolver != 'unknown': score += 10
# Type specification (20 points)
if type != 'legacy_unknown': score += 10
if time_window != 'unknown': score += 10
# Output specification (20 points)
if output_type != 'unknown': score += 10
if format_hint: score += 10
# Source provenance (20 points)
if source.data_layer_module: score += 10
if source.source_tables: score += 10
# Quality policies (10 points)
if quality_filter_policy: score += 5
if confidence_logic: score += 5
return min(score, 100)
Schema Status
Based on completeness score:
- 90-100% + no unresolved →
validated - 50-89% →
draft - 0-49% →
incomplete
10. Validation Tests
Required Tests
def test_value_raw_extraction():
# Test each output_type
assert extract_value_raw('{"key": "val"}', JSON) == {"key": "val"}
assert extract_value_raw('85.8 kg', NUMBER) == 85.8
assert extract_value_raw('2026-03-29', DATE) == '2026-03-29'
def test_unit_inference():
# No units for scores
assert infer_unit('goal_progress_score', ..., NUMBER) == None
# Correct units for measurements
assert infer_unit('weight_aktuell', ..., NUMBER) == 'kg'
# No units for JSON
assert infer_unit('active_goals_json', ..., JSON) == None
def test_time_window_detection():
# Explicit suffix
assert detect_time_window('weight_7d_median', ...) == DAYS_7
# Latest
assert detect_time_window('weight_aktuell', ...) == LATEST
# Legacy mismatch detection
tw, mismatch = detect_time_window('weight_trend', desc='7d', contract='28d')
assert tw == DAYS_28
assert mismatch == True
def test_source_provenance():
# Skip wrappers
assert resolve_source('_safe_int') == (None, None, [], 'wrapper')
# Real sources
func, module, tables, kind = resolve_source('get_latest_weight')
assert func == 'get_latest_weight_data'
assert module == 'body_metrics'
assert 'weight_log' in tables
def test_quality_filter_for_activity():
# Activity placeholders need quality filter
policy = create_quality_policy('activity_summary')
assert policy is not None
assert policy.default_filter_level == "quality"
# Non-activity placeholders don't
policy = create_quality_policy('weight_aktuell')
assert policy is None
11. Continuous Validation
Pre-Commit Checks
# Run validation before commit
python backend/generate_complete_metadata_v2.py
# Check for errors
if QA report shows high failure rate:
FAIL commit
CI/CD Integration
- name: Validate Placeholder Metadata
run: |
python backend/generate_complete_metadata_v2.py
python backend/tests/test_placeholder_metadata_v2.py
Summary
This validation logic ensures:
- Reproducible: Same input → same output
- Testable: All logic has unit tests
- Auditable: Clear decision paths
- Conservative: Prefer
unknownover wrong guesses - Normative: Actual implementation > legacy description