mitai-jinkendo/docs/PLACEHOLDER_METADATA_VALIDATION.md
Lars 650313347f
All checks were successful
Deploy Development / deploy (push) Successful in 54s
Build Test / lint-backend (push) Successful in 1s
Build Test / build-frontend (push) Successful in 15s
feat: Placeholder Metadata V2 - Normative Implementation + ZIP Export Fix
MAJOR CHANGES:
- Enhanced metadata schema with 7 QA fields
- Deterministic derivation logic (no guessing)
- Conservative inference (prefer unknown over wrong)
- Real source tracking (skip safe wrappers)
- Legacy mismatch detection
- Activity quality filter policies
- Completeness scoring (0-100)
- Unresolved fields tracking
- Fixed ZIP/JSON export auth (query param support)

FILES CHANGED:
- backend/placeholder_metadata.py (schema extended)
- backend/placeholder_metadata_enhanced.py (NEW, 418 lines)
- backend/generate_complete_metadata_v2.py (NEW, 334 lines)
- backend/tests/test_placeholder_metadata_v2.py (NEW, 302 lines)
- backend/routers/prompts.py (V2 integration + auth fix)
- docs/PLACEHOLDER_METADATA_VALIDATION.md (NEW, 541 lines)

PROBLEMS FIXED:
✓ value_raw extraction (type-aware, JSON parsing)
✓ Units for dimensionless values (scores, correlations)
✓ Safe wrappers as sources (now skipped)
✓ Time window guessing (confidence flags)
✓ Legacy inconsistencies (marked with flag)
✓ Missing quality filters (activity placeholders)
✓ No completeness metric (0-100 score)
✓ Orphaned placeholders (tracked)
✓ Unresolved fields (explicit list)
✓ ZIP/JSON export auth (query token support for downloads)

AUTH FIX:
- export-catalog-zip now accepts token via query param (?token=xxx)
- export-values-extended now accepts token via query param
- Allows browser downloads without custom headers

Konzept: docs/PLACEHOLDER_METADATA_REQUIREMENTS_V2_NORMATIVE.md

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-29 21:23:37 +02:00

15 KiB

Placeholder Metadata Validation Logic

Version: 2.0.0 Generated: 2026-03-29 Status: Normative


Purpose

This document defines the deterministic derivation logic for all placeholder metadata fields. It ensures that metadata extraction is reproducible, testable, and auditable.


1. Type Classification (PlaceholderType)

Decision Logic

def determine_type(key, description, output_type, value_display):
    # JSON/Markdown outputs are typically raw_data
    if output_type in [JSON, MARKDOWN]:
        return RAW_DATA

    # Scores and percentages are atomic
    if any(x in key for x in ['score', 'pct', 'adequacy']):
        return ATOMIC

    # Summaries and details are raw_data
    if any(x in key for x in ['summary', 'detail', 'verteilung']):
        return RAW_DATA

    # Goals and focus areas (if derived from prompts)
    if any(x in key for x in ['goal', 'focus', 'top_']):
        # Check if from KI/Prompt stage
        if is_from_prompt_stage(key):
            return INTERPRETED
        else:
            return ATOMIC  # Just database values

    # Correlations are interpreted
    if 'correlation' in key or 'plateau' in key or 'driver' in key:
        return INTERPRETED

    # Default: atomic
    return ATOMIC

Rules

  1. ATOMIC: Single values (numbers, strings, dates) from database or simple computation
  2. RAW_DATA: Structured data (JSON, arrays, markdown) representing multiple values
  3. INTERPRETED: Values derived from AI/Prompt stages or complex interpretation
  4. LEGACY_UNKNOWN: Only for existing unclear placeholders (never for new ones)

Validation

  • interpreted requires evidence of prompt/stage origin
  • Calculated scores/aggregations are NOT automatically interpreted

2. Unit Inference

Decision Logic

def infer_unit(key, description, output_type, type):
    # NO units for:
    if output_type in [JSON, MARKDOWN, ENUM]:
        return None

    if any(x in key for x in ['score', 'correlation', 'adequacy']):
        return None  # Dimensionless

    if any(x in key for x in ['pct', 'ratio', 'balance']):
        return None  # Dimensionless percentage/ratio

    # Weight/mass
    if any(x in key for x in ['weight', 'gewicht', 'fm_', 'lbm_']):
        return 'kg'

    # Circumferences
    if 'umfang' in key or any(x in key for x in ['waist', 'hip', 'chest']):
        return 'cm'

    # Time
    if 'duration' in key or 'dauer' in key or 'debt' in key:
        if 'hours' in description or 'stunden' in description:
            return 'Stunden'
        elif 'minutes' in description:
            return 'Minuten'
        return None  # Unclear

    # Heart rate
    if 'rhr' in key or ('hr' in key and 'hrv' not in key):
        return 'bpm'

    # HRV
    if 'hrv' in key:
        return 'ms'

    # VO2 Max
    if 'vo2' in key:
        return 'ml/kg/min'

    # Calories
    if 'kcal' in key or 'energy' in key:
        return 'kcal'

    # Macros
    if any(x in key for x in ['protein', 'carb', 'fat']) and 'g' in description:
        return 'g'

    # Default: None (conservative)
    return None

Rules

  1. NO units for dimensionless values (scores, correlations, percentages, ratios)
  2. NO units for JSON/Markdown/Enum outputs
  3. NO units for classifications (e.g., "recomposition_quadrant")
  4. Conservative: Only assign unit if certain from key or description

Examples

Correct:

  • weight_aktuellkg
  • goal_progress_scoreNone (dimensionless 0-100)
  • correlation_energy_weight_lagNone (dimensionless)
  • activity_summaryNone (text/JSON)

Incorrect:

  • goal_progress_score% (wrong - it's 0-100 dimensionless)
  • waist_hip_ratio → any unit (wrong - dimensionless ratio)

3. Time Window Detection

Decision Logic (Priority Order)

def detect_time_window(key, description, semantic_contract, resolver_name):
    # 1. Explicit suffix (highest confidence)
    if '_7d' in key: return DAYS_7, certain=True
    if '_28d' in key: return DAYS_28, certain=True
    if '_30d' in key: return DAYS_30, certain=True
    if '_90d' in key: return DAYS_90, certain=True

    # 2. Latest/current keywords
    if any(x in key for x in ['aktuell', 'latest', 'current']):
        return LATEST, certain=True

    # 3. Semantic contract (high confidence)
    if '7 tag' in semantic_contract or '7d' in semantic_contract:
        # Check for description mismatch
        if '30' in description or '28' in description:
            mark_legacy_mismatch = True
        return DAYS_7, certain=True, mismatch_note

    # 4. Description patterns (medium confidence)
    if 'letzte 7' in description or '7 tag' in description:
        return DAYS_7, certain=False

    # 5. Heuristics (low confidence)
    if 'avg' in key or 'durchschn' in key:
        return DAYS_30, certain=False, "Assumed 30d for average"

    if 'trend' in key:
        return DAYS_28, certain=False, "Assumed 28d for trend"

    # 6. Unknown
    return UNKNOWN, certain=False, "Could not determine"

Legacy Mismatch Detection

If description says "7d" but semantic contract (implementation) says "28d":

  • Set time_window = DAYS_28 (actual implementation)
  • Set legacy_contract_mismatch = True
  • Add to known_issues: "Description says 7d but implementation is 28d"

Rules

  1. Actual implementation takes precedence over legacy description
  2. Suffix in key is most reliable indicator
  3. Semantic contract (if documented) reflects actual implementation
  4. Unknown if cannot be determined with confidence

4. Value Raw Extraction

Decision Logic

def extract_value_raw(value_display, output_type, type):
    # No value
    if value_display in ['nicht verfügbar', '', None]:
        return None, success=True

    # JSON output
    if output_type == JSON:
        try:
            return json.loads(value_display), success=True
        except:
            # Try to find JSON in string
            match = re.search(r'(\{.*\}|\[.*\])', value_display, DOTALL)
            if match:
                try:
                    return json.loads(match.group(1)), success=True
                except:
                    pass
            return None, success=False  # Failed

    # Markdown
    if output_type == MARKDOWN:
        return value_display, success=True  # Keep as string

    # Number
    if output_type in [NUMBER, INTEGER]:
        match = re.search(r'([-+]?\d+\.?\d*)', value_display)
        if match:
            val = float(match.group(1))
            return int(val) if output_type == INTEGER else val, success=True
        return None, success=False

    # Date
    if output_type == DATE:
        if re.match(r'\d{4}-\d{2}-\d{2}', value_display):
            return value_display, success=True  # ISO format
        return value_display, success=False  # Unknown format

    # String/Enum
    return value_display, success=True

Rules

  1. JSON outputs: Must be valid JSON objects/arrays, not strings
  2. Numeric outputs: Extract number without unit
  3. Markdown/String: Keep as-is
  4. Dates: Prefer ISO format (YYYY-MM-DD)
  5. Failure: Set value_raw = None and mark in unresolved_fields

Examples

Correct:

  • active_goals_json (JSON) → {"goals": [...]} (object)
  • weight_aktuell (NUMBER) → 85.8 (number, no unit)
  • datum_heute (DATE) → "2026-03-29" (ISO string)

Incorrect:

  • active_goals_json (JSON) → "[Fehler: ...]" (string, not JSON)
  • weight_aktuell (NUMBER) → "85.8" (string, not number)
  • weight_aktuell (NUMBER) → 85 (extracted from "85.8 kg" incorrectly)

5. Source Provenance

Decision Logic

def resolve_source(resolver_name):
    # Skip safe wrappers - not real sources
    if resolver_name in ['_safe_int', '_safe_float', '_safe_json', '_safe_str']:
        return wrapper=True, mark_unresolved

    # Known mappings
    if resolver_name in SOURCE_MAP:
        function, data_layer_module, tables, kind = SOURCE_MAP[resolver_name]
        return function, data_layer_module, tables, kind

    # Goals formatting
    if resolver_name.startswith('_format_goals'):
        return None, None, ['goals'], kind=INTERPRETED

    # Unknown
    return None, None, [], kind=UNKNOWN, mark_unresolved

Source Kinds

  • direct: Direct database read (e.g., get_latest_weight)
  • computed: Calculated from data (e.g., calculate_bmi)
  • aggregated: Aggregation over time/records (e.g., get_nutrition_avg)
  • derived: Derived from other metrics (e.g., protein_g_per_kg)
  • interpreted: AI/prompt stage output
  • wrapper: Safe wrapper (not a real source)

Rules

  1. Safe wrappers (_safe_*) are NOT valid source functions
  2. Must trace to real data layer function or database table
  3. Mark as unresolved if cannot trace to real source

6. Used By Tracking

Decision Logic

def track_usage(placeholder_key, ai_prompts_table):
    used_by = UsedBy(prompts=[], pipelines=[], charts=[])

    for prompt in ai_prompts_table:
        # Check template
        if placeholder_key in prompt.template:
            if prompt.type == 'pipeline':
                used_by.pipelines.append(prompt.name)
            else:
                used_by.prompts.append(prompt.name)

        # Check stages
        for stage in prompt.stages:
            for stage_prompt in stage.prompts:
                if placeholder_key in stage_prompt.template:
                    used_by.pipelines.append(prompt.name)

    # Check charts (future)
    # if placeholder_key in chart_endpoints:
    #     used_by.charts.append(chart_name)

    return used_by

Orphaned Detection

If used_by.prompts + used_by.pipelines + used_by.charts are all empty:

  • Set orphaned_placeholder = True
  • Consider for deprecation

7. Quality Filter Policy (Activity Placeholders)

Decision Logic

def create_quality_policy(key):
    # Activity-related placeholders need quality policies
    if any(x in key for x in ['activity', 'training', 'load', 'volume', 'ability']):
        return QualityFilterPolicy(
            enabled=True,
            default_filter_level="quality",  # quality | acceptable | all
            null_quality_handling="exclude",  # exclude | include_as_uncategorized
            includes_poor=False,
            includes_excluded=False,
            notes="Filters for quality='quality' by default. NULL quality excluded."
        )
    return None

Rules

  1. Activity metrics require quality filter policies
  2. Default filter: quality='quality' (acceptable and above)
  3. NULL handling: Excluded by default
  4. Poor quality: Not included unless explicit
  5. Excluded: Not included

8. Confidence Logic

Decision Logic

def create_confidence_logic(key, data_layer_module):
    # Data layer functions have confidence
    if data_layer_module:
        return ConfidenceLogic(
            supported=True,
            calculation="Based on data availability and thresholds",
            thresholds={"min_data_points": 1},
            notes=f"Determined by {data_layer_module}"
        )

    # Scores
    if 'score' in key:
        return ConfidenceLogic(
            supported=True,
            calculation="Based on data completeness for components",
            notes="Correlates with input data availability"
        )

    # Correlations
    if 'correlation' in key:
        return ConfidenceLogic(
            supported=True,
            calculation="Pearson correlation with significance",
            thresholds={"min_data_points": 7}
        )

    return None

Rules

  1. Data layer placeholders: Have confidence logic
  2. Scores: Confidence correlates with data availability
  3. Correlations: Require minimum data points
  4. Simple lookups: May not need confidence logic

9. Metadata Completeness Score

Calculation

def calculate_completeness(metadata):
    score = 0

    # Required fields (30 points)
    if category != 'Unknown': score += 5
    if description and 'No description' not in description: score += 5
    if semantic_contract: score += 10
    if source.resolver != 'unknown': score += 10

    # Type specification (20 points)
    if type != 'legacy_unknown': score += 10
    if time_window != 'unknown': score += 10

    # Output specification (20 points)
    if output_type != 'unknown': score += 10
    if format_hint: score += 10

    # Source provenance (20 points)
    if source.data_layer_module: score += 10
    if source.source_tables: score += 10

    # Quality policies (10 points)
    if quality_filter_policy: score += 5
    if confidence_logic: score += 5

    return min(score, 100)

Schema Status

Based on completeness score:

  • 90-100% + no unresolved → validated
  • 50-89%draft
  • 0-49%incomplete

10. Validation Tests

Required Tests

def test_value_raw_extraction():
    # Test each output_type
    assert extract_value_raw('{"key": "val"}', JSON) == {"key": "val"}
    assert extract_value_raw('85.8 kg', NUMBER) == 85.8
    assert extract_value_raw('2026-03-29', DATE) == '2026-03-29'

def test_unit_inference():
    # No units for scores
    assert infer_unit('goal_progress_score', ..., NUMBER) == None

    # Correct units for measurements
    assert infer_unit('weight_aktuell', ..., NUMBER) == 'kg'

    # No units for JSON
    assert infer_unit('active_goals_json', ..., JSON) == None

def test_time_window_detection():
    # Explicit suffix
    assert detect_time_window('weight_7d_median', ...) == DAYS_7

    # Latest
    assert detect_time_window('weight_aktuell', ...) == LATEST

    # Legacy mismatch detection
    tw, mismatch = detect_time_window('weight_trend', desc='7d', contract='28d')
    assert tw == DAYS_28
    assert mismatch == True

def test_source_provenance():
    # Skip wrappers
    assert resolve_source('_safe_int') == (None, None, [], 'wrapper')

    # Real sources
    func, module, tables, kind = resolve_source('get_latest_weight')
    assert func == 'get_latest_weight_data'
    assert module == 'body_metrics'
    assert 'weight_log' in tables

def test_quality_filter_for_activity():
    # Activity placeholders need quality filter
    policy = create_quality_policy('activity_summary')
    assert policy is not None
    assert policy.default_filter_level == "quality"

    # Non-activity placeholders don't
    policy = create_quality_policy('weight_aktuell')
    assert policy is None

11. Continuous Validation

Pre-Commit Checks

# Run validation before commit
python backend/generate_complete_metadata_v2.py

# Check for errors
if QA report shows high failure rate:
    FAIL commit

CI/CD Integration

- name: Validate Placeholder Metadata
  run: |
    python backend/generate_complete_metadata_v2.py
    python backend/tests/test_placeholder_metadata_v2.py    

Summary

This validation logic ensures:

  1. Reproducible: Same input → same output
  2. Testable: All logic has unit tests
  3. Auditable: Clear decision paths
  4. Conservative: Prefer unknown over wrong guesses
  5. Normative: Actual implementation > legacy description