mitai-jinkendo/docs/PLACEHOLDER_METADATA_VALIDATION.md

# Placeholder Metadata Validation Logic

**Version:** 2.0.0
**Generated:** 2026-03-29
**Status:** Normative

---

## Purpose

This document defines the **deterministic derivation logic** for all placeholder metadata fields. It ensures that metadata extraction is **reproducible, testable, and auditable**.

---

## 1. Type Classification (`PlaceholderType`)

### Decision Logic

```python
def determine_type(key, description, output_type, value_display):
    # JSON/Markdown outputs are typically raw_data
    if output_type in [JSON, MARKDOWN]:
        return RAW_DATA

    # Scores and percentages are atomic
    if any(x in key for x in ['score', 'pct', 'adequacy']):
        return ATOMIC

    # Summaries and details are raw_data
    if any(x in key for x in ['summary', 'detail', 'verteilung']):
        return RAW_DATA

    # Goals and focus areas (if derived from prompts)
    if any(x in key for x in ['goal', 'focus', 'top_']):
        # Check if from KI/Prompt stage
        if is_from_prompt_stage(key):
            return INTERPRETED
        else:
            return ATOMIC  # Just database values

    # Correlations are interpreted
    if 'correlation' in key or 'plateau' in key or 'driver' in key:
        return INTERPRETED

    # Default: atomic
    return ATOMIC
```

### Rules

1. **ATOMIC**: Single values (numbers, strings, dates) from database or simple computation
2. **RAW_DATA**: Structured data (JSON, arrays, markdown) representing multiple values
3. **INTERPRETED**: Values derived from AI/Prompt stages or complex interpretation
4. **LEGACY_UNKNOWN**: Only for existing unclear placeholders (never for new ones)

### Validation

- `interpreted` requires evidence of prompt/stage origin
- Calculated scores/aggregations are NOT automatically `interpreted`

---

## 2. Unit Inference

### Decision Logic

```python
def infer_unit(key, description, output_type, type):
    # NO units for:
    if output_type in [JSON, MARKDOWN, ENUM]:
        return None

    if any(x in key for x in ['score', 'correlation', 'adequacy']):
        return None  # Dimensionless

    if any(x in key for x in ['pct', 'ratio', 'balance']):
        return None  # Dimensionless percentage/ratio

    # Weight/mass
    if any(x in key for x in ['weight', 'gewicht', 'fm_', 'lbm_']):
        return 'kg'

    # Circumferences
    if 'umfang' in key or any(x in key for x in ['waist', 'hip', 'chest']):
        return 'cm'

    # Time
    if 'duration' in key or 'dauer' in key or 'debt' in key:
        if 'hours' in description or 'stunden' in description:
            return 'Stunden'
        elif 'minutes' in description:
            return 'Minuten'
        return None  # Unclear

    # Heart rate
    if 'rhr' in key or ('hr' in key and 'hrv' not in key):
        return 'bpm'

    # HRV
    if 'hrv' in key:
        return 'ms'

    # VO2 Max
    if 'vo2' in key:
        return 'ml/kg/min'

    # Calories
    if 'kcal' in key or 'energy' in key:
        return 'kcal'

    # Macros
    if any(x in key for x in ['protein', 'carb', 'fat']) and 'g' in description:
        return 'g'

    # Default: None (conservative)
    return None
```

### Rules

1. **NO units** for dimensionless values (scores, correlations, percentages, ratios)
2. **NO units** for JSON/Markdown/Enum outputs
3. **NO units** for classifications (e.g., "recomposition_quadrant")
4. **Conservative**: Only assign unit if certain from key or description

### Examples

✅ **Correct:**
- `weight_aktuell` → `kg`
- `goal_progress_score` → `None` (dimensionless 0-100)
- `correlation_energy_weight_lag` → `None` (dimensionless)
- `activity_summary` → `None` (text/JSON)

❌ **Incorrect:**
- `goal_progress_score` → `%` (wrong - it's 0-100 dimensionless)
- `waist_hip_ratio` → any unit (wrong - dimensionless ratio)

---

## 3. Time Window Detection

### Decision Logic (Priority Order)

```python
def detect_time_window(key, description, semantic_contract, resolver_name):
    # 1. Explicit suffix (highest confidence)
    if '_7d' in key: return DAYS_7, certain=True
    if '_28d' in key: return DAYS_28, certain=True
    if '_30d' in key: return DAYS_30, certain=True
    if '_90d' in key: return DAYS_90, certain=True

    # 2. Latest/current keywords
    if any(x in key for x in ['aktuell', 'latest', 'current']):
        return LATEST, certain=True

    # 3. Semantic contract (high confidence)
    if '7 tag' in semantic_contract or '7d' in semantic_contract:
        # Check for description mismatch
        if '30' in description or '28' in description:
            mark_legacy_mismatch = True
        return DAYS_7, certain=True, mismatch_note

    # 4. Description patterns (medium confidence)
    if 'letzte 7' in description or '7 tag' in description:
        return DAYS_7, certain=False

    # 5. Heuristics (low confidence)
    if 'avg' in key or 'durchschn' in key:
        return DAYS_30, certain=False, "Assumed 30d for average"

    if 'trend' in key:
        return DAYS_28, certain=False, "Assumed 28d for trend"

    # 6. Unknown
    return UNKNOWN, certain=False, "Could not determine"
```

### Legacy Mismatch Detection

If description says "7d" but semantic contract (implementation) says "28d":
- Set `time_window = DAYS_28` (actual implementation)
- Set `legacy_contract_mismatch = True`
- Add to `known_issues`: "Description says 7d but implementation is 28d"

### Rules

1. **Actual implementation** takes precedence over legacy description
2. **Suffix in key** is most reliable indicator
3. **Semantic contract** (if documented) reflects actual implementation
4. **Unknown** if cannot be determined with confidence

---

## 4. Value Raw Extraction

### Decision Logic

```python
def extract_value_raw(value_display, output_type, type):
    # No value
    if value_display in ['nicht verfügbar', '', None]:
        return None, success=True

    # JSON output
    if output_type == JSON:
        try:
            return json.loads(value_display), success=True
        except:
            # Try to find JSON in string
            match = re.search(r'(\{.*\}|\[.*\])', value_display, DOTALL)
            if match:
                try:
                    return json.loads(match.group(1)), success=True
                except:
                    pass
            return None, success=False  # Failed

    # Markdown
    if output_type == MARKDOWN:
        return value_display, success=True  # Keep as string

    # Number
    if output_type in [NUMBER, INTEGER]:
        match = re.search(r'([-+]?\d+\.?\d*)', value_display)
        if match:
            val = float(match.group(1))
            return int(val) if output_type == INTEGER else val, success=True
        return None, success=False

    # Date
    if output_type == DATE:
        if re.match(r'\d{4}-\d{2}-\d{2}', value_display):
            return value_display, success=True  # ISO format
        return value_display, success=False  # Unknown format

    # String/Enum
    return value_display, success=True
```

### Rules

1. **JSON outputs**: Must be valid JSON objects/arrays, not strings
2. **Numeric outputs**: Extract number without unit
3. **Markdown/String**: Keep as-is
4. **Dates**: Prefer ISO format (YYYY-MM-DD)
5. **Failure**: Set `value_raw = None` and mark in `unresolved_fields`

### Examples

✅ **Correct:**
- `active_goals_json` (JSON) → `{"goals": [...]}` (object)
- `weight_aktuell` (NUMBER) → `85.8` (number, no unit)
- `datum_heute` (DATE) → `"2026-03-29"` (ISO string)

❌ **Incorrect:**
- `active_goals_json` (JSON) → `"[Fehler: ...]"` (string, not JSON)
- `weight_aktuell` (NUMBER) → `"85.8"` (string, not number)
- `weight_aktuell` (NUMBER) → `85` (extracted from "85.8 kg" incorrectly)

---

## 5. Source Provenance

### Decision Logic

```python
def resolve_source(resolver_name):
    # Skip safe wrappers - not real sources
    if resolver_name in ['_safe_int', '_safe_float', '_safe_json', '_safe_str']:
        return wrapper=True, mark_unresolved

    # Known mappings
    if resolver_name in SOURCE_MAP:
        function, data_layer_module, tables, kind = SOURCE_MAP[resolver_name]
        return function, data_layer_module, tables, kind

    # Goals formatting
    if resolver_name.startswith('_format_goals'):
        return None, None, ['goals'], kind=INTERPRETED

    # Unknown
    return None, None, [], kind=UNKNOWN, mark_unresolved
```

### Source Kinds

- **direct**: Direct database read (e.g., `get_latest_weight`)
- **computed**: Calculated from data (e.g., `calculate_bmi`)
- **aggregated**: Aggregation over time/records (e.g., `get_nutrition_avg`)
- **derived**: Derived from other metrics (e.g., `protein_g_per_kg`)
- **interpreted**: AI/prompt stage output
- **wrapper**: Safe wrapper (not a real source)

### Rules

1. **Safe wrappers** (`_safe_*`) are NOT valid source functions
2. Must trace to **real data layer function** or **database table**
3. Mark as `unresolved` if cannot trace to real source

---

## 6. Used By Tracking

### Decision Logic

```python
def track_usage(placeholder_key, ai_prompts_table):
    used_by = UsedBy(prompts=[], pipelines=[], charts=[])

    for prompt in ai_prompts_table:
        # Check template
        if placeholder_key in prompt.template:
            if prompt.type == 'pipeline':
                used_by.pipelines.append(prompt.name)
            else:
                used_by.prompts.append(prompt.name)

        # Check stages
        for stage in prompt.stages:
            for stage_prompt in stage.prompts:
                if placeholder_key in stage_prompt.template:
                    used_by.pipelines.append(prompt.name)

    # Check charts (future)
    # if placeholder_key in chart_endpoints:
    #     used_by.charts.append(chart_name)

    return used_by
```

### Orphaned Detection

If `used_by.prompts` + `used_by.pipelines` + `used_by.charts` are all empty:
- Set `orphaned_placeholder = True`
- Consider for deprecation

---

## 7. Quality Filter Policy (Activity Placeholders)

### Decision Logic

```python
def create_quality_policy(key):
    # Activity-related placeholders need quality policies
    if any(x in key for x in ['activity', 'training', 'load', 'volume', 'ability']):
        return QualityFilterPolicy(
            enabled=True,
            default_filter_level="quality",  # quality | acceptable | all
            null_quality_handling="exclude",  # exclude | include_as_uncategorized
            includes_poor=False,
            includes_excluded=False,
            notes="Filters for quality='quality' by default. NULL quality excluded."
        )
    return None
```

### Rules

1. **Activity metrics** require quality filter policies
2. **Default filter**: `quality='quality'` (acceptable and above)
3. **NULL handling**: Excluded by default
4. **Poor quality**: Not included unless explicit
5. **Excluded**: Not included

---

## 8. Confidence Logic

### Decision Logic

```python
def create_confidence_logic(key, data_layer_module):
    # Data layer functions have confidence
    if data_layer_module:
        return ConfidenceLogic(
            supported=True,
            calculation="Based on data availability and thresholds",
            thresholds={"min_data_points": 1},
            notes=f"Determined by {data_layer_module}"
        )

    # Scores
    if 'score' in key:
        return ConfidenceLogic(
            supported=True,
            calculation="Based on data completeness for components",
            notes="Correlates with input data availability"
        )

    # Correlations
    if 'correlation' in key:
        return ConfidenceLogic(
            supported=True,
            calculation="Pearson correlation with significance",
            thresholds={"min_data_points": 7}
        )

    return None
```

### Rules

1. **Data layer placeholders**: Have confidence logic
2. **Scores**: Confidence correlates with data availability
3. **Correlations**: Require minimum data points
4. **Simple lookups**: May not need confidence logic

---

## 9. Metadata Completeness Score

### Calculation

```python
def calculate_completeness(metadata):
    score = 0

    # Required fields (30 points)
    if category != 'Unknown': score += 5
    if description and 'No description' not in description: score += 5
    if semantic_contract: score += 10
    if source.resolver != 'unknown': score += 10

    # Type specification (20 points)
    if type != 'legacy_unknown': score += 10
    if time_window != 'unknown': score += 10

    # Output specification (20 points)
    if output_type != 'unknown': score += 10
    if format_hint: score += 10

    # Source provenance (20 points)
    if source.data_layer_module: score += 10
    if source.source_tables: score += 10

    # Quality policies (10 points)
    if quality_filter_policy: score += 5
    if confidence_logic: score += 5

    return min(score, 100)
```

### Schema Status

Based on completeness score:
- **90-100%** + no unresolved → `validated`
- **50-89%** → `draft`
- **0-49%** → `incomplete`

---

## 10. Validation Tests

### Required Tests

```python
def test_value_raw_extraction():
    # Test each output_type
    assert extract_value_raw('{"key": "val"}', JSON) == {"key": "val"}
    assert extract_value_raw('85.8 kg', NUMBER) == 85.8
    assert extract_value_raw('2026-03-29', DATE) == '2026-03-29'

def test_unit_inference():
    # No units for scores
    assert infer_unit('goal_progress_score', ..., NUMBER) == None

    # Correct units for measurements
    assert infer_unit('weight_aktuell', ..., NUMBER) == 'kg'

    # No units for JSON
    assert infer_unit('active_goals_json', ..., JSON) == None

def test_time_window_detection():
    # Explicit suffix
    assert detect_time_window('weight_7d_median', ...) == DAYS_7

    # Latest
    assert detect_time_window('weight_aktuell', ...) == LATEST

    # Legacy mismatch detection
    tw, mismatch = detect_time_window('weight_trend', desc='7d', contract='28d')
    assert tw == DAYS_28
    assert mismatch == True

def test_source_provenance():
    # Skip wrappers
    assert resolve_source('_safe_int') == (None, None, [], 'wrapper')

    # Real sources
    func, module, tables, kind = resolve_source('get_latest_weight')
    assert func == 'get_latest_weight_data'
    assert module == 'body_metrics'
    assert 'weight_log' in tables

def test_quality_filter_for_activity():
    # Activity placeholders need quality filter
    policy = create_quality_policy('activity_summary')
    assert policy is not None
    assert policy.default_filter_level == "quality"

    # Non-activity placeholders don't
    policy = create_quality_policy('weight_aktuell')
    assert policy is None
```

---

## 11. Continuous Validation

### Pre-Commit Checks

```bash
# Run validation before commit
python backend/generate_complete_metadata_v2.py

# Check for errors
if QA report shows high failure rate:
    FAIL commit
```

### CI/CD Integration

```yaml
- name: Validate Placeholder Metadata
  run: |
    python backend/generate_complete_metadata_v2.py
    python backend/tests/test_placeholder_metadata_v2.py
```

---

## Summary

This validation logic ensures:
1. **Reproducible**: Same input → same output
2. **Testable**: All logic has unit tests
3. **Auditable**: Clear decision paths
4. **Conservative**: Prefer `unknown` over wrong guesses
5. **Normative**: Actual implementation > legacy description