- Added permissions for editing and deleting CSV field mappings. - Created type converter for CSV cells to handle various data types. - Implemented database migrations for CSV field mappings and import logs. - Seeded initial system templates for nutrition and activity data imports. - Developed admin endpoints for managing system CSV templates. - Introduced user endpoints for CSV import analysis and mapping retrieval. - Added tests for core CSV parser functionalities, including delimiter detection and value conversion.
1036 lines
38 KiB
Markdown
1036 lines
38 KiB
Markdown
# Issue #21: Universeller CSV-Parser – Anforderungsanalyse & Konzept
|
||
|
||
**Stand:** 2026-04-09
|
||
**Autor:** Claude Code Agent
|
||
**Status:** Konzeptphase (Wartet auf User-Approval)
|
||
|
||
---
|
||
|
||
## 1. Ausgangslage
|
||
|
||
### 1.1 Bestehende CSV-Import-Implementierungen
|
||
|
||
Aktuell existieren **4 separate CSV-Import-Funktionen**:
|
||
|
||
| Modul | Datei | Format | Besonderheiten |
|
||
|-------|-------|--------|----------------|
|
||
| **Nutrition** | `nutrition.py:34` | FDDB | Delimiter `;`, hardcoded Spalten, Aggregierung nach Tag |
|
||
| **Activity** | `activity.py:344` | Apple Health | **Lernendes Mapping** via `activity_type_mappings`, Update-or-Insert |
|
||
| **Blood Pressure** | `blood_pressure.py:293` | Omron | Multiple Spaltennamen-Varianten (DE/EN), Context-Tagging |
|
||
| **ZIP Import** | `importdata.py:30` | Eigenes Format | Profile.json + CSV-Bundle |
|
||
|
||
### 1.2 Gemeinsame Patterns (bereits vorhanden)
|
||
|
||
✅ **Encoding-Detection:**
|
||
```python
|
||
try: text = raw.decode('utf-8')
|
||
except: text = raw.decode('latin-1')
|
||
if text.startswith('\ufeff'): text = text[1:] # BOM-Handling
|
||
```
|
||
|
||
✅ **Duplikat-Erkennung:**
|
||
- Nutrition: `ON CONFLICT (profile_id, date) DO UPDATE`
|
||
- Activity: `SELECT WHERE profile_id=%s AND date=%s AND start_time=%s`
|
||
- Blood Pressure: Timestamp-basiert
|
||
|
||
✅ **Type-Conversion** (scattered):
|
||
- Datumsformate: FDDB (`dd.mm.yyyy`), Apple Health (ISO), Omron (mehrere)
|
||
- Dezimaltrennzeichen: `,` → `.`
|
||
- Einheiten: kJ → kcal
|
||
|
||
❌ **Fehlende Patterns:**
|
||
- Kein **einheitliches Mapping-System** (außer Activity)
|
||
- Kein **User-Interface für Mapping-Anpassung**
|
||
- Keine **automatische Format-Erkennung**
|
||
- Keine **Vorschläge für unbekannte Spalten**
|
||
|
||
---
|
||
|
||
## 2. Anforderungen (aus User-Request)
|
||
|
||
### 2.1 Funktionale Anforderungen
|
||
|
||
| # | Anforderung | Priorität |
|
||
|---|-------------|-----------|
|
||
| **F1** | **Universeller Parser:** Ein Parser für alle Module (Nutrition, Activity, Weight, Circumference, Caliper, Vitals, Sleep) | MUST |
|
||
| **F2** | **Lernendes System:** Automatische Erkennung bekannter CSV-Strukturen basierend auf Spalten-Signaturen | MUST |
|
||
| **F3** | **User-anpassbares Mapping:** UI zur manuellen Zuordnung von CSV-Spalten zu DB-Feldern | MUST |
|
||
| **F4** | **Intelligente Vorschläge:** System schlägt Mappings vor basierend auf Spalten-Namen, Sample-Daten, Statistiken | SHOULD |
|
||
| **F5** | **Type-Conversion:** Automatische Konvertierung von Datumsformaten, Dezimaltrennzeichen, Text→Zahl, Einheiten | MUST |
|
||
| **F6** | **Mapping-Persistenz:** Gespeicherte Mappings können wiederverwendet werden (pro User, pro Modul, global) | MUST |
|
||
| **F7** | **Format-Templates:** Vordefinierte Templates für bekannte Formate (FDDB, Apple Health, Omron, Garmin, etc.) | SHOULD |
|
||
| **F8** | **Validierung:** Vor-Import-Validierung mit Fehler-Report und Preview (erste 5 Zeilen) | SHOULD |
|
||
| **F9** | **Rollback:** Fehlerhafte Imports können rückgängig gemacht werden | NICE |
|
||
|
||
### 2.2 Nicht-funktionale Anforderungen
|
||
|
||
| # | Anforderung | Priorität |
|
||
|---|-------------|-----------|
|
||
| **NF1** | **Backward-Kompatibilität:** Bestehende CSV-Import-Endpoints bleiben funktionsfähig (Wrapper um neuen Parser) | MUST |
|
||
| **NF2** | **Performance:** Import von 1000 Zeilen < 5 Sekunden | SHOULD |
|
||
| **NF3** | **Erweiterbarkeit:** Neue Module/Felder können ohne Code-Änderung hinzugefügt werden (Registry-Pattern) | MUST |
|
||
| **NF4** | **Security:** User können nur eigene Mappings sehen/ändern (außer Admin) | MUST |
|
||
|
||
---
|
||
|
||
## 3. Datenmodell
|
||
|
||
### 3.1 Neue DB-Tabellen
|
||
|
||
#### **`csv_field_mappings`** (Zentrale Mapping-Registry)
|
||
|
||
```sql
|
||
CREATE TABLE csv_field_mappings (
|
||
id SERIAL PRIMARY KEY,
|
||
profile_id INTEGER REFERENCES profiles(id), -- NULL = System-Template
|
||
is_system BOOLEAN DEFAULT false, -- true = read-only Template
|
||
module VARCHAR(50) NOT NULL, -- 'nutrition', 'activity', etc.
|
||
mapping_name VARCHAR(100) NOT NULL, -- "FDDB Export", "Apple Health"
|
||
description TEXT, -- "Standard-Format für FDDB CSV-Exporte"
|
||
|
||
-- CSV-Signatur (für Auto-Detection)
|
||
column_signature TEXT[], -- Spalten-Namen (sortiert, normalisiert)
|
||
delimiter VARCHAR(10) DEFAULT ',', -- CSV-Delimiter
|
||
encoding VARCHAR(20) DEFAULT 'utf-8',
|
||
has_header BOOLEAN DEFAULT true,
|
||
|
||
-- Mapping-Definition (JSONB)
|
||
field_mappings JSONB NOT NULL, -- { "csv_column": "db_field" }
|
||
type_conversions JSONB, -- { "db_field": {"type": "date", "format": "dd.mm.yyyy"} }
|
||
|
||
-- Statistik (für Ranking)
|
||
usage_count INTEGER DEFAULT 0,
|
||
last_used_at TIMESTAMP,
|
||
success_rate FLOAT DEFAULT 1.0, -- Erfolgreiche Imports / Gesamt
|
||
|
||
created_at TIMESTAMP DEFAULT NOW(),
|
||
updated_at TIMESTAMP DEFAULT NOW(),
|
||
|
||
UNIQUE(profile_id, module, mapping_name),
|
||
CHECK (
|
||
-- System-Templates haben profile_id = NULL
|
||
(is_system = true AND profile_id IS NULL) OR
|
||
(is_system = false AND profile_id IS NOT NULL) OR
|
||
(is_system = false AND profile_id IS NULL)
|
||
)
|
||
);
|
||
|
||
CREATE INDEX idx_csv_mappings_lookup ON csv_field_mappings(module, profile_id);
|
||
CREATE INDEX idx_csv_mappings_signature ON csv_field_mappings USING GIN (column_signature);
|
||
CREATE INDEX idx_csv_mappings_system ON csv_field_mappings(is_system, module) WHERE is_system = true;
|
||
|
||
COMMENT ON TABLE csv_field_mappings IS 'Mapping-Registry: System-Templates (is_system=true) + User-Mappings (profile_id NOT NULL)';
|
||
COMMENT ON COLUMN csv_field_mappings.is_system IS 'System-Templates sind read-only und für alle User verfügbar';
|
||
COMMENT ON COLUMN csv_field_mappings.profile_id IS 'NULL = System-Template, NOT NULL = User-spezifisches Mapping';
|
||
```
|
||
|
||
**Beispiel-Entries:**
|
||
|
||
**System-Template (für alle User verfügbar):**
|
||
```json
|
||
{
|
||
"id": 1,
|
||
"profile_id": null,
|
||
"is_system": true,
|
||
"module": "nutrition",
|
||
"mapping_name": "FDDB Export (Standard)",
|
||
"description": "Standard-Format für FDDB.de CSV-Exporte (Deutsch)",
|
||
"column_signature": ["datum_tag_monat_jahr_stunde_minute", "fett_g", "kh_g", "kj", "protein_g"],
|
||
"delimiter": ";",
|
||
"encoding": "utf-8",
|
||
"has_header": true,
|
||
"field_mappings": {
|
||
"datum_tag_monat_jahr_stunde_minute": "date",
|
||
"kj": "kcal",
|
||
"fett_g": "fat_g",
|
||
"kh_g": "carbs_g",
|
||
"protein_g": "protein_g"
|
||
},
|
||
"type_conversions": {
|
||
"date": {
|
||
"type": "date",
|
||
"format": "dd.mm.yyyy HH:MM",
|
||
"extract": "date_only"
|
||
},
|
||
"kcal": {
|
||
"type": "float",
|
||
"source_unit": "kJ",
|
||
"target_unit": "kcal",
|
||
"conversion_factor": 0.239
|
||
},
|
||
"fat_g": {
|
||
"type": "float",
|
||
"decimal_separator": ","
|
||
}
|
||
},
|
||
"usage_count": 1523,
|
||
"success_rate": 0.99
|
||
}
|
||
```
|
||
|
||
**User-spezifisches Mapping (nur für User ID 42):**
|
||
```json
|
||
{
|
||
"id": 123,
|
||
"profile_id": 42,
|
||
"is_system": false,
|
||
"module": "nutrition",
|
||
"mapping_name": "Mein FDDB Export (angepasst)",
|
||
"description": "FDDB Export mit Notiz-Spalte",
|
||
"column_signature": ["datum_tag_monat_jahr_stunde_minute", "fett_g", "kh_g", "kj", "protein_g", "notiz"],
|
||
"delimiter": ";",
|
||
"encoding": "utf-8",
|
||
"has_header": true,
|
||
"field_mappings": {
|
||
"datum_tag_monat_jahr_stunde_minute": "date",
|
||
"kj": "kcal",
|
||
"fett_g": "fat_g",
|
||
"kh_g": "carbs_g",
|
||
"protein_g": "protein_g",
|
||
"notiz": "note"
|
||
},
|
||
"type_conversions": {
|
||
"date": {
|
||
"type": "date",
|
||
"format": "dd.mm.yyyy HH:MM",
|
||
"extract": "date_only"
|
||
},
|
||
"kcal": {
|
||
"type": "float",
|
||
"source_unit": "kJ",
|
||
"target_unit": "kcal",
|
||
"conversion_factor": 0.239
|
||
}
|
||
},
|
||
"usage_count": 8,
|
||
"success_rate": 1.0
|
||
}
|
||
```
|
||
|
||
#### **`csv_import_log`** (Import-Historie für Rollback)
|
||
|
||
```sql
|
||
CREATE TABLE csv_import_log (
|
||
id SERIAL PRIMARY KEY,
|
||
profile_id INTEGER REFERENCES profiles(id),
|
||
mapping_id INTEGER REFERENCES csv_field_mappings(id),
|
||
module VARCHAR(50) NOT NULL,
|
||
|
||
filename VARCHAR(255),
|
||
rows_total INTEGER,
|
||
rows_imported INTEGER,
|
||
rows_updated INTEGER,
|
||
rows_skipped INTEGER,
|
||
rows_errors INTEGER,
|
||
|
||
error_details JSONB, -- [{"row": 5, "error": "Invalid date"}]
|
||
|
||
started_at TIMESTAMP DEFAULT NOW(),
|
||
finished_at TIMESTAMP,
|
||
status VARCHAR(20) DEFAULT 'running', -- 'running', 'success', 'failed'
|
||
|
||
-- Für Rollback
|
||
affected_ids JSONB -- {"nutrition_log": [123, 456, ...]}
|
||
);
|
||
|
||
CREATE INDEX idx_csv_import_profile ON csv_import_log(profile_id, module);
|
||
```
|
||
|
||
### 3.2 System-Templates (Seed-Data)
|
||
|
||
**Bei Installation/Migration werden folgende System-Templates angelegt:**
|
||
|
||
#### **Nutrition (Ernährung)**
|
||
|
||
1. **FDDB Export (Standard)**
|
||
- Delimiter: `;`
|
||
- Encoding: `utf-8`
|
||
- Spalten: `datum_tag_monat_jahr_stunde_minute`, `kj`, `fett_g`, `kh_g`, `protein_g`
|
||
- Besonderheit: kJ → kcal Konvertierung
|
||
|
||
2. **MyFitnessPal Export**
|
||
- Delimiter: `,`
|
||
- Encoding: `utf-8`
|
||
- Spalten: `Date`, `Calories`, `Carbohydrates (g)`, `Fat (g)`, `Protein (g)`
|
||
|
||
3. **Cronometer Export**
|
||
- Delimiter: `,`
|
||
- Encoding: `utf-8`
|
||
- Spalten: `Day`, `Energy (kcal)`, `Protein (g)`, `Net Carbs (g)`, `Fat (g)`
|
||
|
||
#### **Activity (Aktivität)**
|
||
|
||
1. **Apple Health Workout Export (English)**
|
||
- Delimiter: `,`
|
||
- Encoding: `utf-8`
|
||
- Spalten: `Workout Type`, `Start`, `End`, `Duration`, `Distance (km)`, `Active Energy (kcal)`, `Heart Rate Average (bpm)`
|
||
- Besonderheit: Automatisches Training-Type-Mapping
|
||
|
||
2. **Apple Health Workout Export (Deutsch)**
|
||
- Delimiter: `,`
|
||
- Encoding: `utf-8`
|
||
- Spalten: `Trainingsart`, `Start`, `Ende`, `Dauer`, `Strecke (km)`, `Aktive Energie (kcal)`, `Durchschnittliche Herzfrequenz (bpm)`
|
||
|
||
3. **Garmin Connect Export**
|
||
- Delimiter: `,`
|
||
- Encoding: `utf-8`
|
||
- Spalten: `Activity Type`, `Date`, `Time`, `Duration`, `Distance`, `Calories`, `Avg HR`
|
||
|
||
#### **Blood Pressure (Blutdruck)**
|
||
|
||
1. **Omron Export (Deutsch)**
|
||
- Delimiter: `,`
|
||
- Encoding: `utf-8`
|
||
- Spalten: `Datum`, `Zeit`, `Systolisch (mmHg)`, `Diastolisch (mmHg)`, `Puls (bpm)`
|
||
|
||
2. **Omron Export (English)**
|
||
- Delimiter: `,`
|
||
- Encoding: `utf-8`
|
||
- Spalten: `Date`, `Time`, `Systolic (mmHg)`, `Diastolic (mmHg)`, `Pulse (bpm)`
|
||
|
||
#### **Vitals (Vitalwerte)**
|
||
|
||
1. **Apple Health Vitals Export**
|
||
- Delimiter: `,`
|
||
- Encoding: `utf-8`
|
||
- Spalten: `Start`, `Resting Heart Rate (bpm)`, `Heart Rate Variability (ms)`, `Respiratory Rate (breaths/min)`, `Oxygen Saturation (%)`
|
||
|
||
#### **Weight (Gewicht)**
|
||
|
||
1. **Apple Health Weight Export**
|
||
- Delimiter: `,`
|
||
- Encoding: `utf-8`
|
||
- Spalten: `Start`, `Body Mass (kg)`
|
||
|
||
2. **Withings Export**
|
||
- Delimiter: `,`
|
||
- Encoding: `utf-8`
|
||
- Spalten: `Date`, `Weight (kg)`, `Body Fat (%)`, `Muscle Mass (kg)`
|
||
|
||
**GESAMT:** ~12-15 System-Templates initial
|
||
|
||
**Migration:** `backend/migrations/XXX_csv_parser_seed_templates.sql`
|
||
|
||
### 3.3 Modul-Registry (Backend Code)
|
||
|
||
**`backend/csv_parser/module_registry.py`**
|
||
|
||
Definiert für jedes Modul:
|
||
- Verfügbare DB-Felder
|
||
- Datentypen
|
||
- Validierung
|
||
- Erforderliche Felder
|
||
- Duplikat-Strategie
|
||
|
||
```python
|
||
MODULE_DEFINITIONS = {
|
||
"nutrition": {
|
||
"table": "nutrition_log",
|
||
"fields": {
|
||
"date": {"type": "date", "required": True},
|
||
"kcal": {"type": "float", "required": True, "min": 0, "max": 10000},
|
||
"protein_g": {"type": "float", "required": False, "min": 0},
|
||
"fat_g": {"type": "float", "required": False, "min": 0},
|
||
"carbs_g": {"type": "float", "required": False, "min": 0},
|
||
"note": {"type": "string", "required": False, "max_length": 500}
|
||
},
|
||
"duplicate_key": ["profile_id", "date"], # ON CONFLICT
|
||
"duplicate_strategy": "update" # "update" | "skip" | "error"
|
||
},
|
||
"activity": {
|
||
"table": "activity_log",
|
||
"fields": {
|
||
"date": {"type": "date", "required": True},
|
||
"start_time": {"type": "time", "required": False},
|
||
"activity_type": {"type": "string", "required": True},
|
||
"duration_min": {"type": "float", "required": True, "min": 0},
|
||
"kcal_active": {"type": "float", "required": False},
|
||
"distance_km": {"type": "float", "required": False},
|
||
"hr_avg": {"type": "int", "required": False, "min": 30, "max": 220}
|
||
},
|
||
"duplicate_key": ["profile_id", "date", "start_time"],
|
||
"duplicate_strategy": "update"
|
||
},
|
||
# ... weitere Module
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## 4. Architektur
|
||
|
||
### 4.1 System-Komponenten
|
||
|
||
```
|
||
┌─────────────────────────────────────────────────────────────┐
|
||
│ Frontend (React) │
|
||
├─────────────────────────────────────────────────────────────┤
|
||
│ 1. CSV-Upload-Komponente │
|
||
│ - Datei-Upload + Format-Detection │
|
||
│ - Preview (erste 5 Zeilen) │
|
||
│ │
|
||
│ 2. Mapping-Editor │
|
||
│ - Spalten-zu-Feld-Zuordnung (Drag & Drop) │
|
||
│ - Type-Conversion-Konfiguration │
|
||
│ - Vorschau der konvertierten Werte │
|
||
│ │
|
||
│ 3. Mapping-Bibliothek │
|
||
│ - Gespeicherte Mappings anzeigen/auswählen │
|
||
│ - Templates (FDDB, Apple Health, etc.) │
|
||
└─────────────────────────────────────────────────────────────┘
|
||
│
|
||
▼
|
||
┌─────────────────────────────────────────────────────────────┐
|
||
│ Backend (FastAPI) │
|
||
├─────────────────────────────────────────────────────────────┤
|
||
│ 1. CSV-Parser-Engine │
|
||
│ - Encoding-Detection (UTF-8, Latin-1, etc.) │
|
||
│ - Delimiter-Detection (`,` `;` `\t`) │
|
||
│ - Column-Signature-Berechnung │
|
||
│ │
|
||
│ 2. Mapping-Engine │
|
||
│ - Auto-Detection (Spalten → bekannte Mappings) │
|
||
│ - Intelligent Suggestions (Fuzzy-Match, Sample-Analyse) │
|
||
│ - Mapping-Persistenz (DB speichern/laden) │
|
||
│ │
|
||
│ 3. Type-Converter │
|
||
│ - Date-Parser (20+ Formate) │
|
||
│ - Number-Parser (Dezimaltrennzeichen, Tausender) │
|
||
│ - Unit-Converter (kJ↔kcal, km↔mi, etc.) │
|
||
│ - Text-Normalizer (Trim, Lowercase, etc.) │
|
||
│ │
|
||
│ 4. Validator │
|
||
│ - Type-Validation (INT, FLOAT, DATE, etc.) │
|
||
│ - Range-Validation (min/max) │
|
||
│ - Required-Field-Check │
|
||
│ - Custom-Validators pro Modul │
|
||
│ │
|
||
│ 5. Import-Executor │
|
||
│ - Batch-Insert mit Transaction │
|
||
│ - Duplikat-Handling (Update/Skip/Error) │
|
||
│ - Rollback bei Fehler │
|
||
│ - Progress-Tracking (für große Files) │
|
||
└─────────────────────────────────────────────────────────────┘
|
||
│
|
||
▼
|
||
┌─────────────────────────────────────────────────────────────┐
|
||
│ PostgreSQL │
|
||
├─────────────────────────────────────────────────────────────┤
|
||
│ - csv_field_mappings (Mapping-Registry) │
|
||
│ - csv_import_log (Import-Historie) │
|
||
│ - nutrition_log, activity_log, ... (Daten-Tabellen) │
|
||
└─────────────────────────────────────────────────────────────┘
|
||
```
|
||
|
||
### 4.2 Workflow (Happy Path)
|
||
|
||
```
|
||
1. User wählt Datei
|
||
↓
|
||
2. Frontend: POST /api/csv/analyze
|
||
- Datei hochladen
|
||
- Backend: Encoding + Delimiter erkennen
|
||
- Backend: Column-Signature berechnen
|
||
- Backend: Auto-Detection
|
||
1. Suche in User-Mappings (profile_id = current_user)
|
||
2. Suche in System-Templates (is_system = true)
|
||
↓
|
||
3. Backend antwortet:
|
||
{
|
||
"detected_mapping": {
|
||
"id": 1,
|
||
"name": "FDDB Export (Standard)",
|
||
"is_system": true,
|
||
"confidence": 0.98,
|
||
"match_type": "exact_signature"
|
||
},
|
||
"columns": ["date", "kcal", "protein"],
|
||
"sample_rows": [...],
|
||
"suggestions": {
|
||
"date": ["date", "created_at"], // Vorschläge
|
||
"kcal": ["kcal", "energy"]
|
||
}
|
||
}
|
||
↓
|
||
4. Frontend: Mapping-Editor
|
||
- User sieht: "System-Template erkannt: FDDB Export (Standard)"
|
||
- User kann Mapping anpassen (erstellt dann automatisch User-Copy)
|
||
- User testet Type-Conversion (Preview)
|
||
↓
|
||
5. Frontend: POST /api/csv/import
|
||
{
|
||
"mapping_id": 1, // Verwende bestehendes Mapping, ODER:
|
||
"mapping": {...}, // Custom-Mapping
|
||
"module": "nutrition",
|
||
"save_mapping": true, // Als User-Mapping speichern?
|
||
"mapping_name": "MyFitnessPal Export"
|
||
}
|
||
↓
|
||
6. Backend: Import ausführen
|
||
- Validierung
|
||
- Transaction starten
|
||
- Row-by-Row importieren
|
||
- Bei Fehler: Rollback
|
||
- Bei Erfolg: usage_count++ für verwendetes Mapping
|
||
↓
|
||
7. Backend: Antwort
|
||
{
|
||
"success": true,
|
||
"imported": 100,
|
||
"updated": 5,
|
||
"skipped": 2,
|
||
"errors": [{"row": 7, "error": "Invalid date"}],
|
||
"import_log_id": 456 // Für Rollback
|
||
}
|
||
```
|
||
|
||
### 4.3 System-Templates vs. User-Mappings
|
||
|
||
**Hierarchie (Auto-Detection-Reihenfolge):**
|
||
|
||
1. **User-Mappings** (profile_id = current_user)
|
||
- Höchste Priorität
|
||
- Exact Match → sofort verwenden
|
||
- Partial Match → als Vorschlag
|
||
|
||
2. **System-Templates** (is_system = true, profile_id = NULL)
|
||
- Fallback wenn kein User-Mapping passt
|
||
- Read-only (User kann nicht ändern)
|
||
- User kann aber **Kopie erstellen** und anpassen
|
||
|
||
**Permissions:**
|
||
|
||
| Aktion | User-Mappings | System-Templates |
|
||
|--------|---------------|------------------|
|
||
| **Anzeigen** | ✅ Eigene | ✅ Alle |
|
||
| **Verwenden** | ✅ Eigene | ✅ Alle |
|
||
| **Erstellen** | ✅ Ja | ❌ Nur Admin/Migration |
|
||
| **Ändern** | ✅ Eigene | ❌ Nein (Kopie erstellen) |
|
||
| **Löschen** | ✅ Eigene | ❌ Nein |
|
||
| **Kopieren** | ✅ Ja | ✅ Ja → User-Mapping |
|
||
|
||
**Workflow "System-Template anpassen":**
|
||
|
||
```
|
||
User wählt System-Template "FDDB Export (Standard)"
|
||
→ User ändert Mapping (z.B. fügt Spalte hinzu)
|
||
→ Frontend fragt: "System-Template kann nicht geändert werden.
|
||
Kopie erstellen? [Ja] [Abbrechen]"
|
||
→ User klickt [Ja]
|
||
→ Neue User-Mapping mit is_system=false, profile_id=current_user
|
||
```
|
||
|
||
---
|
||
|
||
## 5. Intelligente Features
|
||
|
||
### 5.1 Auto-Detection (Spalten-Signatur-Matching)
|
||
|
||
**Algorithmus:**
|
||
|
||
1. **Exakte Signatur:** Spalten-Namen (normalisiert, sortiert) → 100% Match
|
||
```
|
||
["date", "kcal", "protein_g"] → Mapping-ID 123
|
||
```
|
||
|
||
2. **Partial Match:** ≥70% Überlappung → Vorschlag
|
||
```
|
||
CSV: ["date", "calories", "protein"]
|
||
DB: ["date", "kcal", "protein_g"]
|
||
→ Match: 66% → Mapping-ID 123 als Vorschlag
|
||
```
|
||
|
||
3. **Fuzzy-Match:** Levenshtein-Distanz < 3
|
||
```
|
||
"Datum" → "date" (Distance: 3)
|
||
"Kalorien" → "kcal" (keine exakte Match)
|
||
```
|
||
|
||
### 5.2 Intelligente Vorschläge
|
||
|
||
**Sample-basierte Analyse:**
|
||
|
||
1. **Date-Detection:** Regex-Patterns für 20+ Formate
|
||
```python
|
||
SAMPLES = ["01.01.2024", "02.01.2024", "03.01.2024"]
|
||
→ Pattern: dd.mm.yyyy
|
||
→ Vorschlag: Spalte "Datum" → Feld "date"
|
||
```
|
||
|
||
2. **Number-Detection:** Statistik über Sample-Werte
|
||
```python
|
||
SAMPLES = ["1500,5", "2000,3", "1800,0"]
|
||
→ Decimal-Separator: ","
|
||
→ Range: 1000-3000 → passt zu "kcal"
|
||
```
|
||
|
||
3. **Unit-Detection:** Keyword-Search in Spalten-Namen
|
||
```python
|
||
"Active Energy (kJ)" → Einheit: kJ → Feld: kcal (mit Conversion)
|
||
```
|
||
|
||
### 5.3 Type-Conversion (20+ Formate)
|
||
|
||
**Date-Formate:**
|
||
```python
|
||
DATE_PATTERNS = [
|
||
"%Y-%m-%d", # 2024-01-15 (ISO)
|
||
"%d.%m.%Y", # 15.01.2024 (DE)
|
||
"%d/%m/%Y", # 15/01/2024 (UK)
|
||
"%m/%d/%Y", # 01/15/2024 (US)
|
||
"%Y-%m-%d %H:%M:%S", # Full timestamp
|
||
"%d.%m.%Y %H:%M", # FDDB format
|
||
# ... 15 weitere
|
||
]
|
||
```
|
||
|
||
**Number-Conversion:**
|
||
```python
|
||
def parse_number(value: str, decimal_sep=',', thousands_sep='.') -> float:
|
||
# "1.500,50" → 1500.50
|
||
value = value.replace(thousands_sep, '')
|
||
value = value.replace(decimal_sep, '.')
|
||
return float(value)
|
||
```
|
||
|
||
**Unit-Conversion:**
|
||
```python
|
||
UNIT_CONVERSIONS = {
|
||
("kJ", "kcal"): lambda x: x / 4.184,
|
||
("kcal", "kJ"): lambda x: x * 4.184,
|
||
("km", "mi"): lambda x: x * 0.621371,
|
||
("mi", "km"): lambda x: x * 1.60934,
|
||
("kg", "lb"): lambda x: x * 2.20462,
|
||
("lb", "kg"): lambda x: x * 0.453592,
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## 6. API-Endpoints
|
||
|
||
### 6.1 Neue Endpoints
|
||
|
||
#### **POST /api/csv/analyze**
|
||
|
||
Analysiert hochgeladene CSV-Datei und schlägt Mappings vor.
|
||
|
||
**Request:**
|
||
```
|
||
Content-Type: multipart/form-data
|
||
|
||
file: <csv-file>
|
||
module: "nutrition"
|
||
```
|
||
|
||
**Response:**
|
||
```json
|
||
{
|
||
"encoding": "utf-8",
|
||
"delimiter": ";",
|
||
"columns": ["Datum", "Kalorien (kJ)", "Protein (g)", "Fett (g)"],
|
||
"sample_rows": [
|
||
{"Datum": "01.01.2024", "Kalorien (kJ)": "8000", "Protein (g)": "80", "Fett (g)": "60"},
|
||
{"Datum": "02.01.2024", "Kalorien (kJ)": "9000", "Protein (g)": "90", "Fett (g)": "70"}
|
||
],
|
||
"detected_mappings": [
|
||
{
|
||
"mapping_id": 123,
|
||
"mapping_name": "FDDB Export",
|
||
"confidence": 0.95,
|
||
"match_type": "exact_signature"
|
||
}
|
||
],
|
||
"suggestions": {
|
||
"Datum": {
|
||
"suggested_field": "date",
|
||
"confidence": 0.98,
|
||
"type": "date",
|
||
"detected_format": "dd.mm.yyyy",
|
||
"sample_conversions": ["2024-01-01", "2024-01-02"]
|
||
},
|
||
"Kalorien (kJ)": {
|
||
"suggested_field": "kcal",
|
||
"confidence": 0.85,
|
||
"type": "float",
|
||
"requires_conversion": true,
|
||
"source_unit": "kJ",
|
||
"target_unit": "kcal",
|
||
"sample_conversions": [1912.6, 2151.7]
|
||
}
|
||
},
|
||
"available_fields": {
|
||
"date": {"type": "date", "required": true},
|
||
"kcal": {"type": "float", "required": true, "min": 0, "max": 10000},
|
||
"protein_g": {"type": "float", "required": false},
|
||
"fat_g": {"type": "float", "required": false},
|
||
"carbs_g": {"type": "float", "required": false}
|
||
}
|
||
}
|
||
```
|
||
|
||
#### **POST /api/csv/import**
|
||
|
||
Führt Import mit bestätigtem Mapping aus.
|
||
|
||
**Request:**
|
||
```json
|
||
{
|
||
"file_data": "<base64-encoded-csv>", // Oder file_id aus /analyze
|
||
"module": "nutrition",
|
||
"mapping": {
|
||
"field_mappings": {
|
||
"Datum": "date",
|
||
"Kalorien (kJ)": "kcal",
|
||
"Protein (g)": "protein_g"
|
||
},
|
||
"type_conversions": {
|
||
"date": {"type": "date", "format": "dd.mm.yyyy"},
|
||
"kcal": {"type": "float", "source_unit": "kJ", "conversion_factor": 0.239}
|
||
}
|
||
},
|
||
"save_mapping": true,
|
||
"mapping_name": "FDDB Export 2024"
|
||
}
|
||
```
|
||
|
||
**Response:**
|
||
```json
|
||
{
|
||
"success": true,
|
||
"import_log_id": 456,
|
||
"stats": {
|
||
"total_rows": 100,
|
||
"imported": 95,
|
||
"updated": 3,
|
||
"skipped": 2,
|
||
"errors": 0
|
||
},
|
||
"error_details": [],
|
||
"duration_ms": 1234
|
||
}
|
||
```
|
||
|
||
#### **GET /api/csv/mappings**
|
||
|
||
Liste gespeicherter Mappings (User + System-Templates).
|
||
|
||
**Query-Params:**
|
||
- `module`: Filter nach Modul (optional)
|
||
|
||
**Response:**
|
||
```json
|
||
{
|
||
"system_templates": [
|
||
{
|
||
"id": 1,
|
||
"module": "nutrition",
|
||
"name": "FDDB Export (Standard)",
|
||
"description": "Standard-Format für FDDB.de CSV-Exporte",
|
||
"is_system": true,
|
||
"usage_count": 1523,
|
||
"success_rate": 0.99,
|
||
"created_at": "2024-01-01T00:00:00"
|
||
},
|
||
{
|
||
"id": 2,
|
||
"module": "activity",
|
||
"name": "Apple Health Workout Export",
|
||
"description": "Apple Health CSV-Export (English)",
|
||
"is_system": true,
|
||
"usage_count": 5043,
|
||
"success_rate": 0.98,
|
||
"created_at": "2024-01-01T00:00:00"
|
||
}
|
||
],
|
||
"user_mappings": [
|
||
{
|
||
"id": 123,
|
||
"module": "nutrition",
|
||
"name": "Mein FDDB Export (angepasst)",
|
||
"description": "FDDB mit Notizen",
|
||
"is_system": false,
|
||
"usage_count": 8,
|
||
"success_rate": 1.0,
|
||
"last_used_at": "2024-01-15T10:30:00",
|
||
"created_at": "2024-01-10T12:00:00"
|
||
}
|
||
]
|
||
}
|
||
```
|
||
|
||
**Sortierung:**
|
||
- System-Templates: nach `usage_count DESC` (beliebteste zuerst)
|
||
- User-Mappings: nach `last_used_at DESC` (neueste zuerst)
|
||
|
||
#### **POST /api/csv/mappings/{mapping_id}/copy**
|
||
|
||
Erstellt User-Kopie eines System-Templates (für Anpassungen).
|
||
|
||
**Response:**
|
||
```json
|
||
{
|
||
"new_mapping_id": 124,
|
||
"message": "Kopie erstellt: 'FDDB Export (Standard)' → 'FDDB Export (Standard) - Kopie'"
|
||
}
|
||
```
|
||
|
||
#### **DELETE /api/csv/mappings/{mapping_id}**
|
||
|
||
Löscht gespeichertes Mapping.
|
||
|
||
**Permissions:**
|
||
- User können nur **eigene** Mappings löschen (profile_id = current_user)
|
||
- System-Templates (is_system = true) können **nicht** gelöscht werden
|
||
- Admin kann alle löschen (außer System-Templates)
|
||
|
||
#### **POST /api/csv/rollback/{import_log_id}**
|
||
|
||
Macht einen Import rückgängig (löscht importierte Einträge).
|
||
|
||
**NICE-TO-HAVE:** Nur wenn Zeit bleibt.
|
||
|
||
### 6.2 Bestehende Endpoints (Wrapper)
|
||
|
||
Die bestehenden Endpoints **bleiben funktional** als dünner Wrapper:
|
||
|
||
```python
|
||
# backend/routers/nutrition.py
|
||
|
||
@router.post("/import-csv")
|
||
async def import_nutrition_csv(file: UploadFile, ...):
|
||
"""
|
||
LEGACY: FDDB-spezifischer Import (Backward-Kompatibilität).
|
||
Nutzt intern den Universal-Parser mit vordefiniertem FDDB-Template.
|
||
"""
|
||
# Wrapper um Universal-Parser:
|
||
from csv_parser import universal_import
|
||
|
||
mapping = get_predefined_mapping("nutrition", "fddb")
|
||
result = await universal_import(
|
||
file=file,
|
||
module="nutrition",
|
||
mapping=mapping,
|
||
profile_id=pid
|
||
)
|
||
|
||
# Legacy Response-Format beibehalten:
|
||
return {
|
||
"imported": result["stats"]["imported"],
|
||
"skipped": result["stats"]["skipped"]
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## 7. Frontend-UI (Skizze)
|
||
|
||
### 7.1 CSV-Upload-Seite
|
||
|
||
```
|
||
┌─────────────────────────────────────────────────────────┐
|
||
│ Daten importieren › CSV-Upload │
|
||
├─────────────────────────────────────────────────────────┤
|
||
│ │
|
||
│ Schritt 1: Datei hochladen │
|
||
│ ┌─────────────────────────────────────────────────┐ │
|
||
│ │ [📁 Datei auswählen] nutrition-export.csv │ │
|
||
│ └─────────────────────────────────────────────────┘ │
|
||
│ │
|
||
│ Schritt 2: Modul auswählen │
|
||
│ ○ Ernährung ○ Aktivität ○ Gewicht ○ Vitalwerte │
|
||
│ │
|
||
│ [Weiter →] │
|
||
└─────────────────────────────────────────────────────────┘
|
||
```
|
||
|
||
### 7.2 Mapping-Editor
|
||
|
||
```
|
||
┌──────────────────────────────────────────────────────────────┐
|
||
│ CSV-Import › Mapping bearbeiten │
|
||
├──────────────────────────────────────────────────────────────┤
|
||
│ │
|
||
│ ✓ Format erkannt: FDDB Export (95% Übereinstimmung) │
|
||
│ │
|
||
│ Spalten-Zuordnung: │
|
||
│ ┌────────────────────────────────────────────────────────┐ │
|
||
│ │ CSV-Spalte → Datenbank-Feld │ │
|
||
│ ├────────────────────────────────────────────────────────┤ │
|
||
│ │ "Datum" → [date ▼] ✓ │ │
|
||
│ │ "Kalorien (kJ)" → [kcal ▼] ⚠️ │ │
|
||
│ │ └─ Umrechnung: kJ → kcal (÷4.184) │ │
|
||
│ │ "Protein (g)" → [protein_g ▼] ✓ │ │
|
||
│ │ "Fett (g)" → [fat_g ▼] ✓ │ │
|
||
│ │ "Produkt" → [—nicht zuordnen—] │ │
|
||
│ └────────────────────────────────────────────────────────┘ │
|
||
│ │
|
||
│ Vorschau (erste 3 Zeilen): │
|
||
│ ┌────────────────────────────────────────────────────────┐ │
|
||
│ │ date │ kcal │ protein_g │ fat_g │ │ │
|
||
│ ├────────────────────────────────────────────────────────┤ │
|
||
│ │ 2024-01-01 │ 1912.6 │ 80.0 │ 60.0 │ │ │
|
||
│ │ 2024-01-02 │ 2151.7 │ 90.0 │ 70.0 │ │ │
|
||
│ └────────────────────────────────────────────────────────┘ │
|
||
│ │
|
||
│ ☐ Mapping speichern als: [FDDB Export 2024________] │
|
||
│ │
|
||
│ [← Zurück] [Import starten →] │
|
||
└──────────────────────────────────────────────────────────────┘
|
||
```
|
||
|
||
### 7.3 Import-Fortschritt
|
||
|
||
```
|
||
┌─────────────────────────────────────────────────────────┐
|
||
│ CSV-Import läuft... │
|
||
├─────────────────────────────────────────────────────────┤
|
||
│ │
|
||
│ ████████████████████░░░░░░░░ 80% (80/100 Zeilen) │
|
||
│ │
|
||
│ ✓ 75 Einträge importiert │
|
||
│ ↻ 3 Einträge aktualisiert │
|
||
│ ⊗ 2 Fehler │
|
||
│ │
|
||
│ [Abbrechen] │
|
||
└─────────────────────────────────────────────────────────┘
|
||
```
|
||
|
||
---
|
||
|
||
## 8. Implementierungs-Phasen
|
||
|
||
### Phase 1: Foundation (Woche 1) **← START HIER**
|
||
|
||
**Ziel:** Parser-Engine + Modul-Registry + System-Templates
|
||
|
||
- [ ] **Migration:**
|
||
- `XXX_csv_parser_tables.sql` – `csv_field_mappings`, `csv_import_log` Tabellen
|
||
- `XXX_csv_parser_seed_templates.sql` – 12-15 System-Templates anlegen
|
||
- [ ] **Backend:**
|
||
- `csv_parser/core.py` – Encoding/Delimiter-Detection
|
||
- `csv_parser/module_registry.py` – Modul-Definitionen
|
||
- `csv_parser/type_converter.py` – Date/Number/Unit-Converter (20+ Formate)
|
||
- `csv_parser/permissions.py` – System-Template Read-Only-Check
|
||
- [ ] **Testing:** Unit-Tests für Type-Converter + System-Template-Seed
|
||
|
||
**Output:**
|
||
- Funktionierender Parser (ohne Auto-Detection, ohne UI)
|
||
- 12-15 System-Templates in DB verfügbar
|
||
- User können Templates laden (aber nicht ändern)
|
||
|
||
---
|
||
|
||
### Phase 2: Mapping-System (Woche 2)
|
||
|
||
**Ziel:** Auto-Detection + Mapping-Persistenz
|
||
|
||
- [ ] **Backend:**
|
||
- `csv_parser/mapping_engine.py` – Auto-Detection, Fuzzy-Match
|
||
- `csv_parser/suggestions.py` – Intelligente Vorschläge
|
||
- API: `/api/csv/analyze`, `/api/csv/mappings`, `/api/csv/mappings/{id}/copy`
|
||
- [ ] **Permissions:** System-Template Read-Only-Enforcement
|
||
- [ ] **Testing:**
|
||
- Auto-Detection-Tests mit realen CSV-Files (alle System-Templates)
|
||
- User vs. System Permissions (User kann nicht System-Template ändern)
|
||
- Copy-Workflow (System-Template → User-Mapping)
|
||
|
||
**Output:**
|
||
- Auto-Detection funktioniert (User-Mappings > System-Templates)
|
||
- User können System-Templates kopieren und anpassen
|
||
- Permissions korrekt (Read-Only für System-Templates)
|
||
|
||
---
|
||
|
||
### Phase 3: Import-Executor + API (Woche 2-3)
|
||
|
||
**Ziel:** Import-Workflow komplett
|
||
|
||
- [ ] **Backend:**
|
||
- `csv_parser/executor.py` – Batch-Insert, Validation, Rollback
|
||
- API: `/api/csv/import`, `/api/csv/mappings`
|
||
- [ ] **Migration:** Bestehende Import-Endpoints auf Wrapper umstellen
|
||
- [ ] **Testing:** End-to-End-Tests (Nutrition, Activity)
|
||
|
||
**Output:** Import funktioniert via API, Legacy-Endpoints funktional
|
||
|
||
---
|
||
|
||
### Phase 4: Frontend (Woche 3-4)
|
||
|
||
**Ziel:** User-Interface für Mapping-Editor
|
||
|
||
- [ ] **Frontend:**
|
||
- `CSVUploadPage.jsx` – Upload + Modul-Auswahl
|
||
- `CSVMappingEditor.jsx` – Spalten-zu-Feld-Zuordnung
|
||
- `CSVImportProgress.jsx` – Fortschritts-Anzeige
|
||
- `CSVMappingLibrary.jsx` – Gespeicherte Mappings anzeigen/auswählen
|
||
- [ ] **UX:** Drag & Drop für Spalten-Zuordnung
|
||
- [ ] **Testing:** E2E-Tests (Playwright)
|
||
|
||
**Output:** Vollständige UI, User kann eigene Mappings erstellen
|
||
|
||
---
|
||
|
||
### Phase 5: Rollout (Woche 4)
|
||
|
||
**Ziel:** Alle Module migriert, Legacy-Code entfernt
|
||
|
||
- [ ] Alle Module auf Universal-Parser migriert (Weight, Circumference, Caliper, Sleep)
|
||
- [ ] Legacy-Import-Code entfernt (nach Deprecation-Phase)
|
||
- [ ] Dokumentation aktualisiert
|
||
- [ ] Gitea Issue #21 geschlossen
|
||
|
||
---
|
||
|
||
## 9. Offene Fragen (für User-Approval)
|
||
|
||
1. **Scope:** Alle Module sofort oder schrittweise? (Empfehlung: Start mit Nutrition + Activity)
|
||
2. **Rollback:** Wichtig genug für Phase 1-3? Oder NICE-TO-HAVE?
|
||
3. **UI-Komplexität:** Drag & Drop oder simple Dropdowns? (Empfehlung: Dropdowns zuerst, D&D später)
|
||
4. **Performance:** Import-Limit pro File? (Empfehlung: 10.000 Zeilen, dann Batch-Upload)
|
||
5. **Migration:** Legacy-Endpoints sofort wrappen oder parallel laufen lassen?
|
||
|
||
---
|
||
|
||
## 10. Aufwandsschätzung
|
||
|
||
| Phase | Aufwand | Komponenten |
|
||
|-------|---------|-------------|
|
||
| **Phase 1** | 8-12h | Parser-Engine, Type-Converter, Migrations |
|
||
| **Phase 2** | 6-8h | Auto-Detection, Mapping-Engine, Suggestions |
|
||
| **Phase 3** | 8-10h | Import-Executor, API-Endpoints, Wrapper |
|
||
| **Phase 4** | 12-16h | Frontend UI (3-4 Komponenten) |
|
||
| **Phase 5** | 4-6h | Migration aller Module, Cleanup |
|
||
| **GESAMT** | **38-52h** | ~5-7 Arbeitstage |
|
||
|
||
**Kritischer Pfad:** Phase 1 → Phase 2 → Phase 3 (Backend muss komplett sein vor Frontend)
|
||
|
||
---
|
||
|
||
## 11. Risiken & Mitigations
|
||
|
||
| Risiko | Wahrscheinlichkeit | Impact | Mitigation |
|
||
|--------|-------------------|--------|------------|
|
||
| **Date-Format-Vielfalt:** 20+ Formate schwer zu parsen | HOCH | MITTEL | Fallback auf Manual-Input, User kann Format angeben |
|
||
| **Performance:** Große Files (>10k Zeilen) langsam | MITTEL | MITTEL | Batch-Processing + Background-Job (Celery) |
|
||
| **Backward-Compatibility:** Legacy-Code bricht | NIEDRIG | HOCH | Parallel-Betrieb + Feature-Flag |
|
||
| **UX-Komplexität:** Mapping-Editor zu komplex | MITTEL | NIEDRIG | Wizard-Flow, Step-by-Step, gute Defaults |
|
||
|
||
---
|
||
|
||
## 12. Erfolgskriterien
|
||
|
||
✅ **User kann CSV-File hochladen ohne Code-Kenntnisse**
|
||
✅ **System erkennt bekannte Formate automatisch (≥80% Accuracy)**
|
||
✅ **User kann eigene Mappings speichern und wiederverwenden**
|
||
✅ **Import-Fehlerrate < 5% bei validen Daten**
|
||
✅ **Performance: 1000 Zeilen in < 5 Sekunden**
|
||
✅ **Alle bestehenden CSV-Imports funktionieren weiter (Wrapper)**
|
||
|
||
---
|
||
|
||
**Nächster Schritt:** User-Approval für Konzept + Start Phase 1 (Foundation)
|
||
|
||
**Geschätzter Start-to-Finish:** 5-7 Arbeitstage (bei Fokus-Arbeit ohne Unterbrechungen)
|