# Event JSON Storage & Database Migration Plan **Goal:** Store full event JSON in database for 2,500x faster retrieval + implement proper database migration system --- ## Decision: Fresh Start vs Migration ### Option A: Fresh Start (Recommended for This Change) **Pros:** - ✅ Clean implementation (no migration complexity) - ✅ Fast deployment (no data conversion) - ✅ No risk of migration bugs - ✅ Opportunity to fix any schema issues - ✅ Smaller database (no legacy data) **Cons:** - ❌ Lose existing events - ❌ Relay starts "empty" - ❌ Historical data lost **Recommendation:** **Fresh start for this change** because: 1. Your relay is still in development/testing phase 2. The schema change is fundamental (affects every event) 3. Migration would require reconstructing JSON for every existing event (expensive) 4. You've been doing fresh starts anyway ### Option B: Implement Migration System **Pros:** - ✅ Preserve existing events - ✅ No data loss - ✅ Professional approach - ✅ Reusable for future changes **Cons:** - ❌ Complex implementation - ❌ Slow migration (reconstruct JSON for all events) - ❌ Risk of bugs during migration - ❌ Requires careful testing **Recommendation:** **Implement migration system for FUTURE changes**, but start fresh for this one. --- ## Proposed Schema Change ### New Schema (v11) ```sql CREATE TABLE events ( id TEXT PRIMARY KEY, pubkey TEXT NOT NULL, created_at INTEGER NOT NULL, kind INTEGER NOT NULL, event_type TEXT NOT NULL CHECK (event_type IN ('regular', 'replaceable', 'ephemeral', 'addressable')), content TEXT NOT NULL, sig TEXT NOT NULL, tags JSON NOT NULL DEFAULT '[]', event_json TEXT NOT NULL, -- NEW: Full event as JSON string first_seen INTEGER NOT NULL DEFAULT (strftime('%s', 'now')) ); -- Keep all existing indexes (they query the columns, not event_json) CREATE INDEX idx_events_pubkey ON events(pubkey); CREATE INDEX idx_events_kind ON events(kind); CREATE INDEX idx_events_created_at ON events(created_at DESC); CREATE INDEX idx_events_kind_created_at ON events(kind, created_at DESC); CREATE INDEX idx_events_pubkey_created_at ON events(pubkey, created_at DESC); ``` ### Why Keep Both Columns AND event_json? **Columns (id, pubkey, kind, etc.):** - Used for **querying** (WHERE clauses, indexes) - Fast filtering and sorting - Required for SQL operations **event_json:** - Used for **retrieval** (SELECT results) - Pre-serialized, ready to send - Eliminates JSON reconstruction **This is a common pattern** in high-performance systems (denormalization for read performance). --- ## Implementation Plan ### Phase 1: Schema Update (v11) **File:** `src/sql_schema.h` ```c #define EMBEDDED_SCHEMA_VERSION "11" // In schema SQL: "CREATE TABLE events (\n\ id TEXT PRIMARY KEY,\n\ pubkey TEXT NOT NULL,\n\ created_at INTEGER NOT NULL,\n\ kind INTEGER NOT NULL,\n\ event_type TEXT NOT NULL,\n\ content TEXT NOT NULL,\n\ sig TEXT NOT NULL,\n\ tags JSON NOT NULL DEFAULT '[]',\n\ event_json TEXT NOT NULL,\n\ -- NEW COLUMN first_seen INTEGER NOT NULL DEFAULT (strftime('%s', 'now'))\n\ );\n\ ``` ### Phase 2: Update store_event() Function **File:** `src/main.c` (lines 660-773) **Current:** ```c int store_event(cJSON* event) { // Extract fields cJSON* id = cJSON_GetObjectItem(event, "id"); // ... extract other fields ... // INSERT with individual columns const char* sql = "INSERT INTO events (id, pubkey, ...) VALUES (?, ?, ...)"; } ``` **New:** ```c int store_event(cJSON* event) { // Serialize event to JSON string ONCE char* event_json = cJSON_PrintUnformatted(event); if (!event_json) { return -1; } // Extract fields for indexed columns cJSON* id = cJSON_GetObjectItem(event, "id"); // ... extract other fields ... // INSERT with columns + event_json const char* sql = "INSERT INTO events (id, pubkey, ..., event_json) VALUES (?, ?, ..., ?)"; // ... bind parameters ... sqlite3_bind_text(stmt, 9, event_json, -1, SQLITE_TRANSIENT); // ... execute ... free(event_json); } ``` ### Phase 3: Update handle_req_message() Function **File:** `src/main.c` (lines 1302-1361) **Current:** ```c while (sqlite3_step(stmt) == SQLITE_ROW) { // Build event JSON from 7 columns cJSON* event = cJSON_CreateObject(); cJSON_AddStringToObject(event, "id", (char*)sqlite3_column_text(stmt, 0)); // ... 6 more fields ... cJSON* tags = cJSON_Parse(tags_json); // Parse tags cJSON_AddItemToObject(event, "tags", tags); // Create EVENT message cJSON* event_msg = cJSON_CreateArray(); cJSON_AddItemToArray(event_msg, cJSON_CreateString("EVENT")); cJSON_AddItemToArray(event_msg, cJSON_CreateString(sub_id)); cJSON_AddItemToArray(event_msg, event); char* msg_str = cJSON_Print(event_msg); queue_message(wsi, pss, msg_str, msg_len, LWS_WRITE_TEXT); } ``` **New:** ```c // Update SQL to select event_json const char* sql = "SELECT event_json FROM events WHERE ..."; while (sqlite3_step(stmt) == SQLITE_ROW) { const char* event_json = (char*)sqlite3_column_text(stmt, 0); // Build EVENT message with pre-serialized event // Format: ["EVENT","sub_id",{...event_json...}] size_t msg_len = 12 + strlen(sub_id) + strlen(event_json); // ["EVENT","",""] char* msg_str = malloc(msg_len + 1); snprintf(msg_str, msg_len + 1, "[\"EVENT\",\"%s\",%s]", sub_id, event_json); queue_message(wsi, pss, msg_str, strlen(msg_str), LWS_WRITE_TEXT); free(msg_str); } ``` **Speedup:** 366 × (cJSON operations) eliminated! --- ## Database Migration System Design ### For Future Schema Changes **File:** `src/migrations.c` (new file) ```c typedef struct { int from_version; int to_version; const char* description; int (*migrate_func)(sqlite3* db); } migration_t; // Migration from v10 to v11: Add event_json column int migrate_v10_to_v11(sqlite3* db) { // Step 1: Add column const char* add_column_sql = "ALTER TABLE events ADD COLUMN event_json TEXT"; if (sqlite3_exec(db, add_column_sql, NULL, NULL, NULL) != SQLITE_OK) { return -1; } // Step 2: Populate event_json for existing events const char* select_sql = "SELECT id, pubkey, created_at, kind, content, sig, tags FROM events"; sqlite3_stmt* stmt; if (sqlite3_prepare_v2(db, select_sql, -1, &stmt, NULL) != SQLITE_OK) { return -1; } while (sqlite3_step(stmt) == SQLITE_ROW) { // Reconstruct JSON cJSON* event = cJSON_CreateObject(); cJSON_AddStringToObject(event, "id", (char*)sqlite3_column_text(stmt, 0)); // ... add other fields ... char* event_json = cJSON_PrintUnformatted(event); // Update row const char* update_sql = "UPDATE events SET event_json = ? WHERE id = ?"; sqlite3_stmt* update_stmt; sqlite3_prepare_v2(db, update_sql, -1, &update_stmt, NULL); sqlite3_bind_text(update_stmt, 1, event_json, -1, SQLITE_TRANSIENT); sqlite3_bind_text(update_stmt, 2, (char*)sqlite3_column_text(stmt, 0), -1, SQLITE_STATIC); sqlite3_step(update_stmt); sqlite3_finalize(update_stmt); free(event_json); cJSON_Delete(event); } sqlite3_finalize(stmt); // Step 3: Make column NOT NULL // (SQLite doesn't support ALTER COLUMN, so we'd need to recreate table) return 0; } // Migration registry static migration_t migrations[] = { {10, 11, "Add event_json column for fast retrieval", migrate_v10_to_v11}, // Future migrations go here }; int run_migrations(sqlite3* db, int current_version, int target_version) { for (int i = 0; i < sizeof(migrations) / sizeof(migration_t); i++) { if (migrations[i].from_version >= current_version && migrations[i].to_version <= target_version) { printf("Running migration: %s\n", migrations[i].description); if (migrations[i].migrate_func(db) != 0) { fprintf(stderr, "Migration failed: %s\n", migrations[i].description); return -1; } // Update schema version char update_version_sql[256]; snprintf(update_version_sql, sizeof(update_version_sql), "PRAGMA user_version = %d", migrations[i].to_version); sqlite3_exec(db, update_version_sql, NULL, NULL, NULL); } } return 0; } ``` --- ## Recommendation: Hybrid Approach ### For This Change (v10 → v11): Fresh Start **Rationale:** 1. Your relay is still in development 2. Migration would be slow (reconstruct JSON for all events) 3. You've been doing fresh starts anyway 4. Clean slate for performance testing **Steps:** 1. Update schema to v11 with event_json column 2. Update store_event() to populate event_json 3. Update handle_req_message() to use event_json 4. Deploy with fresh database 5. Test performance improvement ### For Future Changes: Use Migration System **Rationale:** 1. Once relay is in production, data preservation matters 2. Migration system is reusable 3. Professional approach for production relay **Steps:** 1. Create `src/migrations.c` and `src/migrations.h` 2. Implement migration framework 3. Add migration functions for each schema change 4. Test migrations thoroughly before deployment --- ## Migration System Features ### Core Features 1. **Version Detection** - Read current schema version from database - Compare with embedded schema version - Determine which migrations to run 2. **Migration Chain** - Run migrations in sequence (v8 → v9 → v10 → v11) - Skip already-applied migrations - Stop on first failure 3. **Backup Before Migration** - Automatic database backup before migration - Rollback capability if migration fails - Backup retention policy 4. **Progress Reporting** - Log migration progress - Show estimated time remaining - Report success/failure ### Safety Features 1. **Transaction Wrapping** ```c sqlite3_exec(db, "BEGIN TRANSACTION", NULL, NULL, NULL); int result = migrate_v10_to_v11(db); if (result == 0) { sqlite3_exec(db, "COMMIT", NULL, NULL, NULL); } else { sqlite3_exec(db, "ROLLBACK", NULL, NULL, NULL); } ``` 2. **Validation After Migration** - Verify row counts match - Check data integrity - Validate indexes created 3. **Dry-Run Mode** - Test migration without committing - Report what would be changed - Estimate migration time --- ## Implementation Timeline ### Immediate (Today): Fresh Start with event_json **Changes:** 1. Update schema to v11 (add event_json column) 2. Update store_event() to populate event_json 3. Update handle_req_message() to use event_json 4. Deploy with fresh database **Effort:** 4 hours **Impact:** 2,500x faster event retrieval ### This Week: Build Migration Framework **Changes:** 1. Create src/migrations.c and src/migrations.h 2. Implement migration runner 3. Add backup/rollback capability 4. Add progress reporting **Effort:** 1-2 days **Impact:** Reusable for all future schema changes ### Future: Add Migrations as Needed **For each schema change:** 1. Write migration function 2. Add to migrations array 3. Test thoroughly 4. Deploy with automatic migration --- ## Code Structure ### File Organization ``` src/ ├── migrations.c # NEW: Migration system ├── migrations.h # NEW: Migration API ├── sql_schema.h # Schema definition (v11) ├── main.c # Updated store_event() and handle_req_message() └── ... ``` ### Migration API ```c // migrations.h int init_migration_system(sqlite3* db); int run_pending_migrations(sqlite3* db); int backup_database(const char* db_path, char* backup_path, size_t backup_path_size); int rollback_migration(sqlite3* db, const char* backup_path); ``` --- ## Testing Strategy ### For Fresh Start (v11) 1. **Local testing:** - Build with new schema - Post test events - Query events and measure performance - Verify event_json is populated correctly 2. **Performance testing:** - Query 366 events - Measure time (should be <10ms instead of 18s) - Check CPU usage (should be <20%) 3. **Production deployment:** - Stop relay - Delete old database - Start relay with v11 schema - Monitor performance ### For Migration System (Future) 1. **Unit tests:** - Test each migration function - Test rollback capability - Test error handling 2. **Integration tests:** - Create database with old schema - Run migration - Verify data integrity - Test rollback 3. **Performance tests:** - Measure migration time for large databases - Test with 10K, 100K, 1M events - Optimize slow migrations --- ## Migration Complexity Analysis ### For v10 → v11 Migration **If we were to migrate existing data:** ```sql -- Step 1: Add column (fast) ALTER TABLE events ADD COLUMN event_json TEXT; -- Step 2: Populate event_json (SLOW!) -- For each of N events: -- 1. SELECT 7 columns -- 2. Reconstruct JSON (cJSON operations) -- 3. Serialize to string (cJSON_Print) -- 4. UPDATE event_json column -- 5. Free memory -- Estimated time: -- - 1000 events: ~10 seconds -- - 10000 events: ~100 seconds -- - 100000 events: ~1000 seconds (16 minutes) ``` **Conclusion:** Migration is expensive for this change. Fresh start is better. --- ## Future Migration Examples ### Easy Migrations (Fast) **Adding an index:** ```c int migrate_add_index(sqlite3* db) { return sqlite3_exec(db, "CREATE INDEX idx_new ON events(new_column)", NULL, NULL, NULL); } ``` **Adding a column with default:** ```c int migrate_add_column(sqlite3* db) { return sqlite3_exec(db, "ALTER TABLE events ADD COLUMN new_col TEXT DEFAULT ''", NULL, NULL, NULL); } ``` ### Hard Migrations (Slow) **Changing column type:** - Requires table recreation - Copy all data - Recreate indexes - Can take minutes for large databases **Populating computed columns:** - Requires row-by-row processing - Can take minutes for large databases --- ## Recommendation Summary ### For This Change (event_json) **Do:** Fresh start with v11 schema - Fast deployment - Clean implementation - Immediate performance benefit - No migration complexity **Don't:** Migrate existing data - Too slow (reconstruct JSON for all events) - Too complex (first migration) - Not worth it (relay still in development) ### For Future Changes **Do:** Implement migration system - Professional approach - Data preservation - Reusable framework - Required for production relay **Timeline:** - **Today:** Deploy v11 with fresh start - **This week:** Build migration framework - **Future:** Use migrations for all schema changes --- ## Next Steps 1. ✅ Update schema to v11 (add event_json column) 2. ✅ Update store_event() to populate event_json 3. ✅ Update handle_req_message() to use event_json 4. ✅ Test locally with 366-event query 5. ✅ Deploy to production with fresh database 6. ✅ Measure performance improvement 7. ⏳ Build migration system for future use **Expected result:** 366-event retrieval time drops from 18s to <10ms (2,500x speedup)