Files
c-relay/query_analysis_report.md

343 lines
10 KiB
Markdown

# Database Query Performance Analysis Report
**Analysis Date:** 2026-02-02
**Log Duration:** ~6 minutes (15:24:50 - 15:30:58)
**Total Queries:** 366 queries
**Data Source:** serverlog.txt
---
## Executive Summary
The relay is experiencing moderate performance issues with an average query time of **10.4ms** and a maximum query time of **672ms**. The primary bottlenecks are:
1. **Tag-based searches using `json_each()`** - 53% of all queries (194/366)
2. **Monitoring system queries** - Taking 540-550ms each
3. **Multiple pubkey lookups** - Kind 10002 queries with 15-50 pubkeys
---
## Query Performance Metrics
### Overall Statistics
- **Total Queries:** 366
- **Average Query Time:** 10,440 μs (10.4 ms)
- **Minimum Query Time:** 14 μs
- **Maximum Query Time:** 672,846 μs (672.8 ms)
- **Slow Queries (>10ms):** 8 queries (2.2%)
### Query Type Breakdown
| Type | Count | Percentage |
|------|-------|------------|
| REQ | 359 | 98.1% |
| MONITOR | 7 | 1.9% |
---
## Critical Performance Issues
### 1. **SLOWEST QUERY: 672ms Tag Search (IP: 192.42.116.178)**
```sql
SELECT id, pubkey, created_at, kind, content, sig, tags
FROM events
WHERE 1=1
AND (kind < 20000 OR kind >= 30000)
AND kind IN (5,6300,7000,2004,1622)
AND EXISTS (
SELECT 1 FROM json_each(json(tags))
WHERE json_extract(value, '$[0]') = ?
AND json_extract(value, '$[1]') IN (?)
)
ORDER BY created_at DESC
LIMIT 100
```
**Problem:** Full table scan with JSON parsing for every row
**Impact:** 672ms for 0 results (wasted computation)
**Root Cause:** No index on tag values, requires scanning all events
---
### 2. **Monitoring System Queries: 540-550ms Each**
```sql
SELECT * FROM active_subscriptions_log
ORDER BY created_at DESC
```
**Occurrences:** 4 queries in 6 minutes
**Average Time:** 545ms
**Rows Returned:** 20-52 rows
**Problem:** Extremely slow for small result sets
**Root Cause:** Likely missing index on `created_at` column
---
### 3. **Tag-Based Searches (json_each) - 53% of All Queries**
- **Total:** 194 queries (53% of all queries)
- **Pattern:** `EXISTS (SELECT 1 FROM json_each(json(tags)) WHERE ...)`
- **Most Common:** Kind 1984 (105 queries), Kind 1111 (47 queries)
- **Problem:** Every tag search requires full JSON parsing
**Example Pattern:**
```sql
WHERE kind IN (1984)
AND pubkey IN (?)
AND EXISTS (
SELECT 1 FROM json_each(json(tags))
WHERE json_extract(value, '$[0]') = ?
AND json_extract(value, '$[1]') IN (?,?,?,?,?,?,?,?,?,?,?)
)
```
---
### 4. **Multiple Pubkey Lookups (Kind 10002)**
- **Total:** 64 queries for kind 10002
- **Average Time:** 2,500-3,300 μs (2.5-3.3ms)
- **Pattern:** Queries with 15-50 pubkeys in `IN` clause
- **Problem:** Large `IN` clauses without proper indexing
**Example:**
```sql
WHERE kind IN (10002)
AND pubkey IN (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)
```
---
## Client Activity Analysis
### Top Query Sources (by IP)
| IP Address | Query Count | Percentage | Notes |
|------------|-------------|------------|-------|
| 45.84.107.222 | 101 | 27.6% | **Highest activity** |
| 23.234.109.54 | 69 | 18.9% | Second highest |
| 185.220.101.38 | 56 | 15.3% | Third highest |
| 192.42.116.178 | 51 | 13.9% | **Source of 672ms query** |
| 149.22.80.85 | 34 | 9.3% | |
| 174.138.53.241 | 24 | 6.6% | |
| Others | 31 | 8.5% | 6 other IPs |
**Observation:** Top 3 IPs account for 61.8% of all queries
---
## Most Common Query Patterns
| Kind Filter | Query Count | Percentage |
|-------------|-------------|------------|
| kind IN (1984) | 105 | 28.7% |
| kind IN (10002) | 64 | 17.5% |
| kind IN (1111) | 47 | 12.8% |
| kind IN (0,2,3,10002) | 24 | 6.6% |
| kind IN (9735) | 23 | 6.3% |
| kind IN (0,30315,10002,10050) | 20 | 5.5% |
| Others | 83 | 22.7% |
---
## Optimization Recommendations
### Priority 1: Critical (Immediate Action Required)
#### 1.1 Add Index on `active_subscriptions_log.created_at`
**Impact:** Will reduce monitoring queries from 540ms to <10ms
**Effort:** Low
**SQL:**
```sql
CREATE INDEX IF NOT EXISTS idx_active_subscriptions_created_at
ON active_subscriptions_log(created_at DESC);
```
#### 1.2 Implement Tag Indexing System
**Impact:** Will reduce tag searches from 100-600ms to <10ms
**Effort:** High
**Options:**
- **Option A:** Create separate `event_tags` table with indexes
```sql
CREATE TABLE event_tags (
event_id TEXT NOT NULL,
tag_name TEXT NOT NULL,
tag_value TEXT NOT NULL,
FOREIGN KEY (event_id) REFERENCES events(id)
);
CREATE INDEX idx_event_tags_lookup ON event_tags(tag_name, tag_value);
```
- **Option B:** Use SQLite JSON1 extension with generated columns (if available)
- **Option C:** Implement application-level tag caching
**Recommended:** Option A (most reliable and performant)
---
### Priority 2: High (Implement Within Week)
#### 2.1 Optimize Multiple Pubkey Queries
**Current:** `pubkey IN (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)`
**Problem:** Large IN clauses are inefficient
**Solution:**
- Add composite index: `CREATE INDEX idx_events_kind_pubkey ON events(kind, pubkey, created_at DESC);`
- Consider query rewriting for >10 pubkeys
#### 2.2 Add Query Result Caching
**Target Queries:**
- Kind 0 (profile) lookups - frequently repeated
- Kind 10002 (relay lists) - relatively static
- Kind 1984 (reports) - can be cached for 30-60 seconds
**Implementation:**
- Use in-memory LRU cache (e.g., 1000 entries)
- Cache key: hash of SQL + parameters
- TTL: 30-60 seconds for most queries
---
### Priority 3: Medium (Optimize Over Time)
#### 3.1 Disable or Throttle Monitoring Queries
**Current:** Monitoring queries run every ~60 seconds
**Impact:** Each query takes 540ms
**Options:**
- Increase throttle to 300 seconds (5 minutes)
- Disable monitoring entirely if not actively used
- Optimize `active_subscriptions_log` table structure
#### 3.2 Implement Query Complexity Limits
**Problem:** Some queries scan entire table (e.g., 672ms query returned 0 results)
**Solution:**
- Reject queries without time bounds (require `since` or `until`)
- Limit number of kinds in single query (max 10)
- Limit number of pubkeys in single query (max 20)
#### 3.3 Add Composite Indexes for Common Patterns
```sql
-- For kind + created_at queries (most common pattern)
CREATE INDEX idx_events_kind_created ON events(kind, created_at DESC);
-- For kind + pubkey + created_at queries
CREATE INDEX idx_events_kind_pubkey_created ON events(kind, pubkey, created_at DESC);
```
---
## Database Schema Recommendations
### Current Issues
1. **No tag indexing** - Forces full table scans with JSON parsing
2. **Missing created_at indexes** - Monitoring queries are extremely slow
3. **No composite indexes** - Multi-condition queries inefficient
### Recommended Schema Changes
```sql
-- 1. Add tag indexing table
CREATE TABLE IF NOT EXISTS event_tags (
event_id TEXT NOT NULL,
tag_name TEXT NOT NULL,
tag_value TEXT NOT NULL,
FOREIGN KEY (event_id) REFERENCES events(id) ON DELETE CASCADE
);
CREATE INDEX idx_event_tags_lookup ON event_tags(tag_name, tag_value);
CREATE INDEX idx_event_tags_event_id ON event_tags(event_id);
-- 2. Add monitoring table index
CREATE INDEX IF NOT EXISTS idx_active_subscriptions_created_at
ON active_subscriptions_log(created_at DESC);
-- 3. Add composite indexes for common query patterns
CREATE INDEX IF NOT EXISTS idx_events_kind_created
ON events(kind, created_at DESC);
CREATE INDEX IF NOT EXISTS idx_events_kind_pubkey_created
ON events(kind, pubkey, created_at DESC);
-- 4. Add index for pubkey lookups
CREATE INDEX IF NOT EXISTS idx_events_pubkey_created
ON events(pubkey, created_at DESC);
```
---
## Expected Performance Improvements
| Optimization | Current Avg | Expected Avg | Improvement |
|--------------|-------------|--------------|-------------|
| Tag searches (with event_tags table) | 100-600ms | 5-20ms | **95-97%** |
| Monitoring queries (with index) | 540ms | 5-10ms | **98%** |
| Multiple pubkey queries (with index) | 2.5-3.3ms | 0.5-1ms | **70-80%** |
| Overall average query time | 10.4ms | 2-3ms | **70-80%** |
---
## Client Behavior Analysis
### Potentially Abusive Patterns
#### IP: 192.42.116.178 (51 queries)
- **Issue:** Generated the slowest query (672ms)
- **Pattern:** Complex tag searches with multiple kinds
- **Recommendation:** Monitor for repeated expensive queries
#### IP: 45.84.107.222 (101 queries - 27.6% of total)
- **Issue:** Highest query volume
- **Pattern:** Mix of kind 10002, 1984, and various other kinds
- **Recommendation:** Acceptable if queries are efficient; monitor for abuse
### Normal Behavior
- Most queries are <1ms (fast)
- Majority return 0-10 rows (reasonable)
- Query patterns match typical Nostr client behavior
---
## Action Plan
### Immediate (Today)
1. ✅ Add index on `active_subscriptions_log.created_at`
2. ✅ Increase monitoring throttle from 60s to 300s (or disable)
3. ✅ Monitor IP 192.42.116.178 for repeated expensive queries
### This Week
1. ⏳ Design and implement `event_tags` table
2. Add composite indexes for common query patterns
3. Implement query complexity limits (require time bounds)
### This Month
1. Implement query result caching (LRU cache)
2. Add query cost estimation and rejection
3. Optimize subscription matching algorithm
---
## Monitoring Recommendations
### Key Metrics to Track
1. **Average query time** - Target: <5ms
2. **P95 query time** - Target: <50ms
3. **P99 query time** - Target: <100ms
4. **Queries >100ms** - Target: <1% of queries
5. **Tag search percentage** - Target: <30% after optimization
### Alert Thresholds
- **Critical:** Average query time >20ms for 5 minutes
- **Warning:** Any single query >1000ms (1 second)
- **Info:** Client making >100 queries/minute
---
## Conclusion
The relay is experiencing performance issues primarily due to:
1. **Lack of tag indexing** (53% of queries affected)
2. **Missing indexes on monitoring tables** (540ms queries)
3. **Inefficient multiple pubkey lookups**
Implementing the recommended optimizations will reduce average query time from **10.4ms to 2-3ms** (70-80% improvement) and eliminate the 500-600ms slow queries entirely.
**Priority:** Implement tag indexing system and add missing indexes within 1 week to prevent performance degradation as database grows.