Add performance analysis and optimization roadmap to documentation

This commit is contained in:
2025-10-04 07:53:16 -03:00
parent 221b68be49
commit 6edbaa0b82

View File

@@ -333,6 +333,61 @@ curl http://localhost:8080/health
-**Cluster Agnostic**: Works on any OpenShift 4.x cluster -**Cluster Agnostic**: Works on any OpenShift 4.x cluster
-**Production Tested**: Deployed on OCP 4.15, 4.18, and 4.19 -**Production Tested**: Deployed on OCP 4.15, 4.18, and 4.19
### **Performance Analysis & Optimization Roadmap**
**📊 Current Performance Analysis:**
- **Query Efficiency**: Currently using individual queries per workload (6 queries × N workloads)
- **Response Time**: 30-60 seconds for 10 workloads
- **Cache Strategy**: No caching implemented
- **Batch Processing**: Sequential workload processing
**🎯 Performance Optimization Plan:**
- **Phase 1**: Aggregated Queries (10x performance improvement)
- **Phase 2**: Intelligent Caching (5x performance improvement)
- **Phase 3**: Batch Processing (3x performance improvement)
- **Phase 4**: Advanced Queries with MAX_OVER_TIME and percentiles
**Expected Results**: 10-20x faster response times (from 30-60s to 3-6s)
### **🔍 Performance Analysis: ORU Analyzer vs thanos-metrics-analyzer**
**Our Current Approach:**
```python
# ✅ STRENGTHS:
# - Dynamic step calculation based on time range
# - Async queries with aiohttp
# - Individual workload precision
# - OpenShift-specific queries
# ❌ WEAKNESSES:
# - 6 queries per workload (60 queries for 10 workloads)
# - No caching mechanism
# - Sequential processing
# - No batch optimization
```
**thanos-metrics-analyzer Approach:**
```python
# ✅ STRENGTHS:
# - MAX_OVER_TIME for peak usage analysis
# - Batch processing with cluster grouping
# - Aggregated queries for multiple workloads
# - Efficient data processing with pandas
# ❌ WEAKNESSES:
# - Synchronous queries (prometheus_api_client)
# - Fixed resolution (10m step)
# - No intelligent caching
# - Less granular workload analysis
```
**🚀 Optimization Strategy:**
1. **Aggregated Queries**: Single query for all workloads instead of N×6 queries
2. **Intelligent Caching**: 5-minute TTL cache for repeated queries
3. **Batch Processing**: Process workloads in groups of 5
4. **Advanced Queries**: Implement MAX_OVER_TIME and percentiles like thanos
5. **Async + Batch**: Combine our async approach with thanos batch processing
## 📝 Roadmap ## 📝 Roadmap
### 🎯 **PRAGMATIC ROADMAP - Resource Governance Focus** ### 🎯 **PRAGMATIC ROADMAP - Resource Governance Focus**