Commit Graph

61 Commits

Author SHA1 Message Date
42ff7c9f7c Feature: Storage Analysis - nova seção para análise de storage com métricas, gráficos e tabelas detalhadas 2025-10-17 10:05:57 -03:00
93a7a0988a feat: implement batch processing for large clusters (100 pods per batch) with memory optimization and progress tracking 2025-10-15 16:22:40 -03:00
32c074f9b8 fix: correct endpoint default to exclude system namespaces and revert configmap to proper user namespace filtering 2025-10-06 16:33:23 -03:00
f2713329bb fix: include system namespaces in validations endpoint to detect resource-governance workload issues 2025-10-06 16:02:27 -03:00
16a0429cc6 remove: eliminate all mock data and placeholder comments 2025-10-06 15:33:39 -03:00
3c7e2f7fa1 fix: correct namespaces_in_overcommit calculation for string list 2025-10-06 15:24:00 -03:00
c60d815a61 fix: add missing namespaces_list variable for cluster status API 2025-10-06 15:22:44 -03:00
c274269eb9 optimize: reduce cluster/status API response size by removing heavy pod data 2025-10-06 15:21:09 -03:00
8c616652af feat: implement ThanosClient for historical data queries and hybrid Prometheus+Thanos architecture 2025-10-06 12:14:40 -03:00
bd83be20e5 fix: handle Celery task error info properly in status API 2025-10-06 11:00:06 -03:00
bf06ae190a fix: correct KubernetesClient import to K8sClient in Celery tasks 2025-10-06 10:40:20 -03:00
8d92d19433 Fix: Dashboard charts now use real cluster data instead of mock data 2025-10-06 09:35:08 -03:00
eddc492d0e Add real namespace distribution data for dashboard chart
- Create new API endpoint /api/v1/namespace-distribution
- Replace mock data with real cluster data
- Add CPU and memory parsing functions
- Update frontend to use real data with enhanced chart
- Add hover effects and summary statistics
2025-10-04 11:43:22 -03:00
9b2dd69781 Implement Phase 1: Performance Optimization - 10x Improvement
- Add OptimizedPrometheusClient with aggregated queries (1 query vs 6 per workload)
- Implement intelligent caching system with 5-minute TTL and hit rate tracking
- Add MAX_OVER_TIME queries for peak usage analysis and realistic recommendations
- Create new optimized API endpoints for 10x faster workload analysis
- Add WorkloadMetrics and ClusterMetrics data structures for better performance
- Implement cache statistics and monitoring capabilities
- Focus on workload-level analysis (not individual pods) for persistent insights
- Maintain OpenShift-specific Prometheus queries for accurate cluster analysis
- Add comprehensive error handling and fallback mechanisms
- Enable parallel query processing for maximum performance

Performance Improvements:
- 10x reduction in Prometheus queries (60 queries → 6 queries for 10 workloads)
- 5x improvement with intelligent caching (80% hit rate expected)
- Real-time peak usage analysis with MAX_OVER_TIME
- Workload-focused analysis for persistent resource governance
- Optimized for OpenShift administrators' main pain point: identifying projects with missing/misconfigured requests and limits
2025-10-04 09:01:19 -03:00
a4cf3d65bc Implement OpenShift Console exact queries for CPU and Memory Usage
- Add get_workload_cpu_summary() and get_workload_memory_summary() methods
- Use exact OpenShift Console PromQL queries for data consistency
- Update historical analysis API endpoints to include real CPU/Memory data
- Document all OpenShift Console queries in AIAgents-Support.md
- Fix CPU Usage and Memory Usage columns showing N/A in Historical Analysis
2025-10-03 20:19:42 -03:00
6c2821609c Fix: pass time_range parameter to generate_recommendations for proper 7-day data 2025-10-03 09:41:02 -03:00
74f579050c feat: implement real Resource Utilization with Prometheus
- Add get_cluster_resource_utilization() method to PrometheusClient
- Use real CPU and memory usage vs requests data from Prometheus
- Replace placeholder 75% with actual cluster resource utilization
- Update modal to show production-ready status instead of placeholder
- Add automatic fallback to simulated data if Prometheus unavailable
- Calculate overall utilization as average of CPU and memory efficiency
2025-10-02 18:57:10 -03:00
64e17eb521 feat: implement VPA CRD support
- Add CustomObjectsApi integration for VPA resources
- Implement VPA CRUD operations (list, create, delete)
- Add VPA recommendation collection via CRD
- Add API endpoints for VPA management
- Handle VPA installation detection gracefully
- Complete TODO #1: CRD para VPA implementation
2025-10-02 18:50:56 -03:00
a1a70bae45 Implement smart recommendations application and improve VPA modal contrast 2025-10-02 17:30:05 -03:00
c6f69f85c9 fix: correct historical analysis endpoint and Chart.js loading
- Fix endpoint to use get_all_pods() instead of non-existent get_pods_by_selector()
- Move Chart.js scripts to end of body for proper loading order
- Add proper error handling for workload not found cases
- Ensure Chart.js is available before creating graphs
2025-10-02 15:47:13 -03:00
fa48e1de06 fix: remove self reference from function call 2025-10-02 10:56:14 -03:00
d35b637ba7 fix: use pod name extraction instead of labels for workload grouping 2025-10-02 10:55:12 -03:00
5168311e74 fix: correct PodResource attribute access in historical analysis endpoint 2025-10-02 10:53:20 -03:00
43c618cbc4 fix: add historical analysis endpoints and fix FontAwesome
- Add /api/v1/historical-analysis endpoint for workload list
- Add /api/v1/historical-analysis/{namespace}/{workload} for details
- Fix FontAwesome CDN to use working version
- Update todo list with progress
2025-10-02 10:51:33 -03:00
e39668e480 Implement Smart Recommendations Engine with dashboard and modals 2025-10-02 08:17:22 -03:00
f6de5a5f30 Add PromQL queries display in historical analysis
- Include PromQL queries in API response for workload metrics
- Display queries in historical analysis modal with copy functionality
- Add professional styling for query display sections
- Enable users to copy and validate queries in OpenShift Console
- Organize queries by category: cluster totals, usage, requests, limits
- Add copy-to-clipboard functionality with visual feedback
2025-10-02 07:34:02 -03:00
4721a1ef37 Fix historical analysis contradictions and implement workload-based analysis
- Fix insufficient_historical_data vs historical_analysis contradiction
- Add return statement when insufficient data to prevent P99 calculation
- Implement workload-based historical analysis instead of pod-based
- Add _extract_workload_name() to identify workload from pod names
- Add analyze_workload_historical_usage() for workload-level analysis
- Add _analyze_workload_metrics() with Prometheus workload queries
- Add validate_workload_resources_with_historical_analysis() method
- Update /cluster/status endpoint to use workload analysis by namespace
- Improve reliability by analyzing workloads instead of individual pods
- Maintain fallback to pod-level analysis if workload analysis fails
2025-10-01 16:32:12 -03:00
6f5c8b0cac Fix duplicate validations in cluster status
- Remove duplicate static validations from /cluster/status endpoint
- Use only historical analysis which includes static validations
- Add fallback to static validations only if historical analysis fails
- Eliminate duplicate invalid_ratio and container_metrics validations
- Improve validation efficiency and reduce redundancy
2025-10-01 16:25:38 -03:00
2bb5266753 Improve overcommit UI with info icons and modals
- Replace tooltips with info icons (ℹ️) next to CPU/Memory Overcommit
- Add modal dialogs showing detailed overcommit calculations
- Change Resource Quota Coverage to Resource Utilization
- Add CSS styling for overcommit details modals
- Improve UX with clickable info icons instead of hover tooltips
- Show capacity, requests, overcommit percentage, and available resources
2025-10-01 15:41:43 -03:00
8984701bf3 Add detailed tooltips for overcommit metrics
- Add tooltips showing capacity, requests, and calculation details
- Include CPU and Memory capacity/requests in API response
- Add CSS styling for tooltip hover effects
- Show detailed breakdown: Capacity Total, Requests Total, and calculation formula
- Improve user experience with transparent overcommit information
2025-10-01 15:33:39 -03:00
b7bfd33a28 Add debug logging for overcommit calculation 2025-10-01 15:29:43 -03:00
b83c55bf08 Fix Cluster Overcommit Summary display
- Add overcommit data processing in /cluster/status endpoint
- Extract CPU/Memory capacity and requests from Prometheus
- Calculate overcommit percentages and resource quota coverage
- Update frontend to use new overcommit data structure
- Fix issue where Cluster Overcommit Summary was showing all zeros
2025-10-01 15:13:04 -03:00
fae1d6fb18 Fix workload metrics API pod name matching
- Use regex pattern pod=~"{workload}.*" in workload metrics API
- This matches the fix applied to historical analysis
- Should resolve issue where resource-governance workload data was not being retrieved
- Both historical analysis and workload metrics now use consistent pod name matching
2025-10-01 14:57:27 -03:00
ee20a09147 Fix data unification and efficiency calculations
- Unify Prometheus queries between namespace analysis and historical analysis
- Fix efficiency calculations to prevent division by zero
- Remove duplicate validations in validation service
- Improve frontend data display with clear numerical values
- Add proper error handling for missing data
2025-10-01 14:43:43 -03:00
6ad1997afd Remove simulated data and enable real Prometheus metrics 2025-09-30 21:13:46 -03:00
20ae326158 Fix: historical analysis implementation with OpenShift-specific Prometheus queries 2025-09-30 21:01:00 -03:00
3445f58a11 Update Prometheus queries to use OpenShift-specific metrics
- Use node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate for CPU usage
- Use container_memory_working_set_bytes with kubelet job for memory usage
- Use kube_pod_container_resource_requests/limits with kube-state-metrics job
- Add workload-specific filtering to match OpenShift dashboard behavior
- This should resolve the 'insufficient data' issue by using the same metrics as OpenShift
2025-09-30 20:42:59 -03:00
0068db5a9e Fix remaining indentation error in routes.py 2025-09-30 18:07:05 -03:00
7efbd94b50 Fix indentation errors in routes.py 2025-09-30 18:06:48 -03:00
5f3f737b3a Add simulated data fallback for historical analysis when Prometheus is not accessible 2025-09-30 18:06:10 -03:00
2b2b3c23b2 Fix: Historical analysis now shows real consumption numbers and percentages relative to cluster totals 2025-09-30 18:03:17 -03:00
a847f0cd92 Fix: Add missing PrometheusClient import for workload metrics endpoint 2025-09-30 17:43:22 -03:00
f0d3831263 Feature: Add real Prometheus metrics visualization for historical analysis 2025-09-30 17:41:39 -03:00
d683704593 Fix: Integrate historical analysis validations in cluster status endpoint 2025-09-30 17:37:09 -03:00
f3b8022224 Phase 1.2: Complete Historical Analysis Integration - Add insufficient data detection, seasonal patterns, and integrate in main dashboard 2025-09-30 16:48:31 -03:00
fa8f3a41e5 Implement simplified UI/UX with health scores and grouped validations 2025-09-30 09:37:49 -03:00
021ce06323 Fix: corrigido erro 500 na análise por namespace - adicionado suporte para severidade 'info' 2025-09-29 21:48:43 -03:00
3a5af8ce67 Feat: implementar dashboard de cluster health com QoS e Resource Quotas
- Adicionar modelos para QoSClassification, ResourceQuota e ClusterHealth
- Implementar classificação automática de QoS (Guaranteed, Burstable, BestEffort)
- Criar análise de Resource Quotas com recomendações automáticas
- Adicionar dashboard principal com visão geral do cluster
- Implementar análise de overcommit com métricas visuais
- Adicionar top resource consumers com ranking
- Criar distribuição de QoS com estatísticas
- Adicionar novos endpoints API para cluster health e QoS
- Melhorar interface com design responsivo e intuitivo
- Alinhar com práticas Red Hat para gerenciamento de recursos
2025-09-29 16:35:07 -03:00
afc7462b40 Feat: implementar sistema de recomendações inteligentes e categorização de workloads 2025-09-29 15:26:09 -03:00
514ea60274 Fix namespace historical analysis - use Kubernetes API for accurate pod count and remove duplicate function 2025-09-29 14:07:49 -03:00