openshift-resource-governance

Author	SHA1	Message	Date
andersonid	42ff7c9f7c	Feature: Storage Analysis - nova seção para análise de storage com métricas, gráficos e tabelas detalhadas	2025-10-17 10:05:57 -03:00
andersonid	93a7a0988a	feat: implement batch processing for large clusters (100 pods per batch) with memory optimization and progress tracking	2025-10-15 16:22:40 -03:00
andersonid	32c074f9b8	fix: correct endpoint default to exclude system namespaces and revert configmap to proper user namespace filtering	2025-10-06 16:33:23 -03:00
andersonid	f2713329bb	fix: include system namespaces in validations endpoint to detect resource-governance workload issues	2025-10-06 16:02:27 -03:00
andersonid	16a0429cc6	remove: eliminate all mock data and placeholder comments	2025-10-06 15:33:39 -03:00
andersonid	3c7e2f7fa1	fix: correct namespaces_in_overcommit calculation for string list	2025-10-06 15:24:00 -03:00
andersonid	c60d815a61	fix: add missing namespaces_list variable for cluster status API	2025-10-06 15:22:44 -03:00
andersonid	c274269eb9	optimize: reduce cluster/status API response size by removing heavy pod data	2025-10-06 15:21:09 -03:00
andersonid	8c616652af	feat: implement ThanosClient for historical data queries and hybrid Prometheus+Thanos architecture	2025-10-06 12:14:40 -03:00
andersonid	bd83be20e5	fix: handle Celery task error info properly in status API	2025-10-06 11:00:06 -03:00
andersonid	bf06ae190a	fix: correct KubernetesClient import to K8sClient in Celery tasks	2025-10-06 10:40:20 -03:00
andersonid	8d92d19433	Fix: Dashboard charts now use real cluster data instead of mock data	2025-10-06 09:35:08 -03:00
andersonid	eddc492d0e	Add real namespace distribution data for dashboard chart - Create new API endpoint /api/v1/namespace-distribution - Replace mock data with real cluster data - Add CPU and memory parsing functions - Update frontend to use real data with enhanced chart - Add hover effects and summary statistics	2025-10-04 11:43:22 -03:00
andersonid	9b2dd69781	Implement Phase 1: Performance Optimization - 10x Improvement - Add OptimizedPrometheusClient with aggregated queries (1 query vs 6 per workload) - Implement intelligent caching system with 5-minute TTL and hit rate tracking - Add MAX_OVER_TIME queries for peak usage analysis and realistic recommendations - Create new optimized API endpoints for 10x faster workload analysis - Add WorkloadMetrics and ClusterMetrics data structures for better performance - Implement cache statistics and monitoring capabilities - Focus on workload-level analysis (not individual pods) for persistent insights - Maintain OpenShift-specific Prometheus queries for accurate cluster analysis - Add comprehensive error handling and fallback mechanisms - Enable parallel query processing for maximum performance Performance Improvements: - 10x reduction in Prometheus queries (60 queries → 6 queries for 10 workloads) - 5x improvement with intelligent caching (80% hit rate expected) - Real-time peak usage analysis with MAX_OVER_TIME - Workload-focused analysis for persistent resource governance - Optimized for OpenShift administrators' main pain point: identifying projects with missing/misconfigured requests and limits	2025-10-04 09:01:19 -03:00
andersonid	a4cf3d65bc	Implement OpenShift Console exact queries for CPU and Memory Usage - Add get_workload_cpu_summary() and get_workload_memory_summary() methods - Use exact OpenShift Console PromQL queries for data consistency - Update historical analysis API endpoints to include real CPU/Memory data - Document all OpenShift Console queries in AIAgents-Support.md - Fix CPU Usage and Memory Usage columns showing N/A in Historical Analysis	2025-10-03 20:19:42 -03:00
andersonid	6c2821609c	Fix: pass time_range parameter to generate_recommendations for proper 7-day data	2025-10-03 09:41:02 -03:00
andersonid	74f579050c	feat: implement real Resource Utilization with Prometheus - Add get_cluster_resource_utilization() method to PrometheusClient - Use real CPU and memory usage vs requests data from Prometheus - Replace placeholder 75% with actual cluster resource utilization - Update modal to show production-ready status instead of placeholder - Add automatic fallback to simulated data if Prometheus unavailable - Calculate overall utilization as average of CPU and memory efficiency	2025-10-02 18:57:10 -03:00
andersonid	64e17eb521	feat: implement VPA CRD support - Add CustomObjectsApi integration for VPA resources - Implement VPA CRUD operations (list, create, delete) - Add VPA recommendation collection via CRD - Add API endpoints for VPA management - Handle VPA installation detection gracefully - Complete TODO #1: CRD para VPA implementation	2025-10-02 18:50:56 -03:00
andersonid	a1a70bae45	Implement smart recommendations application and improve VPA modal contrast	2025-10-02 17:30:05 -03:00
andersonid	c6f69f85c9	fix: correct historical analysis endpoint and Chart.js loading - Fix endpoint to use get_all_pods() instead of non-existent get_pods_by_selector() - Move Chart.js scripts to end of body for proper loading order - Add proper error handling for workload not found cases - Ensure Chart.js is available before creating graphs	2025-10-02 15:47:13 -03:00
andersonid	fa48e1de06	fix: remove self reference from function call	2025-10-02 10:56:14 -03:00
andersonid	d35b637ba7	fix: use pod name extraction instead of labels for workload grouping	2025-10-02 10:55:12 -03:00
andersonid	5168311e74	fix: correct PodResource attribute access in historical analysis endpoint	2025-10-02 10:53:20 -03:00
andersonid	43c618cbc4	fix: add historical analysis endpoints and fix FontAwesome - Add /api/v1/historical-analysis endpoint for workload list - Add /api/v1/historical-analysis/{namespace}/{workload} for details - Fix FontAwesome CDN to use working version - Update todo list with progress	2025-10-02 10:51:33 -03:00
andersonid	e39668e480	Implement Smart Recommendations Engine with dashboard and modals	2025-10-02 08:17:22 -03:00
andersonid	f6de5a5f30	Add PromQL queries display in historical analysis - Include PromQL queries in API response for workload metrics - Display queries in historical analysis modal with copy functionality - Add professional styling for query display sections - Enable users to copy and validate queries in OpenShift Console - Organize queries by category: cluster totals, usage, requests, limits - Add copy-to-clipboard functionality with visual feedback	2025-10-02 07:34:02 -03:00
andersonid	4721a1ef37	Fix historical analysis contradictions and implement workload-based analysis - Fix insufficient_historical_data vs historical_analysis contradiction - Add return statement when insufficient data to prevent P99 calculation - Implement workload-based historical analysis instead of pod-based - Add _extract_workload_name() to identify workload from pod names - Add analyze_workload_historical_usage() for workload-level analysis - Add _analyze_workload_metrics() with Prometheus workload queries - Add validate_workload_resources_with_historical_analysis() method - Update /cluster/status endpoint to use workload analysis by namespace - Improve reliability by analyzing workloads instead of individual pods - Maintain fallback to pod-level analysis if workload analysis fails	2025-10-01 16:32:12 -03:00
andersonid	6f5c8b0cac	Fix duplicate validations in cluster status - Remove duplicate static validations from /cluster/status endpoint - Use only historical analysis which includes static validations - Add fallback to static validations only if historical analysis fails - Eliminate duplicate invalid_ratio and container_metrics validations - Improve validation efficiency and reduce redundancy	2025-10-01 16:25:38 -03:00
andersonid	2bb5266753	Improve overcommit UI with info icons and modals - Replace tooltips with info icons (ℹ️) next to CPU/Memory Overcommit - Add modal dialogs showing detailed overcommit calculations - Change Resource Quota Coverage to Resource Utilization - Add CSS styling for overcommit details modals - Improve UX with clickable info icons instead of hover tooltips - Show capacity, requests, overcommit percentage, and available resources	2025-10-01 15:41:43 -03:00
andersonid	8984701bf3	Add detailed tooltips for overcommit metrics - Add tooltips showing capacity, requests, and calculation details - Include CPU and Memory capacity/requests in API response - Add CSS styling for tooltip hover effects - Show detailed breakdown: Capacity Total, Requests Total, and calculation formula - Improve user experience with transparent overcommit information	2025-10-01 15:33:39 -03:00
andersonid	b7bfd33a28	Add debug logging for overcommit calculation	2025-10-01 15:29:43 -03:00
andersonid	b83c55bf08	Fix Cluster Overcommit Summary display - Add overcommit data processing in /cluster/status endpoint - Extract CPU/Memory capacity and requests from Prometheus - Calculate overcommit percentages and resource quota coverage - Update frontend to use new overcommit data structure - Fix issue where Cluster Overcommit Summary was showing all zeros	2025-10-01 15:13:04 -03:00
andersonid	fae1d6fb18	Fix workload metrics API pod name matching - Use regex pattern pod=~"{workload}.*" in workload metrics API - This matches the fix applied to historical analysis - Should resolve issue where resource-governance workload data was not being retrieved - Both historical analysis and workload metrics now use consistent pod name matching	2025-10-01 14:57:27 -03:00
andersonid	ee20a09147	Fix data unification and efficiency calculations - Unify Prometheus queries between namespace analysis and historical analysis - Fix efficiency calculations to prevent division by zero - Remove duplicate validations in validation service - Improve frontend data display with clear numerical values - Add proper error handling for missing data	2025-10-01 14:43:43 -03:00
andersonid	6ad1997afd	Remove simulated data and enable real Prometheus metrics	2025-09-30 21:13:46 -03:00
andersonid	20ae326158	Fix: historical analysis implementation with OpenShift-specific Prometheus queries	2025-09-30 21:01:00 -03:00
andersonid	3445f58a11	Update Prometheus queries to use OpenShift-specific metrics - Use node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate for CPU usage - Use container_memory_working_set_bytes with kubelet job for memory usage - Use kube_pod_container_resource_requests/limits with kube-state-metrics job - Add workload-specific filtering to match OpenShift dashboard behavior - This should resolve the 'insufficient data' issue by using the same metrics as OpenShift	2025-09-30 20:42:59 -03:00
andersonid	0068db5a9e	Fix remaining indentation error in routes.py	2025-09-30 18:07:05 -03:00
andersonid	7efbd94b50	Fix indentation errors in routes.py	2025-09-30 18:06:48 -03:00
andersonid	5f3f737b3a	Add simulated data fallback for historical analysis when Prometheus is not accessible	2025-09-30 18:06:10 -03:00
andersonid	2b2b3c23b2	Fix: Historical analysis now shows real consumption numbers and percentages relative to cluster totals	2025-09-30 18:03:17 -03:00
andersonid	a847f0cd92	Fix: Add missing PrometheusClient import for workload metrics endpoint	2025-09-30 17:43:22 -03:00
andersonid	f0d3831263	Feature: Add real Prometheus metrics visualization for historical analysis	2025-09-30 17:41:39 -03:00
andersonid	d683704593	Fix: Integrate historical analysis validations in cluster status endpoint	2025-09-30 17:37:09 -03:00
andersonid	f3b8022224	Phase 1.2: Complete Historical Analysis Integration - Add insufficient data detection, seasonal patterns, and integrate in main dashboard	2025-09-30 16:48:31 -03:00
andersonid	fa8f3a41e5	Implement simplified UI/UX with health scores and grouped validations	2025-09-30 09:37:49 -03:00
andersonid	021ce06323	Fix: corrigido erro 500 na análise por namespace - adicionado suporte para severidade 'info'	2025-09-29 21:48:43 -03:00
andersonid	3a5af8ce67	Feat: implementar dashboard de cluster health com QoS e Resource Quotas - Adicionar modelos para QoSClassification, ResourceQuota e ClusterHealth - Implementar classificação automática de QoS (Guaranteed, Burstable, BestEffort) - Criar análise de Resource Quotas com recomendações automáticas - Adicionar dashboard principal com visão geral do cluster - Implementar análise de overcommit com métricas visuais - Adicionar top resource consumers com ranking - Criar distribuição de QoS com estatísticas - Adicionar novos endpoints API para cluster health e QoS - Melhorar interface com design responsivo e intuitivo - Alinhar com práticas Red Hat para gerenciamento de recursos	2025-09-29 16:35:07 -03:00
andersonid	afc7462b40	Feat: implementar sistema de recomendações inteligentes e categorização de workloads	2025-09-29 15:26:09 -03:00
andersonid	514ea60274	Fix namespace historical analysis - use Kubernetes API for accurate pod count and remove duplicate function	2025-09-29 14:07:49 -03:00

1 2

61 Commits