Files
openshift-resource-governance/README.md

8.8 KiB

OpenShift Resource Governance Tool

A resource governance tool for OpenShift clusters that goes beyond what Metrics Server and VPA offer, providing validations, reports and consolidated recommendations.

🚀 Features

  • Automatic Collection: Collects requests/limits from all pods/containers in the cluster
  • Red Hat Validations: Validates capacity management best practices
  • VPA Integration: Consumes VPA recommendations in Off mode
  • Prometheus Integration: Collects real consumption metrics
  • Consolidated Reports: Generates reports in JSON, CSV and PDF
  • Web UI: Simple interface for visualization and interaction
  • Recommendation Application: Allows approving and applying recommendations

📋 Requirements

  • OpenShift 4.x
  • Prometheus (native in OCP)
  • VPA (optional, for recommendations)
  • Python 3.11+
  • Docker
  • OpenShift CLI (oc)

🛠️ Installation

# 1. Clone the repository
git clone <repository-url>
cd RequestsAndLimits

# 2. Login to OpenShift
oc login <cluster-url>

# 3. Complete deploy (creates everything automatically)
./scripts/deploy-complete.sh

📋 Manual Deploy

1. Image Build

# Local build
./scripts/build.sh

# Build with specific tag
./scripts/build.sh v1.0.0

# Build for specific registry
./scripts/build.sh latest your-username

2. Deploy to OpenShift

# Apply all resources
oc apply -f k8s/

# Wait for deployment
oc rollout status deployment/resource-governance -n resource-governance
# 1. Configure GitHub secrets
./scripts/setup-github-secrets.sh

# 2. Commit and push
git add .
git commit -m "New feature"
git push origin main

# 3. GitHub Actions will do automatic deploy!

Automatic Flow:

  • Push to main → GitHub Actions detects change
  • Automatic build → New image on Docker Hub
  • Automatic deploy → OpenShift updates deployment
  • Rolling Update → Zero downtime
  • Health Checks → Automatic validation

🔧 Manual Deploy (Development)

# Deploy with Blue-Green strategy
./scripts/blue-green-deploy.sh

# Deploy with specific tag
./scripts/blue-green-deploy.sh v1.2.0

# Test CI/CD flow locally
./scripts/test-ci-cd.sh

Development Scripts:

  • Full control over the process
  • Fast iteration during development
  • Easier debugging
  • Local tests before pushing

Complete Deploy (Initial)

# Complete deploy with ImagePullSecret (first time)
./scripts/deploy-complete.sh

This script will:

  • Create namespace and RBAC
  • Configure ImagePullSecret for Docker Hub
  • Deploy application
  • Configure Service and Route
  • Verify everything is working

Manual Deploy

# Default deploy
./scripts/deploy.sh

# Deploy with specific tag
./scripts/deploy.sh v1.0.0

# Deploy to specific registry
./scripts/deploy.sh latest your-username

Undeploy

# Completely remove application
./scripts/undeploy-complete.sh

3. Application Access

After deploy, access the application through the created route:

# Get route URL
oc get route resource-governance-route -n resource-governance

# Access via browser
# https://resource-governance-route-resource-governance.apps.openshift.local

🔧 Configuration

ConfigMap

The application is configured through the ConfigMap resource-governance-config:

data:
  CPU_LIMIT_RATIO: "3.0"                    # Default limit:request ratio for CPU
  MEMORY_LIMIT_RATIO: "3.0"                 # Default limit:request ratio for memory
  MIN_CPU_REQUEST: "10m"                    # Minimum CPU request
  MIN_MEMORY_REQUEST: "32Mi"                # Minimum memory request
  CRITICAL_NAMESPACES: |                    # Critical namespaces for VPA
    openshift-monitoring
    openshift-ingress
    openshift-apiserver
  PROMETHEUS_URL: "http://prometheus.openshift-monitoring.svc.cluster.local:9090"

Environment Variables

  • KUBECONFIG: Path to kubeconfig (used in development)
  • PROMETHEUS_URL: Prometheus URL
  • CPU_LIMIT_RATIO: CPU limit:request ratio
  • MEMORY_LIMIT_RATIO: Memory limit:request ratio
  • MIN_CPU_REQUEST: Minimum CPU request
  • MIN_MEMORY_REQUEST: Minimum memory request

📊 Usage

API Endpoints

Cluster Status

GET /api/v1/cluster/status

Namespace Status

GET /api/v1/namespace/{namespace}/status

Validations

GET /api/v1/validations?namespace=default&severity=error

VPA Recommendations

GET /api/v1/vpa/recommendations?namespace=default

Export Report

POST /api/v1/export
Content-Type: application/json

{
  "format": "json",
  "namespaces": ["default", "kube-system"],
  "includeVPA": true,
  "includeValidations": true
}

Usage Examples

1. Check Cluster Status

curl https://resource-governance-route-resource-governance.apps.openshift.local/api/v1/cluster/status

2. Export CSV Report

curl -X POST https://resource-governance-route-resource-governance.apps.openshift.local/api/v1/export \
  -H "Content-Type: application/json" \
  -d '{"format": "csv", "includeVPA": true}'

3. View Critical Validations

curl "https://resource-governance-route-resource-governance.apps.openshift.local/api/v1/validations?severity=critical"

🔍 Implemented Validations

1. Required Requests

  • Problem: Pods without defined requests
  • Severity: Error
  • Recommendation: Define CPU and memory requests
  • Problem: Pods without defined limits
  • Severity: Warning
  • Recommendation: Define limits to avoid excessive consumption

3. Limit:Request Ratio

  • Problem: Ratio too high or low
  • Severity: Warning/Error
  • Recommendation: Adjust to 3:1 ratio

4. Minimum Values

  • Problem: Requests too low
  • Severity: Warning
  • Recommendation: Increase to minimum values

5. Overcommit

  • Problem: Requests exceed cluster capacity
  • Severity: Critical
  • Recommendation: Reduce requests or add nodes

📈 Reports

JSON Format

{
  "timestamp": "2024-01-15T10:30:00Z",
  "total_pods": 150,
  "total_namespaces": 25,
  "total_nodes": 3,
  "validations": [...],
  "vpa_recommendations": [...],
  "summary": {
    "total_validations": 45,
    "critical_issues": 5,
    "warnings": 25,
    "errors": 15
  }
}

CSV Format

Pod Name,Namespace,Container Name,Validation Type,Severity,Message,Recommendation
pod-1,default,nginx,missing_requests,error,Container without defined requests,Define CPU and memory requests

🔐 Security

RBAC

The application uses a dedicated ServiceAccount with minimal permissions:

  • Pods: get, list, watch, patch, update
  • Namespaces: get, list, watch
  • Nodes: get, list, watch
  • VPA: get, list, watch
  • Deployments/ReplicaSets: get, list, watch, patch, update

Security Context

  • Runs as non-root user (UID 1000)
  • Uses SecurityContext with runAsNonRoot: true
  • Limits resources with requests/limits

🐛 Troubleshooting

Check Logs

oc logs -f daemonset/resource-governance -n resource-governance

Check Pod Status

oc get pods -n resource-governance
oc describe pod <pod-name> -n resource-governance

Check RBAC

oc auth can-i get pods --as=system:serviceaccount:resource-governance:resource-governance-sa

Test Connectivity

# Health check
curl https://resource-governance-route-resource-governance.apps.openshift.local/health

# API test
curl https://resource-governance-route-resource-governance.apps.openshift.local/api/v1/cluster/status

🚀 Development

Run Locally

# Install dependencies
pip install -r requirements.txt

# Run application
python -m uvicorn app.main:app --reload --host 0.0.0.0 --port 8080

Run with Docker

# Build
docker build -t resource-governance .

# Run
docker run -p 8080:8080 resource-governance

Tests

# Test import
python -c "import app.main; print('OK')"

# Test API
curl http://localhost:8080/health

📝 Roadmap

Upcoming Versions

  • Web UI with interactive charts
  • PDF reports with charts
  • Custom rules per namespace
  • GitOps integration (ArgoCD)
  • Slack/Teams notifications
  • Custom Prometheus metrics
  • Multi-cluster support

🤝 Contributing

  1. Fork the project
  2. Create a branch for your feature (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add some AmazingFeature')
  4. Push to the branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

📄 License

This project is under the MIT license. See the LICENSE file for details.

📞 Support

For support and questions:

  • Open an issue on GitHub
  • Consult OpenShift documentation
  • Check application logs