Complete observability solution for GitHub Actions workflows using OpenTelemetry Collector, Prometheus, and Grafana. Monitor your CI/CD pipelines with distributed tracing, rich metrics, and real-time dashboards.
- π Complete Metrics: Workflow execution rates, durations, success/failure rates
- π Distributed Traces: Every workflow run as a detailed trace with job and step spans
- π Rich Dashboards: 6 pre-built Grafana dashboards for different analysis needs
- β‘ Real-time Data: Live updates via GitHub webhooks with <15s latency
- π VCS Insights: Pull request metrics, code change analysis, merge times
- β Failure Analysis: Identify problematic workflows, steps, and patterns
flowchart TD
%% GitHub Source
GH["π GitHub Repository<br/><b>Actions Triggered</b>"]
%% Webhook Flow
GH -->|"workflow_run<br/>workflow_job events"| WH["π‘ GitHub Webhook<br/><b>POST /events</b>"]
WH -->|"HTTPS POST<br/>JSON payload"| CF["π Cloudflare Tunnel<br/><b>Secure Proxy</b>"]
%% Collector Entry Point
CF -->|"Forward to<br/>localhost:9504"| COLLECTOR
%% Main Processing Engine
subgraph COLLECTOR ["π OpenTelemetry Collector"]
direction LR
GHR["π― GitHub Receiver<br/>Port 9504"]
PIPELINE["βοΈ Processing Pipeline<br/>Resource β Attributes β Batch β Span Metrics"]
PE["π Prometheus Exporter<br/>Port 9464"]
GHR --> PIPELINE --> PE
end
%% Parallel API Data Source
subgraph API ["π GitHub API Scraping"]
direction TB
AUTH["π Bearer Token Auth<br/>GitHub PAT"]
SCRAPER["π GitHub Scraper<br/>REST/GraphQL API"]
VCS["π VCS Metrics<br/>Repos β’ PRs β’ Changes"]
AUTH --> SCRAPER --> VCS
end
%% Connect API to Collector
VCS -.->|"Additional<br/>metrics"| PE
%% Data Storage
PE -->|"Scrape metrics<br/>:9464/metrics"| PROM["β‘ Prometheus<br/><b>Time-Series Database</b><br/>30-day retention"]
%% Visualization Layer
PROM -->|"PromQL<br/>queries"| GRAFANA
subgraph GRAFANA ["π Grafana Dashboards"]
direction LR
D1["π Overview"] --- D2["π Health"] --- D3["π Metrics"] --- D4["π Performance"]
end
%% User Access
USER["π€ User Browser<br/><b>localhost:3000</b>"] --> GRAFANA
%% Enhanced Styling
classDef source fill:#24292e,stroke:#f9826c,stroke-width:3px,color:#fff
classDef webhook fill:#f38020,stroke:#fff,stroke-width:2px,color:#fff
classDef collector fill:#326ce5,stroke:#fff,stroke-width:2px,color:#fff
classDef storage fill:#e6522c,stroke:#fff,stroke-width:2px,color:#fff
classDef dashboard fill:#f46800,stroke:#fff,stroke-width:2px,color:#fff
classDef user fill:#00d924,stroke:#fff,stroke-width:3px,color:#fff
class GH source
class WH,CF webhook
class GHR,PIPELINE,PE,AUTH,SCRAPER,VCS collector
class PROM storage
class D1,D2,D3,D4 dashboard
class USER user
- Docker Desktop: Version 4.0+ with Docker Compose
- Cloudflared: For secure webhook tunneling
# macOS (Homebrew) brew install cloudflare/cloudflare/cloudflared # Other platforms: https://developers.cloudflare.com/cloudflare-one/connections/connect-networks/downloads/
- Repository: A GitHub repository where you want to monitor workflows
- Personal Access Token: With these scopes:
repo(Full read repository access)workflow(Read GitHub Action workflows)read:org(Read organization membership)
- Webhook Secret: A secure random string (we'll generate this)
- RAM: 4GB minimum (8GB recommended)
- Disk: 10GB free space for metrics storage
- Network: Stable internet for GitHub API and tunnel
# Clone this repository
git clone <your-repo-url>
cd otel-official
# Make scripts executable
chmod +x start-tunnel.sh- Go to GitHub Settings β Developer settings β Personal access tokens β Tokens (classic)
- Click "Generate new token (classic)"
- Select scopes:
repo,workflow,read:org - Copy the token (you won't see it again!)
# Generate a secure webhook secret
openssl rand -hex 32
# Create environment file
cat > .env << EOF
# GitHub Personal Access Token (with repo, workflow, read:org scopes)
GITHUB_TOKEN=ghp_your_token_here
# Webhook secret for validating GitHub requests (use output from openssl command above)
GITHUB_WEBHOOK_SECRET=your_generated_secret_here
EOF# Start all services (Collector, Prometheus, Grafana)
docker compose up -d
# Check all services are healthy
docker compose psExpected output:
NAME STATUS
otel-official-collector-1 Up 2 seconds
otel-official-grafana-1 Up 2 seconds
otel-official-prometheus-1 Up 2 seconds
# Start the Cloudflare tunnel
./start-tunnel.shThis will output something like:
π Starting Cloudflare Tunnel for GitHub Webhooks
==================================================
β
Tunnel created: https://abc-def-123.trycloudflare.com
π GitHub Webhook Configuration:
URL: https://abc-def-123.trycloudflare.com/events
Content-Type: application/json
Secret: (use GITHUB_WEBHOOK_SECRET from .env)
Events: β
Workflow runs, β
Workflow jobs
β‘ Collector Health: http://localhost:9504/health
π Grafana: http://localhost:3000 (admin/admin)
Keep this terminal open! The tunnel only works while this script is running.
- Go to your GitHub repository you want to track
- Navigate to Settings β Webhooks
- Click Add webhook
- Configure:
- Payload URL:
https://your-tunnel-url.trycloudflare.com/events - Content type:
application/json - Secret: Paste your
GITHUB_WEBHOOK_SECRETfrom.env - Events selection: Choose "Let me select individual events"
- Select events: β Workflow runs, β Workflow jobs
- Payload URL:
- Click Add webhook
# Test webhook endpoint health
curl https://your-tunnel-url.trycloudflare.com/health
# Trigger a workflow in your repository
git push origin main # or trigger manually from GitHub Actions tabOpen http://localhost:3000 in your browser:
- Username:
admin - Password:
admin(change on first login)
Prometheus Dashbaord is available at http://localhost:9090
- GitHub Actions Overview & Observability - Executive summary and KPIs
- GitHub Actions - Workflow Exploration - Detailed drill-down analysis
- GitHub Actions - Workflow Analysis - Scalable pattern analysis
- GitHub Actions - Repository Performance - Strategic performance metrics
- GitHub Actions - Workflow Health Overview - Health monitoring
- GitHub Actions - Complete Metrics - Every available data point visualized
The OpenTelemetry Collector is configured with:
- GitHub Receiver: Webhook on port 9504, API scraping limited to your repository
- Span Metrics Processor: Generates RED metrics from traces
- Prometheus Exporter: Metrics on port 9464
- Resource Processor: Adds service metadata
From Webhooks (Traces β Span Metrics):
github_actions_traces_span_metrics_calls_total{service_name="github-actions"}
github_actions_traces_span_metrics_duration_seconds{service_name="github-actions"}
From GitHub API Scraping:
github_actions_vcs_repository_count
github_actions_vcs_change_count{vcs_repository_name="your-repo"}
github_actions_vcs_change_duration_seconds{vcs_repository_name="your-repo"}
github_actions_vcs_ref_count{vcs_repository_name="your-repo"}
-
Check webhook delivery:
- GitHub repo β Settings β Webhooks β Recent Deliveries
- Look for 200 responses
-
Verify collector is receiving data:
docker compose logs collector | grep "github"
-
Test tunnel connectivity:
curl https://your-tunnel-url.trycloudflare.com/health
-
Check Prometheus targets: http://localhost:9090/targets
otel-collectorshould be "UP"
-
Takes a minute or two.
sequenceDiagram
participant GH as GitHub Actions
participant WH as GitHub Webhook
participant CF as Cloudflare Tunnel
participant OC as OpenTelemetry Collector
participant GS as GitHub Scraper
participant PR as Prometheus
participant GR as Grafana
participant US as User
%% Real-time Webhook Flow
Note over GH: Workflow Triggered
GH->>WH: POST workflow_run event
WH->>CF: HTTPS request to tunnel
CF->>OC: Forward to localhost:9504/events
%% Collector Processing
Note over OC: Event Processing Pipeline
OC->>OC: Parse GitHub payload
OC->>OC: Add resource metadata
OC->>OC: Generate span metrics
OC->>PR: Export metrics (:9464/metrics)
%% Parallel API Scraping (every 60s)
Note over GS: Periodic Scraping
GS->>GH: GitHub API calls (authenticated)
GH-->>GS: Repository & VCS data
GS->>OC: VCS metrics
OC->>PR: Export VCS metrics
%% User Queries Dashboards
Note over US: Dashboard Access
US->>GR: Access dashboard
GR->>PR: PromQL queries
PR-->>GR: Metrics data
GR-->>US: Rendered dashboard
%% Health Checks
US->>CF: GET /health
CF->>OC: Health check
OC-->>CF: {"status": "healthy"}
CF-->>US: 200 OK
| Service | Port | Purpose | Data Volume |
|---|---|---|---|
| collector | 9504, 9464 | OpenTelemetry data collection | otel_data |
| prometheus | 9090 | Metrics storage (30 days) | prometheus_data |
| grafana | 3000 | Dashboards and visualization | grafana_data |
grafana depends_on β prometheus depends_on β collector
All services are connected via the observability Docker network.
After setup, you should see:
- GitHub Webhook: β Green checkmark in GitHub webhook settings
- Collector Logs: Messages about receiving webhook events
- Prometheus Metrics: Data visible at http://localhost:9090/graph
- Grafana Dashboards: Populated charts with workflow data
Test these in Prometheus (http://localhost:9090):
# Workflow execution rate
rate(github_actions_traces_span_metrics_calls_total[5m])
# Repository activity
github_actions_vcs_change_count{vcs_repository_name="your-repo"}
# Average workflow duration
rate(github_actions_traces_span_metrics_duration_seconds_sum[5m]) /
rate(github_actions_traces_span_metrics_duration_seconds_count[5m])
Edit collector-config.yaml:
scrapers:
scraper:
github_org: your-org
search_query: "repo:your-org/repo1 OR repo:your-org/repo2"github:
webhook:
path: "/custom-webhook-path"
health_path: "/custom-health"Edit docker-compose.yml:
prometheus:
command:
- "--storage.tsdb.retention.time=90d" # 90 days instead of 30- Use fine-grained Personal Access Tokens when possible
- Limit token scope to only required repositories
- Rotate tokens regularly (every 90 days)
- Store tokens in secure password managers
- Always use a strong, random webhook secret
- Regularly rotate webhook secrets
- Monitor webhook delivery logs for suspicious activity
- Use strong Grafana admin password
- Consider implementing reverse proxy with authentication
- Monitor collector logs for unauthorized access attempts
- Regularly update Docker images
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
This project uses GitHub Copilot instructions to ensure consistent code quality and standards. Please review the .github/copilot-instructions.md file before contributing.