Monitoring & Observability Guide - Active Graph KG¶

Status: ✅ Production Ready Last Updated: 2025-11-24

Overview¶

Active Graph KG exposes comprehensive Prometheus metrics for production monitoring across all critical endpoints and system health indicators.

Key Features: - Request counters and latency histograms - Score distributions (RRF, cosine, weighted) - Citation quality metrics - Rejection tracking with reasons - Embedding health gauges - Multi-tenant support

Quick Start¶

1. Install Dependencies¶

pip install prometheus-client==0.21.0

2. Enable Metrics¶

export METRICS_ENABLED=true

3. Start Server¶

uvicorn activekg.api.main:app --host 0.0.0.0 --port 8000

4. Test Metrics Endpoint¶

# Check /prometheus endpoint
curl http://localhost:8000/prometheus

# Legacy /metrics endpoint (JSON)
curl http://localhost:8000/metrics | jq .

5. Configure Prometheus Scraping¶

Add to prometheus.yml:

scrape_configs:
  - job_name: 'activekg'
    scrape_interval: 15s
    metrics_path: '/prometheus'
    static_configs:
      - targets: ['localhost:8000']

Available Metrics¶

Request Counters¶

`activekg_ask_requests_total{score_type, rejected}`¶

Type: Counter
Description: Total number of /ask requests
Labels:
score_type: "rrf_fused" or "cosine"
rejected: "true" or "false"

`activekg_search_requests_total{mode, score_type}`¶

Type: Counter
Description: Total number of /search requests
Labels:
mode: "hybrid", "vector", or "text"
score_type: "rrf_fused", "weighted_fusion", or "cosine"

Score Distributions¶

`activekg_gating_score{score_type}`¶

Type: Histogram
Description: Distribution of gating scores from /ask endpoint
Labels:
score_type: "rrf_fused" or "cosine"
Buckets:
RRF range: 0.005, 0.01, 0.015, 0.02, 0.025, 0.03, 0.035, 0.04, 0.05, 0.1
Cosine range: 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0

Why Both Ranges: RRF scores are typically 0.01-0.04, while cosine scores are 0.0-1.0. The histogram covers both ranges for unified tracking.

Citation Quality¶

`activekg_cited_nodes{score_type}`¶

Type: Histogram
Description: Distribution of citation counts per request
Labels:
score_type: "rrf_fused" or "cosine"
Buckets: 0, 1, 2, 3, 5, 10, 15, 20, 30, 50

`activekg_zero_citations_total{score_type, reason}`¶

Type: Counter
Description: Count of requests with zero citations
Labels:
score_type: "rrf_fused" or "cosine"
reason: "no_results", "extremely_low_similarity", etc.

Rejection Metrics¶

`activekg_rejections_total{reason, score_type}`¶

Type: Counter
Description: Count of rejected queries by reason
Labels:
reason: "extremely_low_similarity", "ambiguity", "no_results", etc.
score_type: "rrf_fused" or "cosine"

Latency Metrics¶

`activekg_ask_latency_seconds{score_type, reranked}`¶

Type: Histogram
Description: Latency distribution of /ask requests
Labels:
score_type: "rrf_fused" or "cosine"
reranked: "true" or "false"
Buckets: 0.1, 0.25, 0.5, 0.75, 1.0, 1.5, 2.0, 3.0, 5.0, 10.0 seconds

`activekg_search_latency_seconds{mode, score_type, reranked}`¶

Type: Histogram
Description: Latency distribution of /search requests
Labels:
mode: "hybrid", "vector", "text"
score_type: "rrf_fused", "weighted_fusion", "cosine"
reranked: "true" or "false"
Buckets: 0.05, 0.1, 0.25, 0.5, 0.75, 1.0, 1.5, 2.0, 3.0, 5.0 seconds

Embedding Health¶

`activekg_embedding_coverage_ratio{tenant_id}`¶

Type: Gauge
Description: Ratio of nodes with embeddings (0.0-1.0)
Labels:
tenant_id: Tenant identifier (e.g., "default")
Updated by: /debug/embed_info endpoint calls

`activekg_embedding_max_staleness_seconds{tenant_id}`¶

Type: Gauge
Description: Maximum time since last embedding refresh
Labels:
tenant_id: Tenant identifier
Updated by: /debug/embed_info endpoint calls

Endpoint Instrumentation¶

/ask Endpoint¶

Location: activekg/api/main.py:1800-1900

What's Tracked: - Request count (total, rejected) - Gating score distribution - Citation counts - Zero-citation occurrences - Rejection reasons - Request latency - Reranking usage

Implementation:

import time
from activekg.observability import track_ask_request

@app.post("/ask")
async def ask(request: KGAskRequest, ...):
    start_time = time.time()

    # ... existing code ...

    # Determine score type
    rrf_enabled = os.getenv("HYBRID_RRF_ENABLED", "true").lower() == "true"
    gating_score_type = "rrf_fused" if rrf_enabled else "cosine"

    # Track metrics before return
    if METRICS_ENABLED:
        latency_ms = (time.time() - start_time) * 1000

        track_ask_request(
            gating_score=top_similarity,
            gating_score_type=gating_score_type,
            cited_nodes=len(citations),
            latency_ms=latency_ms,
            rejected=("reason" in metadata),
            rejection_reason=metadata.get("reason"),
            reranked=use_reranker
        )

    return {"answer": answer, "citations": citations, ...}

Metrics Captured: - activekg_ask_requests_total - Incremented on every request - activekg_gating_score - Records top similarity score - activekg_cited_nodes - Records citation count - activekg_rejections_total - Incremented if rejected - activekg_zero_citations_total - Incremented if 0 citations - activekg_ask_latency_seconds - Records end-to-end latency

/search Endpoint¶

Location: activekg/api/main.py:1100-1210

What's Tracked: - Search mode (hybrid/vector/text) - Score type (rrf_fused/weighted_fusion/cosine) - Request latency - Result count - Reranking usage

Implementation:

import time
from activekg.observability import track_search_request

@app.post("/search")
def search_nodes(search_request: KGSearchRequest, ...):
    start_time = time.time()

    # ... existing code ...

    # Track metrics before return
    if METRICS_ENABLED:
        latency_ms = (time.time() - start_time) * 1000

        # Determine mode and score_type
        if search_request.use_hybrid:
            mode = "hybrid"
            rrf_enabled = os.getenv("HYBRID_RRF_ENABLED", "true").lower() == "true"
            score_type = "rrf_fused" if rrf_enabled else "weighted_fusion"
            reranked = search_request.use_reranker
        else:
            mode = "vector"
            score_type = "weighted_fusion" if search_request.use_weighted_score else "cosine"
            reranked = False

        track_search_request(
            mode=mode,
            score_type=score_type,
            latency_ms=latency_ms,
            result_count=len(formatted_results),
            reranked=reranked
        )

    return {"query": search_request.query, "results": formatted_results, ...}

Metrics Captured: - activekg_search_requests_total - Incremented on every request - activekg_search_latency_seconds - Records search latency

Mode Detection Logic: | use_hybrid | use_weighted_score | use_reranker | Mode | Score Type | Reranked | |--------------|----------------------|----------------|------|------------|----------| | true | - | true | hybrid | rrf_fused (if RRF enabled) | true | | true | - | false | hybrid | weighted_fusion (if RRF disabled) | false | | false | true | - | vector | weighted_fusion | false | | false | false | - | vector | cosine | false |

/debug/embed_info Endpoint¶

Location: activekg/api/main.py:646-658

What's Tracked: - Embedding coverage ratio - Maximum embedding staleness - Per-tenant tracking

Implementation:

from activekg.observability import track_embedding_health

@app.get("/debug/embed_info")
def get_embed_info(...):
    # ... calculate stats ...

    # Track embedding health metrics (if enabled)
    if METRICS_ENABLED:
        # Calculate coverage ratio
        coverage_ratio = float(with_embedding / total_nodes) if total_nodes > 0 else 0.0

        # Use max staleness if available
        max_staleness_seconds = float(lr_age_max) if lr_age_max is not None else 0.0

        track_embedding_health(
            coverage_ratio=coverage_ratio,
            max_staleness_seconds=max_staleness_seconds,
            tenant_id=tenant_id
        )

    return {"total_nodes": total_nodes, "with_embedding": with_embedding, ...}

Metrics Captured: - activekg_embedding_coverage_ratio - Updated on each call - activekg_embedding_max_staleness_seconds - Updated on each call

Usage Pattern:

# Manual health check
curl -s http://localhost:8000/debug/embed_info | jq .

# Periodic monitoring (cron)
*/5 * * * * curl -s http://localhost:8000/debug/embed_info > /dev/null 2>&1

Grafana Dashboards¶

Dashboard 1: Request Overview¶

Panels:

Total Request Rate
```
rate(activekg_ask_requests_total[5m])
```

Rejection Rate (Percentage)

100 * (
  rate(activekg_ask_requests_total{rejected="true"}[5m])
  /
  rate(activekg_ask_requests_total[5m])
)

Average Citations

rate(activekg_cited_nodes_sum{score_type="rrf_fused"}[5m])
/
rate(activekg_cited_nodes_count{score_type="rrf_fused"}[5m])

P95 Latency

histogram_quantile(0.95,
  rate(activekg_ask_latency_seconds_bucket[5m])
)

Dashboard 2: Score Analysis¶

Panels:

Gating Score Distribution (RRF)

# P50, P95, P99
histogram_quantile(0.50, rate(activekg_gating_score_bucket{score_type="rrf_fused"}[5m]))
histogram_quantile(0.95, rate(activekg_gating_score_bucket{score_type="rrf_fused"}[5m]))
histogram_quantile(0.99, rate(activekg_gating_score_bucket{score_type="rrf_fused"}[5m]))

Gating Score Distribution (Cosine)

histogram_quantile(0.50, rate(activekg_gating_score_bucket{score_type="cosine"}[5m]))
histogram_quantile(0.95, rate(activekg_gating_score_bucket{score_type="cosine"}[5m]))
histogram_quantile(0.99, rate(activekg_gating_score_bucket{score_type="cosine"}[5m]))

Score Heatmap

# Use "rate(activekg_gating_score_bucket[5m])" with Heatmap visualization

Dashboard 3: Citation Quality¶

Panels:

Citation Distribution
```
rate(activekg_cited_nodes_bucket[5m])
```

Zero-Citation Rate

100 * (
  rate(activekg_zero_citations_total[5m])
  /
  rate(activekg_ask_requests_total{rejected="false"}[5m])
)

Citations by Score Type

rate(activekg_cited_nodes_sum[5m])
/
rate(activekg_cited_nodes_count[5m])
by (score_type)

Dashboard 4: Latency Breakdown¶

Panels:

Latency Percentiles

histogram_quantile(0.50, rate(activekg_ask_latency_seconds_bucket[5m]))
histogram_quantile(0.95, rate(activekg_ask_latency_seconds_bucket[5m]))
histogram_quantile(0.99, rate(activekg_ask_latency_seconds_bucket[5m]))

Reranking Impact

# With reranking
histogram_quantile(0.95,
  rate(activekg_ask_latency_seconds_bucket{reranked="true"}[5m])
)

# Without reranking
histogram_quantile(0.95,
  rate(activekg_ask_latency_seconds_bucket{reranked="false"}[5m])
)

Search vs Ask Latency

# /ask P95
histogram_quantile(0.95, rate(activekg_ask_latency_seconds_bucket[5m]))

# /search P95
histogram_quantile(0.95, rate(activekg_search_latency_seconds_bucket[5m]))

Dashboard 5: Rejection Analysis¶

Panels:

Rejections Over Time
```
rate(activekg_rejections_total[5m])
```

Rejection Reasons (Pie Chart)

sum(rate(activekg_rejections_total[5m])) by (reason)

Rejection Rate by Score Type

100 * (
  rate(activekg_rejections_total[5m])
  /
  ignoring(reason) group_left sum(rate(activekg_ask_requests_total[5m]))
)
by (score_type)

Dashboard 6: Embedding Health¶

Panels:

Coverage Gauge

100 * activekg_embedding_coverage_ratio{tenant_id="default"}

Staleness (Hours)

activekg_embedding_max_staleness_seconds{tenant_id="default"} / 3600

Coverage Trend

avg_over_time(activekg_embedding_coverage_ratio{tenant_id="default"}[5m])

Multi-Tenant Coverage
```
activekg_embedding_coverage_ratio
```

Alerting Rules¶

Critical Alerts¶

High Rejection Rate¶

- alert: HighRejectionRate
  expr: |
    100 * (
      rate(activekg_ask_requests_total{rejected="true"}[5m])
      /
      rate(activekg_ask_requests_total[5m])
    ) > 20
  for: 10m
  labels:
    severity: warning
  annotations:
    summary: "High query rejection rate (>20%)"
    description: "{{ $value | humanizePercentage }} of queries are being rejected"

Low Embedding Coverage¶

- alert: LowEmbeddingCoverage
  expr: activekg_embedding_coverage_ratio < 0.95
  for: 10m
  labels:
    severity: warning
  annotations:
    summary: "Embedding coverage below 95%"
    description: "Only {{ $value | humanizePercentage }} of nodes have embeddings for tenant {{ $labels.tenant_id }}"

Stale Embeddings¶

- alert: StaleEmbeddings
  expr: activekg_embedding_max_staleness_seconds > 86400
  for: 1h
  labels:
    severity: warning
  annotations:
    summary: "Embeddings older than 24 hours"
    description: "Max staleness: {{ $value | humanizeDuration }} for tenant {{ $labels.tenant_id }}"

Critical Embedding Staleness¶

- alert: CriticalEmbeddingStaleness
  expr: activekg_embedding_max_staleness_seconds > 259200
  for: 30m
  labels:
    severity: critical
  annotations:
    summary: "Embeddings older than 3 days"
    description: "Max staleness: {{ $value | humanizeDuration }} for tenant {{ $labels.tenant_id }}. Immediate action required."

High Latency¶

- alert: HighP95Latency
  expr: |
    histogram_quantile(0.95,
      rate(activekg_ask_latency_seconds_bucket[5m])
    ) > 2.0
  for: 10m
  labels:
    severity: warning
  annotations:
    summary: "P95 latency exceeds 2 seconds"
    description: "P95 latency: {{ $value }}s"

High Zero-Citation Rate¶

- alert: HighZeroCitationRate
  expr: |
    100 * (
      rate(activekg_zero_citations_total[5m])
      /
      rate(activekg_ask_requests_total{rejected="false"}[5m])
    ) > 30
  for: 15m
  labels:
    severity: info
  annotations:
    summary: "High zero-citation rate (>30%)"
    description: "{{ $value | humanizePercentage }} of successful requests have no citations"

Production Deployment¶

Environment Configuration¶

# Enable metrics
export METRICS_ENABLED=true

Prometheus Configuration¶

global:
  scrape_interval: 15s
  evaluation_interval: 15s

scrape_configs:
  - job_name: 'activekg-production'
    scrape_interval: 15s
    scrape_timeout: 10s
    metrics_path: '/prometheus'
    static_configs:
      - targets:
          - 'activekg-api-1:8000'
          - 'activekg-api-2:8000'
          - 'activekg-api-3:8000'
    relabel_configs:
      - source_labels: [__address__]
        target_label: instance

alerting:
  alertmanagers:
    - static_configs:
        - targets: ['alertmanager:9093']

rule_files:
  - 'alerts/activekg_alerts.yml'

Grafana Data Source¶

apiVersion: 1
datasources:
  - name: Prometheus
    type: prometheus
    access: proxy
    url: http://prometheus:9090
    isDefault: true
    jsonData:
      timeInterval: "15s"

Security Considerations¶

Option 1: Internal-only metrics

# In docker-compose.yml
services:
  activekg:
    networks:
      - internal  # Metrics only on internal network

Option 2: Basic Auth

# Add middleware to /prometheus endpoint
from fastapi import Depends, HTTPException
from fastapi.security import HTTPBasic, HTTPBasicCredentials

security = HTTPBasic()

def verify_metrics_auth(credentials: HTTPBasicCredentials = Depends(security)):
    if credentials.username != "prometheus" or credentials.password != os.getenv("METRICS_PASSWORD"):
        raise HTTPException(401, "Invalid credentials")
    return credentials

if METRICS_ENABLED:
    app.add_route("/prometheus", get_metrics_handler(), dependencies=[Depends(verify_metrics_auth)])

Option 3: Separate Metrics Port - Run separate HTTP server on different port (e.g., 9090) - Expose only internally or via VPN

Testing & Validation¶

Manual Testing¶

1. Start Server with Metrics¶

export METRICS_ENABLED=true
export JWT_ENABLED=false  # For testing
export RATE_LIMIT_ENABLED=false
./scripts/dev_up.sh

2. Generate Test Traffic¶

# /ask endpoint
for i in {1..10}; do
  curl -X POST http://localhost:8000/ask \
    -H "Content-Type: application/json" \
    -d '{"question":"What ML frameworks are required for the Machine Learning Engineer position?"}'
done

# /search endpoint (hybrid)
curl -X POST http://localhost:8000/search \
  -H "Content-Type: application/json" \
  -d '{
    "query": "machine learning engineer",
    "top_k": 10,
    "use_hybrid": true,
    "use_reranker": true
  }'

# /debug/embed_info endpoint
curl -s http://localhost:8000/debug/embed_info | jq .

3. Check Metrics¶

# View all metrics
curl -s http://localhost:8000/prometheus

# Filter specific metric
curl -s http://localhost:8000/prometheus | grep activekg_ask_requests_total
curl -s http://localhost:8000/prometheus | grep activekg_gating_score
curl -s http://localhost:8000/prometheus | grep activekg_embedding

Automated Validation Script¶

Location: scripts/test_prometheus_integration.sh

chmod +x scripts/test_prometheus_integration.sh
bash scripts/test_prometheus_integration.sh

What It Tests: - ✅ Health endpoint accessible - ✅ /prometheus endpoint returns 200 - ✅ Prometheus format valid (contains # TYPE) - ✅ Test traffic generation - ✅ Metrics tracked: - activekg_ask_requests_total - activekg_gating_score - activekg_cited_nodes - activekg_ask_latency_seconds - activekg_rejections_total

Expected Output:

═══════════════════════════════════════════════════════════
  Prometheus Integration Validation
═══════════════════════════════════════════════════════════

1. Checking if server is running... ✓
2. Checking /prometheus endpoint... ✓ (HTTP 200)
3. Verifying Prometheus format... ✓
4. Generating test traffic...
   - Sending successful query... ✓
   - Sending low-similarity query... ✓
5. Verifying metrics are tracked:
   - activekg_ask_requests_total... ✓ (count: 2.0)
   - activekg_gating_score... ✓ (10 buckets)
   - activekg_cited_nodes... ✓
   - activekg_ask_latency_seconds... ✓ (10 buckets)
   - activekg_rejections_total... ✓ (rejections: 1)

═══════════════════════════════════════════════════════════
✓ Prometheus Integration Validated
═══════════════════════════════════════════════════════════

Troubleshooting¶

Issue: Metrics not updating¶

Symptom: Metrics endpoint returns old values

Solution:

# Check METRICS_ENABLED
echo $METRICS_ENABLED

# Verify metrics are being tracked
curl -s http://localhost:8000/prometheus | grep activekg_ask_requests_total

# Generate test traffic and check again
curl -X POST http://localhost:8000/ask -d '{"question":"test"}'
curl -s http://localhost:8000/prometheus | grep activekg_ask_requests_total

Issue: Missing metrics¶

Symptom: Some metrics don't appear in /prometheus output

Cause: Metrics only appear after first use (lazy initialization)

Solution:

# Trigger all endpoints
curl -X POST http://localhost:8000/ask -d '{"question":"test"}'
curl -X POST http://localhost:8000/search -d '{"query":"test"}'
curl http://localhost:8000/debug/embed_info

# Check metrics again
curl -s http://localhost:8000/prometheus | grep activekg_

Issue: prometheus_client not found¶

Symptom: ModuleNotFoundError: No module named 'prometheus_client'

Solution:

# Install in venv
source venv/bin/activate
pip install prometheus-client==0.21.0

# Verify installation
pip show prometheus-client

Performance Impact¶

Metrics Collection Overhead¶

Typical Impact: - Request overhead: <1ms per request - Memory overhead: ~10-20MB for metric storage - CPU overhead: Negligible (<1%)

Benchmarks (10K requests): - Without metrics: Avg 250ms, P95 450ms - With metrics: Avg 251ms, P95 451ms (+1ms, +0.2%)

Scraping Load¶

Prometheus Scraping: - Scrape interval: 15s - Scrape duration: ~50-100ms - Impact: Negligible (0.6% of time)

Recommendations¶

Use 15s scrape interval (balance between freshness and load)
Increase to 30s if needed (for very high-traffic APIs)
Use /prometheus endpoint (optimized format, not JSON)
Monitor Prometheus itself (ensure scrape success rate >99%)

Summary¶

Active Graph KG Prometheus Integration:

✅ Comprehensive Coverage - All critical endpoints instrumented ✅ Score Tracking - RRF and cosine score distributions ✅ Citation Quality - Citation counts and zero-citation tracking ✅ Rejection Analysis - Rejection reasons and rates ✅ Latency Monitoring - P50/P95/P99 with reranking labels ✅ Embedding Health - Coverage and staleness gauges ✅ Multi-Tenant Support - Tenant-scoped metrics ✅ Production Ready - Tested, validated, documented

Status: Production Ready

References¶

Original Documentation:
PROMETHEUS_WIRING_SUMMARY.md - /ask endpoint instrumentation
SEARCH_INSTRUMENTATION_SUMMARY.md - /search endpoint instrumentation
EMBEDDING_HEALTH_INSTRUMENTATION.md - /debug/embed_info instrumentation
docs/PROMETHEUS_INTEGRATION.md - Original integration guide
Implementation Files:
activekg/observability/metrics.py - Metric definitions
activekg/observability/__init__.py - Exports and handlers
activekg/api/main.py - Endpoint instrumentation
Testing:
scripts/test_prometheus_integration.sh - Validation script
External Resources:
Prometheus Python Client
Grafana Prometheus Guide
PromQL Basics

Monitoring & Observability Guide - Active Graph KG¶

Overview¶

Table of Contents¶

Quick Start¶

1. Install Dependencies¶

2. Enable Metrics¶

3. Start Server¶

4. Test Metrics Endpoint¶

5. Configure Prometheus Scraping¶

Available Metrics¶

Request Counters¶

activekg_ask_requests_total{score_type, rejected}¶

activekg_search_requests_total{mode, score_type}¶

Score Distributions¶

activekg_gating_score{score_type}¶

Citation Quality¶

activekg_cited_nodes{score_type}¶

activekg_zero_citations_total{score_type, reason}¶

Rejection Metrics¶

activekg_rejections_total{reason, score_type}¶

Latency Metrics¶

activekg_ask_latency_seconds{score_type, reranked}¶

activekg_search_latency_seconds{mode, score_type, reranked}¶

Embedding Health¶

activekg_embedding_coverage_ratio{tenant_id}¶

activekg_embedding_max_staleness_seconds{tenant_id}¶

Endpoint Instrumentation¶

/ask Endpoint¶

/search Endpoint¶

/debug/embed_info Endpoint¶

Grafana Dashboards¶

Dashboard 1: Request Overview¶

Dashboard 2: Score Analysis¶

Dashboard 3: Citation Quality¶

Dashboard 4: Latency Breakdown¶

Dashboard 5: Rejection Analysis¶

Dashboard 6: Embedding Health¶

Alerting Rules¶

Critical Alerts¶

High Rejection Rate¶

Low Embedding Coverage¶

Stale Embeddings¶

Critical Embedding Staleness¶

High Latency¶

High Zero-Citation Rate¶

Production Deployment¶

Environment Configuration¶

Prometheus Configuration¶

Grafana Data Source¶

Security Considerations¶

Testing & Validation¶

Manual Testing¶

1. Start Server with Metrics¶

2. Generate Test Traffic¶

3. Check Metrics¶

Automated Validation Script¶

Troubleshooting¶

Issue: Metrics not updating¶

Issue: Missing metrics¶

Issue: prometheus_client not found¶

Performance Impact¶

Metrics Collection Overhead¶

Scraping Load¶

Recommendations¶

Summary¶

References¶

`activekg_ask_requests_total{score_type, rejected}`¶

`activekg_search_requests_total{mode, score_type}`¶

`activekg_gating_score{score_type}`¶

`activekg_cited_nodes{score_type}`¶

`activekg_zero_citations_total{score_type, reason}`¶

`activekg_rejections_total{reason, score_type}`¶

`activekg_ask_latency_seconds{score_type, reranked}`¶

`activekg_search_latency_seconds{mode, score_type, reranked}`¶

`activekg_embedding_coverage_ratio{tenant_id}`¶

`activekg_embedding_max_staleness_seconds{tenant_id}`¶