Active Graph KG API Reference¶
Status: ✅ Production Ready Last Updated: 2025-11-17
Overview¶
Active Graph KG is a drift-aware knowledge graph API built on PostgreSQL and pgvector. It provides semantic search, LLM-powered Q&A with citations, automatic embedding refresh, lineage tracking, and semantic triggers.
Key Features: - Semantic search with hybrid BM25+vector fusion and cross-encoder reranking - LLM-powered Q&A with grounded citations and confidence scoring - Drift-aware automatic refresh with configurable policies - Multi-tenant support with Row-Level Security (RLS) - JWT authentication and rate limiting - Lineage tracking with provenance chains - Semantic trigger patterns - Prometheus metrics integration - Dual ANN indexing (IVFFLAT/HNSW) - DSN fallback for PaaS (DATABASE_URL for Railway/Heroku)
Authentication¶
JWT Authentication (Production)¶
When JWT_ENABLED=true, all endpoints require JWT authentication:
- Header:
Authorization: Bearer <token> - Supported Algorithms: RS256 (public key), HS256 (shared secret)
- Claims:
tenant_id(required): Tenant identifier for RLSactor_id(optional): User/service identifierscopes(optional): Permission scopes (e.g.,admin:refresh)
Admin Scopes:
- admin:refresh: Required for /admin/refresh and debug endpoints
Development Mode¶
When JWT_ENABLED=false, authentication is disabled and tenant_id can be provided in request bodies.
Warning: Only use development mode for local testing. Always enable JWT in production.
Base URL¶
Default: http://localhost:8000
Configure via environment variables:
- ACTIVEKG_DSN: PostgreSQL connection string (fallback: DATABASE_URL for PaaS)
- EMBEDDING_BACKEND: Embedding provider (default: sentence-transformers)
- EMBEDDING_MODEL: Model name (default: all-MiniLM-L6-v2)
- LLM_BACKEND: LLM provider for /ask (default: groq)
- RUN_SCHEDULER: Run scheduler on exactly one instance (default: true)
- PGVECTOR_INDEXES: ANN index types (e.g., ivfflat,hnsw)
- SEARCH_DISTANCE: Distance metric (default: cosine)
Rate Limiting¶
When RATE_LIMIT_ENABLED=true, endpoints are rate-limited per tenant:
Headers (included in responses):
- X-RateLimit-Limit: Maximum requests allowed
- X-RateLimit-Remaining: Remaining requests
- X-RateLimit-Reset: Unix timestamp when limit resets
HTTP 429 Response:
Header:Retry-After: <seconds>
Endpoints¶
Health & Metrics¶
GET /health¶
Health check endpoint with system status.
Parameters: None
Response:
{
"status": "ok",
"timestamp": "2025-11-24T12:00:00Z",
"version": "1.0.0",
"uptime_seconds": 3600.0,
"components": {
"db": {"status": "unknown"}
},
"llm_backend": "groq",
"llm_model": "llama-3.1-8b-instant"
}
Status Codes:
- 200 OK: Service healthy
Example:
GET /metrics¶
Get metrics in JSON format.
Parameters: None
Response:
{
"counters": {
"search_requests_total": 1234.0,
"ask_requests_total": 567.0
},
"gauges": {
"embedding_coverage_ratio": 0.95
},
"histograms": {
"search_latency_ms": {
"count": 1234,
"sum": 45678.9,
"p50": 35.2,
"p95": 120.5,
"p99": 250.3
}
},
"timestamp": "2025-11-11T12:00:00Z"
}
Status Codes:
- 200 OK: Metrics retrieved
Example:
GET /prometheus¶
Get metrics in Prometheus exposition format.
Parameters: None
Response: Plain text in Prometheus format
Status Codes:
- 200 OK: Metrics retrieved
- 503 Service Unavailable: Metrics disabled
Example:
Nodes¶
POST /nodes¶
Create a new knowledge graph node.
Authentication: Required when JWT enabled
Request Body:
{
"classes": ["Job", "Posting"],
"props": {
"text": "Senior ML Engineer position requiring PyTorch expertise...",
"title": "Senior ML Engineer",
"location": "San Francisco"
},
"payload_ref": "s3://bucket/job_123.json",
"metadata": {
"source": "linkedin",
"posted_date": "2025-11-01"
},
"refresh_policy": {
"interval_seconds": 86400,
"drift_threshold": 0.1
},
"triggers": ["ml_engineer_pattern"],
"tenant_id": "acme_corp"
}
| Field | Type | Required | Description |
|---|---|---|---|
classes |
array[string] | Yes | Node class labels (1-10, max 100 chars each) |
props |
object | Yes | Node properties (arbitrary JSON) |
payload_ref |
string | No | External payload reference (URL, S3 key, max 500 chars) |
metadata |
object | No | Additional metadata (arbitrary JSON) |
refresh_policy |
object | No | Auto-refresh configuration |
triggers |
array[string] | No | Trigger pattern names to activate |
tenant_id |
string | No | Tenant ID (dev mode only, max 100 chars) |
Response:
Status Codes:
- 200 OK: Node created
- 400 Bad Request: Invalid input
- 401 Unauthorized: Missing/invalid JWT
- 429 Too Many Requests: Rate limit exceeded
- 500 Internal Server Error: Creation failed
Example:
curl -X POST http://localhost:8000/nodes \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <token>" \
-d '{
"classes": ["Job"],
"props": {"text": "ML Engineer position", "title": "ML Engineer"},
"metadata": {"source": "linkedin"}
}'
Notes:
- If AUTO_EMBED_ON_CREATE=true, embedding is generated asynchronously
- tenant_id from JWT overrides request body in production
- Node ID is auto-generated UUID
GET /nodes/{node_id}¶
Retrieve a node by ID.
Authentication: Required when JWT enabled
Path Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
node_id |
string | Yes | Node UUID |
Query Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
tenant_id |
string | No | Tenant ID (dev mode only, ignored in production) |
Response:
{
"id": "01234567-89ab-cdef-0123-456789abcdef",
"classes": ["Job"],
"props": {
"text": "Senior ML Engineer position...",
"title": "Senior ML Engineer"
},
"payload_ref": "s3://bucket/job_123.json",
"metadata": {
"source": "linkedin"
},
"refresh_policy": {
"interval_seconds": 86400,
"drift_threshold": 0.1
},
"triggers": ["ml_engineer_pattern"],
"version": 1
}
Status Codes:
- 200 OK: Node found
- 401 Unauthorized: Missing/invalid JWT
- 404 Not Found: Node not found or not visible to tenant
- 429 Too Many Requests: Rate limit exceeded
Example:
curl http://localhost:8000/nodes/01234567-89ab-cdef-0123-456789abcdef \
-H "Authorization: Bearer <token>"
POST /nodes/{node_id}/refresh¶
Manually refresh a node's embedding.
Authentication: Required when JWT enabled
Path Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
node_id |
string | Yes | Node UUID to refresh |
Query Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
tenant_id |
string | No | Tenant ID (dev mode only, ignored in production) |
Response:
{
"id": "01234567-89ab-cdef-0123-456789abcdef",
"drift_score": 0.12,
"last_refreshed": "2025-11-11T12:00:00Z",
"event_id": "event_123"
}
| Field | Type | Description |
|---|---|---|
id |
string | Node ID |
drift_score |
float | Cosine distance from previous embedding (0.0-1.0) |
last_refreshed |
string | ISO 8601 timestamp |
event_id |
string | Event ID if drift exceeded threshold, null otherwise |
Status Codes:
- 200 OK: Refresh completed
- 401 Unauthorized: Missing/invalid JWT
- 404 Not Found: Node not found
- 429 Too Many Requests: Rate limit exceeded
- 500 Internal Server Error: Refresh failed
Example:
curl -X POST http://localhost:8000/nodes/01234567-89ab-cdef-0123-456789abcdef/refresh \
-H "Authorization: Bearer <token>"
Notes:
- Computes drift vs previous embedding using cosine similarity
- Emits refreshed event if drift > refresh_policy.drift_threshold
- Updates embedding, drift_score, and last_refreshed fields
- Writes to embedding_history table
GET /nodes/{node_id}/versions¶
Get embedding version history for a node.
Authentication: Required when JWT enabled
Path Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
node_id |
string | Yes | Node UUID |
Query Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
limit |
integer | No | Max versions to return (default: 10, max: 100) |
Response:
{
"node_id": "01234567-89ab-cdef-0123-456789abcdef",
"versions": [
{
"version_index": 3,
"drift_score": 0.12,
"created_at": "2025-11-11T12:00:00Z",
"embedding_ref": "s3://bucket/job_123.json"
},
{
"version_index": 2,
"drift_score": 0.08,
"created_at": "2025-11-10T12:00:00Z",
"embedding_ref": "s3://bucket/job_123.json"
}
],
"count": 2
}
Status Codes:
- 200 OK: Versions retrieved
- 400 Bad Request: Invalid limit
- 401 Unauthorized: Missing/invalid JWT
- 429 Too Many Requests: Rate limit exceeded
- 500 Internal Server Error: Query failed
Example:
curl http://localhost:8000/nodes/01234567-89ab-cdef-0123-456789abcdef/versions?limit=20 \
-H "Authorization: Bearer <token>"
Edges¶
POST /edges¶
Create a relationship between two nodes.
Authentication: Required when JWT enabled
Request Body:
{
"src": "node_123",
"dst": "node_456",
"rel": "DERIVED_FROM",
"props": {
"confidence": 0.95,
"timestamp": "2025-11-11T12:00:00Z"
},
"tenant_id": "acme_corp"
}
| Field | Type | Required | Description |
|---|---|---|---|
src |
string | Yes | Source node ID (max 100 chars) |
dst |
string | Yes | Target node ID (max 100 chars) |
rel |
string | Yes | Relationship type (max 100 chars) |
props |
object | No | Edge properties (arbitrary JSON) |
tenant_id |
string | No | Tenant ID (dev mode only, max 100 chars) |
Common Relationship Types:
- DERIVED_FROM: Provenance/lineage (used by /lineage endpoint)
- WORKS_WITH: Collaboration
- REPORTS_TO: Hierarchy
- SIMILAR_TO: Similarity
Response:
Status Codes:
- 200 OK: Edge created
- 400 Bad Request: Invalid input
- 401 Unauthorized: Missing/invalid JWT
- 429 Too Many Requests: Rate limit exceeded
- 500 Internal Server Error: Creation failed
Example:
curl -X POST http://localhost:8000/edges \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <token>" \
-d '{
"src": "node_123",
"dst": "node_456",
"rel": "DERIVED_FROM",
"props": {"confidence": 0.95}
}'
Search¶
POST /search¶
Semantic search across knowledge graph nodes.
Authentication: Required when JWT enabled
Request Body:
{
"query": "ML engineer with PyTorch experience",
"top_k": 10,
"metadata_filters": {
"source": "linkedin"
},
"compound_filter": {
"metadata": {"job_type": "full_time"}
},
"tenant_id": "acme_corp",
"use_weighted_score": true,
"use_hybrid": true,
"use_reranker": true,
"decay_lambda": 0.01,
"drift_beta": 0.1
}
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
query |
string | Yes | - | Search query text (1-2000 chars) |
top_k |
integer | No | 10 | Number of results (1-100) |
metadata_filters |
object | No | null | Simple equality filters (key-value pairs) |
compound_filter |
object | No | null | JSONB containment filter for nested queries |
tenant_id |
string | No | null | Tenant ID (dev mode only, max 100 chars) |
use_weighted_score |
boolean | No | false | Apply recency/drift weighting |
use_hybrid |
boolean | No | false | Use BM25+vector fusion (recommended) |
use_reranker |
boolean | No | true | Apply cross-encoder reranking (hybrid only) |
decay_lambda |
float | No | 0.01 | Age decay rate (0.0-1.0) |
drift_beta |
float | No | 0.1 | Drift penalty weight (0.0-1.0) |
Search Modes:
- Vector-only (
use_hybrid=false): Pure semantic similarity using embeddings - Hybrid (
use_hybrid=true): BM25 + vector fusion with optional reranking - RRF fusion (default): Reciprocal rank fusion, scores 0.01-0.04
- Weighted fusion (
HYBRID_RRF_ENABLED=false): Linear combination, scores 0.0-1.0
Weighted Scoring Formula (when use_weighted_score=true):
Response:
{
"query": "ML engineer with PyTorch experience",
"results": [
{
"id": "node_123",
"classes": ["Resume"],
"props": {
"text": "Experienced ML engineer specializing in PyTorch...",
"name": "Jane Doe"
},
"payload_ref": "s3://bucket/resume_123.pdf",
"metadata": {
"source": "linkedin"
},
"similarity": 0.8542,
"text": "Experienced ML engineer specializing in PyTorch..."
}
],
"count": 10
}
Status Codes:
- 200 OK: Search completed
- 400 Bad Request: Invalid query
- 401 Unauthorized: Missing/invalid JWT
- 429 Too Many Requests: Rate limit exceeded
- 500 Internal Server Error: Search failed
Example (Vector-only):
curl -X POST http://localhost:8000/search \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <token>" \
-d '{
"query": "ML engineer with PyTorch",
"top_k": 5
}'
Example (Hybrid with reranking):
curl -X POST http://localhost:8000/search \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <token>" \
-d '{
"query": "ML engineer with PyTorch",
"top_k": 10,
"use_hybrid": true,
"use_reranker": true
}'
Notes:
- Hybrid search automatically falls back to vector-only if BM25 index unavailable
- Reranker uses cross-encoder model for higher precision (slower)
- tenant_id from JWT overrides request body in production
- Empty results may indicate missing embeddings (check /debug/embed_info)
Ask (LLM Q&A)¶
POST /ask¶
LLM-powered Q&A with grounded citations from knowledge graph.
Authentication: Required when JWT enabled
Request Body:
{
"question": "What ML frameworks does the ML engineer position require?",
"max_results": 5,
"tenant_id": "acme_corp",
"use_weighted_score": true
}
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
question |
string | Yes | - | Question to answer (1-1000 chars) |
max_results |
integer | No | 5 | Max context nodes to retrieve (1-20) |
tenant_id |
string | No | null | Tenant ID (dev mode only, max 100 chars) |
use_weighted_score |
boolean | No | true | Use recency/drift weighting |
Response:
{
"answer": "The ML engineer position requires PyTorch and TensorFlow [0], along with experience in scikit-learn [1].",
"citations": [
{
"node_id": "job_123",
"classes": ["Job"],
"drift_score": 0.08,
"age_days": 1.2,
"lineage": [
{
"ancestor": "linkedin_scrape_456",
"depth": 1
}
]
}
],
"confidence": 0.92,
"metadata": {
"searched_nodes": 20,
"filtered_nodes": 3,
"cited_nodes": 2,
"top_similarity": 0.854,
"gating_score": 0.854,
"gating_score_type": "rrf_fused",
"first_citation_idx": 0,
"citation_at_1_precision": 1.0,
"llm_path": "fast",
"routing_reason": "high_confidence_sim=0.854",
"intent_detected": "entity_job",
"intent_type": "entity_job",
"classes_filter": ["Job"],
"must_have_terms": ["machine learning engineer"],
"structured_results_count": 0
}
}
Response Fields:
| Field | Type | Description |
|---|---|---|
answer |
string | LLM-generated answer with citation markers [0], [1], etc. |
citations |
array | Cited nodes with lineage and freshness metadata |
confidence |
float | Answer confidence score (0.0-1.0) |
metadata |
object | Search diagnostics and routing info |
Citation Fields:
| Field | Type | Description |
|---|---|---|
node_id |
string | Cited node UUID |
classes |
array[string] | Node class labels |
drift_score |
float | Latest drift score (0.0-1.0) |
age_days |
float | Days since last refresh |
lineage |
array | Provenance chain (DERIVED_FROM edges) |
Metadata Fields:
| Field | Type | Description |
|---|---|---|
searched_nodes |
integer | Total nodes retrieved |
filtered_nodes |
integer | Nodes after similarity filtering |
cited_nodes |
integer | Nodes actually cited in answer |
top_similarity |
float | Highest similarity score |
gating_score |
float | Score used for gating decision |
gating_score_type |
string | Score type: rrf_fused, weighted_fusion, or cosine |
first_citation_idx |
integer | Index of first citation (0-based) |
citation_at_1_precision |
float | 1.0 if first citation is top result, 0.0 otherwise |
llm_path |
string | LLM used: fast or fallback |
routing_reason |
string | Why fast/fallback was chosen |
intent_detected |
string | Detected query intent type |
intent_type |
string | Intent category (e.g., entity_job, open_positions) |
classes_filter |
array[string] | Node classes filtered by intent |
must_have_terms |
array[string] | Required terms for intent-based filtering |
Intent Detection:
The /ask endpoint detects structured query intents and applies specialized retrieval:
entity_job: Job posting queries → filters toJobclassentity_resume: Resume/experience queries → filters toResumeclassentity_article: Article/knowledge queries → filters toArticleclassopen_positions: Open positions queries → uses structured SQL queryperformance_issues: Performance issue queries → uses structured SQL query
Hybrid Routing:
When HYBRID_ROUTING_ENABLED=true, the system routes to fast or fallback LLM:
- Fast path (
llama-3.1-8b-instant): High-confidence queries (top_sim >= 0.70) - Fallback path (
gpt-4o-mini): Complex queries, low confidence, reasoning
Gating & Quality:
- Extremely low similarity: Returns "I don't have enough information" if top_sim < threshold
- Ambiguity gating: Rejects if top 3 results are too similar (< 0.02 gap)
- Low similarity fallback: Limits to top-1 result, caps confidence at 0.6
- Citation quality: Tracks first citation precision (is top result cited?)
Status Codes:
- 200 OK: Question answered (even if low confidence)
- 400 Bad Request: Invalid question
- 401 Unauthorized: Missing/invalid JWT
- 429 Too Many Requests: Rate limit exceeded (includes Retry-After header)
- 503 Service Unavailable: LLM disabled (LLM_ENABLED=false)
- 500 Internal Server Error: Processing failed
Example:
curl -X POST http://localhost:8000/ask \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <token>" \
-d '{
"question": "What ML frameworks does the position require?",
"max_results": 5
}'
Example (Low Confidence Response):
{
"answer": "I don't have enough information to answer this question confidently.",
"citations": [],
"confidence": 0.2,
"metadata": {
"searched_nodes": 5,
"cited_nodes": 0,
"filtered_nodes": 0,
"top_similarity": 0.12,
"gating_score": 0.12,
"gating_score_type": "rrf_fused",
"reason": "extremely_low_similarity"
}
}
Notes:
- Uses hybrid search (BM25+vector) with cross-encoder reranking by default
- Citations include up to 3 ancestors in lineage chain
- Confidence calculated from citation coverage, similarity, and intent match
- Cached responses (TTL: 600s) for identical questions
- Max concurrency: 3 concurrent requests per tenant
- tenant_id from JWT overrides request body in production
POST /ask/stream¶
Server-Sent Events streaming for LLM Q&A.
Authentication: Required when JWT enabled
Request Body: Same as /ask
Response: Server-Sent Events stream with three event types:
-
contextevent (initial): -
tokenevents (streaming): -
finalevent (last): -
errorevent (on failure):
Status Codes:
- 200 OK: Stream started
- 401 Unauthorized: Missing/invalid JWT
- 429 Too Many Requests: Rate limit exceeded
- 503 Service Unavailable: LLM disabled
Example:
curl -X POST http://localhost:8000/ask/stream \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <token>" \
-d '{"question": "What are the ML frameworks?"}' \
--no-buffer
Notes:
- Max concurrency: 2 concurrent /ask/stream requests per tenant
- Stricter rate limits than /ask
- Use --no-buffer with curl to see streaming tokens
Triggers & Patterns¶
POST /triggers¶
Register a semantic trigger pattern.
Authentication: Required when JWT enabled
Request Body:
{
"name": "ml_engineer_pattern",
"example_text": "machine learning engineer position requiring PyTorch and TensorFlow",
"description": "Trigger for ML engineer job postings"
}
| Field | Type | Required | Description |
|---|---|---|---|
name |
string | Yes | Pattern name (unique identifier) |
example_text |
string | Yes | Example text to embed as pattern |
description |
string | No | Human-readable description |
Response:
{
"status": "registered",
"name": "ml_engineer_pattern",
"description": "Trigger for ML engineer job postings"
}
Status Codes:
- 200 OK: Pattern registered
- 400 Bad Request: Missing required fields
- 401 Unauthorized: Missing/invalid JWT
- 429 Too Many Requests: Rate limit exceeded
- 500 Internal Server Error: Registration failed
Example:
curl -X POST http://localhost:8000/triggers \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <token>" \
-d '{
"name": "ml_engineer_pattern",
"example_text": "machine learning engineer position",
"description": "ML engineer jobs"
}'
Notes:
- Pattern embedding generated from example_text
- Patterns are global (not tenant-scoped)
- Triggers fire when node embeddings are similar to pattern
GET /triggers¶
List all registered trigger patterns.
Authentication: Optional (rate limited)
Parameters: None
Response:
{
"patterns": [
{
"name": "ml_engineer_pattern",
"description": "ML engineer jobs",
"created_at": "2025-11-10T12:00:00Z"
}
],
"count": 1
}
Status Codes:
- 200 OK: Patterns listed
- 429 Too Many Requests: Rate limit exceeded
- 500 Internal Server Error: Listing failed
Example:
DELETE /triggers/{name}¶
Delete a trigger pattern by name.
Authentication: Required when JWT enabled
Path Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
name |
string | Yes | Pattern name to delete |
Response:
Status Codes:
- 200 OK: Pattern deleted
- 401 Unauthorized: Missing/invalid JWT
- 404 Not Found: Pattern not found
- 429 Too Many Requests: Rate limit exceeded
- 500 Internal Server Error: Deletion failed
Example:
curl -X DELETE http://localhost:8000/triggers/ml_engineer_pattern \
-H "Authorization: Bearer <token>"
Events¶
GET /events¶
List events with optional filtering.
Authentication: Required when JWT enabled
Query Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
node_id |
string | No | Filter by node ID |
event_type |
string | No | Filter by event type |
tenant_id |
string | No | Tenant ID (dev mode only, ignored in production) |
limit |
integer | No | Max events to return (default: 100, max: 1000) |
Event Types:
- refreshed: Node embedding refreshed (drift > threshold)
- trigger_fired: Semantic trigger matched
- created: Node created
- updated: Node updated
Response:
{
"events": [
{
"id": "event_123",
"node_id": "node_456",
"type": "refreshed",
"payload": {
"drift_score": 0.12,
"last_refreshed": "2025-11-11T12:00:00Z",
"manual_trigger": true
},
"created_at": "2025-11-11T12:00:00Z"
}
],
"count": 1
}
Status Codes:
- 200 OK: Events retrieved
- 401 Unauthorized: Missing/invalid JWT
- 429 Too Many Requests: Rate limit exceeded
- 500 Internal Server Error: Listing failed
Example:
Notes:
- Events are ordered by created_at DESC
- tenant_id from JWT applies RLS filtering in production
Lineage¶
GET /lineage/{node_id}¶
Retrieve provenance lineage for a node.
Authentication: Required when JWT enabled
Path Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
node_id |
string | Yes | Node UUID |
Query Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
max_depth |
integer | No | Max lineage depth (default: 5) |
tenant_id |
string | No | Tenant ID (dev mode only, ignored in production) |
Response:
{
"node_id": "node_123",
"ancestors": [
{
"id": "node_456",
"depth": 1,
"edge_props": {
"confidence": 0.95
}
},
{
"id": "node_789",
"depth": 2,
"edge_props": {}
}
],
"depth": 2
}
Status Codes:
- 200 OK: Lineage retrieved
- 401 Unauthorized: Missing/invalid JWT
- 429 Too Many Requests: Rate limit exceeded
- 500 Internal Server Error: Retrieval failed
Example:
Notes:
- Traverses DERIVED_FROM edges recursively
- depth=1 means direct parent, depth=2 means grandparent, etc.
- tenant_id from JWT applies RLS filtering in production
Admin¶
POST /admin/refresh¶
Trigger on-demand refresh cycle.
Authentication: Required (scope: admin:refresh)
Request Body:
Option 1 (specific nodes):
Option 2 (array shorthand):
Option 3 (all due nodes):
Response (specific nodes):
Response (all nodes):
Status Codes:
- 200 OK: Refresh completed
- 401 Unauthorized: Missing/invalid JWT
- 403 Forbidden: Missing admin:refresh scope
- 429 Too Many Requests: Rate limit exceeded
- 500 Internal Server Error: Refresh failed
Example (specific nodes):
curl -X POST http://localhost:8000/admin/refresh \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <token>" \
-d '{"node_ids": ["node_123", "node_456"]}'
Example (all due nodes):
curl -X POST http://localhost:8000/admin/refresh \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <token>" \
-d 'null'
Notes:
- Requires admin:refresh scope in JWT claims
- Emits refreshed event if drift > threshold
- Writes to embedding_history table
- tenant_id from JWT applies RLS filtering
Connector Admin API¶
GET /_admin/connectors/cache/health¶
Check connector config cache subscriber health status.
Authentication: JWT required when JWT_ENABLED=true
Parameters: None
Response:
{
"status": "ok",
"subscriber": {
"connected": true,
"last_message_ts": "2025-11-11T15:08:16.411197Z",
"reconnects": 0
}
}
Status Codes:
- 200 OK: Health check successful
Example:
Notes:
- status: "ok" when subscriber connected and operational
- status: "degraded" when subscriber down or disconnected
- See OPERATIONS.md for complete operations guide
POST /_admin/connectors/rotate_keys¶
Rotate encryption keys for connector configurations.
Authentication: JWT required when JWT_ENABLED=true
Request Body:
Parameters:
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
providers |
array | No | null | Filter by provider names |
tenants |
array | No | null | Filter by tenant IDs |
batch_size |
integer | No | 100 | Rows per batch |
dry_run |
boolean | No | false | Count only, no changes |
Response:
Dry-Run Response:
Status Codes:
- 200 OK: Rotation completed
- 401 Unauthorized: Missing/invalid JWT
- 500 Internal Server Error: Rotation failed
Examples:
Dry-run to preview candidates:
curl -X POST http://localhost:8000/_admin/connectors/rotate_keys \
-H "Content-Type: application/json" \
-d '{"dry_run": true}'
Rotate all configs:
curl -X POST http://localhost:8000/_admin/connectors/rotate_keys \
-H "Content-Type: application/json" \
-d '{"dry_run": false}'
Rotate specific provider:
curl -X POST http://localhost:8000/_admin/connectors/rotate_keys \
-H "Content-Type: application/json" \
-d '{"providers": ["s3"], "dry_run": false}'
With JWT authentication:
curl -X POST http://localhost:8000/_admin/connectors/rotate_keys \
-H "Authorization: Bearer <token>" \
-H "Content-Type: application/json" \
-d '{"dry_run": true}'
Notes:
- Selects rows where key_version != ACTIVE_VERSION
- Decrypts with old key, re-encrypts with active key
- Invalidates cache and publishes Redis pub/sub notification
- Per-row error handling (one failure doesn't stop batch)
- See OPERATIONS.md for complete runbook
GET /admin/anomalies¶
Detect operational anomalies in the knowledge graph.
Authentication: Optional (rate limited)
Query Parameters:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
types |
string | No | all | Comma-separated anomaly types |
lookback_hours |
integer | No | 24 | Hours to look back |
drift_spike_threshold |
float | No | 2.0 | Drift multiplier threshold |
trigger_storm_threshold |
integer | No | 50 | Min trigger events for storm |
scheduler_lag_multiplier |
float | No | 2.0 | Lag multiplier for overdue |
tenant_id |
string | No | null | Tenant ID filter |
Anomaly Types:
- drift_spike: Nodes with drift > 2x mean for 3+ consecutive refreshes
- trigger_storm: >50 trigger events in 1 hour (runaway triggers)
- scheduler_lag: Nodes overdue for refresh (>2x expected interval)
Response:
{
"anomalies": {
"drift_spike": [
{
"node_id": "node_123",
"avg_drift": 0.45,
"recent_drifts": [0.42, 0.48, 0.44],
"threshold": 0.20
}
],
"trigger_storm": [
{
"trigger_name": "ml_engineer_pattern",
"event_count": 75,
"time_window_hours": 1.0
}
],
"scheduler_lag": [
{
"node_id": "node_456",
"expected_interval_seconds": 86400,
"actual_interval_seconds": 180000,
"lag_multiplier": 2.08
}
]
},
"summary": {
"total": 3,
"by_type": {
"drift_spike": 1,
"trigger_storm": 1,
"scheduler_lag": 1
},
"lookback_hours": 24
}
}
Status Codes:
- 200 OK: Anomalies detected
- 429 Too Many Requests: Rate limit exceeded
- 500 Internal Server Error: Detection failed
Example:
curl "http://localhost:8000/admin/anomalies?types=drift_spike,trigger_storm&lookback_hours=48" \
-H "Authorization: Bearer <token>"
Debug Endpoints¶
Debug endpoints require admin:refresh scope when JWT enabled.
GET /debug/dbinfo¶
Inspect database and tenant context.
Authentication: Required (scope: admin:refresh) when JWT enabled
Response:
{
"database": "activekg",
"tenant_context": "acme_corp",
"server_host": "10.0.1.5",
"server_port": 5432
}
Example:
GET /debug/search_sanity¶
Retrieval sanity checks for diagnosing empty search results.
Authentication: Required (scope: admin:refresh) when JWT enabled
Response:
{
"tenant_id": "acme_corp",
"total_nodes": 1000,
"nodes_with_embeddings": 950,
"nodes_with_text_search": 950,
"embedding_coverage_pct": 95.0,
"text_search_coverage_pct": 95.0,
"sample_nodes_with_embedding": [
{"id": "node_123", "classes": ["Job"], "has_text": true}
],
"sample_nodes_without_embedding": [
{"id": "node_456", "classes": ["Resume"], "has_text": false}
]
}
Example:
POST /debug/search_explain¶
Detailed search result triage with similarity scores and snippets.
Authentication: Required (scope: admin:refresh) when JWT enabled
Request Body:
Response:
{
"query": "ML engineer",
"mode": "hybrid",
"score_type": "rrf_fused",
"score_range": "0.01-0.04 (low)",
"result_count": 10,
"results": [
{
"node_id": "node_123",
"similarity": 0.0342,
"score_type": "rrf_fused",
"classes": ["Job"],
"snippet": "Senior ML Engineer position requiring...",
"metadata": {"source": "linkedin"},
"has_embedding": true,
"has_text_search": true
}
],
"threshold_info": {
"recommended_min": 0.15,
"recommended_max": 0.28,
"top_similarity": 0.0342,
"bottom_similarity": 0.0089
},
"scoring_notes": {
"rrf_fused": "RRF scores range 0.01-0.04 (rank-based fusion of vector+BM25)",
"weighted_fusion": "Weighted scores range 0.0-1.0 (linear combination of vector+BM25)",
"cosine": "Cosine similarity range 0.0-1.0 (vector-only)"
}
}
Example:
curl -X POST http://localhost:8000/debug/search_explain \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <token>" \
-d '{"query": "ML engineer", "use_hybrid": true, "top_k": 10}'
GET /debug/embed_info¶
Inspect embedding configuration and stored vectors.
Authentication: Required (scope: admin:refresh) when JWT enabled
Response:
{
"embedding_backend": "sentence-transformers",
"embedding_model": "all-MiniLM-L6-v2",
"counts": {
"total_nodes": 1000,
"with_embedding": 950,
"without_embedding": 50
},
"vector_dimension": {
"db_type": "vector(384)",
"db_dim": 384,
"sampled_dims": [384]
},
"sample": {
"n": 100,
"norm_min": 0.998,
"norm_max": 1.002,
"norm_mean": 1.0,
"example_ids": ["node_123", "node_456"]
},
"last_refreshed": {
"count": 950,
"age_seconds": {
"min": 120.5,
"avg": 43200.2,
"max": 86400.8
}
}
}
Example:
GET /debug/intent¶
Test intent detection without running full /ask.
Query Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
q |
string | Yes | Query to test |
Response:
{
"query": "What ML frameworks does the position require",
"normalized": "what machine learning frameworks does the position require",
"intent_type": "entity_job",
"params": {
"expected_classes": ["Job"],
"must_have_terms": ["machine learning engineer"]
}
}
Example:
curl "http://localhost:8000/debug/intent?q=What%20ML%20frameworks%20does%20the%20position%20require"
Demo Console¶
GET /demo¶
HTML demo console for testing API functionality.
Authentication: None
Response: HTML page with interactive forms for: - Search - Trigger management - Event listing - Lineage exploration - Anomaly detection
Example:
Error Responses¶
All endpoints return consistent error responses:
Common status codes:
- 400 Bad Request: Invalid input (validation error)
- 401 Unauthorized: Missing or invalid JWT token
- 403 Forbidden: Insufficient permissions (missing scope)
- 404 Not Found: Resource not found
- 429 Too Many Requests: Rate limit exceeded (includes Retry-After header)
- 500 Internal Server Error: Server error
- 503 Service Unavailable: Service disabled (e.g., LLM_ENABLED=false)
Configuration¶
Key environment variables:
Database & Embedding¶
ACTIVEKG_DSN: PostgreSQL connection stringEMBEDDING_BACKEND:sentence-transformers,openai,cohereEMBEDDING_MODEL: Model name (default:all-MiniLM-L6-v2)
LLM (Q&A)¶
LLM_ENABLED: Enable/askendpoint (default:true)LLM_BACKEND:groq,openai,litellmLLM_MODEL: Model name (default:llama-3.1-8b-instant)
Hybrid Routing¶
HYBRID_ROUTING_ENABLED: Enable fast/fallback routing (default:false)ASK_FAST_BACKEND: Fast LLM backend (default:groq)ASK_FAST_MODEL: Fast model (default:llama-3.1-8b-instant)ASK_FALLBACK_BACKEND: Fallback backend (default:openai)ASK_FALLBACK_MODEL: Fallback model (default:gpt-4o-mini)
Search & Retrieval¶
WEIGHTED_SEARCH_CANDIDATE_FACTOR: Candidate multiplier for weighted search (default:2.0)ASK_SIM_THRESHOLD: Similarity cutoff for/ask(default:0.30)ASK_USE_RERANKER: Enable cross-encoder reranking (default:true)RERANK_SKIP_TOPSIM: Skip reranking if top_sim >= threshold (default:0.80)
Authentication & Rate Limiting¶
JWT_ENABLED: Enable JWT authentication (default:false)JWT_SECRET_KEY: Shared secret for HS256 (required if HS256)JWT_PUBLIC_KEY_PATH: Public key for RS256 (required if RS256)RATE_LIMIT_ENABLED: Enable rate limiting (default:false)REDIS_URL: Redis URL for rate limiting (required if enabled)
Operations¶
AUTO_EMBED_ON_CREATE: Auto-embed new nodes (default:true)RUN_SCHEDULER: Start refresh scheduler (default:true)METRICS_ENABLED: Enable Prometheus metrics (default:true)
Rate Limits¶
Default rate limits per tenant (when RATE_LIMIT_ENABLED=true):
| Endpoint | Limit | Window | Concurrency |
|---|---|---|---|
/search |
100 req | 1 min | - |
/ask |
20 req | 1 min | 3 concurrent |
/ask/stream |
10 req | 1 min | 2 concurrent |
/nodes |
50 req | 1 min | - |
/edges |
50 req | 1 min | - |
/triggers |
20 req | 1 min | - |
/admin/refresh |
10 req | 1 min | - |
| default | 100 req | 1 min | - |
Concurrency limits prevent resource exhaustion from parallel requests.
Best Practices¶
1. Use Hybrid Search with Reranking¶
For best retrieval quality:
2. Monitor Embedding Coverage¶
Check /debug/embed_info regularly to ensure high coverage:
Target: >95% embedding coverage for optimal search quality.
3. Set Appropriate Refresh Policies¶
For frequently changing content:
4. Use Lineage for Provenance¶
Always link derived nodes to sources:
curl -X POST http://localhost:8000/edges \
-d '{"src": "resume_v2", "dst": "resume_v1", "rel": "DERIVED_FROM"}'
5. Monitor Anomalies¶
Check for drift spikes and trigger storms:
6. Inspect Low-Confidence Answers¶
Use /ask metadata to diagnose quality issues:
- top_similarity < 0.30: Likely irrelevant context
- cited_nodes = 0: No citations found (trust accordingly)
- ambiguity_reason: Results too similar (query needs refinement)
Changelog¶
v0.1.0 (Current)¶
- Initial release with core KG functionality
- Hybrid search with BM25+vector fusion
- LLM Q&A with grounded citations
- Multi-tenant RLS support
- JWT authentication
- Rate limiting with Redis
- Prometheus metrics
- Drift-aware refresh scheduler
- Semantic triggers
- Lineage tracking
Support¶
For issues and questions: - GitHub Issues: Active Graph KG - GitHub Discussions: Community - Documentation: Active Graph KG Docs
Generated: 2025-11-24 Version: 1.0.0 License: MIT