Distributed Infrastructure¶
Scale GraphMem to handle millions of documents and thousands of concurrent users.
Full Guide Available
For a complete distributed infrastructure guide with code examples, see the DISTRIBUTED_INFRASTRUCTURE.md in the repository.
Architecture Overview¶
┌─────────────────────────────────────────────────────────────────────────────────┐
│ DISTRIBUTED GRAPHMEM │
├─────────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌────────────────┐ ┌─────────────────────┐ ┌──────────────────────┐ │
│ │ PRODUCERS │────▶│ MESSAGE QUEUE │────▶│ WORKER POOL │ │
│ │ │ │ (Redpanda/Kafka) │ │ │ │
│ │ • Datasets │ │ │ │ ┌──────┐ ┌──────┐ │ │
│ │ • APIs │ │ • ingest_queue │ │ │Worker│ │Worker│ │ │
│ │ • Files │ │ • embed_queue │ │ │ 1 │ │ 2 │ │ │
│ └────────────────┘ │ • extract_queue │ │ └──────┘ └──────┘ │ │
│ └─────────────────────┘ └──────────────────────┘ │
│ │ │ │
│ ▼ ▼ │
│ ┌──────────────────────────────────────────────────────────────────────────┐ │
│ │ PROCESSING LAYER │ │
│ │ │ │
│ │ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────────────┐ │ │
│ │ │ EMBEDDING POOL │ │ LLM EXTRACTION │ │ GRAPH OPERATIONS │ │ │
│ │ │ │ │ │ │ │ │ │
│ │ │ • GPU Workers │ │ • Async Batch │ │ • Entity Resolution │ │ │
│ │ │ • Batch 1000+ │ │ • Rate Limiting │ │ • Community Detection │ │ │
│ │ │ • Local Models │ │ • Retry Logic │ │ • Evolution │ │ │
│ │ └─────────────────┘ └─────────────────┘ └─────────────────────────┘ │ │
│ └──────────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────────────────────────────────────┐ │
│ │ STORAGE LAYER │ │
│ │ │ │
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │
│ │ │ Neo4j │ │ Redis │ │ Turso │ │ │
│ │ │ Cluster │ │ Cluster │ │ (Backup) │ │ │
│ │ └─────────────┘ └─────────────┘ └─────────────┘ │ │
│ └──────────────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────────┘
Performance Targets¶
| Metric | Target | How |
|---|---|---|
| Ingestion | 10,000 docs/sec | GPU embeddings + async LLM |
| Query p50 | < 100ms | Redis cache + read replicas |
| Query p99 | < 500ms | Connection pooling |
| Concurrent Users | 10,000+ | Horizontal scaling |
| Document Capacity | 100M+ | Neo4j sharding |
Key Components¶
Message Queue (Redpanda/Kafka)¶
Decouples producers from consumers for reliability and scaling:
topics:
- graphmem.ingest.raw # Incoming documents
- graphmem.ingest.embed # For embedding
- graphmem.ingest.extract # For LLM extraction
- graphmem.graph.resolve # Entity resolution
- graphmem.query.pending # Query queue
GPU Embedding Cluster¶
Local GPU models for high-throughput embedding:
- Single A100: ~5,000 embeddings/sec
- Single H100: ~10,000 embeddings/sec
- 100x faster than API calls
Worker Pool¶
Horizontally scalable workers:
- Embed Workers: GPU-accelerated embedding
- Extract Workers: LLM-based extraction
- Graph Workers: Entity resolution, evolution
- Query Workers: Query processing
Docker Compose (Development)¶
version: '3.8'
services:
redpanda:
image: vectorized/redpanda:latest
ports:
- "9092:9092"
redis:
image: redis:7-alpine
ports:
- "6379:6379"
neo4j:
image: neo4j:5-enterprise
ports:
- "7474:7474"
- "7687:7687"
gateway:
build: .
ports:
- "8000:8000"
depends_on:
- redpanda
- redis
- neo4j
embed-worker:
build: .
command: ["--type", "embed"]
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
extract-worker:
build: .
command: ["--type", "extract"]
deploy:
replicas: 5
Kubernetes Deployment¶
apiVersion: apps/v1
kind: Deployment
metadata:
name: graphmem-worker
spec:
replicas: 10
template:
spec:
containers:
- name: worker
image: graphmem/worker:latest
resources:
requests:
cpu: "1"
memory: "2Gi"
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: graphmem-worker-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: graphmem-worker
minReplicas: 5
maxReplicas: 50
metrics:
- type: External
external:
metric:
name: kafka_consumer_lag
target:
type: AverageValue
averageValue: "1000"