Technical Architecture
Technology Stack
| Component | Technology | Purpose |
|---|---|---|
| Streaming | Apache Flink 2.1 | Real-time event processing |
| Batch | Apache Spark | Batch processing and analytics |
| Lakehouse | Apache Iceberg | Cold storage table format |
| Hot Cache | Hazelcast/Dragonfly | Real-time profile access |
| Profile Store | PostgreSQL + ScyllaDB | Structured profiles |
| Identity Graph | ScyllaDB | Graph traversal at scale |
| Event Store | Iceberg on S3/MinIO | Historical events |
| Vector Store | OpenSearch | Embeddings |
| Message Queue | Apache Kafka | Event streaming |
| Schema Registry | Apicurio | Schema management |
| Orchestration | Temporal | Durable workflows |
| Workflows | Flowable | Approvals |
| Secrets | HashiCorp Vault | Key management |
| Monitoring | Prometheus | Metrics and alerting |
| UI | React + React Flow | Visual canvas |
| MLOps | MLflow | Model registry |
| LLM | LangGraph | Agent orchestration |
| Catalog | DataHub | Metadata management |
| OLAP | ClickHouse | Analytics queries |
Deployment Models
Sovereign On-Prem: Full in-customer deployment, air-gapped option, customer Kubernetes.
Private Cloud: Customer's AWS/Azure/GCP tenancy, no shared infrastructure.
Hybrid: Core on-prem, analytics in cloud if permitted, DMZ proxy.
Scalability Targets
| Component | Target | Technology |
|---|---|---|
| Profiles | 100M+ per tenant | ScyllaDB by DTX_ID |
| Events/second | 100K+ | Flink + Kafka |
| Graph edges | 1B+ | ScyllaDB wide rows |
| Segment eval | <500ms p95 | Flink streaming |
| Profile lookup | <50ms p95 | Hot cache + fallback |