The Blueprint: Hybrid GraphRAG Ingestion and Retrieval

[INGESTION]
[Raw Data (S3)] ──> [Lambda Orchestrator] ──> [Amazon Bedrock (Entity Extraction)]
│
┌────────────────────────┴────────────────────────┐
▼ ▼
[Amazon OpenSearch (Vectors)] [Amazon Neptune (Graph DB)]

[RETRIEVAL]
[User Query] ──> [API Gateway] ──> [Lambda Resolver] ──> [Query OpenSearch + Neptune]
│
▼ (Context Compactor)
[Amazon Bedrock (LLM)] ──> [User Response]

Phase 1: The Ingestion Pipeline (Building the Graph)

The Storage Layer (Amazon S3):

Unstructured enterprise data (PDFs, wiki pages, markdown files) is uploaded to an Amazon S3 bucket.
- An S3 Event Notification triggers an asynchronous processing pipeline.

The Processing Layer (AWS Lambda / AWS Glue):

An AWS Lambda function (or an AWS Glue job for massive data volumes) chunks the documents.
- Instead of just saving the raw text chunks, Lambda calls Amazon Bedrock using a lightweight, fast model to perform Named Entity Recognition (NER) and relationship mapping.

The Hybrid Storage Layer (OpenSearch + Neptune):

The Text Vector: The Lambda function generates text embeddings and stores the chunks in Amazon OpenSearch Service for semantic keyword similarity.
- The Entity Graph: Simultaneously, the extracted entities (e.g., Product X) and relationships (e.g., DEPENDS_ON) are written as nodes and edges into Amazon Neptune (AWS’s managed graph database).

Phase 2: The Retrieval Pipeline (The Low-Token Search)

The Ingress Layer (API Gateway):

The user asks a complex question (e.g., “If I upgrade System A, what downstream services are impacted?”) via Amazon API Gateway.

The Hybrid Resolver (AWS Lambda):

API Gateway invokes a central Lambda Resolver function.
- Instead of performing a massive keyword search that returns pages of text, the Lambda runs a hybrid query:
- It hits Amazon OpenSearch for basic semantic context.
  - It queries Amazon Neptune using a graph query language (like openCypher or Gremlin) to pull the exact dependency path.
1. The Context Compactor (The Token Saver):
The Lambda function combines the small, highly targeted vector text chunk with the precise structural relationships from Neptune.
- It strips out all irrelevant conversational text, assembling a highly dense, ultra-compact context payload.

The Generation Layer (Amazon Bedrock):

The compacted context is sent to a frontier model in Amazon Bedrock.
- Because the context is highly refined, the model returns a perfectly accurate, hallucination-free answer while consuming up to 70% fewer tokens than a traditional vector-only setup.