| # SPARKNET Implementation Report |
| ## Agentic Document Intelligence Platform |
|
|
| **Report Date:** January 2025 |
| **Version:** 0.1.0 |
|
|
| --- |
|
|
| ## Executive Summary |
|
|
| SPARKNET is an enterprise-grade **Agentic Document Intelligence Platform** that follows FAANG best practices for: |
| - **Modular Architecture**: Clean separation of concerns with well-defined interfaces |
| - **Local-First Privacy**: All processing happens locally via Ollama |
| - **Evidence Grounding**: Every extraction includes verifiable source references |
| - **Production-Ready**: Type-safe, tested, configurable, and scalable |
|
|
| --- |
|
|
| ## 1. What Has Been Implemented |
|
|
| ### 1.1 Core Subsystems |
|
|
| | Subsystem | Location | Status | Description | |
| |-----------|----------|--------|-------------| |
| | **Document Intelligence** | `src/document_intelligence/` | Complete | Vision-first document understanding | |
| | **Legacy Document Pipeline** | `src/document/` | Complete | OCR, layout, chunking pipeline | |
| | **RAG Subsystem** | `src/rag/` | Complete | Vector search with grounded retrieval | |
| | **Multi-Agent System** | `src/agents/` | Complete | ReAct-style agents with tools | |
| | **LLM Integration** | `src/llm/` | Complete | Ollama client with routing | |
| | **CLI** | `src/cli/` | Complete | Full command-line interface | |
| | **API** | `api/` | Complete | FastAPI REST endpoints | |
| | **Demo UI** | `demo/` | Complete | Streamlit dashboard | |
|
|
| ### 1.2 Document Intelligence Module (`src/document_intelligence/`) |
| |
| **Architecture (FAANG-inspired: Google DocAI pattern):** |
| |
| ``` |
| src/document_intelligence/ |
| βββ chunks/ # Core data models (BoundingBox, DocumentChunk, TableChunk) |
| β βββ models.py # Pydantic models with full type safety |
| β βββ __init__.py |
| βββ io/ # Document loading with caching |
| β βββ base.py # Abstract interfaces |
| β βββ pdf.py # PyMuPDF-based PDF loading |
| β βββ image.py # PIL image loading |
| β βββ cache.py # LRU page caching |
| βββ models/ # ML model interfaces |
| β βββ base.py # BaseModel, BatchableModel |
| β βββ ocr.py # OCRModel interface |
| β βββ layout.py # LayoutModel interface |
| β βββ table.py # TableModel interface |
| β βββ vlm.py # VisionLanguageModel interface |
| βββ parsing/ # Document parsing pipeline |
| β βββ parser.py # DocumentParser orchestrator |
| β βββ chunking.py # SemanticChunker |
| βββ grounding/ # Visual evidence |
| β βββ evidence.py # EvidenceBuilder, EvidenceTracker |
| β βββ crops.py # Image cropping utilities |
| βββ extraction/ # Field extraction |
| β βββ schema.py # ExtractionSchema, FieldSpec |
| β βββ extractor.py # FieldExtractor |
| β βββ validator.py # ExtractionValidator |
| βββ tools/ # Agent tools |
| β βββ document_tools.py # ParseDocumentTool, ExtractFieldsTool, etc. |
| β βββ rag_tools.py # IndexDocumentTool, RetrieveChunksTool, RAGAnswerTool |
| βββ agent_adapter.py # EnhancedDocumentAgent integration |
| ``` |
| |
| **Key Features:** |
| - **Zero-Shot Capability**: Works across document formats without training |
| - **Schema-Driven Extraction**: Define fields using JSON Schema or Pydantic |
| - **Abstention Policy**: Never guesses - abstains when confidence is low |
| - **Visual Grounding**: Every extraction includes page, bbox, snippet, confidence |
| |
| ### 1.3 RAG Subsystem (`src/rag/`) |
| |
| **Architecture (FAANG-inspired: Meta FAISS + Google Vertex AI pattern):** |
| |
| ``` |
| src/rag/ |
| βββ store.py # VectorStore interface + ChromaVectorStore |
| βββ embeddings.py # OllamaEmbedding + OpenAIEmbedding (feature-flagged) |
| βββ indexer.py # DocumentIndexer for chunked documents |
| βββ retriever.py # DocumentRetriever with evidence support |
| βββ generator.py # GroundedGenerator with citations |
| βββ docint_bridge.py # Bridge to document_intelligence subsystem |
| βββ __init__.py # Clean exports |
| ``` |
| |
| **Key Features:** |
| - **Local-First Embeddings**: Ollama `nomic-embed-text` by default |
| - **Cloud Opt-In**: OpenAI embeddings disabled by default, feature-flagged |
| - **Metadata Filtering**: Filter by document_id, chunk_type, page_range |
| - **Citation Generation**: Answers include `[1]`, `[2]` references |
| - **Confidence-Based Abstention**: Returns "I don't know" when uncertain |
|
|
| ### 1.4 Multi-Agent System (`src/agents/`) |
|
|
| **Agents Implemented:** |
| | Agent | Purpose | Model | |
| |-------|---------|-------| |
| | `ExecutorAgent` | Task execution with tools | llama3.1:8b | |
| | `DocumentAgent` | ReAct-style document analysis | llama3.1:8b | |
| | `PlannerAgent` | Task decomposition | mistral | |
| | `CriticAgent` | Output validation | phi3 | |
| | `MemoryAgent` | Context management | llama3.2 | |
| | `VisionOCRAgent` | Vision-based OCR | llava (optional) | |
|
|
| ### 1.5 CLI Commands |
|
|
| ```bash |
| # Document Intelligence |
| sparknet docint parse document.pdf -o result.json |
| sparknet docint extract invoice.pdf --preset invoice |
| sparknet docint ask document.pdf "What is the total?" |
| sparknet docint classify document.pdf |
| |
| # RAG Operations |
| sparknet docint index document.pdf # Index into vector store |
| sparknet docint index-stats # Show index statistics |
| sparknet docint retrieve "payment terms" -k 10 # Semantic search |
| sparknet docint ask doc.pdf "question" --use-rag # RAG-powered Q&A |
| |
| # Legacy Document Commands |
| sparknet document parse invoice.pdf |
| sparknet document extract contract.pdf -f "party_name" |
| sparknet rag index *.pdf --collection my_docs |
| sparknet rag search "query" --top 10 |
| ``` |
|
|
| --- |
|
|
| ## 2. How to Execute SPARKNET |
|
|
| ### 2.1 Prerequisites |
|
|
| ```bash |
| # 1. System Requirements |
| # - Python 3.10+ |
| # - NVIDIA GPU with CUDA 12.0+ (optional but recommended) |
| # - 16GB+ RAM |
| # - 50GB+ disk space |
| |
| # 2. Install Ollama (if not installed) |
| curl -fsSL https://ollama.com/install.sh | sh |
| |
| # 3. Start Ollama server |
| ollama serve |
| ``` |
|
|
| ### 2.2 Installation |
|
|
| ```bash |
| cd /home/mhamdan/SPARKNET |
| |
| # Option A: Use existing virtual environment |
| source sparknet/bin/activate |
| |
| # Option B: Create new environment |
| python3 -m venv sparknet |
| source sparknet/bin/activate |
| |
| # Install dependencies |
| pip install -r requirements.txt |
| pip install -r demo/requirements.txt |
| |
| # Install SPARKNET in development mode |
| pip install -e . |
| ``` |
|
|
| ### 2.3 Download Required Models |
|
|
| ```bash |
| # Embedding model (required for RAG) |
| ollama pull nomic-embed-text:latest |
| |
| # LLM models (at least one required) |
| ollama pull llama3.2:latest # Fast, 2GB |
| ollama pull llama3.1:8b # General purpose, 5GB |
| ollama pull mistral:latest # Good reasoning, 4GB |
| |
| # Optional: Larger models for complex tasks |
| ollama pull qwen2.5:14b # Complex reasoning, 9GB |
| ``` |
|
|
| ### 2.4 Running the Demo UI |
|
|
| **Method 1: Using the launcher script** |
| ```bash |
| cd /home/mhamdan/SPARKNET |
| ./run_demo.sh 8501 |
| ``` |
|
|
| **Method 2: Direct Streamlit command** |
| ```bash |
| cd /home/mhamdan/SPARKNET |
| source sparknet/bin/activate |
| streamlit run demo/app.py --server.port 8501 |
| ``` |
|
|
| **Method 3: Bind to specific IP (for remote access)** |
| ```bash |
| streamlit run demo/app.py \ |
| --server.address 172.24.50.21 \ |
| --server.port 8501 \ |
| --server.headless true |
| ``` |
|
|
| **Access at:** http://172.24.50.21:8501 or http://localhost:8501 |
|
|
| ### 2.5 Running the API Server |
|
|
| ```bash |
| cd /home/mhamdan/SPARKNET |
| source sparknet/bin/activate |
| uvicorn api.main:app --host 0.0.0.0 --port 8000 --reload |
| ``` |
|
|
| **API Endpoints:** |
| - `GET /health` - Health check |
| - `POST /api/documents/parse` - Parse document |
| - `POST /api/documents/extract` - Extract fields |
| - `POST /api/rag/index` - Index document |
| - `POST /api/rag/query` - Query RAG |
|
|
| ### 2.6 Running Examples |
|
|
| ```bash |
| cd /home/mhamdan/SPARKNET |
| source sparknet/bin/activate |
| |
| # Document Intelligence Demo |
| python examples/document_intelligence_demo.py |
| |
| # RAG End-to-End Pipeline |
| python examples/document_rag_end_to_end.py |
| |
| # Simple Agent Task |
| python examples/simple_task.py |
| |
| # Document Agent |
| python examples/document_agent.py |
| ``` |
|
|
| ### 2.7 Running Tests |
|
|
| ```bash |
| cd /home/mhamdan/SPARKNET |
| source sparknet/bin/activate |
| |
| # Run all tests |
| pytest tests/ -v |
| |
| # Run specific test suites |
| pytest tests/unit/test_document_intelligence.py -v |
| pytest tests/unit/test_rag_integration.py -v |
| |
| # Run with coverage |
| pytest tests/ --cov=src --cov-report=html |
| ``` |
|
|
| --- |
|
|
| ## 3. Configuration |
|
|
| ### 3.1 RAG Configuration (`configs/rag.yaml`) |
|
|
| ```yaml |
| vector_store: |
| type: chroma |
| chroma: |
| persist_directory: "./.sparknet/chroma_db" |
| collection_name: "sparknet_documents" |
| distance_metric: cosine |
| |
| embeddings: |
| provider: ollama # Local-first |
| ollama: |
| model: nomic-embed-text |
| base_url: "http://localhost:11434" |
| openai: |
| enabled: false # Disabled by default |
| |
| generator: |
| provider: ollama |
| ollama: |
| model: llama3.2 |
| abstain_on_low_confidence: true |
| abstain_threshold: 0.3 |
| ``` |
|
|
| ### 3.2 Document Configuration (`config/document.yaml`) |
|
|
| ```yaml |
| ocr: |
| engine: paddleocr # or tesseract |
| languages: ["en"] |
| confidence_threshold: 0.5 |
| |
| layout: |
| enabled: true |
| reading_order: true |
| |
| chunking: |
| min_chunk_chars: 10 |
| max_chunk_chars: 4000 |
| target_chunk_chars: 500 |
| ``` |
|
|
| --- |
|
|
| ## 4. FAANG Best Practices Applied |
|
|
| ### 4.1 Google-Inspired Patterns |
| - **DocAI Architecture**: Modular vision-first document understanding |
| - **Structured Output**: Schema-driven extraction with validation |
| - **Abstention Policy**: Never hallucinate, return "I don't know" |
|
|
| ### 4.2 Meta-Inspired Patterns |
| - **FAISS Integration**: Fast similarity search (optional alongside ChromaDB) |
| - **RAG Pipeline**: Retrieve-then-generate with citations |
|
|
| ### 4.3 Amazon-Inspired Patterns |
| - **Textract-like API**: Structured field extraction with confidence scores |
| - **Evidence Grounding**: Every output traceable to source |
|
|
| ### 4.4 Microsoft-Inspired Patterns |
| - **Form Recognizer Pattern**: Pre-built schemas for invoices, contracts |
| - **Confidence Thresholds**: Configurable abstention levels |
|
|
| ### 4.5 Apple-Inspired Patterns |
| - **Privacy-First**: All processing local by default |
| - **Opt-In Cloud**: OpenAI and cloud services disabled by default |
|
|
| --- |
|
|
| ## 5. Quick Start Commands |
|
|
| ```bash |
| # === SETUP === |
| cd /home/mhamdan/SPARKNET |
| source sparknet/bin/activate |
| ollama serve & # Start in background |
| |
| # === DEMO UI === |
| streamlit run demo/app.py --server.port 8501 |
| |
| # === CLI USAGE === |
| # Parse a document |
| python -m src.cli.main docint parse Dataset/IBM*.pdf -o result.json |
| |
| # Index for RAG |
| python -m src.cli.main docint index Dataset/*.pdf |
| |
| # Ask questions with RAG |
| python -m src.cli.main docint ask Dataset/IBM*.pdf "What is this document about?" --use-rag |
| |
| # === PYTHON API === |
| python -c " |
| from src.document_intelligence import DocumentParser |
| parser = DocumentParser() |
| result = parser.parse('Dataset/IBM N_A.pdf') |
| print(f'Parsed {len(result.chunks)} chunks') |
| " |
| |
| # === RUN TESTS === |
| pytest tests/unit/ -v |
| ``` |
|
|
| --- |
|
|
| ## 6. Troubleshooting |
|
|
| ### Issue: Ollama not running |
| ```bash |
| # Check status |
| curl http://localhost:11434/api/tags |
| |
| # Start Ollama |
| ollama serve |
| |
| # If port in use |
| pkill ollama && ollama serve |
| ``` |
|
|
| ### Issue: Missing models |
| ```bash |
| ollama list # See installed models |
| ollama pull nomic-embed-text # Install embedding model |
| ollama pull llama3.2 # Install LLM |
| ``` |
|
|
| ### Issue: ChromaDB errors |
| ```bash |
| # Reset vector store |
| rm -rf .sparknet/chroma_db |
| ``` |
|
|
| ### Issue: Import errors |
| ```bash |
| # Ensure in correct directory |
| cd /home/mhamdan/SPARKNET |
| |
| # Ensure venv activated |
| source sparknet/bin/activate |
| |
| # Reinstall |
| pip install -e . |
| ``` |
|
|
| --- |
|
|
| ## 7. Architecture Diagram |
|
|
| ``` |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
| β SPARKNET Platform β |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ |
| β βββββββββββββββ βββββββββββββββ βββββββββββββββ β |
| β β Streamlit β β FastAPI β β CLI β Interfaces β |
| β β Demo β β API β β Commands β β |
| β ββββββββ¬βββββββ ββββββββ¬βββββββ ββββββββ¬βββββββ β |
| βββββββββββ΄βββββββββββββββββ΄βββββββββββββββββ΄ββββββββββββββββββββββ€ |
| β β |
| β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β |
| β β Agent Layer β β |
| β β ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ β β |
| β β β Document β β Executor β β Planner β β Critic β β β |
| β β β Agent β β Agent β β Agent β β Agent β β β |
| β β ββββββ¬ββββββ ββββββ¬ββββββ ββββββ¬ββββββ ββββββ¬ββββββ β β |
| β βββββββββ΄βββββββββββββ΄βββββββββββββ΄βββββββββββββ΄ββββββββββββ β |
| β β |
| β ββββββββββββββββββββββ βββββββββββββββββββββββββββββββββββ β |
| β β Document Intel β β RAG Subsystem β β |
| β β βββββββββ ββββββββ β β βββββββββββ βββββββββββββββββββ β β |
| β β βParser β βExtractβ β β βIndexer β β Retriever β β β |
| β β βββββββββ ββββββββ β β βββββββββββ βββββββββββββββββββ β β |
| β β βββββββββ ββββββββ β β βββββββββββ βββββββββββββββββββ β β |
| β β βGround β βValid β β β βEmbedder β β Generator β β β |
| β β βββββββββ ββββββββ β β βββββββββββ βββββββββββββββββββ β β |
| β ββββββββββββββββββββββ βββββββββββββββββββββββββββββββββββ β |
| β β |
| β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β |
| β β Infrastructure β β |
| β β ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ β β |
| β β β Ollama β β ChromaDB β β GPU β β Cache β β β |
| β β β Client β β Store β β Manager β β Layer β β β |
| β β ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ β β |
| β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
| ``` |
|
|
| --- |
|
|
| ## 8. Files Modified/Created in Recent Session |
|
|
| | File | Action | Description | |
| |------|--------|-------------| |
| | `src/rag/docint_bridge.py` | Created | Bridge between document_intelligence and RAG | |
| | `src/document_intelligence/tools/rag_tools.py` | Created | RAG tools for agents | |
| | `src/document_intelligence/tools/__init__.py` | Modified | Added RAG tool exports | |
| | `src/document_intelligence/tools/document_tools.py` | Modified | Enhanced AnswerQuestionTool with RAG | |
| | `src/cli/docint.py` | Modified | Added index, retrieve, delete-index commands | |
| | `src/rag/__init__.py` | Modified | Added bridge exports | |
| | `configs/rag.yaml` | Created | RAG configuration file | |
| | `tests/unit/test_rag_integration.py` | Created | RAG integration tests | |
| | `examples/document_rag_end_to_end.py` | Created | End-to-end RAG example | |
|
|
| --- |
|
|
| **Report Complete** |
|
|
| For questions or issues, refer to the troubleshooting section above or check the test files for usage examples. |
|
|