| --- |
| license: mit |
| tags: |
| - cancer-genomics |
| - bioinformatics |
| - graph-database |
| - neo4j |
| - distributed-computing |
| - boinc |
| - healthcare |
| - genomics |
| - fastq |
| - blast |
| - variant-calling |
| - gdc-portal |
| - tcga |
| library_name: cancer-at-home-v2 |
| pipeline_tag: other |
| --- |
| |
| # Cancer@Home v2 |
|
|
| <div align="center"> |
| <img src="https://img.shields.io/badge/version-2.0.0-blue.svg" alt="Version"> |
| <img src="https://img.shields.io/badge/license-MIT-green.svg" alt="License"> |
| <img src="https://img.shields.io/badge/python-3.8+-blue.svg" alt="Python"> |
| <img src="https://img.shields.io/badge/neo4j-5.13-brightgreen.svg" alt="Neo4j"> |
| </div> |
|
|
| ## ๐งฌ Overview |
|
|
| Cancer@Home v2 is a comprehensive distributed computing platform for cancer genomics research that combines **BOINC distributed computing**, **GDC cancer data analysis**, **sequence processing (FASTQ/BLAST)**, and **Neo4j graph visualization** into a unified, easy-to-use system. |
|
|
| Inspired by [Cancer@Home v1](https://www.herox.com/DCx/round/516/entry/23285) and [Andrew Kamal's Neo4j Dashboard](https://medium.com/neo4j/visualize-cancer-1c80a95f5bb4), this platform makes cancer genomics research accessible, distributed, and visual. |
|
|
| ## ๐ฏ Key Features |
|
|
| - ๐ **Interactive Web Dashboard** - Modern UI with real-time visualizations |
| - ๐ **Neo4j Graph Database** - Model complex gene-mutation-patient relationships |
| - โก **BOINC Integration** - Distributed computing for intensive analyses |
| - ๐ **GraphQL API** - Flexible data querying |
| - ๐งช **Bioinformatics Pipeline** - FASTQ processing, BLAST alignment, variant calling |
| - ๐ **GDC Portal Integration** - Access TCGA/TARGET cancer datasets |
| - ๐ **Quick Setup** - Running in under 5 minutes |
|
|
| ## ๐๏ธ Architecture |
|
|
| ``` |
| โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ |
| โ Web Dashboard (D3.js + Chart.js) โ |
| โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค |
| โ FastAPI Backend (REST + GraphQL) โ |
| โโโโโโโโฌโโโโโโโฌโโโโโโโฌโโโโโโโฌโโโโโโโโโโโโโโโโโค |
| โNeo4j โBOINC โ GDC โFASTQ โ BLAST/Variant โ |
| โGraph โClientโ API โ QC โ Calling โ |
| โโโโโโโโดโโโโโโโดโโโโโโโดโโโโโโโดโโโโโโโโโโโโโโโโโ |
| ``` |
|
|
| ## ๐ฆ Installation |
|
|
| ### Prerequisites |
| - Python 3.8+ |
| - Docker Desktop |
| - 8GB RAM (16GB recommended) |
|
|
| ### Quick Start |
|
|
| **Windows:** |
| ```powershell |
| git clone https://huggingface.co/OpenPeerAI/CancerAtHomeV2 |
| cd CancerAtHomeV2 |
| .\setup.ps1 |
| python run.py |
| ``` |
|
|
| **Linux/Mac:** |
| ```bash |
| git clone https://huggingface.co/OpenPeerAI/CancerAtHomeV2 |
| cd CancerAtHomeV2 |
| chmod +x setup.sh |
| ./setup.sh |
| python run.py |
| ``` |
|
|
| Then open: **http://localhost:5000** |
|
|
| ## ๐ Usage |
|
|
| ### Web Dashboard |
| Access the interactive dashboard at http://localhost:5000 with: |
| - **Dashboard Tab**: Overview statistics and mutation charts |
| - **Neo4j Visualization**: Interactive graph of cancer relationships |
| - **BOINC Tasks**: Submit and monitor distributed computing tasks |
| - **GDC Data**: Browse and download cancer datasets |
| - **Pipeline Tools**: Run FASTQ QC, BLAST, and variant calling |
|
|
| ### GraphQL API |
|
|
| Query cancer data at http://localhost:5000/graphql |
|
|
| **Example: Get mutations in TP53 gene** |
| ```graphql |
| query { |
| mutations(gene: "TP53") { |
| mutation_id |
| chromosome |
| position |
| consequence |
| } |
| } |
| ``` |
|
|
| **Example: Get patient statistics** |
| ```graphql |
| query { |
| cancerStatistics(cancer_type_id: "BRCA") { |
| total_patients |
| total_mutations |
| avg_mutations_per_patient |
| } |
| } |
| ``` |
|
|
| ### REST API |
|
|
| **Database Summary:** |
| ```bash |
| curl http://localhost:5000/api/neo4j/summary |
| ``` |
|
|
| **Submit BOINC Task:** |
| ```bash |
| curl -X POST http://localhost:5000/api/boinc/submit \ |
| -H "Content-Type: application/json" \ |
| -d '{"workunit_type": "variant_calling", "input_file": "sample.fastq"}' |
| ``` |
|
|
| ### Python API |
|
|
| **FASTQ Processing:** |
| ```python |
| from backend.pipeline import FASTQProcessor |
| |
| processor = FASTQProcessor() |
| stats = processor.calculate_statistics("input.fastq") |
| filtered = processor.quality_filter("input.fastq") |
| ``` |
|
|
| **Variant Calling:** |
| ```python |
| from backend.pipeline import VariantCaller, VariantAnalyzer |
| |
| caller = VariantCaller() |
| vcf_file = caller.call_variants("alignment.bam", "reference.fa") |
| variants = caller.filter_variants(vcf_file) |
| |
| analyzer = VariantAnalyzer() |
| cancer_variants = analyzer.identify_cancer_variants(variants) |
| tmb = analyzer.calculate_mutation_burden(variants) |
| ``` |
|
|
| **Neo4j Queries:** |
| ```python |
| from backend.neo4j import DatabaseManager |
| |
| db = DatabaseManager() |
| query = """ |
| MATCH (g:Gene {symbol: 'TP53'})<-[:AFFECTS]-(m:Mutation) |
| RETURN m.position, m.consequence |
| """ |
| results = db.execute_query(query) |
| db.close() |
| ``` |
|
|
| ## ๐ Data Model |
|
|
| ### Neo4j Graph Schema |
|
|
| **Nodes:** |
| - **Gene**: Genes with mutations (TP53, BRCA1, KRAS, etc.) |
| - **Mutation**: Genetic variants with position and consequence |
| - **Patient**: Individual cases with demographics |
| - **CancerType**: Cancer classifications (BRCA, LUAD, COAD, GBM) |
|
|
| **Relationships:** |
| - `Gene โ AFFECTS โ Mutation` |
| - `Patient โ HAS_MUTATION โ Mutation` |
| - `Patient โ DIAGNOSED_WITH โ CancerType` |
|
|
| ### Sample Data Included |
|
|
| - **7 Genes**: TP53, BRAF, BRCA1, BRCA2, PIK3CA, KRAS, EGFR |
| - **5 Mutations**: Cancer-associated variants |
| - **5 Patients**: Representative TCGA cases |
| - **4 Cancer Types**: BRCA, LUAD, COAD, GBM |
|
|
| ## ๐ง Technology Stack |
|
|
| - **Backend**: FastAPI, Python 3.8+ |
| - **Database**: Neo4j 5.13 (Graph Database) |
| - **API**: GraphQL (Strawberry), REST |
| - **Frontend**: HTML5, CSS3, JavaScript, D3.js, Chart.js |
| - **Bioinformatics**: Biopython, BLAST+ |
| - **Data Source**: GDC Portal API (TCGA/TARGET) |
| - **Infrastructure**: Docker, Docker Compose |
| - **Distributed Computing**: BOINC Framework |
|
|
| ## ๐ Documentation |
|
|
| - [README.md](README.md) - Complete project overview |
| - [QUICKSTART.md](QUICKSTART.md) - 5-minute setup guide |
| - [USER_GUIDE.md](USER_GUIDE.md) - Detailed usage documentation |
| - [GRAPHQL_EXAMPLES.md](GRAPHQL_EXAMPLES.md) - Query examples |
| - [ARCHITECTURE.md](ARCHITECTURE.md) - System architecture |
| - [PROJECT_SUMMARY.md](PROJECT_SUMMARY.md) - Feature overview |
|
|
| ## ๐ Use Cases |
|
|
| 1. **Cancer Research**: Analyze genomics data with distributed computing |
| 2. **Education**: Learn cancer genetics and bioinformatics |
| 3. **Data Visualization**: Explore gene-mutation-patient relationships |
| 4. **Pipeline Development**: Test bioinformatics workflows |
| 5. **Graph Analytics**: Query complex biological networks |
|
|
| ## ๐ฌ Supported Cancer Projects |
|
|
| - **TCGA-BRCA**: Breast Cancer (1,098 cases) |
| - **TCGA-LUAD**: Lung Adenocarcinoma (585 cases) |
| - **TCGA-COAD**: Colon Adenocarcinoma (461 cases) |
| - **TCGA-GBM**: Glioblastoma (617 cases) |
| - **TARGET-AML**: Acute Myeloid Leukemia (238 cases) |
|
|
| ## ๐ Bioinformatics Pipeline |
|
|
| ### FASTQ Processing |
| - Quality control and filtering |
| - Adapter trimming |
| - Statistics calculation |
| - QC report generation |
|
|
| ### BLAST Alignment |
| - BLASTN for nucleotide sequences |
| - BLASTP for protein sequences |
| - Hit filtering by identity/e-value |
| - Homology detection |
|
|
| ### Variant Calling |
| - VCF generation from alignments |
| - Quality filtering |
| - Cancer variant identification |
| - Tumor mutation burden (TMB) calculation |
|
|
| ## ๐ Access Points |
|
|
| - **Application**: http://localhost:5000 |
| - **API Docs**: http://localhost:5000/docs (Swagger UI) |
| - **GraphQL**: http://localhost:5000/graphql |
| - **Neo4j Browser**: http://localhost:7474 (neo4j/cancer123) |
|
|
| ## ๐ ๏ธ Configuration |
|
|
| Edit `config.yml` to customize: |
|
|
| ```yaml |
| neo4j: |
| uri: "bolt://localhost:7687" |
| password: "cancer123" |
| |
| gdc: |
| download_dir: "./data/gdc" |
| projects: ["TCGA-BRCA", "TCGA-LUAD", "TCGA-COAD"] |
| |
| pipeline: |
| fastq: |
| quality_threshold: 20 |
| min_length: 50 |
| blast: |
| evalue: 0.001 |
| num_threads: 4 |
| ``` |
|
|
| ## ๐ค Contributing |
|
|
| Contributions are welcome! This project is open source under the MIT License. |
|
|
| ### Development Setup |
| ```bash |
| python -m venv venv |
| source venv/bin/activate # or venv\Scripts\activate on Windows |
| pip install -r requirements.txt |
| pytest test_cancer_at_home.py |
| ``` |
|
|
| ## ๐ License |
|
|
| MIT License - See [LICENSE](LICENSE) file |
|
|
| Copyright (c) 2025 OpenPeer AI, Riemann Computing Inc., Bleunomics, Andrew Magdy Kamal |
|
|
| ## ๐ Acknowledgments |
|
|
| ### Inspiration |
| - [Cancer@Home v1](https://www.herox.com/DCx/round/516/entry/23285) - HeroX DCx Challenge |
| - [Andrew Kamal's Neo4j Cancer Visualization](https://medium.com/neo4j/visualize-cancer-1c80a95f5bb4) |
|
|
| ### Data Sources |
| - [Genomic Data Commons (GDC) Portal](https://portal.gdc.cancer.gov/) |
| - The Cancer Genome Atlas (TCGA) Program |
| - Therapeutically Applicable Research to Generate Effective Treatments (TARGET) |
|
|
| ### Technologies |
| - Neo4j Graph Database |
| - BOINC Distributed Computing Project |
| - Biopython Community |
| - FastAPI Framework |
|
|
| ## ๐ฅ Authors |
|
|
| - **OpenPeer AI** - Core development and architecture |
| - **Riemann Computing Inc.** - Distributed computing integration |
| - **Bleunomics** - Bioinformatics pipeline and genomics expertise |
| - **Andrew Magdy Kamal** - Graph database design and visualization |
|
|
| ## ๐ Support |
|
|
| - **Documentation**: See project documentation files |
| - **Issues**: Check logs in `logs/cancer_at_home.log` |
| - **Configuration**: Review `config.yml` |
| - **Health Check**: http://localhost:5000/api/health |
|
|
| ## ๐ฎ Roadmap |
|
|
| ### Planned Features |
| - Machine learning for mutation prediction |
| - Multi-omics data integration (RNA-seq, proteomics) |
| - Survival analysis and clinical outcomes |
| - Advanced graph algorithms (PageRank, community detection) |
| - Cloud deployment support (AWS, Azure, GCP) |
| - Mobile-responsive design |
| - User authentication and authorization |
|
|
| ## ๐ Statistics |
|
|
| - **Lines of Code**: ~5,000+ |
| - **Modules**: 9 Python modules |
| - **API Endpoints**: 15+ REST + GraphQL |
| - **Documentation**: 2,500+ lines |
| - **Setup Time**: < 5 minutes |
| - **Sample Data**: 7 genes, 5 mutations, 5 patients |
|
|
| ## ๐ฏ Citation |
|
|
| If you use Cancer@Home v2 in your research, please cite: |
|
|
| ```bibtex |
| @software{cancer_at_home_v2, |
| title = {Cancer@Home v2: Distributed Cancer Genomics Research Platform}, |
| author = {OpenPeer AI and Riemann Computing Inc. and Bleunomics and Andrew Magdy Kamal}, |
| year = {2025}, |
| url = {https://huggingface.co/OpenPeerAI/CancerAtHomeV2}, |
| license = {MIT} |
| } |
| ``` |
|
|
| ## ๐ท๏ธ Tags |
|
|
| `cancer-genomics` `bioinformatics` `neo4j` `graph-database` `distributed-computing` `boinc` `fastq` `blast` `variant-calling` `gdc-portal` `tcga` `target` `graphql` `fastapi` `python` `docker` `healthcare` `precision-medicine` `computational-biology` |
|
|
| --- |
|
|
| **Made with โค๏ธ by OpenPeer AI, Riemann Computing Inc., Bleunomics, and Andrew Magdy Kamal** |
|
|
| **For cancer research, by researchers, accessible to all.** |
|
|