Background: Advances in cancer biology are increasingly dependent on integration of heterogeneous datasets. Large-scale efforts have systematically mapped many aspects of cancer cell biology; however, it remains challenging for individual scientists to effectively integrate and understand this data.
Results: We have developed a new data retrieval and indexing framework that allows us to integrate publicly available data from different sources and to combine publicly available data with new or bespoke datasets. Our approach, which we have named the cancer data integrator (CanDI), is straightforward to implement, is well documented, and is continuously updated which should enable individual users to take full advantage of efforts to map cancer cell biology. We show that CanDI empowered testable hypotheses of new synthetic lethal gene pairs, genes associated with sex disparity, and immunotherapy targets in cancer.
Conclusions: CanDI provides a flexible approach for large-scale data integration in cancer research enabling rapid generation of hypotheses. The CanDI data integrator is available at https://github.com/GilbertLabUCSF/CanDI .
Keywords: Data integration; Functional genomics; Multiomics; Synthetic lethality.
© 2021. The Author(s).