More than 6 million Americans are at risk for Alzheimer's Disease Related Dementias (ADRD), most of whom are 65 or older. The clock drawing test (CDT) is a quick, simple, and effective technique that has the potential advantage of self-management and screening for ADRD patients. Current CDT-based ADRD screening studies focus more on efficacy, involving many handcrafted features, ignoring data modalities, and lacking validation. This paper aims to propose a unified telemedicine framework for fully and semi-automatic effective early ADRD screening based on multimodal and agile data fusion, focusing on the interpretability and validation of the model by using gradient-weighted class activation mapping (Grad-CAM) and locally linear embedding (LLE). The datasets for this work include 1,662 samples of CDT images and related demographic and cognitive information. The fully automatic case involving only CDT images can achieve the highest AUC of 81% with a 75% recall rate in binary screening. The multimodal data fusion in the semi-automatic case can achieve up to 90% AUC with an 83% recall rate. The visualization of the Convolutional Neural Networks (CNN) shows that it can automatically obtain critical information about the outline, scale, and clock hands from CDT images, and the analysis of structured features shows that the memory test is key to effective ADRD screening.