Antidepressants exhibit similar efficacy, but varying tolerability, in randomized controlled trials. Predicting tolerability in real-world clinical populations may facilitate personalization of treatment and maximize adherence. This retrospective longitudinal cohort study aimed to determine the extent to which incorporating patient history from electronic health records improved prediction of unplanned treatment discontinuation at index antidepressant prescription. Clinical data were analyzed from individuals from health networks affiliated with two large academic medical centers between March 1, 2008 and December 31, 2014. In total, the study cohorts included 51,683 patients with at least one International Classification of Diseases diagnostic code for major depressive disorder or depressive disorder not otherwise specified who initiated antidepressant treatment. Among 70,121 total medication changes, 16,665 (23.77%) of them were followed by failure to return; maximum risk was observed with paroxetine (27.71% discontinuation), and minimum with venlafaxine (20.78% discontinuation); Mantel-Haenzel χ2 (8 df) = 126.44, p = 1.54e-23 <1e-6. Models incorporating diagnostic and procedure codes and medication prescriptions improved per-medication Areas Under the Curve (AUCs) to a mean of 0.69 [0.64-0.73] (ranging from 0.62 for paroxetine to 0.80 for escitalopram), with similar performance in the second, replication health system. Machine learning applied to coded electronic health records facilitates identification of individuals at high-risk for treatment dropout following change in antidepressant medication. Such methods may assist primary care physicians and psychiatrists in the clinic to personalize antidepressant treatment on the basis not solely of efficacy, but of tolerability.