In early-stage development of therapeutic monoclonal antibodies, assessment of the viability and ease of their purification typically requires extensive experimentation. However, the work required for upstream protein expression and downstream purification development often conflicts with timeline pressures and material constraints, limiting the number of molecules and process conditions that can reasonably be assessed. Recently, high-throughput batch-binding screen data along with improved molecular descriptors have enabled development of robust quantitative structure-property relationship (QSPR) models that predict monoclonal antibody chromatographic binding behavior from the amino acid sequence. Here, we describe a QSPR strategy for in silico monoclonal antibody purification process fit assessment. Principal Component Analysis is applied to extract a one-dimensional basis for comparison of molecular chromatographic binding behavior from multi-dimensional high-throughput batch-binding screen data. Kernel Ridge Regression is used to predict the first principal component for new molecular sequences. This workflow is demonstrated with a set of 97 monoclonal antibodies for five chromatography resins in two salt types across a range of pH and salt concentrations. Model development benchmarks four descriptor sets from biophysical structural models and protein language models. The investigation illustrates the value QSPR models can provide to purification process fit assessment, and selection of resins and operating conditions from sequence alone.
Keywords: Antibody; QSAR; QSPR; biophysical; chromatography; computational; developability; machine learning; manufacturability; purification.