Many vision-based applications rely on logistic regression for embedding classification within a probabilistic context, such as recognition in images and videos or identifying disease-specific image phenotypes from neuroimages. Logistic regression, however, often performs poorly when trained on data that is noisy, has irrelevant features, or when the samples are distributed across the classes in an imbalanced setting; a common occurrence in visual recognition tasks. To deal with those issues, researchers generally rely on ad-hoc regularization techniques or model a subset of these issues. We instead propose a mathematically sound logistic regression model that selects a subset of (relevant) features and (informative and balanced) set of samples during the training process. The model does so by applying cardinality constraints (via l0-'norm' sparsity) on the features and samples. l0 defines sparsity in mathematical settings but in practice has mostly been approximated (e.g., via l1 or its variations) for computational simplicity. We prove that a local minimum to the non-convex optimization problems induced by cardinality constraints can be computed by combining block coordinate descent with penalty decomposition. On synthetic, image recognition, and neuroimaging datasets, we show that the accuracy of the method is higher than alternative methods and classifiers commonly used in the literature.