ABSTRACT Genetic toxicity data from various sources were integrated into a rigorously designed database using the ToxML schema. The public database sources include the U.S. Food and Drug Administration (FDA) submission data from approved new drug applications, food contact notifications, generally recognized as safe food ingredients, and chemicals from the NTP and CCRIS databases. The data from public sources were then combined with data from private industry according to ToxML criteria. The resulting "integrated" database, enriched in pharmaceuticals, was used for data mining analysis. Structural features describing the database were used to differentiate the chemical spaces of drugs/candidates, food ingredients, and industrial chemicals. In general, structures for drugs/candidates and food ingredients are associated with lower frequencies of mutagenicity and clastogenicity, whereas industrial chemicals as a group contain a much higher proportion of positives. Structural features were selected to analyze endpoint outcomes of the genetic toxicity studies. Although most of the well-known genotoxic carcinogenic alerts were identified, some discrepancies from the classic Ashby-Tennant alerts were observed. Using these influential features as the independent variables, the results of four types of genotoxicity studies were correlated. High Pearson correlations were found between the results of Salmonella mutagenicity and mouse lymphoma assay testing as well as those from in vitro chromosome aberration studies. This paper demonstrates the usefulness of representing a chemical by its structural features and the use of these features to profile a battery of tests rather than relying on a single toxicity test of a given chemical. This paper presents data mining/profiling methods applied in a weight-of-evidence approach to assess potential for genetic toxicity, and to guide the development of intelligent testing strategies.