Evaluation of the treatment effect on cytogenetic ordered categorical response is considered in patients treated for chronic myelogenous leukaemia (CML) in a clinical trial initiated by the East German Group for Hematology and Oncology. A simulation model for the cytogenetic response (per cent of Philadelphia chromosome positive metaphases) serially measured in CML patients was constructed to describe roughly the sparse information available in medical literature. The model was used to construct a summary measure of response and to formulate the treatment effect as a regression with U-shape distributed ordered categorical data. Two simple models (vertical shift model and pooled conditional response model) were specifically designed to model the treatment effect 'observed' in a simulated 'pilot' data set. The powers were contrasted with the traditional proportional odds and binary models. The comparison was based both on repeated sampling from the simulated model and on bootstrap of 'given' pilot data set. We show that the specific models that address the treatment effect directly (as anticipated from pilot data) can gain in power as compared to the traditional proportional odds model when evaluated by bootstrap. However, the proportional odds model appears to be better with repeated sampling from the simulation model. To explain this discrepancy we generated 'pilot data sets' repeatedly from the simulation model and showed that the ordering of the bootstrap power estimates is unstable with reasonably complex models dependent on the random fall of the pilot data sets. This phenomenon clearly limits the usefulness of subtle modelling the form of the treatment difference observed in a small pilot data set.