Molecular docking is an essential tool in structure-based drug discovery, widely utilized to model ligand-protein interactions and enrich potential hits. Among the different docking strategies, semiflexible docking (rigid-receptor and flexible-ligand model) is the most popular, benefiting from its balance of docking accuracy and speed. However, this approach ignores the conformational changes of proteins and hence demands suitable protein conformations as input. When the binding interaction adheres to an induced-fit model, flexible methods such as molecular dynamics simulation can be utilized, but they are computationally demanding. To balance between speed and accuracy, the flexible docking approach is an effective choice, as exemplified by AutoDock Vina and AutoDockFR, which treat selected protein side chains as flexible parts. However, the efficiency of flexible docking methods is yet to be improved for virtual screening usage. In this article, we introduce DSDPFlex, an improved flexible-receptor docking method accelerated by GPU parallelization. Beyond acceleration, optimizations with respect to sampling, scoring, and search space are implemented in DSDPFlex to further improve its capability in flexible tasks. In cross-docking evaluation, DSDPFlex demonstrates superior accuracy compared to AutoDock Vina and is 100 times faster than Vina in flexible-receptor tasks. We also show the advantage of flexible-receptor methods on suboptimal pockets and validate the advantage of DSDPFlex in screening on apo and AlphaFold2-predicted structures. With improvements in both efficiency and accuracy, DSDPFlex is expected to hold potential in future docking-based studies.