A Twist on Traditional Computational Methods to Advance Mathematical Modeling
Dr. Khanh N. Dinh’s (Associate Research Scientist in the Irving Institute for Cancer Dynamics and the Department of Statistics) group has developed a more efficient random forest-based Approximate Bayesian Computation method to enable more reliable and robust mathematical modeling across many fields of science. They share their findings in a new study published in Statistics and Computing. The work was conducted in collaboration with Professor Simon Tavaré’s lab, Director of Irving Institute for Cancer Dynamics.
Challenges of Traditional ABC Methods in Mathematical Modeling
Scientists use mathematical modeling to explain complex biological phenomena, such as the evolution of a disease. Making these models as accurate as possible requires identifying the right parameters that best match real-world data.
Approximate Bayesian Computation (ABC) is one of the basic approaches for estimating parameters in mathematical modeling where the likelihood function is not available or difficult to compute. As such, ABC plays an important role in developing data-centric models in science.
“Traditional ABC methods require a number of hyperparameters that the users need to tune for specific problems in order to balance between accuracy and computational runtime. This can result in a lot of trials and errors,” explains Dinh.
To overcome this issue, researchers started implementing a machine-learning technique called random forest (RF) methodology within ABC. RF-based methods are significantly less sensitive to noise, enabling the use of many different statistics to fully characterize the data.
However, building RFs can be computationally expensive, especially to handle complex problems that require large numbers of simulations. Thus, the development of a faster and more efficient RF-based ABC method promises to facilitate more reliable and robust models across many different fields of science.
Combining Random Forests with ABC Sequential Monte Carlo
To address these computational challenges, Dinh’s group incorporated RFs into the framework of ABC Sequential Monte Carlo (SMC) to improve efficiency. In this approach, the model repeatedly uses its latest iteration to produce the corresponding simulations, training a new RF based on these results in the next cycle.
As the process continues, the sampled parameters get closer to the true values. This approach also reaches more accurate results faster and requires less computing power than existing RF-based methods.
“We observed that ABC-SMC-(D)RF converges to the posterior distributions more quickly than previous RF methods, thus saving both computational runtime and storage associated with RF formation in practical applications,” notes Dinh.
Applying the New ABC-SMC-(D)RF to Study Cancer Evolution
Dinh’s team is applying the new methodology to develop mathematical models for genomic instability, mutation progression, and cancer evolution. The ABC-SMC-(D)RF approach estimates parameters in these different applications more efficiently, requiring fewer simulations and limited adjustments.
For collaborator Simon Tavaré, “the need for parameter estimation transcends cancer research or even biology. We expect that ABC-SMC-(D)RF and other RF-based methodologies will be of help for modeling in different scientific areas.”
“Early results indicate that our method works well for these applications with minimal modifications. We therefore anticipate that it will play an integral role in many of our future modeling efforts,” adds Dinh.
Looking Ahead
Dinh sees several ways to enhance the ABC-SMC-(D)RF technique even more and explore its different applications in mathematical modeling. For instance, the algorithm could finish early and save time if it were able to detect more reliably when the results have converged. The method has also demonstrated some potential in model selection to compare different mathematical models and identify which one best fits the real data.
Publication Details
The paper was a collaboration between Khanh Dinh, Cécile Liu (former IICD intern), Xijin Xiang (former IICD Research Staff Member), Zhihan Liu (former Statistics intern), and Simon Tavaré. The paper was published in Volume 35, article number 219 of Statistics and Computing (DOI: 10.1007/s11222-025-10748-x).
