Teaching artificial intelligence about complicated structures

While natural language text and image generators like ChatGPT and DALL-E are changing the writing and art worlds, artificial intelligence (AI) is reshaping the scientific landscape by making it possible to analyze extremely large data sets. However, even AI is still unreliable when inferring patterns and making decisions in highly interconnected systems, like the relationships between thousands of different people in a community or the hundreds of genes that underlie a single disease. In a paper published earlier this month in the prestigious Journal of the Royal Statistical Society Series B, Professor of Statistics and Data Science and Biostatistics at UCLA, Xiaowu Dai, presents a new statistical technique for using AI to make accurate scientific estimates of these massive, complicated systems.

When scientists use AI to study complex systems, they aren’t just seeking a single prediction – they want to understand how different variables interact. However, as the number of interacting variables increases, the complexity of the model grows exponentially. This leads to a well-known challenge called the “curse of interaction,” where traditional AI often produces unreliable estimates of complex structures. In many cases, AI simplifies or even ignores these interactions to make the problem more solvable. However, in fields like genomics, where understanding gene interactions is critical, this can result in inaccurate conclusions. “In problems with many interacting variables, the curse of interaction makes it difficult to achieve fast and accurate estimates,” says Professor Dai. “Often, there isn’t enough data at the edge between the scientific known and the unknown, forcing AI to make trade-offs between interpretability and accuracy.”

The new technique Professor Dai developed (called nonparametric estimation via partial derivatives) allows scientists to estimate interactions between variables without being hindered by the curse of interaction. This approach requires data about how each variable changes in relation to each other – either observed or estimated – paired with original data. By combining these two sources of information, the method can provide more accurate estimates of underlying structures, and as a result, more reliable predictions. “This method has the potential to address a wide range of questions across multiple disciplines,” says Professor Dai. “We believe this method is a useful tool for practitioners in data-driven fields, ranging from science and social science to engineering.”

Tags: AI, mathematics, Research