Delphi-2M Predicts Which Disease Will Strike You In the Next 20 Years
Here’s how a GPT-style model, Delphi-2M, can predict your future disease risks, based on your current medical record.
I want to introduce you to
as the guest author for this week's newsletter.Ayushi is the writer of
where she posts insightful articles about the use of technology that improves the health and lives of people.Medical professionals are undoubtedly heroes, of course. They perform extraordinary work, saving lives and healing patients daily. However, they cannot foresee which disease might develop in a patient over the next few years.
I mean, who can predict what is going to happen in the coming 20 years?
But, what if it was true? What if you could know now what deadly disease is going to appear in your body in the next 20 years?
An AI model called the Delphi-2M is here to make this possible.
Scientists have recently developed a GPT-based model trained on health data, which can predict the likelihood of developing more than 1,000 different diseases based on an individual’s past health history.
It not only tells you that you might get an “xyz” disease, but it also provides the estimated timing of when the disease may occur, for example, 1 year, 5 years, or 10 years.
Delphi-2M showed an AUC score of 0.76 (out of 1.0) across all diseases, which is pretty impressive.
Let’s now learn how scientists developed this model, which may revolutionize the concept of preventive medicine.
Development of Delphi-2M
The data used to train the model primarily came from the UK Biobank, a huge health database of about 500k people from across the UK.
For Delphi-2M, the researchers used a specific database called the “First occurrence data.” It records the very first time someone got diagnosed with each disease.
Furthermore, to ensure that Delphi-2M worked beyond just UK data, the scientists also used data from Denmark’s incredible health record system.
Denmark tracks and records every citizen’s health throughout their lifetime using a unique ID number. Their data covers about 796 different types of diagnoses from approximately 1.9 million people.
Delphi-2M is built on the same foundation as the technology behind ChatGPT. The difference is that it is designed to predict health problems instead of generating text.
Just as GPT understands and learns how language works by reading millions of texts, Delphi-2M learns how diseases develop by studying health records.
To train the model, every medical event in the data was treated as a token in language modelling. During training, the model learned to predict both the next event and the waiting time until the next event.
For example: “High blood pressure — 2 years later” or “Cardiac arrest — 5 years later”
Each of these lines from the data became a Bivariate pair, which helps the model not only predict which event will happen but also when it is likely to occur.
The model also uses something called the Competing Exponentials, which again helps the model to naturally capture both the likelihood of disease and the expected time until it appears.
Think of the exponentials as many stopwatches or timers, one for each possible disease a person might develop. For instance, one timer for heart disease, one for cancer, and one for kidney disease.
When the researchers enter the person’s medical history, the model assigns a “tick speed” to each disease.
Each of these timers has its own speed, which depends on a person’s medical history and the associated risk factors. For example, if someone has high cholesterol levels, the heart disease timer may tick faster, while for rare diseases, the timer ticks more slowly.
Next, all the timers start at the same moment, and they compete with each other to see which one of them rings first. Whichever timer rings first, the model predicts that the event will happen next.
For example, if the “cancer timer” ticks before the “heart disease timer,” the model will predict that cancer will appear next, and also tell you when it expects that to happen.
Thus, using the competing exponentials, the model captures the competition between diseases, as a person may be at risk for multiple conditions simultaneously in real life.
Additionally, diseases don’t occur at regular intervals in real life. Sometimes, nothing happens for years, and then a diagnosis may appear suddenly. The exponential timers handle these irregular gaps as well.
The interesting part about Delphi is that it looks at general health problems in the population, and then it adjusts those predictions for each person by adding their own health history. For example, for conditions like asthma or joint pain, the predictions are almost the same as the average for people of the same age and gender.
Here, personal history doesn’t matter much. But for predicting the risk of severe infection, the risk can differ from person to person.
A person who has weak immunity or has diabetes may have a higher chance of developing a severe infection than a person who is generally healthy. Thus, Delphi’s predictions vary between individuals, showing that it can spot important differences in individual health risks.
Let’s now dive into the impressive results shown by the Delphi-2M model.
How Well Does Delphi-2M Actually Work
To check how accurate the predictions of the Delphi-2M model are, the scientists use the AUC score, as we discussed earlier.
The model scores a 0.76 across all the diseases, which is quite impressive.
For 97% of all the diseases, it scores much better than just random guessing, which means that every disease follows some predictable pattern.
Surprisingly, the model gives the most accurate prediction of death. With an AUC score of 0.97, Delphi proves that it is extremely good at predicting when someone might die.

Another huge advantage of Delphi-2M is that it can predict about 1000 different diseases simultaneously at any point in someone’s life.
Most of the tools and computer programs available today can predict specific diseases, such as heart problems or cancer, but very few can predict the full range of human diseases. Delphi-2M outperforms or matches many current single-disease prediction tools while offering the unique advantage of being able to predict a person’s risk for almost every human disease.

Here is another interesting feature of Delphi: it can create complete, realistic stories for imaginary people.
When the researchers test the “fake-data-only” model of Delphi, it gives an accuracy score of 0.74.
This could be a massive advantage for researchers, as they can train powerful health prediction AI systems using entirely artificial data. This would potentially protect real patients’ vital information and still give the same results.

Let’s now understand how scientists tracked the influence of past diseases on future predictions.
To do this, they used SHAP analysis.
Let me break it down in simple terms.
SHAP, or Shapley Additive Explanations, is simply a way to answer the question of ‘which past event made the AI model predict that?’
Let’s take an example. Imagine the prediction is a team result, and every past event (like a diagnosis, a lab test, or a medication) is a player on that team.
For this team, SHAP asks, if we add each player one by one in all possible orders, how much does that player change the team’s final score on average?
The results of SHAP analysis indicate that recent events matter the most, long-standing illnesses increase the risk, and protective or healthy signals push this risk down.

From helping the health professionals catch the disease early to planning healthcare for the ageing population, Delphi’s use could transform medicine in the future.
Instead of looking at each disease separately, we can now view them as interconnected health events that build upon one another over a person’s lifetime. Instead of waiting for the symptoms, we might catch the disease years earlier.
While the technology of Delphi-2M may need more improvements, it is a big step forward in using AI to understand and predict human health.
References
Research paper titled “Learning the natural history of human disease with generative transformers” from Nature
Article titled “Which diseases will you have in 20 years? This AI accurately predicts your risks.”
I’d again like to thank
for writing this article for ‘Into AI’.Don’t forget to subscribe to her newsletter
where she shares insightful articles about the use of technology in improving the health and lives of people.