Researchers unveiled how artificial intelligence is advancing rheumatology – from sharper imaging and cancer prediction to remote monitoring and robotic patient support.
Artificial intelligence (AI) is rapidly transitioning from theoretical promise to practical reality in healthcare — and rheumatology is no exception.
At this year’s EULAR Congress in Barcelona, researchers from across the globe showed how AI tools are reshaping the diagnosis, monitoring and management of complex rheumatic diseases.
Smarter imaging for better monitoring
High-resolution computed tomography (HRCT) is the gold standard for evaluating interstitial lung disease (ILD), a serious complication in systemic sclerosis (SSc).
But new research led by Francesca Motta showed that AI-assisted interpretation may significantly outperform traditional radiologist assessments. Her observational study of 33 patients with SSc-ILD found that AI was more precise in detecting progression in SSc-ILD and aligned better with pulmonary function test results — potentially enabling earlier and more accurate interventions.
Of the 33 patients, 79% were female with a median age of 57 years (IQR 47-73), disease duration of five years (IQR 2-6), 76% anti-Scl70, 9% anti-centromere, 6% anti-RNA pol III, 6% anti-Pm/Scl and 3% anti-nuclear antibody positive.
Forced vital capacity (FVC), forced expiratory volume in one second (FEV1) and diffusing capacity for carbon monoxide (DLCO) also adjusted for alveolar volume were assessed at both time points. HRCT images were evaluated at two time points one year apart. Visual scoring was performed by two radiologists with expertise in thoracic imaging, who evaluated the images independently and resolved any disagreements through consensus.
Related
AI-assisted analysis was performed using Thoracic VCAR software to quantify volumes and percentages of ground glass opacities, fibrotic lung involvement and normal lung. Patients were classified as having progressive or non-progressive ILD based on Erice criteria. Treatment changes between the two time points were also recorded.
The researchers found that AI-assisted quantitative HRCT analysis outperformed visual scoring in assessing the progression of fibrosis in patients with SSc-ILD and showed more significant correlations with PFT values.
“Subtle changes detected by AI-assisted quantitative HRCT analysis may better suggest disease progression in patients with SSc-ILD over time,” they wrote.
“Longitudinal prospective validation is needed to integrate AI into clinical practice for SSc-ILD management.”
Similarly, Seulkee Lee presented findings around AI in diagnosis, investigating a deep learning model that analysed sacral MRIs to identify axial spondyloarthritis. Using advanced algorithms to assess both inflammatory and structural changes, the system not only performed with high sensitivity and specificity but also flagged patients who met clinical criteria but fell outside standard imaging definitions – underscoring AI’s potential to bridge diagnostic gaps.
In ultrasound, Claus Juergen Bauer explored how a supervised deep learning model could classify lesions indicative of giant cell arteritis. With a training dataset of 3800 images from 244 patients, the model excelled in identifying abnormalities in key arteries, though challenges remained in smaller branches. Future efforts will focus on increasing dataset diversity and validation across multiple centres.
Predicting risks and improving outcomes
AI’s potential isn’t limited to imaging. Antonio Tonutti’s presentation showcased two machine learning models aimed at predicting “interceptable” cancers (those diagnosed synchronously or after the first non-Raynaud symptom) in SSc patients. Drawing on a wide range of clinical and serological data, the models achieved accuracy of 73-79%, although there were differences in in sensitivity, precision and specificity between the two, with no one model winning out over the other in all parameters.
Breast cancer was the most common malignancy (32%), followed by lung (16%), gynaecological (8%), colorectal (7.5%) and haematological cancers (7%). Key predictive features included ILD, digital ulcers and high CRP levels, while treatment with mycophenolate mofetil appeared protective. These findings suggest a new frontier in personalised cancer screening for at-risk populations and an improvement in early cancer detection in SSc patients.
Another group shared work on using large language models for risk assessments. Pallavi Vij and her team assessed large language models (LLMs) for their ability to provide osteoporosis care recommendations. These tools showed promise in risk stratification and referral decisions but lagged behind in treatment advice, reinforcing the need for clinical oversight and ongoing validation.
The researchers evaluated the performance of Claude (Anthropic) in analysing 50 clinician-developed fictional osteoporosis cases. The LLM was configured with the National Osteoporosis Guideline Group (NOGG) framework through prompt engineering for risk stratification and treatment recommendations.
Cases were independently assessed by two osteoporosis specialists (each with more than five years’ experience). Primary outcomes included accuracy of risk stratification (low, high and very high risk), treatment concordance and referral decision recommendations. Risk categories were defined as: low/high risk (manageable in primary care with lifestyle modifications or oral bisphosphonates) and very high risk (requiring specialist assessment).
Performance metrics included precision, recall and time efficiency measurements. The researchers additionally evaluated the system’s performance across varying case complexities, including challenging scenarios such as breast cancer patients on hormonal therapy.
The LLM demonstrated 90% accuracy in risk stratification compared to specialist assessment, with balanced precision and recall across risk categories (low/high risk: precision 0.92, recall 0.88; very high risk: precision 0.89, recall 0.91). Treatment recommendation concordance was 44% overall, improving to 60% for primary care cases, reflecting better alignment in less complex scenarios.
Referral decision concordance reached 90%, suggesting reliable triage capabilities. Time efficiency analysis showed marked differences: specialists required 10-12 minutes per case (approximately 24 hours total including administrative tasks), while the LLM completed all assessments in seven minutes. Detailed analysis revealed consistent performance across varying case complexities, with particularly strong correlation in identifying high-risk cases requiring immediate intervention.
Performance varied in scenarios lacking specific NOGG guidelines, such as breast cancer patients on hormonal therapy, where treatment decisions relied heavily on clinical judgment.
“LLMs demonstrate promising utility for osteoporosis risk stratification and referral triage, potentially reducing administrative burden while maintaining clinical accuracy,” the authors concluded.
“However, lower concordance in treatment recommendations highlights the continuing necessity of clinical expertise for therapeutic decision-making. Further validation studies are needed to evaluate real-world implementation, workflow integration, and cost-effectiveness.”
AI on your smartphone – and at your side
Technology is also transforming how disease activity is tracked outside the clinic. In a collaborative study spanning Lausanne, Bari and Bern, Marco Capodiferro demonstrated how smartphone cameras and deep learning algorithms could assess hand movement in rheumatoid arthritis patients.
By analysing finger flexion captured on video, the AI system predicted disease activity levels with strong accuracy, offering a potential breakthrough in remote monitoring, the researchers said.
“Computer vision-based hand motion tracking can effectively distinguish RA disease activity states and correlates with clinical and patient-reported outcome measures,” they wrote.
“This technology offers a promising avenue for telemedicine and remote monitoring, complementing subjective assessments with objective metrics. Future research should investigate potential confounding factors, such as hand deformities or osteoarthritis, and explore the integration of hand motion tracking into comprehensive remote monitoring platforms for RA patients.”
One of the most novel AI sessions came from Daan van Gorssel, who introduced a social robot designed to assist rheumatology patients with common questions and healthcare information.
The AI-powered robot, trained through natural language processing, was generally well received by both patients and clinicians. However, concerns were raised about adapting communication styles to varying education levels and maintaining the human touch in emotionally sensitive conversations.
“Both patients with rheumatic diseases and HCPs see the AI-reinforced Social Robot as a potentially valuable tool in a rheumatology outpatient clinical setting, particularly for providing information and supporting patient education,” the researchers concluded.
“However, both groups also highlighted the importance of maintaining human interaction and emphasised that the robot should complement, rather than replace, the role of healthcare professionals.
“While the robot’s information was generally viewed as accurate and relevant, there were concerns about its complexity and the need for human oversight.”
Across all the presentations, a common theme emerged – AI tools are poised to enhance but not replace clinical decision-making.
While they offer speed, accuracy, and scalability, their integration into real-world practice would require rigorous validation, ethical oversight, and close collaboration with clinicians, was the general consensus.
Annals of the Rheumatic Diseases, June 2025
Annals of the Rheumatic Diseases, June 2025
Annals of the Rheumatic Diseases, June 2025
Annals of the Rheumatic Diseases, June 2025
Annals of the Rheumatic Diseases, June 2025