Standardization and interpretable analysis of geological database using retrieval-augmented large language model
Published in Geodata and AI, 2025
Urban geological characterization requires standardizing heterogeneous borehole data subject to interpretive variability from engineering practices. Current linguistic models struggle with dynamic yet limited geological datasets and lack transparent interpretation. This study develops a novel framework that incorporates large language models (LLMs) for intelligent formation-type standardization for urban geological databases. The Macao geological database, comprising 100 boreholes from 21 construction projects, has been established by engineering geologists. Input strategies, model uncertainty, and prediction states are analyzed to reveal performance-semantic relationships, optimizing expert-computer interaction for enhanced performance. Overall, the key contributions are as follows: (1) An expert-validated geological database across Macao is first developed for benchmarking; (2) A domain-adapted retrieval-augmented LLM framework is proposed for geological standardization; (3) Interpretability analysis is performed to link retrievals to model behaviors; (4) Model interpretability further guides further knowledge inputs to enhance performance. The study addresses critical gaps in enhancing transparent LLMs, supporting reliable human-AI collaboration for advanced geotechnical applications.
