abstract
- Many clinical studies are greatly dependent on an efficient identification of relevant datasets. This selection can be performed in existing health data catalogues, by searching for available metadata. The search process can be optimised through questioning-answering interfaces, to help researchers explore the available data present. However, when searching the distinct catalogues the lack of metadata harmonisation imposes a few bottlenecks. This paper presents a methodology to allow semantic search over several biomedical database catalogues, by extracting the information using a shared domain knowledge. The resulting pipeline allows the converted data to be published as FAIR endpoints, and it provides an end-user interface that accepts natural language questions.
- This work has received support from the EU/EFPIA Innovative Medicines Initiative 2 Joint Undertaking under grant agreement No 806968. AP and JRA are funded by the FCT - Foundation for Science and Technology (national funds) under the grants PD/BD/142877/2018 and SFRH/BD/147837/2019 respectively.