VU Researchers Develop a Highly Versatile, Innovative Tool for Spatial Data Analysis

Sukurta: 22 July 2024

52676447494 88fad2e6cb kResearchers at Vilnius University (VU), PhD student Liudas Daumantas and Professor Andrej Spiridonov, have developed an innovative methodological approach and its software based on machine learning and presented these innovations in the journal "Palaeontology". The said developments focus on flexibility, allowing the applicability to a wide range of scientific problems using various data of various types, structures, and origins, extending beyond the intended niche application in palaeontology and biogeography.


"This approach and its software provide new perspectives and opportunities for bioregionalisation studies in palaeobiogeography. The machine learning-based methodological approach is called 'HespDiv', short for hierarchical spatial data subdivision. On the other hand, due to its versatility, the method can be applied in such diverse fields as epidemiology, criminology, and economics," explain VU researchers Prof. Spiridonov and Daumantas.


An Innovative Method for Understanding the Spatiotemporal Evolution of Life


The 'HespDiv' methodology is designed to determine the boundaries of bioregions by analysing spatial data of species composition and dividing this data into contiguous geographic regions (see Fig. 1). This tool is particularly useful in palaeobiogeography for studying the historical distribution of life on Earth. With its help, we can understand how nature in the past caused entire groups of species to split into separate bioregions over potentially long-time scales, where these generations of species had to coexist and co-evolve, and how these bioregions changed and interacted over geological time," says Prof. Spiridonov.


"The importance of the 'HespDiv' methodology in paleontological research lies in its ability to identify bioregions corresponding to units of the Bretskyan hierarchy. This new scheme of life hierarchy has received much attention, but until now, there was no method adapted to identify such eco-genealogical biota units," adds Prof. Spiridonov.


Main Principles and Advantages of the Methodology


"The main advantages of this methodology are flexibility and hierarchical structure. In short, 'HespDiv' works by repeated spatial subdivision of data from one study area into two increasingly smaller parts. Various forms of lines are used for subdivisions, which, following the selected criteria, best separate the data spatially. Such a process results in a neatly, hierarchically divided space corresponding to the desired data properties. Since hierarchical structures are very widespread in both natural and human systems, the ability to analyse various data origins in this way is very promising. We have strived to realise this possibility to the fullest by creating the most flexible software, allowing various data to be examined from different perspectives," says Daumantas.


According to scientists, this research method also ensures the spatial contiguity of the identified territories and makes it easy to assess the reliability of the results. The spatial contiguity of the identified territories is important because geographical proximity makes everything more interconnected within a contiguous area. For example, communities with identical species compositions, separated in space by a geographical barrier (e.g., mountains, rivers, seas), due to different genetic compositions and connections with different local natural environments and various geological or historical events, will eventually lose similarity and "tread their own evolutionary pathways." In other words, biological systems of spatially non-contiguous territories behave differently and, therefore, should be treated as different despite existing similarities. The same principle applies to human systems.


"Moreover, unlike many others, the 'HespDiv' method allows for assessing the reliability of the results. In paleobiogeography, the subjectivity of bioregional boundaries is a serious problem: different scientists favour different bioregionalization schemes, and different bioregionalization methods or their applications often lead to different results. The 'HespDiv' methodology stands out because it allows the evaluation of the statistical significance of each obtained subdivision and can show how bioregion boundaries vary depending on the method application (parameter values used) and data. These integrated automated methods for revealing the reliability and stability of results ensure the objectivity and ease of use of 'HespDiv'."

Figure 1

Figure 1: The main steps of the 'HespDiv' method: 1) generation and testing of straight subdivisions; 2) identification of the best straight subdivision; 3) preparation for non-linear subdivisions evaluation; 4) generation and testing of non-linear subdivisions; 5) identification of the best subdivision; 6) using the best subdivision to divide the study area into two regions and repeating steps 1–6 with the data of these regions.
To Be Applied by Epidemiologists, Criminologists, and Economists


"We can apply the 'HespDiv' method wherever we have more abundant spatial data, whose properties vary depending on the location, and there is a scientific or practical interest in understanding this spatial variation in properties. Thus, although the method was primarily developed to find objective boundaries between regions with the most different fossil compositions (relevant in paleobiogeography and, overall, in understanding past life), it can be perfectly applied in fields as diverse as epidemiology, criminology, or economics. For example, having spatial data on crimes and related socio-economic, demographic, and infrastructure indicators, using the developed methodology, it would be possible to reveal the boundaries between city parts where different factors determine crime rates," lists L. Daumantas.


According to the scientists, such an approach can help reveal that, for example, in some areas, crime is most related to the seclusion of the area, in others – to underdeveloped infrastructure (e.g., poor lighting, lack of cameras) and elsewhere – to gatherings of people. Thus, the method would help find the most effective ways to reduce crime rates, taking the area's specificity into account. Similarly, the method could be applied to examine the spatial statistics of diseases in epidemiology and the economic indicators (e.g., profitability of establishments) in economics.


Application of the 'HespDiv' Methodology in Paleobiogeography


The capabilities of 'HespDiv' have already been demonstrated using data on Miocene mammal distribution in the United States. The applied methodology revealed a hierarchical system of bioregions, the three main bioregions being conditioned by geographical dispersal barriers and the spatial variation of natural conditions as well as the stability of this variation over time (see Fig. 2). The strongest subdivisions are associated with topographically distinctive areas of Miocene in the USA( such as the Rocky Mountains, Basin and Range Province), active volcanoes in the Northwest, and the coastal climate and biota influenced by oceans.

Figure 2
Figure 2: The hierarchical system of bioregions identified using the 'HespDiv' methodology in the Miocene mammal distribution data in the United States. The labels of the subdivisions indicate the index number of the subdivision, the hierarchical rank of the subdivision (Roman numeral), the strength of the subdivision (Morisita-Horn similarity index value between the separated regions' data; the smaller, the stronger the subdivision), the statistical significance of the subdivision (asterisk). The most significant bioregions were distinguished by the first and fourth subdivisions (West Coast – Northwest, Central Plains, and Southeast territory).


According to the researchers, objectively determined bioregion boundaries and their hierarchical structures are very important for various paleontological studies, as they provide fundamental contextual information on the spatial structure of biological diversity. The bioregions identified by this methodology can be used as territorial units for sampling in comparative spatial studies.


"Providing a tool that integrates spatial contiguity and hierarchical structure, 'HespDiv' helps scientists explore the complex interplay between ecological dynamics and evolutionary history. Thus, 'HespDiv' is a significant step forward in biogeography, offering scientists a powerful tool to reveal the spatial structures and their evolution in geological time. The developed software is open-source and freely available to all interested users. It can be downloaded and installed in the R software environment," say VU researchers.


The research was funded under the project S-MIP-21-9 ‘Major Transitions in Macroevolution – The Role of Spatial Structuring.’