^{1}

^{1}

^{1}

^{2}

^{1}

This article was submitted to Mathematics of Computation and Data Science, a section of the journal Frontiers in Applied Mathematics and Statistics

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

The movement of atmospheric air masses can be seen as a continuous flow of gases and particles hovering over our planet, and it can be locally simplified by means of three-dimensional trajectories. These trajectories can hence be seen as a way of connecting distant areas of the globe during a given period of time. In this paper we present a mathematical formalism to construct spatial and spatiotemporal networks where the nodes represent the subsets of a partition of a geographical area and the links between them are inferred from sampled trajectories of air masses passing over and across them. We propose different estimators of the intensity of the links, relying on different bio-physical hypotheses and covering adjustable time periods. This construction leads to a new definition of spatiotemporal networks characterized by adjacency matrices giving, e.g., the probability of connection between distant areas during a chosen period of time. We applied our methodology to characterize tropospheric connectivity in two real geographical contexts: the watersheds of the French region Provence-Alpes-Côte d’Azur and the coastline of the Mediterranean Sea. The analysis of the constructed networks allowed identifying a marked seasonal pattern in air mass movements in the two study areas. If our methodology is applied to samples of air-mass trajectories, with potential implications in aerobiology and plant epidemiology, it could be applied to other types of trajectories, such as animal trajectories, to characterize connectivity between different components of the landscape hosting the animals.

Atmospheric air masses are volumes of air with a defined temperature and water vapor content that have long been known to rule fundamental atmospheric phenomena like weather and air currents. Their composition is mostly inert gases, but both organic and inorganic particles have been found to linger in high-altitude air as a consequence of the constant interaction of air masses with the earth’s surface below them. A non-exhaustive list includes gases and minerals like wildfire smoke, radioactive material, dust, sand, volcanic ash and sea salt, but also living organisms such as pollen, fungal spores, bacteria, virus and small insects. Despite the relative sparse density of these particles with respect to the volume of an air mass, their presence and transportation across the planet has proven to have strong effects on many phenomena impacting human health and safety (pollen [

The rise in the number of publications on these subjects suggests a growing interest of the scientific community on the effects of air-mass movements on the biosphere, that has surely been boosted by recent available developments, such as the Hybrid Single-Particle Lagrangian Integrated Trajectory model (HYSPLIT [

The vast majority of studies focused on isolated events, such as dust storms or peaks of air pollutants, that are rather concentrated in time (from few hours to few weeks) and/or space (just a few locations such as cities). Nonetheless, the movement of air masses is expected to have impacts on a broader spatiotemporal scale, as reviewed in recent studies [

From a generic statistical point of view, we aim to (i) estimate the weighted and directed edges of a graph using a sample of trajectories of

In the following sections, we first introduce the definitions and properties that will allow us to describe and then estimate connections between points/areas in space via spatiotemporal trajectories. Then, we propose several types of measures to model diverse types of connections. The expected output consists of a spatiotemporal graph describing the network of links induced by trajectories. It’s worth noting that our approach is meant to infer connectivity induced by air-mass movements and it is readily applicable to HYSPLIT-type data, but we have maintained a sufficient level of generality to be applied to other phenomena, provided that trajectory data are available (e.g., animal trajectories). Finally, we apply our method to two case studies concerning the coastline of the Mediterranean sea and the French region of Provence-Alpes-Côte d’Azur. The two case studies have different spatiotemporal granularities and they will be used to provide examples of application of the proposed methodology.

In this section we show how a set of trajectories evolving within space during a finite time interval can be used to construct pertinent spatiotemporal networks. We first recall some basic definitions related to networks (

Network theory (a.k.a. graph theory) is a mathematical formalism introduced by Leonhard Euler to describe the famous Königsberg bridge problem [_{
ij
}, is non-zero an edge exists between

In this paper, a network is said to be

Types of temporal networks: The time of activation is indicated within the grey bar next to the edges (ranging between 0 and 8). For contact networks

We consider a function

In general, a flow is defined with respect to a possibly time-dependent vector field

Definition 2.1. The

Example 2.1. The notions of flow and trajectory segment can be adapted to cope with air mass trajectories over the Earth surface. In this case, the spatial domain

Example 2.2. Animal movements and behaviour activities can also be represented with the notions of flows and trajectory segments, providing, for instance, the animal locations and the covariate value indicating whether animals are feeding or not. In this case,

Trajectory-based networks are grounded on the notion of

Definition 2.2. Let

Diverse types of the pointwise connectivity can be constructed, either using trajectory segments generated by

Illustration of contact-based, duration-based and length-based pointwise connectivities:

Example 2.3. The contact-based pointwise connectivity is defined by:

Remark 1. This example based on the simple contact between sets can be considered as too strict from a statistical and measure-theory perspective since the length or the duration of a contact may be null. Instead, a positive constraint on contact length for example can be used to define another version of the contact-based pointwise connectivity:

Example 2.4. The duration-based pointwise connectivity is defined by:

Example 2.5. The length-based pointwise connectivity is defined by:

Example 2.6. The pointwise connectivity based on local volume is defined by:

More sophisticated specifications of the pointwise connectivity can be proposed by incorporating spatio-temporal covariates in its formulation, like in the following examples.

Example 2.7. Let

Example 2.8. Let

Illustration of pointwise connectivity based on a covariate measured along the trajectory: (see Example 2.8). In this illustration, the passage of the particle in the elliptic spatial domain

Remark 2. If in Example 2.8, the altitude of the air mass is incorporated as the third coordinate of

Remark 3. Example 2.8 could be generalized by considering a measure, say μ, over

Each pointwise connectivity defined above can be used for defining the integrated connectivity, which measures the quantitative directional link between two subsets

Definition 2.3. Let

Definition 2.3. encompasses connectivities generated by either forward or backward trajectories, depending on the sign of δ. The use of a unique duration

The measure ν in Definition 2.3 can be continuous, discrete or hybrid over

Definition 2.4. A

Definition 2.4. corresponds to a spatial trajectory-based network evaluated over the fixed temporal domain

In practice, the integral defining the integrated connectivities between subsets of

If ν is constant, other classical numerical approaches can be applied to approximate the integral, such as an hybrid approach in which the mid-point rule is applied in time and a regular point process is used in space. In such a case, the integrated connectivity estimator is also given by

Example 3.1. Using

Example 3.2. Using

Example 3.3 Using

In this section, we applied our general framework to the flow of air mass movements. Indeed, these movements compiled over years were used to characterize climatic patterns [

The first study region corresponds to the coast of the Mediterranean Sea, ranging approximately 1,600 km from north to south and 3,860 km from east to west. The temperate climate of the chosen region is strongly influenced by the presence of the Mediterranean Sea, with mild winters, hot summers and relatively scarce precipitations events. The landscape is characterized by coastal vegetation, typically shrubs and pines, and densely populated areas with intensive crop production of wheat, barley, vegetables and fruits, especially olive, grapes and citrus. In this paper, we characterize recurrent movements of air masses through the Mediterranean region by defining a grid with mesh size 74 km covering the coastline from 5 up to 250 km inland from the coast, including the four largest islands (namely Sicily, Sardinia, Cyprus and Corsica). Thus, we divide the region into 604 cells, where the centroids of the cells will be used as arrival locations of air-mass trajectories and will correspond to the nodes of the constructed network.

The second study region corresponds to the French region of Provence-Alpes-Côte d’Azur (PACA, hereafter), located in the south-eastern part of France and characterized by a rather complex landscape formed by a densely-populated coastline, agricultural lands (high-value-crops with fruit and olive orchards, vineyards, vegetable cultivation and horticulture), and natural mostly-alpines areas. The choice of this particular region is justified in the context of a research project aimed at assessing the potential long-distance dissemination of phytopathogenic bacterial populations that are known to be transported by air currents. The bacteria of interest (e.g.,

Once the arrival points for the two study regions have been established, we turned to the computation of air-mass trajectories arriving at the prescribed locations using the Hybrid Single-Particle Lagrangian Integrated Trajectory model (HYSPLIT [

The final step for the construction of the networks is the estimation of the adjacency matrices of the networks, based on the methodology presented in the previous sections. To do that, for each pair of subsets of the spatial domain, we used the daily 48 h backward trajectories arriving at the locations sampled within the receptor subset, and computed the contact-based estimator (see Example 3.1). The subsets of the spatial domain are the watersheds for PACA and circular buffers of radius 20 km for the Mediterranean region, as in [

In this work we will consider networks corresponding to three temporal contexts: (i) the spatial networks obtained when _{1} encompasses the year 2011, _{2} encompasses 2012 and so on, and (iii) monthly spatiotemporal networks formed by the twelve spatial networks obtained when _{1} represents every January from 2011 to 2017, _{2} every February from 2011 to 2017, and so on. In all these cases, we consider that the length of the time interval was one to easily compare the inferred networks (i.e.,

The networks we constructed are directed and weighted by contact-based connectivities generated by air mass trajectories and estimated with

• Density (Dens), is computed as the ratio between the sum of all edge weights and the number of all possible edges [

• Transitivity (Trans) or clustering, is computed by averaging the weighted clustering coefficient proposed by [

• Strength correlation (SC) is measured as the correlation between the incoming and outgoing strengths, computed as the sum of the weights of the edges pointing to or from a given node, respectively. Networks with positive (resp. negative) strength correlation are known to foster (resp. hamper) epidemic spread [

Other measures that are usually adopted to characterize network topology are meant to measure the length or cost associated to the movement between nodes following the edges of the network. We notice that, under the current framework, the weight computed between two nodes is proportional to the number of air-mass trajectories that connect them. Hence, higher values of the edge weight are associated to a higher connectivity between nodes. This is nonetheless incompatible with existing network search algorithms used to identify the shortest path between nodes since they usually consider the weight of an edge as a kind of distance or cost, hence the higher the weight, the less likely the connection between the nodes (e.g., the Dijkstra’s algorithm for weighted directed networks; ). On the other hand, it suffices to transform our weights into effective distances

• the average shortest path between every pair of nodes, computed as the average shortest effective distance (ED);

• the average number of nodes in the shortest paths (ASP);

• the network diameter (Diam), computed as the maximum effective distance of a network, and the number of nodes that have to be crossed in this longest path;

• the small worldness (SW) property, computed as the ratio between the clustering and the average shortest path distance [

The two spatial trajectory-based networks representing the strength of tropospheric connections in the Mediterranean region and PACA during the entire period 2012 to 2017 are represented in

Networks weighted by contact-based connectivities generated by air mass trajectories: The connectivities are generated by air mass trajectories between ^{2}) of each whatershed in

Interestingly, we can observe that the indices provided in

Network indices [Diameter (Diam), density (Dens), transitivity (Trans), average shortest path (ASP), effective distance (ED), small worldness (SW), strength correlation (SC)] calculated from the networks covering the Mediterranean region and estimated in three temporal contexts: the entire period 2012–2017, yearly time periods from 2012 to 2017 and monthly time periods.

Mediterranean region | |||||||
---|---|---|---|---|---|---|---|

Diam | Dens (×10^{−3}) |
Trans | ASP | ED | SW | SC | |

2012–2017 | 7 | 0.30 | 0.74 | 3.11 | 14.13 | 2.34 | 0.05 |

2012 | 8 | 11.0 | 0.78 | 3.66 | 14.06 | 2.35 | −0.04 |

2013 | 8 | 11.4 | 0.77 | 3.50 | 13.91 | 2.35 | 0.04 |

2014 | 16 | 11.6 | 0.78 | 4.18 | 17.90 | 2.25 | 0.06 |

2015 | 13 | 11.1 | 0.78 | 3.80 | 15.57 | 2.34 | 0.08 |

2016 | 9 | 14.0 | 0.77 | 3.61 | 14.65 | 2.28 | 0.08 |

2017 | 9 | 11.0 | 0.77 | 3.51 | 13.99 | 2.37 | 0.02 |

January | 11 | 0.31 | 0.74 | 3.74 | 12.44 | 5.05 | −0.11 |

February | 22 | 0.30 | 0.74 | 4.16 | 21.67 | 5.65 | −0.12 |

March | 12 | 0.30 | 0.74 | 4.14 | 15.34 | 5.60 | −0.03 |

April | 13 | 0.31 | 0.75 | 4.16 | 16.11 | 5.53 | −0.01 |

May | 11 | 0.32 | 0.75 | 4.13 | 14.71 | 5.50 | −0.18 |

June | 14 | 0.32 | 0.73 | 4.69 | 18.10 | 6.46 | −0.10 |

July | 15 | 0.31 | 0.71 | 4.54 | 16.80 | 6.36 | −0.20 |

August | 12 | 0.29 | 0.72 | 4.50 | 16.87 | 6.23 | −0.18 |

September | 11 | 0.29 | 0.73 | 4.50 | 16.72 | 6.18 | −0.06 |

October | 19 | 0.28 | 0.74 | 5.14 | 22.42 | 6.94 | −0.15 |

November | 18 | 0.27 | 0.74 | 4.98 | 21.58 | 6.78 | −0.05 |

December | 12 | 0.30 | 0.72 | 3.62 | 10.08 | 4.99 | −0.13 |

Network indices [diameter (Diam), density (Dens), transitivity (Trans), average shortest path (ASP), effective distance (ED), small worldness (SW), strength correlation (SC)] calculated from the networks covering PACA and estimated in three temporal contexts: the entire period 2012–2017, yearly time periods from 2012 to 2017 and monthly time periods.

PACA | |||||||
---|---|---|---|---|---|---|---|

Diam | Dens (×10^{−3}) |
Trans | ASP | ED | SW | SC | |

2012–2017 | 5 | 2.51 | 0.99 | 2.47 | 4.78 | 0.11 | 0.22 |

2012 | 5 | 0.99 | 0.91 | 2.6 | 4.73 | 2.85 | 0.13 |

2013 | 5 | 1.01 | 0.92 | 2.6 | 4.57 | 2.83 | −0.07 |

2014 | 8 | 1.00 | 0.92 | 2.86 | 4.58 | 3.12 | 0.004 |

2015 | 7 | 1.02 | 0.92 | 2.67 | 4.60 | 2.90 | −0.01 |

2016 | 8 | 1.01 | 0.91 | 2.81 | 4.58 | 3.09 | 0.02 |

2017 | 5 | 1.01 | 0.92 | 2.54 | 4.66 | 2.76 | 0.11 |

January | 19 | 2.35 | 0.77 | 4.09 | 5.90 | 5.30 | −0.54 |

February | 19 | 2.16 | 0.81 | 4.60 | 6.29 | 5.71 | −0.50 |

March | 27 | 2.30 | 0.78 | 6.05 | 5.13 | 7.75 | −0.52 |

April | 15 | 2.40 | 0.97 | 3.54 | 4.08 | 3.64 | −0.59 |

May | 22 | 2.67 | 0.87 | 5.66 | 4.62 | 6.49 | −0.64 |

June | 17 | 2.55 | 0.79 | 4.93 | 6.48 | 6.22 | −0.61 |

July | 18 | 2.63 | 0.80 | 4.82 | 6.53 | 6.06 | −0.62 |

August | 21 | 2.63 | 0.80 | 4.91 | 6.32 | 6.15 | −0.62 |

September | 18 | 2.64 | 0.74 | 4.54 | 5.05 | 6.10 | −0.64 |

October | 6 | 2.69 | 0.99 | 2.37 | 4.07 | 2.40 | −0.64 |

November | 8 | 2.33 | 0.92 | 2.79 | 4.58 | 2.94 | −0.49 |

December | 18 | 2.34 | 0.80 | 5.90 | 5.74 | 7.36 | −0.56 |

Clustering according to the indices calculated over the Mediterranean spatio-temporal network:

Boxplot for the computed indices over the Mediterranean and PACA regions: Boxplot for the different indices (Diameter, density, transitivity, average shortest path, small worldness, effective distance, strength correlation) obtained from

For the PACA region, the dendogram in

Clustering according to the indices calculated over the PACA spatio-temporal network:

We presented a framework for estimating and characterizing spatial and spatio-temporal networks generated by trajectory data. The development of this framework was motivated by the study of networks resulting from the movement of air masses sampled over long time periods and large spatial scales. Thus, in the application, we investigated the tropospheric connectivities across the Mediterranean basin and the French region PACA, and their variations through years and months. Our approach could be applied to diverse phenomena, from which trajectories can be observed. For instance, one could estimate networks generated by the movement of animals on the landscape scale based on animal trajectories observed with GPS devices [

In

In statistics, we are not only interested in point estimation, but also in the assessment of estimation uncertainties. In this paper, we however, focused on connectivity estimation, even if quantifying the estimation variance could have been useful for more rigorously investigating temporal variation in connectivities. Formally, the connectivity measures that we defined are integrals. Hence, results on integral numerical approximations (e.g., midpoint, trapezoidal or Monte Carlo integration) can be exploited to assess errors or variances of the connectivity estimates [

To more finely estimate connectivity, and its uncertainty, one could also take into account, if relevant, the uncertainty about the trajectories themselves. For example, when observed trajectories are smoothed versions of actual trajectories (as it is likely the case for air-mass trajectories calculated with HYSPLIT) or when the trajectories are partially observed and rather erratic, (i) a probabilistic model grounded on, for instance, a stochastic differential equation, could be used to reconstruct probable trajectories and (ii) the connectivity would be estimated from these reconstructed trajectories. Obviously, step (ii) should incorporate the uncertainty about the trajectory reconstruction impacted by an eventual preliminary step consisting in estimating the parameters of the above-mentioned probabilistic model.

Concerning the application treated in this article, we observed distinct seasonal (intra-annual) patterns in the temporal variation of the networks covering the Mediterranean coastline and PACA, whereas no evidence of inter-annual variation has been observed. In the case of the Mediterranean coastline, the networks corresponding to the three clusters shown in

In the long-term context of our applied research projects connected to aerobiology, the construction and exploration of networks generated by air-mass movements are a way to unravel epidemiological dynamics (and the resulting genetic patterns) of microbial pathogens disseminated at long distance via air movements in the troposphere (see [

Finally, the networks estimated using our approach could be a basis for developing epidemiological models (explicitly handling the pathogen) incorporating long-distance dissemination conditional on recurrent air-mass movements. Such models could be exploited to set up surveillance strategies for early warning and epidemic anticipation in order to help reduce the impacts of airborne pathogens on human health, agricultural production and ecosystem functioning [

The datasets generated and analysed during the current study are available from the corresponding author on reasonable request.

MC, DM, CEM, RS, SS contributed equally to this work. SS, RS and MC developed the theory; MC, DM, CEM and SS performed the applications. MC, DM and SS wrote the manuscript, RS and CEM reviewed it. All authors read and approved the final manuscript.

This research was funded by the SPREE project from the French National Research Agency (Grant No. ANR–17–

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

The authors thank Loïc Houde for his technical assistance in the calculation of trajectories with HYSPLIT. This article has been released as a pre-print at arxiv, [

The Supplementary Material for this article can be found online at: