Rockfall Detection from Terrestrial Lidar Point Clouds: a Clustering Approach Using R

In this study we analyzed a series of terrestrial LiDAR point clouds acquired over a cliff in Puigcercos (Catalonia, Spain). The objective was to detect and extract individual rockfall events that occurred during a time span of six months and to investigate their spatial distribution. To this end local and global cluster algorithms were applied. First we used the nearest neighbor clutter removal (NNCR) method in combination with the expectation-maximization (EM) algorithm to separate feature points from clutter; then a density based algorithm (DBSCAN) allowed us to isolate the single cluster features which represented the rockfall events. Finally we estimated the Ripley's K-function to analyze the global spatial pattern of the identified rockfalls. The computations for the cluster analyses were carried out using R free software for statistical computing and graphics. The local cluster analysis allowed a proper identification and characterization of more than 600 rockfalls. The global spatial pattern analysis showed that these rockfalls were clustered and provided the range of distances at which these events tend to be aggregated.


Introduction
Natural hazards such as earthquakes, volcanoes, floods, or landslides pose a serious threat to peoples and to infrastructure.Their detection and spatiotemporal analysis with the goal of forecasting future events, constitutes a great challenge in term of socioeconomic benefits.During the last decade the use of new satellites and the development of new aerial and terrestrial sensors for landslide recognition, characterization, and monitoring clearly improved our understanding of landslide phenomena.In this domain, the application of light detection and ranging (LiDAR) techniques to landslide studies has grown exponentially [9,12].This sensor allows detecting and modeling subtle topographic features over a surface through the detection of the distance from the sensor to a given target.When mounted over a ground-based platform, the so-called terrestrial laser scanner (TLS) is able to acquire high resolution 3D information of the terrain (e.g., over 250 points per square meter) which results in a more or less homogeneously distributed point cloud.Innovative applications include landslide monitoring [17,22,28], automatic discontinuity extraction [11,16], vegetation filtering [32], etc.Recent trends include the development of efficient tools and methods able to extract features from point clouds [5,9,14,31] which, despite the great technological advances, is not an easy task.
In the present study we analyzed terrestrial LiDAR point clouds in order to automatically detect and extract rockfalls, which are a type of mass movement involving the detachment of blocks from a cliff.We selected a cliff in Puigcercos (Spain) as pilot study area.This cliff is affected by a high number of rockfalls per year [1] and so it provides an especially suitable site for the scope of the present research.Once individually identified, rockfalls could be analyzed to make inferences about their spatial distribution, which implies the dispersion or attraction between events.It is well known that landslides, as with many other geological events, are usually not randomly distributed but grouped in clusters both over space and over time [13,20,29].The analysis of their spatial pattern is key to understanding and modeling the influence of the factors that predispose to failures.Furthermore, the study of the patterns of precursory rockfall is important to predict slope failures in advance [25].When the scale of the rock failure is small enough compared to the extension of the analyzed area, geological events can be represented as point processesconvenient from a statistical and computational point of view [29].Accordingly, the spatial distribution of rockfalls can be analyzed using mathematical models for testing whether an irregular or random point pattern is observed [33], which includes algorithms of clustering.Generally speaking, two main classes of spatial clustering algorithms can be outlined: 1) local methods, to identify clusters in space and/or in time, (e.g., the geographical analysis machine, GAM [21]; Turnbull's cluster evaluation permutation procedure, CEPP [30]; spatial scan statistics [15]); 2) global indicators, to measure and test for the randomness of a point process (e.g., Moran's I [19], the fractal dimension [18], or Ripley's K-function [24]).The present study illustrates an example of application of the two above-mentioned kinds of cluster analysis aiming to: (1) recognize and extract rockfalls from LiDAR point clouds; and (2) test for spatial attraction among rockfall events.To this end, we firstly applied a distance based algorithm [6] able to distinguish between features and clutter in LiDAR point clouds; then the extracted feature points were labeled as belonging to different object, representing the rockfalls, by means of a density based algorithm [8].Finally we applied the Ripley's K-function [24] to assess the global spatial attraction (clustering) among the detected rockfalls.

Study area and data collection
The study area corresponds to the main scarp of a landslide occurred in 1881 [7] and located in Puigcercos (Catalonia, Spain) (Figure 1).Frequent and persistent rockfall activity characterizes the area: detailed information both on rockfall activity and on cliff strain www.josis.orgbefore a rockfall occurrence can be found in [1].Nevertheless, no insight into how smallscale rockfalls are distributed along the cliff has yet been provided.According to [25], this information is important for the prediction of slope failure.Data collection was achieved by means of Ilris 3D terrestrial LiDAR (Optech).This instrument allowed acquiring a dense (∼500 points/sq.meter) and accurate (0.7cm at 100m) dataset, at high acquisition rate (2.500 points/second).A well-established method to detect natural changes in a given cliff (i.e., rockfalls) is to compare several HRDEMs (high resolution digital elevation model) acquired at different times [1,17,26].Accordingly, we scanned the whole cliff during two fieldwork campaigns (Figure 2): the first data acquisition, referred hereinafter as the reference point cloud, was acquired on September 2011 and the second one, referred hereinafter as the target point cloud, was acquired on March 2012 (i.e., a time span of six months).The difference between the reference and the target point clouds led to a final dataset from which we extracted a considerable number of rockfalls (n > 600).Shorter time span could lead to an insufficient number of detectable events, while a longer period could lead to an overlapping of rockfalls from the same location.
The almost vertical geometry of the cliff allowed us to reduce the spatial dimensionality of the geological point process and to adopt a 2D approach.

Method
The overall analysis proceeded stepwise with the goal of: (1) extracting features information (i.e., rockfalls) from LiDAR point clouds; (2) investigating their spatial distribution.Different cluster algorithms were applied at each step: 1a) nearest neighbor clutter removal (NNCR) in combination with expectation-maximization (EM) to separate feature points from clutter [6]; 1b) a density based algorithm (DBSCAN) to separate the single clusters, representing the rockfall events, which arise from the detected feature points [8]; 2) Ripley's K-function [24] to investigate the global spatial pattern of the extracted rockfalls.
Generally speaking pattern recognition, and specifically cluster analysis, encompasses algorithms grouping objects showing similar properties into respective categories.Spatial clusters can be identified whenever the observed distance among groups of point locations in space is lower than the expected distance for a random distribution.This assumption can be accepted or rejected based on the results of statistic tests (e.g., Monte Carlo simulations).For geological events, where intensity is not constant and clusters are often not isolated, their detection is not an easy task.

Point cloud pretreatment and comparison
To prepare the dataset for the cluster analyses, we first carried out a series of pretreatments over the original LiDAR point clouds as follows: (a) alignment of the target point cloud to the reference point cloud; (b) selection and extraction of reference and target points falling inside the boundary relative to the study zone (i.e., the cliff); (c) creation of a triangular irregular network (TIN) based on the reference point cloud; (d) comparison of the two point clouds by calculating the differences along a vector normal to the cliff surface (i.e., TIN); (e) exploratory statistical analysis of the point cloud comparison population (extraction of the mean value, percentiles, etc.); (f) removal of the instrumental error (95%), corresponding to differences between pairs of points lower than 2.7cm (note that negative values identified with deformations and depositions were also removed).The resulting point dataset, referred as the "pre-filtered point cloud," was composed of points where the difference between target and reference point cloud was higher than 2.7cm.This dataset contains two types of events: points representing real changes along the rock face (i.e., rockfalls) and points characterized by gross instrumental errors (i.e., clutter).The recognition and classification of both populations will be discussed in the following sections.

Feature extraction (NNCR algorithm)
We applied a local cluster algorithm, the nearest neighbor clutter removal (NNCR) [6], to extract features (i.e., the rockfalls) from the clutter (i.e., the instrumental residual error).The algorithm calculates for each point i of a spatial pattern the 2D Euclidean distance to its Kth nearest neighbors (Kth-NN): intuitively points inside regions of higher density (i.e., feature points) have a smaller Kth-NN distance than points inside regions of lower density (i.e., clutter points).Considering the pre-filtered point cloud, it is reasonable to assume that the distance between two points falling inside a rockfall is lower than the distance between points falling outside.Therefore a distance-based spatial clustering algorithm can be applied to extract target features.Because of the vertical orientation of the cliff, we assumed a two dimensional space oriented along the X, Z conventional axis.Clutter and feature points are considered as two distinct populations distributed as a Poisson point process with feature points superimposed over the clutter and scattered along the study area.Under this assumption, for a set of n random points, the distribution of the distances D K to their Kth-NN becomes highly bimodal, displaying a strong separation between feature and clutter population.For each given Kth-NN, the maximum likelihood estimator of the intensity ( λ) equals the number of events (∼ K) falling inside a circle of radius d i around a randomly chosen event i of the point process, divided by the area of relevance: The expectation maximization (EM) algorithm can be applied to fit a mixture distribution (the feature and the clutter) to the nearest neighbor distances (D k ), which is used to classify each point as belong to the class feature or clutter.EM estimates the intensities of the feature and the clutter distribution (i.e., λ f and λ c respectively) and the probability of a point event to be assigned to the class feature.The degree of neighbors (Kth) has to be fixed in advance by the user and this choice can affect the result of the analysis.An elevated K-value may miss features composed of few points, while a low K-value may include clutter points in the class feature.When the phenomenon is well known, the expert user can chose the optimum K-value just by looking at the results.As suggested in [6], in the present study several increasing K-values were applied and the one based on an entropy type measure of separation (S) was retained.In more detail, plotting S against increasing values of K allowed detecting the change-point in the curve, indicating which K-value could be retained for the NNCR analysis.In our case this value was equal to 15 (i.e., 15th nearest neighbors) and the visual inspection of the results confirmed this choice.

Single rockfall recognition (DBSCAN)
The NNCR algorithm discussed in Section 3.2 allowed us to separate feature points from clutter.The second step of the analysis consisted of distinguishing the different cluster events (i.e., rockfalls) labeling each feature point according to the result of a classification.For this purpose we applied a density-based algorithm, founded on the density reachability and connectivity clustering technique (DBSCAN) [8].This method can discover clusters with arbitrary shape in a large spatial database, requiring only input parameters.Clusters are defined according to the local density of points in the pattern and based on the fact that each cluster is surrounded by an area of lower density (or even equal to zero) representing the noise.The parameters required by the model are: the minimum number of points (MinPts) within a maximum distance (eps) around each randomly chosen point (p) in the dataset (Figure 3a).The algorithm visits each point p and labels the points at eps-distance as "seeds" if they satisfy the MinPts condition.Seed points are said to be density-reachable by the "core" point (p) and they define a cluster around it.The following assumptions allow a density based cluster to be defined (Figure 3b): 1) a seed-point belongs to more than one cluster if it is density-reachable by more than one core-point; 2) two core-points are densityconnected each other if there is an intermediate seed-point which is density-reachable by both; 3) a chain of intermediate seed-points can connect contiguous clusters allowing clusters of arbitrary shape to be defined, whose size depends only on the parameters eps and MinPts.Points which do not belong to any cluster are defined as "noise."The parameters MinPts and eps strongly influence the numbers, shape, and size of the detected clusters.We point out the fact that MinPts depends on eps since a minimum number of observations is expected at a given distance around each point.We first fixed the eps-value to equal three times the spatial resolution of the point pattern resulting from the pre-filtering and NNCR operations, ∼3-4cm.This gives rise to an eps-distance of 10cm.Then we experimented with different plausible MinPts and we analyzed the number, the size, and the shape of the resulting clusters.On the one hand, higher values of MinPts lead www.josis.org to a high fragmentation since small features, containing a number of points lower than the MinPts, were classed as noise.On the other hand, lower values of MinPts lead to an excessive connectivity between clusters and to a high number of detected small features.Accordingly, a value of 5 MinPts was retained as compromise between the recognized clusters and their spatial connectivity.

Spatial pattern distribution of the rockfalls (Ripley's K-function)
The DBSCAN algorithm allowed feature points to be labeled as belonging to distinct rockfall events.Once detected, the spatial pattern of the different rockfalls was investigated.The main objective of this step of the analysis was to verify if these events show a clustered or random distribution, and if there exists a spatial attraction (i.e., cluster) at a specific range of distances among them.To this end we applied a global cluster method, namely Ripley's K-function [24], able to investigate and test for the spatial randomness of a spatial point process.To compute the function we assimilated the rockfalls, which actually are areal events, to point events.To indicate the event's location we extracted the centroid of each rockfall.
Computationally the K-function (K(r)) equals the expected number (E) of additional points (n) within a distance (r) from a randomly distributed event (u) divided by the spatial intensity (λ): Here the intensity λ of the point process X is defined as the average number of points per unit area, and b(u, r) represents a circle of radius r centered over a point u of X.Under complete spatial randomness (CSR) the theoretical K(r) is equal to πr 2 .The estimated K(r) can be plotted against the distance r and compared with the theoretical value.If the estimated function is higher than πr 2 , events are spatially clustered, whilst smaller values indicate repulsion among events.This assists in discovering at what range of distances data exhibit a non-random pattern distribution.To account for the natural non-uniform distribution of the rockfalls along the study area, we applied a generalization of the K(r) for a spatial inhomogeneous distribution (i.e., a non-stationary point process with nonconstant intensity).The inhomogeneous K-function (K inhom (r) ) assumes each point x i weighted by its local intensity λ(x i ), which we estimated using the leave-one-out kernel smoother model [3].As for the stationary case (i.e., under CSR), K inhom(r) equals πr 2 when events are randomly distributed, and the same considerations are valid.Moreover, we introduced an edge correction in the computation: essentially, if the area of a circle b(u, r) falls only partially inside the study area, only the overlapping surface is counted.
In the present study we adopted a transformation of the Ripleys K-function, namely the L-function [4] defined as: π It follows that the theoretical L(r) value minus r is equal to zero at every distance.Therefore, its comparison with the estimated curve is easier to read.To test for spatial randomness, 999 Monte Carlo simulations of the realization of an inhomogeneous random point process were performed.These provided minimum-maximum point-wise Monte Carlo envelopes, making it possible to reject the null hypothesis (i.e., data randomization) for each r-value with a statistically significant level.

Computational environment
The core of the analytical inspection (i.e., clutter removal and rockfall extraction, recognition and analysis), which actually represents the main contribution of this paper, was fully carried out using R software for statistical computing and graphics [23].R is a free software environment integrating facilities for data manipulation, calculation, and graphical display.The R base can be extended via packages available through the comprehensive R archive network (CRAN), which covers a wide range of modern statistics.More specifically, for feature extraction we applied the NNCR algorithm included in the package spatstat [2]; to label features belonging to the detected single rockfall events we applied the DBSCAN algorithm included in the package fpc [10]; for the spatial pattern analysis of rockfalls we computed the inhomogeneous K-function as implemented on the aforementioned spatstat package [2].We report below the R commands used: (1) to extract feature points (Section 3.2); (2) to generate the density based clusters (Section 3.3); and (3) to calculate the inhomogeneous K-function over the detected rockfalls (Section 3.4).

Rockfalls detection from LiDAR point clouds
The first step of the analysis concerned the alignment of the reference and the target point clouds into a common reference frame.Differences between the two datasets were calculated and 95% of the instrumental error was removed (minimum level of detection of 2.7cm), leading to a pre-filtered point cloud (Figure 4a).The original dataset, composed of 1,841,784 points was reduced up to 93,144 points (Figure 4b).The nearest neighbor clutter removal algorithm (NNCR) was applied to distinguish between the noise (i.e., clutter) and the feature points in the pre-filtered LiDAR point cloud.
Intuitively the density of points belonging to the class feature (i.e., rockfalls) is higher than the density of points falling outside (i.e., outliers).For each point in the pattern the expectation-maximization (EM) algorithm provided the probability for each point in the dataset to belong to a feature.If this value was greater than 50%, points were assigned to the class feature, otherwise they were classed as noise.The result of the NNCR algorithm is shown in Figure 4c.From the pre-filtered dataset containing 93,144 points, 67,544 were classed as feature and 25,600 as noise.It can be observed in this figure that the feature points designate well-shaped rockfall events, but they are not individually labeled; that is they are not assigned to a specific rockfall event.
A density based algorithm, namely DBSCAN, solves this problem.This method allowed labeling the more dense aggregations of feature points as belonging to a single rockfall event and removing the residual noise features (i.e., which contain fewer points than MinPts with respect to eps).As result, 643 single rockfalls were detected (Figure 5).With a local density of about 250-500pts/m 2 , 80% of the detected rockfalls consist of fewer than 70 feature points.Their frequency distribution (Figure 6) covers a range between 5 and 7444 points/rockfall, with most of the events (n = 92) composed of 5-10 feature points.The curve fitting the distribution is positively skewed with most of the values concentrated on the left side (between 0 and 500 points/rockfall).Only 19 detected rockfalls consist of more than 500 feature points.Amongst these, 8 events arise from 1000 up to 3000 feature points and two big events from 5711 and 7444 feature points respectively.

Spatial pattern of rockfalls
The last part of this study concerns the analysis of the global spatial distribution of the detected rockfalls by means of Ripley's K-function.Given that the scale of each detected rockfall was small compared to the extension of the analyzed area, we considered each event as a point process by extracting the 643 centroids relative to each areal event.Results show that rockfalls are not randomly distributed over the study area, but they tend to concentrate in clusters and this cluster behavior is demonstrated over a large number (n = 999) of simulation envelopes (Monte Carlo test).From the shape of the L(r) curve (Figure 7) we can notice that rockfalls are significantly closer than expected for a random distribution at a distance ranging from about 1 up to about 3.5m; dispersed above about 4.5m; and included between the upper and the lower simulated curves (i.e., envelopes) in between, indicating a random distribution in this range.A critical value can be noted around 2m.This corresponds to the more pronounced cluster behavior, as results from the selection of the L(r) function.
Although the K-function is a global cluster indicator, it provides useful information about the distance scale at which clusters take place.The detected critical distance-value, readable over the r-axes in the L(r) plot, could be retained for future local cluster analyses aiming to locate clusters in space.

Discussions and perspectives
Our method proved useful for the selected pilot study area as it allowed us to extract rockfalls from the LiDAR point clouds and to analyze their spatial pattern distribution.The NNCR algorithm allowed us to detect changes on the rock face with a high level of detail.While it is commonly suggested to consider a threshold of 10cm between noise and the minimum detectable change [17], the proposed technique led to a point cloud classification in two different classes (i.e., feature and noise) with a significantly greater level of detail: about four times more precise (2.7cm).Thanks to this improvement, the geometry, magnitude, and frequency of rockfalls can be quantified with a higher degree of accuracy than using classical techniques, such as DEM subtraction or point to surface comparison.Furthermore, points matching with small variations on surface topography were correctly classified as noise.Indeed, these points correspond to a mismatched alignment near cracks and not to real rockfall events.
The DBSCAN technique proved useful for classifying each feature point as belonging to a single rockfall event and so it made possible to extract and to map the single rockfalls.Despite the encouraging outcomes obtained in the present study, clusters resulting from this unsupervised technique strongly depend on the parameters MinPts (minimum number of points) and eps (reachability-distance).This method cannot be completely unsupervised in the sense that the user has to check the results and to modify these two parameters according to the knowledge of the phenomena under study.Moreover, a compromise between the detectable clusters, their size and connectivity, has to be taken into account.
We showed how rockfalls can be detected and extracted from terrestrial LiDAR point cloud by assuming their distribution in a two dimensional space.This assumption is acceptable for the selected pilot study area, which corresponds to the main scarp of an ancient www.josis.orglandslide with a vertical geometry.Nevertheless, for a generalization of the method to the 3D space, the proposed technique needs to be adapted and the parameters modified.
Although some authors have reported on how catastrophic landslides may be preceded by an increase in rockfall activity [22,25,27], little is known about the characteristics of precursory rockfalls, as magnitude, frequency, spatiotemporal distribution, etc.Interestingly, [25] analyzed during several months the spatiotemporal distribution of rockfalls before the occurrence of a large scale rockfall (> 10000m 3 ), showing how precursory failures tend to concentrate on the footprint where a future failure will take place.We made the preliminary observation that precursory rockfalls tend to concentrate along the cracks that delimit future failures [1].In the present study, the K-function provided a powerful global cluster indicator able to reveal at what range of distances small-scale rockfalls tend to aggregate.These distance-values can be retained for future analyses aiming to locate clusters in space, which is crucial to the investigation of precursory rockfalls.
Most of the studies dealing with rockfall prediction focus on a single failure, and the predictive capabilities of the proposed methods have not yet been thoroughly proved.This gap should be filled by providing more case studies in different environmental conditions and, most importantly, by using a robust statistical approach able to discriminate whether a certain rockfall distribution is random or clustered and which are the implications of this pattern.Future research will focus on the development of new geospatial and geostatistical techniques, both for the detection of areas characterized by a high concentration of rockfalls and for the characterization of these areas in space and in time.Current challenges include an exhaustive analysis of the spatiotemporal distribution of precursory rockfalls aiming to detect easily areas where future failures are highly likely.
Finally, further developments of the research will focus on the analysis of multidimensional point patterns, including the third spatial dimension and/or temporal variability.To this end, the custom implementation of the algorithms in the R environment is also suitable.

Conclusions
We presented a method able to extract features from terrestrial LiDAR point clouds focused on the identification of single rockfall events.More than 600 small scale rockfalls were recognized over a vertical cliff of about 2000m 2 of surface and occurring within two fieldwork campaigns spaced by 6 months.The spatial distribution of these events was analyzed, proving that detected rockfalls were clustered at a well-defined distance-range (between 1 and 3.5m).
Future research will consider different scans acquired at different periods to provide a better understanding of precursory rockfalls and their spatiotemporal distribution.

Figure 2 :
Figure 2: (a) Complete LiDAR dataset of the Puigcercos cliff composed by three scans.(b) Section of the cliff representing the study area.

Figure 3 :
Figure 3: (a) The minimum number of points (MinPts) at a maximum distance (eps) around the core-point (p) defines a cluster.(b) The core-points p, p 1 , p 2 , q are density-connected by the chain of intermediate seed-points: the resulting cluster is encircled in red.Point o is a noise.

Figure 4 :
Figure 4: Pre-filtered point cloud displayed in: (a) graduated colors, based on the differences between the reference and the target point clouds; and (b) in black.(c) Point features (red) and clutter (green) resulting from the NNCR algorithm (K=15).

Figure 6 :
Figure 6: Frequency distribution of the number of feature points belonging to each detected rockfall.

Figure 7 :
Figure 7: L(r) function (as defined in the text) for the 643 detected rockfalls.