Clustering GLM Data with DBSCAN

DBSCAN is a widely used, density-based clustering algorithm. First proposed in 1996 by Martin Ester, Hans-Peter Kriegel, Jörg Sander and Xiaowei Xu, DBSCAN has been used for decades in a wide range of ways. The basis of the DBSCAN algorithm is quite simple. The algorithm takes in a value epsilon and a value for the minimum number of items required for each cluster. The algorithm starts by picking a random point, and identifies if it is epsilon distance away from any other points. If so, and enough points are within epsilon range to satisfy the minimum item requirement, then all of those points are placed in a cluster together. The program continues to go through all of the points and either assigns them to a cluster or identifies them as noise. In this site, DBSCAN is used for the clustering of GLM Level 2 data, such as groups and flashes. In contrast to the K-means algorithm, DBSCAN does not require the user to know the required number of clusters before clustering begins. This is especially advantageous with lightning data, as the number of thunderstorms present in any one frame of time varies widely.

epsilon = 1.20minPoints = 3

Each one of the cases above represents a section of a longitude by latitude grid. Green cases represent cases with a lot of data, yellow cases have some data, and red cases have very little data.