Visualization of the Bloom Filter Concept
Visualization of the Bloom Filter Concept2023 P. Walther

Bloom Filter as Tool to Organize Complex and Big Geospatial Data

Project Description

Bloom filters are probabilistic data structures used to model sets of data points. Their advantage lies in the efficient evaluation of reoccurences in data streams. So far these are mainly used in network routing tasks for relatively small data sets. With this project we want to explore how they can be used in complex and big data environments, such as big geo data sets.

Project Goals

To evaluate and extend the current Bloom Filter usage, we pursue three goals:

For the first project goal, we want to break down barriers that currently make it difficult to use Bloom filters for particularly large data sets. There are tight constraints on the configuration of Bloom filters, for example, the number of hash functions must currently be an integer, the length of the filters should be a power of two, and queries usually have equal error probability. Furthermore, there is little use of Bloom filter data structures beyond simple element testing. In this context, based on a large body of prior work, we aim to conduct a systematic investigation in this project, significantly expanding the possible usage scenarios for the Bloom filter.

In a second goal, we want to implement a benchmarking environment in which the theoretical developments can be evaluated in light of actual hardware. This will include experiments in the context of specialized hardware (e.g., FPGAs and GPUs) to exemplify possibilities for future development.

A third project goal deals with the use of Bloom filters as a data structure for complex geodata. In particular, we are concerned with sparse 2D and 3D data from geoinformatics.

Acknowledgement

This project is kindly supported by the DFG (German Research Foundation). Project-Number: 507196470

References

The publications in this project are listed below:

  1. Walther, P., Luo, X., & Werner, M. (2024). TraBiMap: Reducing Privacy Concerns in Trajectory Analysis with Randomized Data Representations. 2nd ACM SIGSPATIAL International Workshop on Geo-Privacy and Data Utility for Smart Societies. https://doi.org/10.1145/3681768.3698496 [PDF] [Online]
  2. Walther, P., Deuser, F., & Werner, M. (2024). Multi-Modal Contextualization of Trajectory Data for Advanced Analysis. Datenbank-Spektrum. https://doi.org/10.1007/s13222-024-00484-3 [PDF] [Online]
  3. Walther, P. (2024). Advancements of Randomized Data Structures for Geospatial Data. In H. V. J. Themis Palpanas (Ed.), Proceedings of the Workshops of the EDBT/ICDT 2024 Joint Conference co-located with the EDBT/ICDT 2024 Joint Conference (Number 3651). https://ceur-ws.org/Vol-3651/PhDW-1.pdf [PDF] [Online]

Contact

Paul Walther
paul.walther@tum.de
Professorship of Big Geospatial Data Management
Lise-Meitner-Str. 9
85521 Ottobrunn\


© 2020 M. Werner