Prof: Projekte

Benchmarking Data Science Pipelines

Liu L, Erdelt P.K. (2024) Benchmarking Machine Learning Pipelines in PostgreSQL with TPCx-AI. In: Nambiar, R., Poess, M. (eds) Performance Evaluation and Benchmarking. TPCTC 2024. Lecture Notes in Computer Science, vol 15337. Springer, Cham
https://doi.org/10.1007/978-3-031-93858-0_8
Preprint Version
Repository: https://github.com/perdelt/TPCx-AI_in_DB
In times of DataOps and MLOps, do we need a new analytical database benchmark?
How to benchmark Data Science Pipellines?
- Example: TPCX-AI
- Example: Sanzu
How to integrate (most of) Data Science Workflow consistently and efficiently into a GPU framework?
- Example: https://rapids.ai/
- Example: http://gpuopenanalytics.com/

Cloud-native Benchmarking of DBMS

Cluster Manager: https://github.com/Beuth-Erdelt/Benchmark-Experiment-Host-Manager
Benchmark Tool: https://github.com/Beuth-Erdelt/DBMS-Benchmarker
Python Packages: https://pypi.org/user/perdelt/
Python Docs: https://readthedocs.org/profiles/perdelt/
Docker Images: https://hub.docker.com/u/bexhoma
Erdelt, P.K. (2024). A Cloud-Native Adoption of Classical DBMS Performance Benchmarks and Tools. In: Nambiar, R., Poess, M. (eds) Performance Evaluation and Benchmarking. TPCTC 2023. Lecture Notes in Computer Science, vol 14247. Springer, Cham.
https://doi.org/10.1007/978-3-031-68031-1_9
Preprint Version
Presentation: http://dx.doi.org/10.13140/RG.2.2.24813.36324
Erdelt P.K. and Jestel, J., (2022). DBMS-Benchmarker: Benchmark and Evaluate DBMS in Python. Journal of Open Source Software, 7(79), 4628, https://doi.org/10.21105/joss.04628
Talk at the TPCTC 2021 (VLDB workshop): Orchestrating DBMS Benchmarking in the Cloud with Kubernetes
Erdelt P.K. (2022) Orchestrating DBMS Benchmarking in the Cloud with Kubernetes. In: Nambiar R., Poess M. (eds) Performance Evaluation and Benchmarking. TPCTC 2021. Lecture Notes in Computer Science, vol 13169. Springer, Cham.
https://doi.org/10.1007/978-3-030-94437-7_6
Preprint Version
Talk at the TPCTC 2020 (VLDB workshop): Repetition and Evaluation of Cloud-based DBMS Benchmarking
Erdelt P.K. (2021) A Framework for Supporting Repetition and Evaluation in the Process of Cloud-Based DBMS Performance Benchmarking. In: Nambiar R., Poess M. (eds) Performance Evaluation and Benchmarking. TPCTC 2020. Lecture Notes in Computer Science, vol 12752. Springer, Cham.
https://doi.org/10.1007/978-3-030-84924-5_6
Preprint Version

Weitere Tätigkeiten

Erdelt, P.K. and Rabl, T.: Benchmarking Multi-Tenant Architectures in PostgreSQL
Repository
Baskan, Denis and Erdelt, Patrick, Neighborhood-Based Loss Functions for Explainability of Autoencoders (September 8, 2022). Available at SSRN: https://ssrn.com/abstract=4212995 or http://dx.doi.org/10.2139/ssrn.4212995
Bockhacker M, Erdelt P, Röhl H. Machine Learning for Clinical Pathway Decision Support in Suspected Bone Lesions on Radiographs. In: Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie, editors. 70. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS), https://dx.doi.org/10.3205/25gmds095
Reviewer für The Journal of Open Source Software
Technisches Team für den Berliner Tag der Mathematik 2014, 2019, 2025

High-Performance und GPU-basierte Mathematik

Numerik in Rechner-Clustern, z.B. mit Dask
Numerik auf einer GPU, z.B. mit CuPy
Numerik auf einer GPU, z.B. mit JAX
Numerik in JIT-kompiliertem Python, z.B. mit Numba
Mathematik auf einer GPU, z.B. mit nvmath-python

Eine GPU (Graphics Processing Unit) ist für die Grafikdarstellung in einem PC zuständig. Spezielle Bibliotheken (insbesondere CUDA und OpenCL) ermöglichen es, rechenintensive Teile allgemeiner Programmabläufe auf einer GPU statt wie üblich auf einer CPU (Central Processing Unit) auszuführen. Insbesondere durch ihre Fähigkeit, massiv parallel und dadurch schnell Tensorrechnungen für große Datenmengen durchführen zu können, haben sich GPU in vielen mathematiklastigen Anwendungsbereichen als technische Grundlage etabliert.

Data Science

BHT Berlin Data Science + X

Komplexe Unternehmensentscheidungen werden häufig durch nachvollziehbare Datenanalysen unterstützt. Professorinnen und Professoren der Fachbereiche „Mathematik“ und „Informatik und Medien“ arbeiten mit Studierenden und Industriepartnern am Zukunftsthema „Data Science“.

Siehe auch: Studiengang

AWS Educate

https://aws.amazon.com/de/education/awseducate/

Ansprechpartner (Central Point of Contact) für das Hochschulprogramm von Amazon Web Services

OCIDA

Das Forschungsprojekt OCIDA hat die Zielsetzung, durch effiziente Auswertung von individuellen Kundendaten aus dem E-Commerce die Abwanderung von Kunden (Churn) durch optimale Strategien zu reduzieren. Dabei sollen Methoden des analytischen Marketings, der mathematischen Modellierung und Optimierung aus dem Revenue Management und Pricing sowie geeignete Prognoseverfahren Anwendung finden. Aufgrund der großen Datenmengen erfolgt eine Auswertung von Big Data basierend auf Hadoop-Technologien.