Personal details
Title | Multi-Tenant Data Quality Scoring and Validation at Scale |
Description | Goals:
Description: This project aims to establish a foundational data quality scoring mechanism for Baqend. By analyzing the tracking and CDN data schemas, the goal is to group attributes that likely share validation semantics. This grouping will then serve as a basis for identifying potential anomalies that can indicate issues with data quality. Given the diverse nature of Baqend's data, which spans multiple tables with numerous attributes of varying data types, and the unique characteristics of each customer website, this task requires a nuanced approach. While the long-term vision is to have a continuous monitoring system that can instantly flag data quality issues and ideally also pinpoint possible causes/solutions, the scope of this thesis will focus on laying the groundwork for such a system. This might involve focusing on a specific subset of the data, analyzing data from a particular timeframe, or implementing a batch-based analysis rather than a real-time one. In the long term, we seek to provide Baqend with a robust mechanism to ensure the integrity and quality of its collected data, paving the way for more advanced, real-time monitoring systems in the future. Resources (Mandatory) Please check out the following resources before we meet for discussing a potential topic for your thesis:
|
Home institution | Department of Computing Science |
Associated institutions |
|
Type of work | practical / application-focused |
Type of thesis | Bachelor's or Master's degree |
Author | Prof. Dr. Wolfram Wingerath |
Status | available |
Problem statement | |
Requirement |
depends on the topic |
Created | 20/08/23 |