Stud.IP Uni Oldenburg
University of Oldenburg
26.07.2021 16:36:17

Personal details

Title zhokli

Description: This topic is formulated in accordance to the data Management phase in SmartHelm project. As part of the project we obtain different categories of data such as order management data, navigation data (Structured data), weather data, Geo Information Data (semi-structured data category) and EEG sensor data (unstructured category). The heterogeneous data is transferred through various protcols such as REST API, general file transfer, web API services etc., now these transfered data sources obtained in various formats such as (.csv, JSON, xdf etc.,) should be extracted from the sources and transformed using various data processing tools into a uniform format then finally loaded to store in Data Warehouse (DWH). In principle through literature study, the state-of-the-art ETL should be chosen in way that it best suitable for in-house data storage rather than cloud-based ETL tools. Data storage is completely done in-house database.

Data Integration plays a key role before storing the data in the main Data Storage (DWH), because stored data must be productive to utilize it for implementing data analysis as well data evaluation techniques. Therefore, within this Thesis there is scope to research the concept of Data warehousing and ETL methods in depth. In addition, can practically implement the best suitable methods for our Data Requirements and goals.

Home institution Department of Computing Science
Type of work practical / application-focused
Type of thesis Master's degree
Author Harish Moturu
Status available
Problem statement

AIM: To find out and implement the suitable ETL approaches, data warehousing techniques for storing structured, Semi-structured and unstructured Data in an in-house Data Warehouse with the following aspects.

  • Study about the available data sources

  • design the schema

  • develop data source Connectors

  • deploy ETL process

  • Data Integration.

  • Build a model

  • Prepare Data Catalogue.

The main goal from the Thesis is to build a scalable Data storage system, which can be set into application on the Data collected in the Project from various sources.

Language:  The Thesis can be written either in German or English. No restrictions.

  • Python programming
  • Hands-on various python libraies to communicate with API, databases etc.,
  • SQL, RDBMS such as PostgreSQL etc.,
  • Language: Free to choose either German or English.
Created 17/05/21

Study data

  • Very Large Business Applications
Degree programmes Not assigned to any degree programmes
Assigned courses No courses assigned
Contact person