Topic: Seamless Integration of HPC and Storage Systems with Open-WebSearch (OWS): Challenges, Solutions, and Performance Optimization

Topic: Seamless Integration of HPC and Storage Systems with Open-WebSearch (OWS): Challenges, Solutions, and Performance Optimization

Personal details

Title Seamless Integration of HPC and Storage Systems with Open-WebSearch (OWS): Challenges, Solutions, and Performance Optimization
Description

This thesis will explore the technical integration of a local HPC and storage system with the Open-WebSearch (OWS) infrastructure. The objective is to establish a seamless data exchange mechanism that enables local resources to participate in web crawling and indexing

The research will include: 

Analyzing the current architecture and defining integration requirements. 

Developing specialized interfaces to enable efficient and secure data exchange

Implementing a small-scale web crawling and indexing prototype on an HPC system. 

Evaluating performance, scalability, and security challenges

Optimizing the integration process and documenting best practices

A successful outcome will demonstrate that a local data center can effectively connect to and contribute to the OWS infrastructure, paving the way for further adoption. 

Home institution Department of Computing Science
Associated institutions
Type of work practical / application-focused
Type of thesis Bachelor's or Master's degree
Author Sreedhar Kokkarachedu
Status available
Problem statement

The Open-WebSearch (OWS) initiative, as part of the Horizon 2020 project, aims to establish an open and independent search infrastructure, providing an alternative to commercial search engines. To support this initiative, data centers need to integrate their High-Performance Computing (HPC) and storage resources with the OWS network. 

However, this integration presents several technical challenges, including: 

Connectivity Issues: Establishing seamless communication between local systems and OWS. 

Data Exchange Mechanisms: Developing efficient interfaces for transferring and processing large-scale web data. 

Performance and Scalability: Ensuring that HPC resources can handle the demands of large-scale web crawling and indexing

Security and Reliability: Addressing potential risks associated with distributed data exchange. 

The primary challenge is not the availability of computing resources but the technical connection between local storage/HPC systems and the distributed OWS infrastructure. This thesis will focus on identifying these challenges and developing solutions to enable effective integration. 

Requirement

To successfully complete this research, the following requirements must be met: 

Technical Requirements 

Understanding of HPC and Storage Systems: Familiarity with cluster computing, parallel processing, and storage architectures. 

Knowledge of Distributed Systems: Understanding communication between multiple nodes in a large-scale infrastructure. 

Programming Skills: Experience with SLURM, Python, or Bash for implementation and automation. 

Created 03/04/25