Personal details
Title | Seamless Integration of HPC and Storage Systems with Open-WebSearch (OWS): Challenges, Solutions, and Performance Optimization |
Description | This thesis will explore the technical integration of a local HPC and storage system with the Open-WebSearch (OWS) infrastructure. The objective is to establish a seamless data exchange mechanism that enables local resources to participate in web crawling and indexing. The research will include: Analyzing the current architecture and defining integration requirements. Developing specialized interfaces to enable efficient and secure data exchange. Implementing a small-scale web crawling and indexing prototype on an HPC system. Evaluating performance, scalability, and security challenges. Optimizing the integration process and documenting best practices. A successful outcome will demonstrate that a local data center can effectively connect to and contribute to the OWS infrastructure, paving the way for further adoption. |
Home institution | Department of Computing Science |
Associated institutions |
|
Type of work | practical / application-focused |
Type of thesis | Bachelor's or Master's degree |
Author | Sreedhar Kokkarachedu |
Status | available |
Problem statement | The Open-WebSearch (OWS) initiative, as part of the Horizon 2020 project, aims to establish an open and independent search infrastructure, providing an alternative to commercial search engines. To support this initiative, data centers need to integrate their High-Performance Computing (HPC) and storage resources with the OWS network. However, this integration presents several technical challenges, including: Connectivity Issues: Establishing seamless communication between local systems and OWS. Data Exchange Mechanisms: Developing efficient interfaces for transferring and processing large-scale web data. Performance and Scalability: Ensuring that HPC resources can handle the demands of large-scale web crawling and indexing. Security and Reliability: Addressing potential risks associated with distributed data exchange. The primary challenge is not the availability of computing resources but the technical connection between local storage/HPC systems and the distributed OWS infrastructure. This thesis will focus on identifying these challenges and developing solutions to enable effective integration. |
Requirement | To successfully complete this research, the following requirements must be met: Technical Requirements Understanding of HPC and Storage Systems: Familiarity with cluster computing, parallel processing, and storage architectures. Knowledge of Distributed Systems: Understanding communication between multiple nodes in a large-scale infrastructure. Programming Skills: Experience with SLURM, Python, or Bash for implementation and automation. |
Created | 03/04/25 |