Theses

The student is required to conduct advanced research on a topic related to data science. The topic is chosen in consultation with thesis supervisor, and the student presents research plan in writing before research starts. The student must submit the completed research in a thesis format to an examination committee and make an oral presentation of the thesis. The student is expected to furnish evidence of competence in research and a sound understanding of data science associated with the research.

The Thesis is presented to the university in partial fulfillment of the requirements for the degree of Master of Science in the program of Data Science and Analytics.

2025

Tushe, Ergi – Video Based Multiple Pedestrian Detection and Tracking for Autonomous Delivery Robots in Urban Spaces Using Pose Estimation and Depth Perception (Supervisor: Bilal Farooq)

The integration of Autonomous Delivery Robots (ADRs) into pedestrian-heavy urban spaces introduces unique challenges in terms of safe, efficient, and socially acceptable navigation. This research addresses a critical gap in ADR perception by developing a full pipeline for single video camera-based multi-pedestrian detection and tracking, combining YOLOv9-based detection, DeepSORT tracking, YOLO-Pose human pose estimation, and monocular depth perception. The system is designed for computational efficiency and real-world deployability, operating entirely on pre-trained models without any additional training or fine-tuning. Leveraging real-world MOT17 dataset sequences, this study demonstrates how integrating human-pose estimation and depth cues enhances pedestrian trajectory prediction and identity maintenance, even under occlusions and dense crowds. Results show measurable improvements, including up to a 10% increase in identity preservation (IDF1), a 7% improvement in multi-object tracking accuracy (MOTA), and consistently high detection precision exceeding 85%, even in challenging scenarios. Notably, the system identifies vulnerable pedestrian groups, supporting more socially aware and inclusive robot behaviour. By contributing a multi-stage perception framework capable of interpreting human intent and enhancing trust in ADR deployment, this research advances the field of human-robot interaction and provides actionable insights for scalable, real world ADR integration into modern cities.

Yang, Shuqi – Markov Decision Process Models for the Multiperiod Newsvendor Problem with Service-Dependent Demand (Supervisor: Mucahit Cevik; Co-supervisor: Aliaa Alnaggar)

Demand being service-dependent means that a customer may exit the seller’s market after encountering an inventory stockout, which affects the demand in future periods. This research aims to investigate and analyze inventory control policies under these settings, empirically evaluating the performance of various solution approaches. We propose four different Markov decision process (MDP) models under partial and full observability, in constrained and unconstrained settings, for a finite and an infinite planning horizon. We use exact and approximated solution methods to solve the proposed models. Our study reveals insights into the structure of the inventory ordering policies. The analysis reveals that constrained models produce more aggressive ordering strategies, leading to increased order quantities to mitigate stockout risks. Results from the constrained partially observable model demonstrate the intricate interactions between budget and partial observability. Lastly, the impact of the budget weakens as the state space of the partially observable model gets larger.

Zaidi, Saniya – Towards Multimodal Emotion Recognition: Insights from ECG, GSR, and Eye Tracking Signals (Supervisor: Naimul Khan)

Tracking physiological and behavioral signals provides insights into human emotions. This study aimed to predict emotional states using deep learning on Electrocardiogram (ECG), Galvanic Skin Response (GSR), and Eye Tracking data. Conducted on VREED and MAHNOB datasets, both using VR environments, signals were combined (e.g., ECG with GSR) and saved as images, then analyzed using a ResNet-34 model. Results showed VREED achieved up to 70% accuracy, driven mainly by Eye Tracking, while MAHNOB reached a peak of 42%, with GSR as the most impactful signal. All experiments were subject-independent to prevent data leakage. Findings emphasize the varying effectiveness of each signal in emotion recognition, guiding future research in this area.

2023

de Guzman, Patrick – Transformer Models for Automated Bug Training and Duplicate Bug Detection (Supervisor: Mucahit Cevik; Co-supervisor: Ayse Basar Bener)

In the software engineering field, developer teams must handle bug reports of varying sources and formats to maintain and optimize software applications as issues arise. A team’s workflow for handling bugs involves multiple stages to review, assess, assign, and resolve bugs. In teams for large-scale applications, streamlining such processes is vital for efficient operations as they are exposed to greater volumes and varieties of bug reports. This thesis focuses on bug triaging and duplicate bug detection as preliminary processing options for the automated bucketing and assignment of bugs. In the bug triaging task, Transformer-based models are found to outperform in mean Rank-5, Rank-10, and Mean Reciprocal Rank across several open-source datasets for various software projects. In the duplicate bug detection task, similarity learning is employed and Transformer-based siamese models with domain adaptation are shown to improve similarity learning capabilities with improvements in mean Area under the Curve, Recall-rate @ k, and Mean Reciprocal Rank performance.

Helmeczi, Robert Kraig – Few-Shot Learning for Text Classification and Its Applications in Essay Scoring and Software Engineering (Supervisor: Mucahit Cevik)

Few-shot learning—the ability to train models with access to limited data—has become increasingly popular in the natural language processing (NLP) domain, as large language models such as GPT and T0 have been empirically shown to achieve high performance in numerous tasks with access to just a handful of labeled examples. Smaller language models such as BERT and its variants have also been shown to achieve strong performance with just a handful of labeled examples when combined with few-shot learning algorithms like pattern-exploiting training (PET) and SetFit. The focus of this thesis is to investigate the performance of alternative few-shot learning approaches with BERT-based models. Specifically, vanilla fine-tuning, PET and SetFit are compared for numerous BERT-based checkpoints over an array of training set sizes. To facilitate this investigation, applications of few-shot learning are considered in automatic essay scoring—the task automatically grading written assessments—as well as in software engineering. For each task, high-performance techniques and their associated model checkpoints are identified through detailed empirical analysis. Our results establish PET as a strong few-shot learning approach, and our analysis show that with just a few hundred labeled examples it can achieve performance near that of fine-tuning on full-sized data sets.