Enhancing Harassment Detection Using Specialist-Driven Modular AI
Background
The rapid growth of digital platforms has unfortunately brought a rise in technology-facilitated violence, making the need for effective content moderation more critical than ever. Developing automated systems to detect and analyze user-generated harassment reports is a significant challenge. Prior monolithic artificial intelligence models struggled to accurately classify diverse harassment types, such as commenting, ogling, and groping, because each category manifests through distinct linguistic cues. The primary goal of this project was to systematically evaluate existing AI models for harassment detection and enhance their capabilities through the development of an Adaptive Specialist-driven Harassment Detection System (AD-ASH). By testing various models on the SafeCity dataset, we strategically delegated specific harassment classification sub-tasks to the most effective models to optimize overall detection accuracy. The key goals included conducting a comparative evaluation across classic and modern architectures, developing a modular system, analyzing dataset quality, and establishing a framework for future contextual enhancements using knowledge graphs. This research successfully provided a robust template for moderation systems, improving automated detection tailored to the complexity of real-world classification tasks.
Project
Gender-based violence (GBV) is a massive issue and has increasingly extended into our digital spaces. In Canada alone, nearly one in five women experience technology-facilitated harassment every year. For many, the only way to stay safe is to block abusers, self-censor, or just delete their accounts completely. As the amount of content online keeps growing, we urgently need automated systems to flag harmful posts. But building tools that actually work is a tough technical problem. While foundational research established crowdsourced datasets like SafeCity to train these systems, existing solutions primarily utilize monolithic models, such as single Convolutional Neural Networks (CNNs) or Recurrent Neural Networks (RNNs). These singular architectures often miss type-specific cues; for instance, verbal harassment like commenting relies on direct quotes, while visual harassment like ogling involves spatial terminology. Consequently, effective moderation systems must be designed to handle these diverse linguistic structures rather than relying on a single, generalized approach.
This study aimed to address this limitation by evaluating a wide spectrum of natural language processing tools, ranging from fine-tuned Small Language Models like BERT to advanced prompting and Retrieval-Augmented Generation (RAG) strategies for Large Language Models like Llama-3.1 and DeepSeek. Our findings revealed that different models excelled at different tasks: fine-tuned BERT performed best for "commenting," CNN-RNN outperformed on "ogling," and fine-tuned DeepSeek achieved the highest accuracy for "groping." Through this comparative analysis, the project sought to design an adaptive ensemble framework that dynamically assigned classification tasks to the highest-performing specialist model. Ultimately, this work advanced the reliability of automated moderation, achieving improved multi-label detection accuracy and highlighted critical data curation needs for sensitive artificial intelligence applications. The primary goal of this project was to evaluate existing models and develop a specialized, modular artificial intelligence system to accurately classify online harassment narratives. To achieve this, we focused on four specific objectives:
- Systematic Model Evaluation: We conducted a comprehensive comparative analysis of diverse techniques for classifying sexual harassment narratives. This involved evaluating classic architectures (CNNs and RNNs), modern transformer-based models (fine-tuned BERT), and advanced strategies for Large Language Models to identify the highest-performing algorithm for each specific harassment type.
- Modular System Architecture Development: We engineered the Adaptive Specialist-driven Harassment Detection System (AD-ASH), an extensible framework designed to overcome the limitations of monolithic models. This system dynamically routed each classification sub-task to its respective specialist model, successfully aggregating binary decisions into a highly accurate multi-label output.
- Dataset Quality Analysis and Validation: We performed a critical validation study of the SafeCity dataset through independent expert re-labeling. This analysis identified significant label mismatch rates (exceeding 21% for commenting and ogling) and directional biases, providing essential context for model performance variations and demonstrating the critical need for rigorous annotation protocols in sensitive AI research.
- Framework for Contextual Enhancements: We established a modular foundation capable of supporting future contextual modeling components. This architecture was designed to allow the integration of knowledge graphs to capture relational structures specific to sexual harassment, aiming to further enhance the contextual reasoning of Large Language Models in subsequent research phases.
Research Team
- Karen Soldatić, PI, CERC Health Equity and Community Wellbeing, Toronto Metropolitan University, ON, Canada
- Enas AlTarawneh, Post Doctoral Fellow - Digital Health Equity and Accessibility, CERC Health Equity and Community Wellbeing, Toronto Metropolitan University, ON, Canada
- Kshitiz Pokhrel, Research Assistant, CERC Health Equity and Community Wellbeing, Toronto Metropolitan University, ON, Canada
- Deeksha Chandola, Research Assistant, CERC Health Equity and Community Wellbeing, Toronto Metropolitan University, ON, Canada
- Glaucia Melo, Assistant Professor, Toronto Metropolitan University, ON, Canada
Funding
- This research project is supported by the CERC Health Equity and Community Wellbeing.
Period
- September 2024 - November 2025
Publications
- E. Altarawneh, K. Pokhrel, D. Chandola, G. Melo and K. Soldatic, "Beyond Monolithic LLMs: Modular AI for Online Harassment Detection," 2025 IEEE International Conference on Collaborative Advances in Software and COmputiNg (CASCON), Toronto, ON, Canada, 2025, pp. 24-29, doi: 10.1109/CASCON66301.2025.00021.