MyHateDetect – Multilingual Hate Speech Detection System

MyHateDetect is a dual-stage classification platform for detecting and categorising hate speech in English and Malay texts. Built with a focus on code-switched social media texts (tweets), the system uses multilingual BERT (mBERT) to ensure high accuracy across diverse linguistic contexts.

Research & Publication

The binary classification model (Stage 1) and the underlying dataset used in this project have been formally published:

Research Paper: A bilingual Malay-English social media dataset for binary hate speech detection (Published in Data in Brief, ScienceDirect).
Official Dataset: A Bilingual Malay-English Social Media Dataset for Binary Hate Speech Detection (Hosted on Mendeley Data).

System Architecture

Stage 1: binary classification (hate vs non-hate): Determine if a tweet contains hate speech.

Stage 2: multilabel hate type classification: Identify the specific nature of the hate speech (Race, Religion, Gender, Sexual Orientation).

Performance

The system is fine-tuned over 10,000 bilingual tweets. mBERT is selected for deployment due to its performances in both stages.

Stage 1 Shows high True Positive counts, effectively filtering toxic content.

Stage 2 Reveals how different categories, such as Race and Religion, often overlap in toxic discourse.

Features

Dual-Stage Pipeline: Optimised detection architecture. Stage 1 acts as a filter (Hate vs. Non-Hate), while Stage 2 performs deep-dive categorisation.
Multilingual BERT (mBERT) Integration: Specifically fine-tuned for high accuracy in both English and Malay, outperforming standard monolingual models.
Role-Based Access Control (RBAC): Secure access for Admins (system management and CSV uploads) and Policymakers (read-only visualisation and trend analysis).
Zero-Config NLP: Automated NLTK resource setup on first launch.

Folder Structure

MyHateDetect/
├── app/
│   ├── templates/            # HTML for dashboard, visualisation, auth
│   ├── routes/               # Flask Blueprints
│   ├── static/               # Logo used in UI
│   ├── stage_predict.py      # Final prediction script (stage 1 + 2)
│   ├── text_utils.py         # Preprocessing & Auto-NLTK setup
│   └── utils.py              # Progress bar, database functions
├── experiment/
│   ├── stage1/               # Binary classification: Training notebooks, model weights & performance visuals
│   ├── stage2/               # Multi-label classification: Training notebooks, model weights & visuals
├── sample_uploads            # Sample dataset files for tweets upload and user registration use
├── slangdict                 # Dictionary for normalising slang and toxic
├── sql query/
│   └── myhatedetect.sql      # MySQL database dump
├── requirements.txt          # Clean list of dependencies for website
├── run.py                    # Entry point for Flask app
└── README.md                 # Project documentation and setup guide

Installation & Setup

1. Clone the Repository

git clone https://github.com/JunTan03/FYP-MyHateDetect.git
cd FYP-MyHateDetect

2. Install Dependencies

pip install -r requirements.txt

<<<<<<< HEAD 3. Model Weights Due to GitHub's file size limitations (100MB), the fine-tuned BERT models (model.safetensors) are not included in this repository.

To run the prediction pipeline locally:

Download the model weights from [https://drive.google.com/drive/folders/11DBAdZg2rDUveGkMJe-EDdC8tza94gKo?usp=drive_link].
Place the files into the following directory:

experiment/stage1/s1_mb_model/
experiment/stage2/s2_mb_model/.

Ensure config.json and tokenizer_config.json are also present in the same folder.

3. Database Setup

Ensure MySQL is running amd import the database dump
Run this command to create the database

mysql -u root -p -e "CREATE DATABASE IF NOT EXISTS myhatedetect;"

Populate the database using the provided SQL dump ======= 3. Database Setup

Ensure MySQL is running amd import the database dump
Run this command to create the database

mysql -u root -p -e "CREATE DATABASE IF NOT EXISTS myhatedetect;"

Populate the database using the provided SQL dump

d8b64f8dc48e3037702599efd0c9b6476861e2e0

mysql -u root -p myhatedetect < "sql query/myhatedetect.sql"

4. Run the Application

python run.py

Visit: http://localhost:5000

Demo Credentials

Role	Email	Password
Admin	jtan4148@gmail.com	12345678
Policymaker	(Create via Admin)

Upload Rules for Tweet

Column must be text or tweet
Duplicate file_name and month will be skipped
All inputs cleaned and language-detected

License

This project is licensed under the MIT License - see the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MyHateDetect – Multilingual Hate Speech Detection System

Research & Publication

System Architecture

Performance

Features

Folder Structure

Installation & Setup

Demo Credentials

Upload Rules for Tweet

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
app		app
experiment		experiment
sample_uploads		sample_uploads
slangdict		slangdict
sql query		sql query
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
run.py		run.py

Folders and files

Latest commit

History

Repository files navigation

MyHateDetect – Multilingual Hate Speech Detection System

Research & Publication

System Architecture

Performance

Features

Folder Structure

Installation & Setup

Demo Credentials

Upload Rules for Tweet

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages