Skip to content

Another AI (Artificial Intelligence) project made by 5 AI (Actual Indonesians)

Notifications You must be signed in to change notification settings

RealNath/Nasihuy

Repository files navigation

Tugas Besar 2 IF3170

Implementasi Decision Tree Learning (DTL), Logistic Regression, dan Support Vector Machine (SVM) untuk klasifikasi status studi mahasiswa. Tersedia versi from scratch dan baseline scikit-learn dengan pipeline preprocessing lengkap (imputasi, outlier handling, encoding, normalisasi, PCA, SMOTE), serta dukungan simpan/muat model dan pembuatan submission Kaggle.

Struktur Repo

Nasihuy-Tubes2-IF3170/
├── data/
│   ├── sample_submission.csv
│   ├── test.csv
│   └── train.csv
├── doc/
│   └── IF3170_Artificial_Intelligence___Tugas_Besar_2_Notebook_Nasihuy.ipynb
├── Spek/
│   ├── 1763033934646_Tugas-Besar-2-IF3170-Inteligensi-Artifisial-2025_2026 (1).pdf
│   ├── laporan.txt
│   └── spek.txt
├── src/
│   ├── data/               # DataLoader, modul EDA
│   ├── model/              # Implementasi DTL/LogReg/SVM from scratch + visualizer
│   ├── preprocessing/      # Cleaning, feature eng, encoding, scaling/normalisasi, PCA, SMOTE
│   ├── storage/            # Save/load model (pkl/txt)
│   ├── test/               # Skrip perbandingan model vs sklearn
│   ├── main.py             # Alur training interaktif
│   ├── submission.py       # Generator submission Kaggle
│   └── perbandingan_model.txt
├── README.md
└── requirements.txt

Prasyarat

  • Python 3.10+ (teruji di 3.11)
  • Install dependensi:
    pip install -r requirements.txt

Dataset

  • Train/Test diunduh otomatis dari GDrive (link di kode):
    train_url = https://drive.google.com/uc?id=1wzTvPSwjAK5PN0iCWEXy92_tim-5ggjs
    test_url = https://drive.google.com/uc?id=1ZoKNPeUAIIFIZHoKaY6_4R_fUqDue0HM

Cara Jalan Singkat

  1. Jalankan pipeline utama:
    python src/main.py
    • Load data → EDA ringkas → split train/val → cleaning → preprocessing (imputasi, outlier IQR, OHE + PowerTransformer/StandardScaler, PCA 95%) → SMOTE → pilih model (LogReg/SVM/DTL) → evaluasi → opsi simpan model.
  2. Uji perbandingan scratch vs sklearn:
    python src/test/test_dtl.py
    python src/test/test_logres.py
    python src/test/test_svm.py
  3. Buat submission (butuh model .pkl yang sudah disimpan):
    python src/submission.py
    • Pilih file model → pipeline fit → transform test → hasilkan submission.csv.

Menjalankan Pipeline Utama

python src/main.py

Langkah utama:

  1. Load data train/test.
  2. EDA ringkas (info, missing values).
  3. Split train/val (80/20, stratify).
  4. Cleaning (cek missing, hapus duplikat).
  5. Preprocessing: FeatureEngineering (Age binning), imputasi, outlier IQR, OneHotEncoder + PowerTransformer/StandardScaler, PCA 95% variansi.
  6. SMOTE pada train.
  7. Pilih dan latih model (LogReg/SVM/DTL), tampilkan metrik, simpan model.

Uji dan Perbandingan Model

  • Decision Tree (scratch vs sklearn):
    python src/test/test_dtl.py
  • Logistic Regression (scratch vs sklearn):
    python src/test/test_logres.py
  • SVM (scratch vs sklearn, PEGASOS):
    python src/test/test_svm.py

Metrik yang dicetak: Accuracy, Balanced Accuracy, Precision/Recall/F1 (weighted), dan klasifikasi report. Beberapa skrip menyimpan plot perbandingan.

Membuat Submission

Pastikan sudah ada model tersimpan (pkl) di src/storage/ atau direktori kerja.

python src/submission.py

Alur: load data → fit preprocessing pipeline → pilih file model .pkl → transform test → hasilkan submission.csv (nama bisa dipilih) untuk diunggah ke Kaggle.

Catatan Model From Scratch

  • DTL (src/model/dtl.py): CART dengan Gini/Entropy, dukung numerik/kategorikal, pre-pruning max_depth/min_samples_split.
  • Logistic Regression (src/model/logres.py): binary & multiclass OvR, gradient descent, regulasi L1/L2 opsional, early stopping berbasis tol.
  • SVM (src/model/svm.py): PEGASOS primal SGD, kernel linear/RBF/polynomial, strategi OvA/OvO.
  • Semua model mewarisi BaseModel sehingga dapat disimpan/muat via storage.

PCA dan Imbalance

  • PCA: n_components=0.95, random_state=42, dijalankan setelah encoding/scaling. Cek variansi kumulatif di notebook/saat fit pipeline.
  • SMOTE: diterapkan pada training set setelah preprocessing untuk menyeimbangkan kelas (terutama Enrolled/Dropout).

Reproducibility

  • Gunakan random_state=42 (sudah di kode).
  • Jalankan skrip pada Python 3.10/3.11; install requirements yang sama.
  • Simpan hasil run (metrik/plot) untuk dokumentasi di laporan.

Pembagian Tugas

No Tugas NIM
1 Exploratory Data Analysis (EDA) & Data Cleaning 13523132, 13523150, 13523124
2 Data Transformation & Feature Selection 13523132, 13523150, 13523124
3 Dimensionality Reduction (PCA) 13523132, 13523124
4 Implementasi Decision Tree Learning (DTL) 13523155
5 Implementasi Logistic Regression 13523124, 13523155
6 Implementasi Support Vector Machine (SVM) 13523136
7 Implementasi Gambar Percabangan Tree pada DTL [BONUS] 13523155, 13523136
8 Laporan 13523124, 13523132, 13523136, 13523150, 13523155

About

Another AI (Artificial Intelligence) project made by 5 AI (Actual Indonesians)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages