Implementasi Decision Tree Learning (DTL), Logistic Regression, dan Support Vector Machine (SVM) untuk klasifikasi status studi mahasiswa. Tersedia versi from scratch dan baseline scikit-learn dengan pipeline preprocessing lengkap (imputasi, outlier handling, encoding, normalisasi, PCA, SMOTE), serta dukungan simpan/muat model dan pembuatan submission Kaggle.
Nasihuy-Tubes2-IF3170/
├── data/
│ ├── sample_submission.csv
│ ├── test.csv
│ └── train.csv
├── doc/
│ └── IF3170_Artificial_Intelligence___Tugas_Besar_2_Notebook_Nasihuy.ipynb
├── Spek/
│ ├── 1763033934646_Tugas-Besar-2-IF3170-Inteligensi-Artifisial-2025_2026 (1).pdf
│ ├── laporan.txt
│ └── spek.txt
├── src/
│ ├── data/ # DataLoader, modul EDA
│ ├── model/ # Implementasi DTL/LogReg/SVM from scratch + visualizer
│ ├── preprocessing/ # Cleaning, feature eng, encoding, scaling/normalisasi, PCA, SMOTE
│ ├── storage/ # Save/load model (pkl/txt)
│ ├── test/ # Skrip perbandingan model vs sklearn
│ ├── main.py # Alur training interaktif
│ ├── submission.py # Generator submission Kaggle
│ └── perbandingan_model.txt
├── README.md
└── requirements.txt
- Python 3.10+ (teruji di 3.11)
- Install dependensi:
pip install -r requirements.txt
- Train/Test diunduh otomatis dari GDrive (link di kode):
train_url = https://drive.google.com/uc?id=1wzTvPSwjAK5PN0iCWEXy92_tim-5ggjs
test_url = https://drive.google.com/uc?id=1ZoKNPeUAIIFIZHoKaY6_4R_fUqDue0HM
- Jalankan pipeline utama:
python src/main.py
- Load data → EDA ringkas → split train/val → cleaning → preprocessing (imputasi, outlier IQR, OHE + PowerTransformer/StandardScaler, PCA 95%) → SMOTE → pilih model (LogReg/SVM/DTL) → evaluasi → opsi simpan model.
- Uji perbandingan scratch vs sklearn:
python src/test/test_dtl.py python src/test/test_logres.py python src/test/test_svm.py
- Buat submission (butuh model
.pklyang sudah disimpan):python src/submission.py
- Pilih file model → pipeline fit → transform test → hasilkan
submission.csv.
- Pilih file model → pipeline fit → transform test → hasilkan
python src/main.pyLangkah utama:
- Load data train/test.
- EDA ringkas (info, missing values).
- Split train/val (80/20, stratify).
- Cleaning (cek missing, hapus duplikat).
- Preprocessing: FeatureEngineering (Age binning), imputasi, outlier IQR, OneHotEncoder + PowerTransformer/StandardScaler, PCA 95% variansi.
- SMOTE pada train.
- Pilih dan latih model (LogReg/SVM/DTL), tampilkan metrik, simpan model.
- Decision Tree (scratch vs sklearn):
python src/test/test_dtl.py
- Logistic Regression (scratch vs sklearn):
python src/test/test_logres.py
- SVM (scratch vs sklearn, PEGASOS):
python src/test/test_svm.py
Metrik yang dicetak: Accuracy, Balanced Accuracy, Precision/Recall/F1 (weighted), dan klasifikasi report. Beberapa skrip menyimpan plot perbandingan.
Pastikan sudah ada model tersimpan (pkl) di src/storage/ atau direktori kerja.
python src/submission.pyAlur: load data → fit preprocessing pipeline → pilih file model .pkl → transform test → hasilkan submission.csv (nama bisa dipilih) untuk diunggah ke Kaggle.
- DTL (
src/model/dtl.py): CART dengan Gini/Entropy, dukung numerik/kategorikal, pre-pruningmax_depth/min_samples_split. - Logistic Regression (
src/model/logres.py): binary & multiclass OvR, gradient descent, regulasi L1/L2 opsional, early stopping berbasis tol. - SVM (
src/model/svm.py): PEGASOS primal SGD, kernel linear/RBF/polynomial, strategi OvA/OvO. - Semua model mewarisi
BaseModelsehingga dapat disimpan/muat viastorage.
- PCA:
n_components=0.95,random_state=42, dijalankan setelah encoding/scaling. Cek variansi kumulatif di notebook/saat fit pipeline. - SMOTE: diterapkan pada training set setelah preprocessing untuk menyeimbangkan kelas (terutama Enrolled/Dropout).
- Gunakan
random_state=42(sudah di kode). - Jalankan skrip pada Python 3.10/3.11; install requirements yang sama.
- Simpan hasil run (metrik/plot) untuk dokumentasi di laporan.
| No | Tugas | NIM |
|---|---|---|
| 1 | Exploratory Data Analysis (EDA) & Data Cleaning | 13523132, 13523150, 13523124 |
| 2 | Data Transformation & Feature Selection | 13523132, 13523150, 13523124 |
| 3 | Dimensionality Reduction (PCA) | 13523132, 13523124 |
| 4 | Implementasi Decision Tree Learning (DTL) | 13523155 |
| 5 | Implementasi Logistic Regression | 13523124, 13523155 |
| 6 | Implementasi Support Vector Machine (SVM) | 13523136 |
| 7 | Implementasi Gambar Percabangan Tree pada DTL [BONUS] | 13523155, 13523136 |
| 8 | Laporan | 13523124, 13523132, 13523136, 13523150, 13523155 |