AI-Driven Cybersecurity: The Future of Digital Defense

Project Chapter 13

βœ… Chapter 13: Build Your First ML Threat Detection Model

A complete beginner-friendly project to detect cyber threats using Machine Learning


πŸ“Œ Introduction

You’ve learned how AI and ML power modern security systems. Now it’s time to build your first real, working ML model for threat detection.

This chapter gives you:

  • a simple but powerful project
  • completely beginner-friendly
  • uses real-world cybersecurity data
  • does not require deep math
  • runs on Google Colab
  • takes 30–45 minutes to complete

By the end, you’ll create an ML model that can: βœ” detect malicious network traffic βœ” classify normal vs. attack behaviour βœ” be used for SOC automation βœ” serve as your first AI cybersecurity portfolio project

Let’s begin.


🎯 1. What You Will Build

You will build a Network Intrusion Detection Model using ML to detect:

  • port scanning
  • brute force attacks
  • DDoS
  • botnets
  • malicious flows

We use a popular dataset:

CIDDS-001 or UNSW-NB15

Both simulate real enterprise network traffic.

Your ML model will:

  1. Load dataset
  2. Extract features
  3. Train a classification model
  4. Predict malicious vs benign traffic
  5. Evaluate accuracy

🧰 2. Tools Needed (Free)

Everything is 100% free and cloud-based.

Use:

  • Google Colab (recommended)
  • Python 3
  • Scikit-Learn
  • Pandas
  • NumPy
  • Matplotlib

No installation needed.


πŸ“Š 3. Understanding the Dataset (Simple Explanation)

Each row = one network flow

Contains:

  • source IP
  • destination IP
  • port
  • protocol
  • duration
  • bytes sent
  • packets
  • flags
  • labels (normal or attack)

ML learns from these patterns.


✨ 4. ML Workflow Overview

Here’s the exact flow of your project:

Dataset
   ↓
Data Cleaning
   ↓
Feature Selection
   ↓
Train-Test Split
   ↓
Model Training (Random Forest)
   ↓
Evaluation
   ↓
Threat Predictions

This pipeline is universal in cybersecurity ML.


πŸ§ͺ 5. Build the Model (Code Included)

Copy + paste this into Google Colab.


STEP 1 β€” Install Libraries

!pip install pandas numpy scikit-learn matplotlib

STEP 2 β€” Import Libraries

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, accuracy_score
import matplotlib.pyplot as plt

STEP 3 β€” Load the Dataset

If using UNSW-NB15:

from google.colab import files
uploaded = files.upload()
df = pd.read_csv('UNSW-NB15.csv')

If using CIDDS-001:

df = pd.read_csv('cidds.csv')

STEP 4 β€” Basic Cleaning

df = df.dropna()
df = df.select_dtypes(include=[np.number])

This removes text columns and keeps numeric features only.


STEP 5 β€” Define Features & Labels

X = df.drop('label', axis=1)
y = df['label']

If your dataset uses β€œattack” instead:

y = df['attack']

STEP 6 β€” Train-Test Split

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.25, random_state=42)

STEP 7 β€” Train Random Forest Model

Random Forest is great for beginners.

model = RandomForestClassifier(n_estimators=150)
model.fit(X_train, y_train)

STEP 8 β€” Model Predictions

y_pred = model.predict(X_test)

STEP 9 β€” Evaluate Model

print("Accuracy:", accuracy_score(y_test, y_pred))
print(classification_report(y_test, y_pred))

If accuracy is >85%, your model is performing well.


🧠 6. Understanding Your Model’s Output

Your output will look like:

Accuracy: 0.91
Precision    Recall    F1-score
Benign        0.92       0.88
Malicious     0.90       0.93

What this means:

  • Accuracy 91% β†’ Model works well
  • Precision β†’ How exact predictions are
  • Recall β†’ How many threats were correctly found
  • F1-score β†’ Balance between precision & recall

Security analysts focus on recall, because missing attacks is dangerous.


πŸ“Š 7. Visualizing Feature Importance

Add this code:

importances = model.feature_importances_
indices = np.argsort(importances)[-10:]

plt.figure(figsize=(10,5))
plt.title("Top 10 Important Features")
plt.barh(range(len(indices)), importances[indices], align='center')
plt.yticks(range(len(indices)), [X.columns[i] for i in indices])
plt.show()

This graph shows what features matter most.

Common important features:

  • duration
  • bytes
  • src_port
  • dst_port
  • protocol
  • pkt_count

Your ML model becomes explainable β€” important for cybersecurity jobs.


πŸ›‘οΈ 8. Deploy the Model (Optional)

Your model can be deployed as:

  • a local script
  • a SIEM integration
  • a SOC alerting tool
  • a cloud function
  • a REST API

Example Python API (Flask):

from flask import Flask, request
import joblib

app = Flask(__name__)
model = joblib.load("model.pkl")

@app.route('/predict', methods=['POST'])
def predict():
    data = request.json
    prediction = model.predict([list(data.values())])
    return str(prediction[0])

app.run()

You now have your own AI threat detector.


πŸ“ 9. Portfolio Ideas (For Students)

Add these to your cybersecurity portfolio:

  • GitHub repo of this project
  • Kaggle notebook
  • LinkedIn post documenting the process
  • PDF report including evaluation metrics
  • A video demo on YouTube or Instagram

This massively boosts your employability.


🧩 10. Diagram: ML Threat Detection Pipeline

 +-------------------+
 |    Raw Traffic    |
 +---------+---------+
           |
    Feature Extractor
           |
 +---------+---------+
 |  ML Model (RF)    |
 +---------+---------+
           |
    Threat Prediction
 (Benign | Malicious)

πŸŽ“ 11. Common ML Models Used in Threat Detection

ModelBest For
Random Forestfirst models, structured data
XGBoosthigh accuracy models
SVMsmall datasets
LSTMsequence-based logs
Autoencoderanomaly detection
CNNmalware byte analysis

Your first project uses Random Forest β€” perfect for beginners.


πŸ“Œ Key Takeaways

  • This project teaches the basics of ML in cybersecurity.
  • You built your first AI threat classifier using real data.
  • It works for network anomaly detection, SOC automation, and security analysis.
  • You learned feature extraction, model training, evaluation, and visualization.
  • This can be added to your cybersecurity portfolio immediately.