AI-Driven Cybersecurity: The Future of Digital Defense

Project Chapter 13

โœ… Chapter 13: Build Your First ML Threat Detection Model

A complete beginner-friendly project to detect cyber threats using Machine Learning


๐Ÿ“Œ Introduction

Youโ€™ve learned how AI and ML power modern security systems. Now itโ€™s time to build your first real, working ML model for threat detection.

This chapter gives you:

  • a simple but powerful project
  • completely beginner-friendly
  • uses real-world cybersecurity data
  • does not require deep math
  • runs on Google Colab
  • takes 30โ€“45 minutes to complete

By the end, youโ€™ll create an ML model that can: โœ” detect malicious network traffic โœ” classify normal vs. attack behaviour โœ” be used for SOC automation โœ” serve as your first AI cybersecurity portfolio project

Letโ€™s begin.


๐ŸŽฏ 1. What You Will Build

You will build a Network Intrusion Detection Model using ML to detect:

  • port scanning
  • brute force attacks
  • DDoS
  • botnets
  • malicious flows

We use a popular dataset:

CIDDS-001 or UNSW-NB15

Both simulate real enterprise network traffic.

Your ML model will:

  1. Load dataset
  2. Extract features
  3. Train a classification model
  4. Predict malicious vs benign traffic
  5. Evaluate accuracy

๐Ÿงฐ 2. Tools Needed (Free)

Everything is 100% free and cloud-based.

Use:

  • Google Colab (recommended)
  • Python 3
  • Scikit-Learn
  • Pandas
  • NumPy
  • Matplotlib

No installation needed.


๐Ÿ“Š 3. Understanding the Dataset (Simple Explanation)

Each row = one network flow

Contains:

  • source IP
  • destination IP
  • port
  • protocol
  • duration
  • bytes sent
  • packets
  • flags
  • labels (normal or attack)

ML learns from these patterns.


โœจ 4. ML Workflow Overview

Hereโ€™s the exact flow of your project:

Dataset
   โ†“
Data Cleaning
   โ†“
Feature Selection
   โ†“
Train-Test Split
   โ†“
Model Training (Random Forest)
   โ†“
Evaluation
   โ†“
Threat Predictions

This pipeline is universal in cybersecurity ML.


๐Ÿงช 5. Build the Model (Code Included)

Copy + paste this into Google Colab.


STEP 1 โ€” Install Libraries

!pip install pandas numpy scikit-learn matplotlib

STEP 2 โ€” Import Libraries

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, accuracy_score
import matplotlib.pyplot as plt

STEP 3 โ€” Load the Dataset

If using UNSW-NB15:

from google.colab import files
uploaded = files.upload()
df = pd.read_csv('UNSW-NB15.csv')

If using CIDDS-001:

df = pd.read_csv('cidds.csv')

STEP 4 โ€” Basic Cleaning

df = df.dropna()
df = df.select_dtypes(include=[np.number])

This removes text columns and keeps numeric features only.


STEP 5 โ€” Define Features & Labels

X = df.drop('label', axis=1)
y = df['label']

If your dataset uses โ€œattackโ€ instead:

y = df['attack']

STEP 6 โ€” Train-Test Split

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.25, random_state=42)

STEP 7 โ€” Train Random Forest Model

Random Forest is great for beginners.

model = RandomForestClassifier(n_estimators=150)
model.fit(X_train, y_train)

STEP 8 โ€” Model Predictions

y_pred = model.predict(X_test)

STEP 9 โ€” Evaluate Model

print("Accuracy:", accuracy_score(y_test, y_pred))
print(classification_report(y_test, y_pred))

If accuracy is >85%, your model is performing well.


๐Ÿง  6. Understanding Your Modelโ€™s Output

Your output will look like:

Accuracy: 0.91
Precision    Recall    F1-score
Benign        0.92       0.88
Malicious     0.90       0.93

What this means:

  • Accuracy 91% โ†’ Model works well
  • Precision โ†’ How exact predictions are
  • Recall โ†’ How many threats were correctly found
  • F1-score โ†’ Balance between precision & recall

Security analysts focus on recall, because missing attacks is dangerous.


๐Ÿ“Š 7. Visualizing Feature Importance

Add this code:

importances = model.feature_importances_
indices = np.argsort(importances)[-10:]

plt.figure(figsize=(10,5))
plt.title("Top 10 Important Features")
plt.barh(range(len(indices)), importances[indices], align='center')
plt.yticks(range(len(indices)), [X.columns[i] for i in indices])
plt.show()

This graph shows what features matter most.

Common important features:

  • duration
  • bytes
  • src_port
  • dst_port
  • protocol
  • pkt_count

Your ML model becomes explainable โ€” important for cybersecurity jobs.


๐Ÿ›ก๏ธ 8. Deploy the Model (Optional)

Your model can be deployed as:

  • a local script
  • a SIEM integration
  • a SOC alerting tool
  • a cloud function
  • a REST API

Example Python API (Flask):

from flask import Flask, request
import joblib

app = Flask(__name__)
model = joblib.load("model.pkl")

@app.route('/predict', methods=['POST'])
def predict():
    data = request.json
    prediction = model.predict([list(data.values())])
    return str(prediction[0])

app.run()

You now have your own AI threat detector.


๐Ÿ“ 9. Portfolio Ideas (For Students)

Add these to your cybersecurity portfolio:

  • GitHub repo of this project
  • Kaggle notebook
  • LinkedIn post documenting the process
  • PDF report including evaluation metrics
  • A video demo on YouTube or Instagram

This massively boosts your employability.


๐Ÿงฉ 10. Diagram: ML Threat Detection Pipeline

 +-------------------+
 |    Raw Traffic    |
 +---------+---------+
           |
    Feature Extractor
           |
 +---------+---------+
 |  ML Model (RF)    |
 +---------+---------+
           |
    Threat Prediction
 (Benign | Malicious)

๐ŸŽ“ 11. Common ML Models Used in Threat Detection

ModelBest For
Random Forestfirst models, structured data
XGBoosthigh accuracy models
SVMsmall datasets
LSTMsequence-based logs
Autoencoderanomaly detection
CNNmalware byte analysis

Your first project uses Random Forest โ€” perfect for beginners.


๐Ÿ“Œ Key Takeaways

  • This project teaches the basics of ML in cybersecurity.
  • You built your first AI threat classifier using real data.
  • It works for network anomaly detection, SOC automation, and security analysis.
  • You learned feature extraction, model training, evaluation, and visualization.
  • This can be added to your cybersecurity portfolio immediately.