Building a Machine Learning Threat Detection System
Overview
In this project, I built a threat detection system using machine learning and log data.
1. Data Preprocessing
Raw log example:
Dec 15 00:12:10 server sshd[12345]: Failed password for invalid user admin from 192.168.1.2 port 22
Python preprocessing:
import pandas as pd
logs = pd.read_csv("auth.log")
logs["IP"] = logs["raw"].str.extract(r'from (\d+\.\d+\.\d+\.\d+)')
logs["Status"] = logs["raw"].str.contains("Failed").replace({True: "Failed", False: "Success"})
print(logs.head())
2. Isolation Forest Model
from sklearn.ensemble import IsolationForest
features = logs[["IP", "Timestamp"]]
features["IP"] = features["IP"].astype("category").cat.codes
model = IsolationForest(contamination=0.01)
logs["Anomaly"] = model.fit_predict(features)
print(logs[logs["Anomaly"] == -1])
Results
| Timestamp | IP | Status | Anomaly |
|---|---|---|---|
| 1702591930 | 192.168.1.2 | Failed | -1 |