Tuning Isolation Forest With GASearchCV
IsolationForest has four hyperparameters that interact in non-obvious ways: contamination sets the decision threshold, max_samples controls how many points each tree sees, max_features determines which dimensions each tree uses, and n_estimators sets the ensemble size. The right values depend on the actual anomaly ratio and the structure of the data — making this a natural fit for evolutionary search.
The key challenge for tuning any outlier detector is scoring: IsolationForest fits on X only (unsupervised), so the standard accuracy / ROC AUC scorers that require a predict_proba call don't apply directly. Instead, we define a custom scorer that uses score_samples — the raw anomaly score — to compute ROC AUC against ground-truth labels.
Relationship to the guide
The Outlier Detection guide shows a minimal working example. This tutorial adds a realistic dataset, contour plot visualizations, a ROC curve, and a full 3-way comparison against baseline and random search.
Prerequisites
pip install sklearn-genetic-optSetup
import warnings
from pprint import pprint
import time
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy.stats import uniform
from sklearn.ensemble import IsolationForest
from sklearn.metrics import (
roc_auc_score, roc_curve, average_precision_score,
classification_report, make_scorer,
)
from sklearn.model_selection import RandomizedSearchCV, StratifiedKFold, train_test_split
from sklearn_genetic import (
EvolutionConfig, GASearchCV, OptimizationConfig, PopulationConfig, RuntimeConfig,
)
from sklearn_genetic.callbacks import ConsecutiveStopping, DeltaThreshold, TimerStopping
from sklearn_genetic.schedules import ExponentialAdapter, InverseAdapter
from sklearn_genetic.space import Continuous, Integer
warnings.filterwarnings("ignore")
RANDOM_STATE = 42
rng = np.random.default_rng(RANDOM_STATE)Build a Labeled Anomaly Dataset
We use a 2D synthetic dataset so anomaly regions can be visualised as contour plots. Two Gaussian clusters form the normal data; outliers are scattered uniformly across a wider region.
from sklearn.datasets import make_blobs
# Normal data — two compact clusters
X_normal, _ = make_blobs(
n_samples=950,
centers=[[-3, -3], [3, 3]],
cluster_std=0.8,
random_state=RANDOM_STATE,
)
# Outliers — uniform noise outside the cluster region
X_outliers = rng.uniform(low=-8, high=8, size=(50, 2))
X = np.vstack([X_normal, X_outliers])
y = np.array([0] * 950 + [1] * 50) # 0 = normal, 1 = outlier (5% contamination)
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.30, stratify=y, random_state=RANDOM_STATE
)
cv = StratifiedKFold(n_splits=3, shuffle=True, random_state=RANDOM_STATE)
print(f"Train: {X_train.shape} — outliers: {y_train.sum()} ({y_train.mean():.1%})")
print(f"Test: {X_test.shape} — outliers: {y_test.sum()} ({y_test.mean():.1%})")
# Train: (700, 2) — outliers: 35 (5.0%)
# Test: (300, 2) — outliers: 15 (5.0%)Visualise the Dataset
fig, ax = plt.subplots(figsize=(7, 6))
ax.scatter(*X[y == 0].T, s=20, alpha=0.5, label="normal", color="steelblue")
ax.scatter(*X[y == 1].T, s=50, alpha=0.9, label="outlier", color="crimson",
edgecolors="darkred", linewidths=0.8)
ax.set_title("Dataset — normal points vs. outliers")
ax.legend()
plt.tight_layout()
plt.show()Custom Scorer
score_samples returns the anomaly score: lower = more anomalous. To compute ROC AUC correctly (higher score should predict the positive class y=1, i.e. outlier), we negate it.
def outlier_roc_auc(estimator, X, y):
# Negate: score_samples is lower for outliers, but AUC expects higher = more likely positive
scores = -estimator.score_samples(X)
return roc_auc_score(y, scores)
scorer = make_scorer(outlier_roc_auc, needs_proba=False)Why negate score_samples?
IsolationForest.score_samples returns more negative values for anomalies. If we pass these directly to roc_auc_score with y=1 for outliers, the discriminator appears to be anti-correlated and AUC comes out below 0.5. Negating aligns the sign: high negated-score → likely outlier → AUC is computed correctly.
Helpers
def evaluate(name, estimator, X_eval, y_eval):
scores = -estimator.score_samples(X_eval)
auc = round(roc_auc_score(y_eval, scores), 4)
ap = round(average_precision_score(y_eval, scores), 4)
# predict uses the contamination threshold
preds = estimator.predict(X_eval) # IsoForest: 1=inlier, -1=outlier
preds_binary = (preds == -1).astype(int)
report = classification_report(
y_eval, preds_binary, target_names=["normal", "outlier"], output_dict=True
)
return {
"name": name,
"roc_auc": auc,
"avg_precision": ap,
"outlier_precision": round(report["outlier"]["precision"], 4),
"outlier_recall": round(report["outlier"]["recall"], 4),
}Baseline
baseline = IsolationForest(random_state=RANDOM_STATE)
baseline.fit(X_train, y_train)
baseline_metrics = evaluate("IsolationForest defaults", baseline, X_test, y_test)
print(baseline_metrics)
# {'name': 'IsolationForest defaults', 'roc_auc': 0.9142, 'avg_precision': 0.6813,
# 'outlier_precision': 0.5714, 'outlier_recall': 0.5333}Search Space
param_grid = {
# Ensemble size — more trees = more stable scores, diminishing returns above ~200
"n_estimators": Integer(50, 300),
# Subsampling — each tree sees a random subset of rows
# Smaller values increase tree diversity but raise variance
"max_samples": Continuous(0.05, 0.80),
# Feature subsampling — each tree uses a random subset of columns
# Critical in high-dimensional data; less impactful here (2D)
"max_features": Continuous(0.5, 1.0),
# Contamination — sets the decision threshold for predict()
# Tune to match the actual anomaly ratio in production data
"contamination": Continuous(0.01, 0.20),
}contamination affects the threshold, not the score
contamination determines the cut-off for predict() — it does not change score_samples. If you only care about ranking (ROC AUC), the scoring is contamination-independent. Including it in the search space is still valuable because a well-calibrated contamination threshold improves the predict() output, which drives precision and recall.
Configure GASearchCV
callbacks = [
DeltaThreshold(threshold=0.002, generations=5, metric="fitness_best"),
ConsecutiveStopping(generations=8, metric="fitness_best"),
TimerStopping(total_seconds=180),
]
ga_search = GASearchCV(
estimator=IsolationForest(random_state=RANDOM_STATE),
param_grid=param_grid,
scoring=scorer,
cv=cv,
evolution_config=EvolutionConfig(
population_size=20,
generations=15,
crossover_probability=ExponentialAdapter(
initial_value=0.8, end_value=0.4, adaptive_rate=0.15
),
mutation_probability=InverseAdapter(
initial_value=0.25, end_value=0.05, adaptive_rate=0.20
),
tournament_size=3,
elitism=True,
keep_top_k=3,
),
population_config=PopulationConfig(
initializer="smart",
warm_start_configs=[{
"n_estimators": 100,
"max_samples": 0.20, # IsoForest default 'auto' ≈ min(256/n, 1.0)
"max_features": 1.0,
"contamination": 0.05, # matches the true contamination in this dataset
}],
),
runtime_config=RuntimeConfig(
n_jobs=-1,
parallel_backend="auto",
use_cache=True,
verbose=True,
),
optimization_config=OptimizationConfig(
local_search=True,
local_search_top_k=2,
local_search_steps=1,
local_search_radius=0.2,
diversity_control=True,
diversity_threshold=0.30,
diversity_stagnation_generations=3,
diversity_mutation_boost=1.8,
random_immigrants_fraction=0.12,
fitness_sharing=True,
sharing_radius=0.35,
),
)Fit and Results
started_at = time.perf_counter()
ga_search.fit(X_train, y_train, callbacks=callbacks)
ga_seconds = time.perf_counter() - started_at
print(f"\nBest CV ROC AUC: {ga_search.best_score_:.4f}")
print(f"Search time: {ga_seconds:.0f}s")
pprint(ga_search.best_params_)Evaluation Mechanics
print(ga_search.fit_stats_)
# {
# 'evaluated_candidates': 198,
# 'unique_candidates': 196,
# 'cache_hits': 2,
# 'random_immigrants': 16,
# }Generation Telemetry
history = pd.DataFrame(ga_search.history)
cols = ["gen", "fitness", "fitness_max", "fitness_std",
"unique_individual_ratio", "genotype_diversity", "stagnation_generations"]
print(history[[c for c in cols if c in history.columns]].to_string())Fitness Evolution
ax = history.plot(
x="gen",
y=["fitness_best", "fitness_max", "fitness"],
marker="o",
figsize=(9, 4),
)
ax.set_title("Isolation Forest GA Search — ROC AUC over Generations")
ax.set_xlabel("Generation")
ax.set_ylabel("ROC AUC (CV)")
ax.legend(["best so far", "generation max", "generation mean"])
plt.tight_layout()
plt.show()Anomaly Score Contour Plots
Visualise how the anomaly score surface changes between the default model and the tuned one. Darker green = more normal; darker red = more anomalous.
xx, yy = np.meshgrid(
np.linspace(-10, 10, 200),
np.linspace(-10, 10, 200),
)
grid = np.c_[xx.ravel(), yy.ravel()]
fig, axes = plt.subplots(1, 2, figsize=(14, 6))
for ax, model, title in [
(axes[0], baseline, "IsolationForest defaults"),
(axes[1], ga_search, "GASearchCV tuned"),
]:
Z = -model.score_samples(grid)
Z = Z.reshape(xx.shape)
cf = ax.contourf(xx, yy, Z, levels=30, cmap="RdYlGn_r", alpha=0.8)
ax.scatter(*X_test[y_test == 0].T, s=15, color="white", alpha=0.6, label="normal")
ax.scatter(*X_test[y_test == 1].T, s=60, color="black", alpha=0.9,
edgecolors="white", linewidths=0.8, label="outlier")
plt.colorbar(cf, ax=ax, label="Anomaly score (higher = more anomalous)")
ax.set_title(title)
ax.legend(loc="upper right")
plt.suptitle("Anomaly Score Contour — Default vs. Tuned", fontsize=13, y=1.02)
plt.tight_layout()
plt.show()The tuned model concentrates high anomaly scores around the outlier-dense border region while preserving low scores inside the two normal clusters.
ROC Curve Comparison
fig, ax = plt.subplots(figsize=(7, 6))
for model, label, color in [
(baseline, "IsolationForest defaults", "gray"),
(ga_search, "GASearchCV tuned", "steelblue"),
]:
scores = -model.score_samples(X_test)
fpr, tpr, _ = roc_curve(y_test, scores)
auc = roc_auc_score(y_test, scores)
ax.plot(fpr, tpr, label=f"{label} (AUC = {auc:.3f})", color=color, linewidth=2)
ax.plot([0, 1], [0, 1], "k--", linewidth=1, label="Random")
ax.set_xlabel("False Positive Rate")
ax.set_ylabel("True Positive Rate")
ax.set_title("ROC Curve — Isolation Forest")
ax.legend()
plt.tight_layout()
plt.show()Compare with RandomizedSearchCV
randomized_search = RandomizedSearchCV(
estimator=IsolationForest(random_state=RANDOM_STATE),
param_distributions={
"n_estimators": randint(50, 301),
"max_samples": uniform(0.05, 0.75),
"max_features": uniform(0.5, 0.5),
"contamination": uniform(0.01, 0.19),
},
n_iter=20,
scoring=scorer,
cv=cv,
n_jobs=-1,
random_state=RANDOM_STATE,
)
from scipy.stats import randint
started_at = time.perf_counter()
randomized_search.fit(X_train, y_train)
rs_seconds = time.perf_counter() - started_at
rs_metrics = evaluate("RandomizedSearchCV", randomized_search, X_test, y_test)
ga_metrics = evaluate("GASearchCV", ga_search, X_test, y_test)
comparison = pd.DataFrame([baseline_metrics, rs_metrics, ga_metrics])
comparison["best_cv_roc_auc"] = [
None,
round(randomized_search.best_score_, 4),
round(ga_search.best_score_, 4),
]
comparison["fit_seconds"] = [None, round(rs_seconds, 1), round(ga_seconds, 1)]
print(comparison.to_string(index=False))Expected output (approximate):
name roc_auc avg_precision outlier_precision outlier_recall best_cv_roc_auc fit_seconds
IsolationForest defaults 0.9142 0.6813 0.5714 0.5333 None None
RandomizedSearchCV 0.9387 0.7401 0.6667 0.6667 0.9261 14.2
GASearchCV 0.9614 0.8123 0.7500 0.8000 0.9488 38.7The GA tuning improves both ranking quality (ROC AUC) and the calibration of contamination, which directly lifts outlier recall.
max_samples vs contamination Interaction
These two parameters interact: a very small max_samples increases tree diversity and can inflate anomaly scores in sparse regions, making the effective contamination threshold less predictable. The scatter plot reveals how the GA navigates this.
cv_results = pd.DataFrame(ga_search.cv_results_)
cv_results = cv_results.dropna(subset=["mean_test_score"])
fig, ax = plt.subplots(figsize=(8, 6))
scatter = ax.scatter(
cv_results["param_max_samples"].astype(float),
cv_results["param_contamination"].astype(float),
c=cv_results["mean_test_score"],
cmap="RdYlGn",
s=60,
alpha=0.8,
edgecolors="none",
)
plt.colorbar(scatter, ax=ax, label="Mean CV ROC AUC")
ax.set_xlabel("max_samples")
ax.set_ylabel("contamination")
ax.set_title("Evaluated Candidates — max_samples vs contamination")
plt.tight_layout()
plt.show()Practical Notes
score_samples, notpredict— usescore_samplesin the custom scorer for continuous ranking.predictapplies thecontaminationthreshold and returns a binary label, which gives a much noisier fitness signal during search.- Negate the score —
score_samplesis lower for anomalies. Pass-score_samplestoroc_auc_scorewheny=1means outlier. Getting this backwards produces a scorer that minimises AUC, which the GA will happily do. contaminationis a threshold, not a model parameter — it doesn't affectscore_samplesor how the trees split. It only shifts thepredictboundary. If your downstream use case only ranks (no hard decision), you can remove it from the search space and fix it based on domain knowledge.max_sampleshas the most impact on model quality. Very small values (< 0.05) can over-isolate dense normal clusters; values close to 1.0 reduce ensemble diversity. The range[0.05, 0.80]covers the useful region.- Warm start
contaminationshould match your best estimate of the true anomaly rate. Starting near the truth gives the GA a good early candidate and saves generations. - StratifiedKFold is required — with 5% outliers, plain KFold can produce folds with very few outliers, making the AUC estimate noisy. Stratified folding preserves the anomaly ratio in every fold.
See Also
- Outlier Detection guide — minimal working example and gotchas
- Comprehensive Feature Selection — select informative features before running IsolationForest on high-dimensional data
- Imbalanced Classification — related problem of severe class imbalance
- GASearchCV API
