Advanced Optimizer Control
When default settings produce premature convergence or poor search quality, these controls give you finer-grained command over the evolutionary process. They are optional — the defaults are conservative and work well for most searches. Add them one at a time when telemetry shows the optimizer needs them.
Prerequisites
- Completed Basic Usage
- Familiar with Understanding Cross-Validation
When to Use These Controls
Check pd.DataFrame(search.history) after a run. Add controls if you see:
| Symptom | Likely cause | Remedy |
|---|---|---|
unique_individual_ratio drops to near 0 in a few generations | Population collapsed | Enable diversity_control |
genotype_diversity is low while score still improves | Slow convergence, not stuck | Let it run or increase population_size |
stagnation_generations grows for 5+ generations | Local optimum | Enable diversity_control, fitness sharing, or local search |
| Multiple high-scoring but similar candidates | Single dominant region | Enable fitness_sharing to keep multiple niches alive |
| Final solution is good but nearby configs might be better | Under-exploited region | Enable local_search |
Configuration Objects
Advanced settings live in OptimizationConfig. All parameters are disabled (or set to low-impact defaults) unless explicitly changed.
from sklearn_genetic import OptimizationConfigFull Example: Hyperparameter Search
import pandas as pd
from sklearn.datasets import load_breast_cancer
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, roc_auc_score
from sklearn.model_selection import StratifiedKFold, train_test_split
from sklearn_genetic import (
EvolutionConfig,
GASearchCV,
OptimizationConfig,
PopulationConfig,
RuntimeConfig,
)
from sklearn_genetic.schedules import ExponentialAdapter, InverseAdapter
from sklearn_genetic.space import Categorical, Integer
X, y = load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.25, stratify=y, random_state=42
)
cv = StratifiedKFold(n_splits=4, shuffle=True, random_state=42)
param_grid = {
"n_estimators": Integer(50, 250),
"max_depth": Integer(2, 20),
"min_samples_split": Integer(2, 20),
"min_samples_leaf": Integer(1, 10),
"max_features": Categorical(["sqrt", "log2", None]),
"criterion": Categorical(["gini", "entropy", "log_loss"]),
}
crossover_schedule = InverseAdapter(
initial_value=0.8,
end_value=0.6,
adaptive_rate=0.05,
)
mutation_schedule = ExponentialAdapter(
initial_value=0.1,
end_value=0.25,
adaptive_rate=0.08,
)
search = GASearchCV(
estimator=RandomForestClassifier(random_state=42, n_jobs=1),
param_grid=param_grid,
cv=cv,
scoring="roc_auc",
evolution_config=EvolutionConfig(
population_size=24,
generations=18,
crossover_probability=crossover_schedule,
mutation_probability=mutation_schedule,
tournament_size=3,
elitism=True,
keep_top_k=4,
),
population_config=PopulationConfig(initializer="smart"),
runtime_config=RuntimeConfig(n_jobs=-1, parallel_backend="auto", verbose=True),
optimization_config=OptimizationConfig(
# Diversity control
diversity_control=True,
diversity_threshold=0.18,
diversity_stagnation_generations=4,
diversity_mutation_boost=1.8,
random_immigrants_fraction=0.15,
# Fitness sharing
fitness_sharing=True,
sharing_radius=0.25,
sharing_alpha=1.0,
# Local search
local_search=True,
local_search_top_k=2,
local_search_steps=4,
local_search_radius=0.08,
),
)
search.fit(X_train, y_train)
y_pred = search.predict(X_test)
y_proba = search.predict_proba(X_test)[:, 1]
print(search.best_params_)
print("Accuracy:", accuracy_score(y_test, y_pred))
print("ROC AUC:", roc_auc_score(y_test, y_proba))Reading Optimizer Telemetry
After fitting, convert history to a DataFrame to diagnose the search:
history = pd.DataFrame(search.history)
columns = [
"gen",
"fitness_best",
"fitness_max",
"unique_individual_ratio",
"genotype_diversity",
"stagnation_generations",
"mutation_probability",
"diversity_control_triggered",
"random_immigrants",
"duplicate_replacements",
"fitness_sharing_applied",
"mean_niche_count",
"max_niche_count",
"local_refinements",
]
print(history[columns])
print(search.fit_stats_)Key telemetry fields:
| Field | Meaning |
|---|---|
unique_individual_ratio | Fraction of distinct individuals. Low → population collapsed |
genotype_diversity | Average per-gene diversity. Low → structurally similar candidates |
stagnation_generations | Generations since fitness_best last improved |
diversity_control_triggered | Whether diversity control fired this generation |
random_immigrants | Number of random candidates injected |
duplicate_replacements | Duplicate offspring replaced before evaluation |
fitness_sharing_applied | Whether niche-aware selection was active |
mean_niche_count / max_niche_count | Crowding during selection |
local_refinements | Neighbor candidates evaluated by local search (usually non-zero only in the final row) |
Feature Selection Example
The same controls work with GAFeatureSelectionCV. In feature selection, local_search_radius controls the fraction of feature bits flipped when creating local neighbors.
from sklearn.datasets import load_breast_cancer
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import StratifiedKFold, train_test_split
from sklearn_genetic import (
EvolutionConfig,
GAFeatureSelectionCV,
OptimizationConfig,
PopulationConfig,
RuntimeConfig,
)
from sklearn_genetic.schedules import ExponentialAdapter
X, y = load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.25, stratify=y, random_state=42
)
cv = StratifiedKFold(n_splits=4, shuffle=True, random_state=42)
selector = GAFeatureSelectionCV(
estimator=RandomForestClassifier(random_state=42, n_jobs=1),
cv=cv,
scoring="roc_auc",
max_features=18,
evolution_config=EvolutionConfig(
population_size=30,
generations=16,
crossover_probability=0.8,
mutation_probability=ExponentialAdapter(
initial_value=0.1, end_value=0.25, adaptive_rate=0.08
),
keep_top_k=4,
),
population_config=PopulationConfig(initializer="smart"),
runtime_config=RuntimeConfig(n_jobs=-1, parallel_backend="auto", verbose=True),
optimization_config=OptimizationConfig(
diversity_control=True,
diversity_threshold=0.2,
diversity_stagnation_generations=4,
random_immigrants_fraction=0.15,
fitness_sharing=True,
sharing_radius=0.2,
local_search=True,
local_search_top_k=2,
local_search_steps=5,
local_search_radius=0.1,
),
)
selector.fit(X_train, y_train)
print("Selected features:", selector.best_features_)
print("Test score:", selector.score(X_test, y_test))Recommended Workflow
Start simple and add controls based on telemetry:
- Default run — use
PopulationConfig(initializer="smart"),crossover_probability=0.8,mutation_probability=0.1. Inspect history. - If diversity collapses early — enable
OptimizationConfig(diversity_control=True). - If one candidate family dominates — enable
fitness_sharing=True. - If the final region looks close but not fully refined — enable
local_search=True. - If stagnation persists — add adaptive schedules: slowly increase mutation over generations.
Tuning Guidelines
| Parameter | Practical starting range | Notes |
|---|---|---|
diversity_threshold | 0.1–0.3 | Higher triggers earlier |
diversity_mutation_boost | 1.5–2.5 | Too high = next generation is random |
random_immigrants_fraction | 0.05–0.2 | Larger helps rugged spaces; may slow convergence |
sharing_radius | 0.15–0.35 | Smaller = only penalize very similar candidates |
local_search_radius | 0.05–0.15 | Larger = behaves more like an extra mutation phase |
local_search_steps | 1–5 | Each step = extra CV calls |
Practical Notes
- Fitness sharing only changes temporary selection pressure; it does not alter
best_score_,cv_results_, or raw cross-validation scores. - Local search evaluates extra candidates after the genetic search completes, so it can improve final quality but increases total runtime.
- Always keep a final holdout set separate from the CV folds for honest model assessment.
Next Steps
- Adaptive Schedules — schedule mutation and crossover rates over generations
- Troubleshooting — diagnose convergence problems using
fit_stats_andhistory - Config Objects API — full parameter reference for
OptimizationConfig
