Skip to content

Advanced Optimizer Control

When default settings produce premature convergence or poor search quality, these controls give you finer-grained command over the evolutionary process. They are optional — the defaults are conservative and work well for most searches. Add them one at a time when telemetry shows the optimizer needs them.

Prerequisites

When to Use These Controls

Check pd.DataFrame(search.history) after a run. Add controls if you see:

SymptomLikely causeRemedy
unique_individual_ratio drops to near 0 in a few generationsPopulation collapsedEnable diversity_control
genotype_diversity is low while score still improvesSlow convergence, not stuckLet it run or increase population_size
stagnation_generations grows for 5+ generationsLocal optimumEnable diversity_control, fitness sharing, or local search
Multiple high-scoring but similar candidatesSingle dominant regionEnable fitness_sharing to keep multiple niches alive
Final solution is good but nearby configs might be betterUnder-exploited regionEnable local_search

Configuration Objects

Advanced settings live in OptimizationConfig. All parameters are disabled (or set to low-impact defaults) unless explicitly changed.

python
from sklearn_genetic import OptimizationConfig
python
import pandas as pd
from sklearn.datasets import load_breast_cancer
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, roc_auc_score
from sklearn.model_selection import StratifiedKFold, train_test_split

from sklearn_genetic import (
    EvolutionConfig,
    GASearchCV,
    OptimizationConfig,
    PopulationConfig,
    RuntimeConfig,
)
from sklearn_genetic.schedules import ExponentialAdapter, InverseAdapter
from sklearn_genetic.space import Categorical, Integer

X, y = load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.25, stratify=y, random_state=42
)

cv = StratifiedKFold(n_splits=4, shuffle=True, random_state=42)

param_grid = {
    "n_estimators": Integer(50, 250),
    "max_depth": Integer(2, 20),
    "min_samples_split": Integer(2, 20),
    "min_samples_leaf": Integer(1, 10),
    "max_features": Categorical(["sqrt", "log2", None]),
    "criterion": Categorical(["gini", "entropy", "log_loss"]),
}

crossover_schedule = InverseAdapter(
    initial_value=0.8,
    end_value=0.6,
    adaptive_rate=0.05,
)
mutation_schedule = ExponentialAdapter(
    initial_value=0.1,
    end_value=0.25,
    adaptive_rate=0.08,
)

search = GASearchCV(
    estimator=RandomForestClassifier(random_state=42, n_jobs=1),
    param_grid=param_grid,
    cv=cv,
    scoring="roc_auc",
    evolution_config=EvolutionConfig(
        population_size=24,
        generations=18,
        crossover_probability=crossover_schedule,
        mutation_probability=mutation_schedule,
        tournament_size=3,
        elitism=True,
        keep_top_k=4,
    ),
    population_config=PopulationConfig(initializer="smart"),
    runtime_config=RuntimeConfig(n_jobs=-1, parallel_backend="auto", verbose=True),
    optimization_config=OptimizationConfig(
        # Diversity control
        diversity_control=True,
        diversity_threshold=0.18,
        diversity_stagnation_generations=4,
        diversity_mutation_boost=1.8,
        random_immigrants_fraction=0.15,
        # Fitness sharing
        fitness_sharing=True,
        sharing_radius=0.25,
        sharing_alpha=1.0,
        # Local search
        local_search=True,
        local_search_top_k=2,
        local_search_steps=4,
        local_search_radius=0.08,
    ),
)

search.fit(X_train, y_train)

y_pred = search.predict(X_test)
y_proba = search.predict_proba(X_test)[:, 1]

print(search.best_params_)
print("Accuracy:", accuracy_score(y_test, y_pred))
print("ROC AUC:", roc_auc_score(y_test, y_proba))

Reading Optimizer Telemetry

After fitting, convert history to a DataFrame to diagnose the search:

python
history = pd.DataFrame(search.history)

columns = [
    "gen",
    "fitness_best",
    "fitness_max",
    "unique_individual_ratio",
    "genotype_diversity",
    "stagnation_generations",
    "mutation_probability",
    "diversity_control_triggered",
    "random_immigrants",
    "duplicate_replacements",
    "fitness_sharing_applied",
    "mean_niche_count",
    "max_niche_count",
    "local_refinements",
]

print(history[columns])
print(search.fit_stats_)

Key telemetry fields:

FieldMeaning
unique_individual_ratioFraction of distinct individuals. Low → population collapsed
genotype_diversityAverage per-gene diversity. Low → structurally similar candidates
stagnation_generationsGenerations since fitness_best last improved
diversity_control_triggeredWhether diversity control fired this generation
random_immigrantsNumber of random candidates injected
duplicate_replacementsDuplicate offspring replaced before evaluation
fitness_sharing_appliedWhether niche-aware selection was active
mean_niche_count / max_niche_countCrowding during selection
local_refinementsNeighbor candidates evaluated by local search (usually non-zero only in the final row)

Feature Selection Example

The same controls work with GAFeatureSelectionCV. In feature selection, local_search_radius controls the fraction of feature bits flipped when creating local neighbors.

python
from sklearn.datasets import load_breast_cancer
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import StratifiedKFold, train_test_split

from sklearn_genetic import (
    EvolutionConfig,
    GAFeatureSelectionCV,
    OptimizationConfig,
    PopulationConfig,
    RuntimeConfig,
)
from sklearn_genetic.schedules import ExponentialAdapter

X, y = load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.25, stratify=y, random_state=42
)

cv = StratifiedKFold(n_splits=4, shuffle=True, random_state=42)

selector = GAFeatureSelectionCV(
    estimator=RandomForestClassifier(random_state=42, n_jobs=1),
    cv=cv,
    scoring="roc_auc",
    max_features=18,
    evolution_config=EvolutionConfig(
        population_size=30,
        generations=16,
        crossover_probability=0.8,
        mutation_probability=ExponentialAdapter(
            initial_value=0.1, end_value=0.25, adaptive_rate=0.08
        ),
        keep_top_k=4,
    ),
    population_config=PopulationConfig(initializer="smart"),
    runtime_config=RuntimeConfig(n_jobs=-1, parallel_backend="auto", verbose=True),
    optimization_config=OptimizationConfig(
        diversity_control=True,
        diversity_threshold=0.2,
        diversity_stagnation_generations=4,
        random_immigrants_fraction=0.15,
        fitness_sharing=True,
        sharing_radius=0.2,
        local_search=True,
        local_search_top_k=2,
        local_search_steps=5,
        local_search_radius=0.1,
    ),
)

selector.fit(X_train, y_train)

print("Selected features:", selector.best_features_)
print("Test score:", selector.score(X_test, y_test))

Start simple and add controls based on telemetry:

  1. Default run — use PopulationConfig(initializer="smart"), crossover_probability=0.8, mutation_probability=0.1. Inspect history.
  2. If diversity collapses early — enable OptimizationConfig(diversity_control=True).
  3. If one candidate family dominates — enable fitness_sharing=True.
  4. If the final region looks close but not fully refined — enable local_search=True.
  5. If stagnation persists — add adaptive schedules: slowly increase mutation over generations.

Tuning Guidelines

ParameterPractical starting rangeNotes
diversity_threshold0.1–0.3Higher triggers earlier
diversity_mutation_boost1.5–2.5Too high = next generation is random
random_immigrants_fraction0.05–0.2Larger helps rugged spaces; may slow convergence
sharing_radius0.15–0.35Smaller = only penalize very similar candidates
local_search_radius0.05–0.15Larger = behaves more like an extra mutation phase
local_search_steps1–5Each step = extra CV calls

Practical Notes

  • Fitness sharing only changes temporary selection pressure; it does not alter best_score_, cv_results_, or raw cross-validation scores.
  • Local search evaluates extra candidates after the genetic search completes, so it can improve final quality but increases total runtime.
  • Always keep a final holdout set separate from the CV folds for honest model assessment.

Next Steps

Released under the MIT License.