Skip to content

Development version

You are reading the latest (dev) docs. For the stable version, see 0.13.

Advanced Optimizer Control

When default settings produce premature convergence or poor search quality, these controls give you finer-grained command over the evolutionary process. They are optional — the defaults are conservative and work well for most searches. Add them one at a time when telemetry shows the optimizer needs them.

Prerequisites

When to Use These Controls

Check pd.DataFrame(search.history) after a run. Add controls if you see:

SymptomLikely causeRemedy
unique_individual_ratio drops to near 0 in a few generationsPopulation collapsedEnable diversity_control
genotype_diversity is low while score still improvesSlow convergence, not stuckLet it run or increase population_size
stagnation_generations grows for 5+ generationsLocal optimumEnable diversity_control, fitness sharing, or local search
Multiple high-scoring but similar candidatesSingle dominant regionEnable fitness_sharing to keep multiple niches alive
Final solution is good but nearby configs might be betterUnder-exploited regionEnable local_search

Configuration Objects

Advanced settings live in OptimizationConfig. All parameters are disabled (or set to low-impact defaults) unless explicitly changed.

python
from sklearn_genetic import OptimizationConfig
python
import pandas as pd
from sklearn.datasets import load_breast_cancer
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, roc_auc_score
from sklearn.model_selection import StratifiedKFold, train_test_split

from sklearn_genetic import (
    EvolutionConfig,
    GASearchCV,
    OptimizationConfig,
    PopulationConfig,
    RuntimeConfig,
)
from sklearn_genetic.schedules import ExponentialAdapter, InverseAdapter
from sklearn_genetic.space import Categorical, Integer

X, y = load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.25, stratify=y, random_state=42
)

cv = StratifiedKFold(n_splits=4, shuffle=True, random_state=42)

param_grid = {
    "n_estimators": Integer(50, 250),
    "max_depth": Integer(2, 20),
    "min_samples_split": Integer(2, 20),
    "min_samples_leaf": Integer(1, 10),
    "max_features": Categorical(["sqrt", "log2", None]),
    "criterion": Categorical(["gini", "entropy", "log_loss"]),
}

crossover_schedule = InverseAdapter(
    initial_value=0.8,
    end_value=0.6,
    adaptive_rate=0.05,
)
mutation_schedule = ExponentialAdapter(
    initial_value=0.1,
    end_value=0.25,
    adaptive_rate=0.08,
)

search = GASearchCV(
    estimator=RandomForestClassifier(random_state=42, n_jobs=1),
    param_grid=param_grid,
    cv=cv,
    scoring="roc_auc",
    evolution_config=EvolutionConfig(
        population_size=24,
        generations=18,
        crossover_probability=crossover_schedule,
        mutation_probability=mutation_schedule,
        tournament_size=3,
        elitism=True,
        keep_top_k=4,
    ),
    population_config=PopulationConfig(initializer="smart"),
    runtime_config=RuntimeConfig(n_jobs=-1, parallel_backend="auto", verbose=True),
    optimization_config=OptimizationConfig(
        # Diversity control
        diversity_control=True,
        diversity_threshold=0.18,
        diversity_stagnation_generations=4,
        diversity_mutation_boost=1.8,
        random_immigrants_fraction=0.15,
        # Fitness sharing
        fitness_sharing=True,
        sharing_radius=0.25,
        sharing_alpha=1.0,
        # Local search
        local_search=True,
        local_search_top_k=2,
        local_search_steps=4,
        local_search_radius=0.08,
    ),
)

search.fit(X_train, y_train)

y_pred = search.predict(X_test)
y_proba = search.predict_proba(X_test)[:, 1]

print(search.best_params_)
print("Accuracy:", accuracy_score(y_test, y_pred))
print("ROC AUC:", roc_auc_score(y_test, y_proba))

Reading Optimizer Telemetry

After fitting, convert history to a DataFrame to diagnose the search:

python
history = pd.DataFrame(search.history)

columns = [
    "gen",
    "fitness_best",
    "fitness_max",
    "unique_individual_ratio",
    "genotype_diversity",
    "stagnation_generations",
    "mutation_probability",
    "diversity_control_triggered",
    "random_immigrants",
    "duplicate_replacements",
    "fitness_sharing_applied",
    "mean_niche_count",
    "max_niche_count",
    "local_refinements",
]

print(history[columns])
print(search.fit_stats_)

Key telemetry fields:

FieldMeaning
unique_individual_ratioFraction of distinct individuals. Low → population collapsed
genotype_diversityAverage per-gene diversity. Low → structurally similar candidates
stagnation_generationsGenerations since fitness_best last improved
diversity_control_triggeredWhether diversity control fired this generation
random_immigrantsNumber of random candidates injected
duplicate_replacementsDuplicate offspring replaced before evaluation
fitness_sharing_appliedWhether niche-aware selection was active
mean_niche_count / max_niche_countCrowding during selection
local_refinementsNeighbor candidates evaluated by local search (usually non-zero only in the final row)

Feature Selection Example

The same controls work with GAFeatureSelectionCV. In feature selection, local_search_radius controls the fraction of feature bits flipped when creating local neighbors.

python
from sklearn.datasets import load_breast_cancer
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import StratifiedKFold, train_test_split

from sklearn_genetic import (
    EvolutionConfig,
    GAFeatureSelectionCV,
    OptimizationConfig,
    PopulationConfig,
    RuntimeConfig,
)
from sklearn_genetic.schedules import ExponentialAdapter

X, y = load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.25, stratify=y, random_state=42
)

cv = StratifiedKFold(n_splits=4, shuffle=True, random_state=42)

selector = GAFeatureSelectionCV(
    estimator=RandomForestClassifier(random_state=42, n_jobs=1),
    cv=cv,
    scoring="roc_auc",
    max_features=18,
    evolution_config=EvolutionConfig(
        population_size=30,
        generations=16,
        crossover_probability=0.8,
        mutation_probability=ExponentialAdapter(
            initial_value=0.1, end_value=0.25, adaptive_rate=0.08
        ),
        keep_top_k=4,
    ),
    population_config=PopulationConfig(initializer="smart"),
    runtime_config=RuntimeConfig(n_jobs=-1, parallel_backend="auto", verbose=True),
    optimization_config=OptimizationConfig(
        diversity_control=True,
        diversity_threshold=0.2,
        diversity_stagnation_generations=4,
        random_immigrants_fraction=0.15,
        fitness_sharing=True,
        sharing_radius=0.2,
        local_search=True,
        local_search_top_k=2,
        local_search_steps=5,
        local_search_radius=0.1,
    ),
)

selector.fit(X_train, y_train)

print("Selected features:", selector.best_features_)
print("Test score:", selector.score(X_test, y_test))

Start simple and add controls based on telemetry:

  1. Default run — use PopulationConfig(initializer="smart"), crossover_probability=0.8, mutation_probability=0.1. Inspect history.
  2. If diversity collapses early — enable OptimizationConfig(diversity_control=True).
  3. If one candidate family dominates — enable fitness_sharing=True.
  4. If the final region looks close but not fully refined — enable local_search=True.
  5. If stagnation persists — add adaptive schedules: slowly increase mutation over generations.

Tuning Guidelines

ParameterPractical starting rangeNotes
diversity_threshold0.1–0.3Higher triggers earlier
diversity_mutation_boost1.5–2.5Too high = next generation is random
random_immigrants_fraction0.05–0.2Larger helps rugged spaces; may slow convergence
sharing_radius0.15–0.35Smaller = only penalize very similar candidates
local_search_radius0.05–0.15Larger = behaves more like an extra mutation phase
local_search_steps1–5Each step = extra CV calls

Practical Notes

  • Fitness sharing only changes temporary selection pressure; it does not alter best_score_, cv_results_, or raw cross-validation scores.
  • Local search evaluates extra candidates after the genetic search completes, so it can improve final quality but increases total runtime.
  • Always keep a final holdout set separate from the CV folds for honest model assessment.

Next Steps

Released under the MIT License.