Skip to content

sklearn-genetic-opt

sklearn-genetic-opt adds evolutionary optimization tools to the scikit-learn workflow. It can tune hyperparameters with GASearchCV and select feature subsets with GAFeatureSelectionCV using algorithms powered by DEAP.

Highlights

  • GASearchCV for hyperparameter search across classification, regression, and supported outlier-detection estimators.
  • GAFeatureSelectionCV for wrapper-based feature selection with cross-validation.
  • Search spaces for integer, continuous, and categorical parameters.
  • Grouped configuration objects: EvolutionConfig, PopulationConfig, RuntimeConfig, and OptimizationConfig.
  • Smart initial populations with PopulationConfig(initializer="smart").
  • Adaptive mutation and crossover schedules.
  • Optional local search, diversity control, random immigrants, and fitness sharing.
  • Parallel candidate evaluation with n_jobs and parallel_backend.
  • Evaluation caching, optimizer telemetry through history, and fit-cost counters through fit_stats_.
  • Callbacks for early stopping, progress reporting, checkpoints, TensorBoard, and custom logic.
  • Plotting helpers plus MLflow 3 logging support.

Installation (dev)

Install from the master branch:

bash
pip install git+https://github.com/rodrigo-arenas/Sklearn-genetic-opt.git@master

Or install the released version:

bash
pip install sklearn-genetic-opt

Quick Start

python
from sklearn.datasets import load_breast_cancer
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import StratifiedKFold, train_test_split
from sklearn.metrics import roc_auc_score

from sklearn_genetic import EvolutionConfig, GASearchCV, PopulationConfig, RuntimeConfig
from sklearn_genetic.space import Categorical, Continuous, Integer

X, y = load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.25, stratify=y, random_state=42
)

param_grid = {
    "n_estimators": Integer(50, 250),
    "max_depth": Integer(2, 14),
    "min_samples_split": Integer(2, 12),
    "min_samples_leaf": Integer(1, 8),
    "max_features": Categorical(["sqrt", "log2", None]),
    "ccp_alpha": Continuous(0.0, 0.03),
}

search = GASearchCV(
    estimator=RandomForestClassifier(random_state=42),
    param_grid=param_grid,
    cv=StratifiedKFold(n_splits=3, shuffle=True, random_state=42),
    scoring="roc_auc",
    evolution_config=EvolutionConfig(population_size=20, generations=12),
    population_config=PopulationConfig(initializer="smart"),
    runtime_config=RuntimeConfig(n_jobs=-1, parallel_backend="auto", use_cache=True),
)

search.fit(X_train, y_train)
print(search.best_params_)

Next Steps

Released under the MIT License.