sklearn-genetic-opt
sklearn-genetic-opt adds evolutionary optimization tools to the scikit-learn workflow. It can tune hyperparameters with GASearchCV and select feature subsets with GAFeatureSelectionCV using algorithms powered by DEAP.
Highlights
GASearchCVfor hyperparameter search across classification, regression, and supported outlier-detection estimators.GAFeatureSelectionCVfor wrapper-based feature selection with cross-validation.- Search spaces for integer, continuous, and categorical parameters.
- Grouped configuration objects:
EvolutionConfig,PopulationConfig,RuntimeConfig, andOptimizationConfig. - Smart initial populations with
PopulationConfig(initializer="smart"). - Adaptive mutation and crossover schedules.
- Optional local search, diversity control, random immigrants, and fitness sharing.
- Parallel candidate evaluation with
n_jobsandparallel_backend. - Evaluation caching, optimizer telemetry through
history, and fit-cost counters throughfit_stats_. - Callbacks for early stopping, progress reporting, checkpoints, TensorBoard, and custom logic.
- Plotting helpers plus MLflow 3 logging support.
Installation (dev)
Install from the master branch:
bash
pip install git+https://github.com/rodrigo-arenas/Sklearn-genetic-opt.git@masterOr install the released version:
bash
pip install sklearn-genetic-optQuick Start
python
from sklearn.datasets import load_breast_cancer
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import StratifiedKFold, train_test_split
from sklearn.metrics import roc_auc_score
from sklearn_genetic import EvolutionConfig, GASearchCV, PopulationConfig, RuntimeConfig
from sklearn_genetic.space import Categorical, Continuous, Integer
X, y = load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.25, stratify=y, random_state=42
)
param_grid = {
"n_estimators": Integer(50, 250),
"max_depth": Integer(2, 14),
"min_samples_split": Integer(2, 12),
"min_samples_leaf": Integer(1, 8),
"max_features": Categorical(["sqrt", "log2", None]),
"ccp_alpha": Continuous(0.0, 0.03),
}
search = GASearchCV(
estimator=RandomForestClassifier(random_state=42),
param_grid=param_grid,
cv=StratifiedKFold(n_splits=3, shuffle=True, random_state=42),
scoring="roc_auc",
evolution_config=EvolutionConfig(population_size=20, generations=12),
population_config=PopulationConfig(initializer="smart"),
runtime_config=RuntimeConfig(n_jobs=-1, parallel_backend="auto", use_cache=True),
)
search.fit(X_train, y_train)
print(search.best_params_)