Skip to content

GAFeatureSelectionCV

Genetic algorithm wrapper-based feature selection with cross-validation.

GAFeatureSelectionCV selects a subset of columns from the input data that maximises the cross-validation score of the given estimator. After fitting, it exposes support_ (a boolean mask of selected features) and behaves as a fitted scikit-learn estimator for prediction.

Class Signature

python
from sklearn_genetic import GAFeatureSelectionCV

GAFeatureSelectionCV(
    estimator,
    *,
    cv=5,
    scoring=None,
    refit=True,
    verbose=0,
    keep_top_k=1,
    elitism=True,
    error_score=np.nan,
    return_train_score=False,
    evolution_config=None,
    population_config=None,
    runtime_config=None,
    optimization_config=None,
)

Parameters

ParameterTypeDefaultDescription
estimatorestimatorA scikit-learn estimator with a fit method
cvint or CV splitter5Cross-validation strategy
scoringstrNoneMetric to evaluate. None uses the estimator's default scorer
refitboolTrueRefit the best estimator on the full training data after search
verboseint0Verbosity level
keep_top_kint1Number of hall-of-fame individuals to keep across generations
elitismboolTrueWhether to carry over the best individual to the next generation
error_scorefloat or "raise"np.nanScore to assign when a candidate raises an exception
evolution_configEvolutionConfigNonePopulation size, generations, crossover/mutation rates
population_configPopulationConfigNoneInitialization strategy and diversity settings
runtime_configRuntimeConfigNoneParallelism, caching, verbosity
optimization_configOptimizationConfigNoneLocal search, fitness sharing

Attributes After fit

AttributeDescription
support_Boolean mask — True for selected features
best_score_Mean CV score of the best feature subset
best_estimator_Estimator fitted with the selected features on the full training data
cv_results_Dict with per-candidate results
historyList of per-generation dicts
logbookDEAP logbook
fit_stats_Evaluation counters
n_features_Number of selected features

Methods

MethodDescription
fit(X, y, callbacks=None)Run the genetic feature selection
transform(X)Return X with only the selected features
predict(X)Predict using best_estimator_ on selected features
predict_proba(X)Predict class probabilities
score(X, y)Score using best_estimator_ on selected features

Example

python
import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC

from sklearn_genetic import EvolutionConfig, GAFeatureSelectionCV, PopulationConfig, RuntimeConfig

X, y = load_iris(return_X_y=True)
noise = np.random.uniform(0, 10, size=(X.shape[0], 10))
X = np.hstack((X, noise))

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=0)

selector = GAFeatureSelectionCV(
    estimator=SVC(gamma="auto"),
    cv=3,
    scoring="accuracy",
    evolution_config=EvolutionConfig(population_size=30, generations=20, keep_top_k=2, elitism=True),
    population_config=PopulationConfig(initializer="smart"),
    runtime_config=RuntimeConfig(n_jobs=-1),
)

selector.fit(X_train, y_train)

print("Selected features:", selector.support_)
print("CV accuracy:", round(selector.best_score_, 4))
print("Test accuracy:", round(selector.score(X_test, y_test), 4))

See Also

Released under the MIT License.