Multi-Metric Search on Iris
This example shows how to run GASearchCV with multiple scorers and inspect per-metric results after fitting.
Setup
import warnings
from pprint import pprint
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, balanced_accuracy_score, f1_score, make_scorer
from sklearn.model_selection import StratifiedKFold, train_test_split
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn_genetic import (
EvolutionConfig, GASearchCV, OptimizationConfig, PopulationConfig, RuntimeConfig,
)
from sklearn_genetic.callbacks import ConsecutiveStopping, DeltaThreshold, TimerStopping
from sklearn_genetic.schedules import ExponentialAdapter, InverseAdapter
from sklearn_genetic.space import Categorical, Continuous, Integer
warnings.filterwarnings("ignore", category=UserWarning)
RANDOM_STATE = 42
iris = load_iris(as_frame=True)
X, y = iris.data, iris.target
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.30, stratify=y, random_state=RANDOM_STATE
)
cv = StratifiedKFold(n_splits=3, shuffle=True, random_state=RANDOM_STATE)Define Multiple Metrics
A multi-metric search receives a dictionary of scorers. The refit parameter decides which metric is used to choose best_params_ and refit best_estimator_.
scoring = {
"accuracy": "accuracy",
"balanced_accuracy": make_scorer(balanced_accuracy_score),
"f1_macro": make_scorer(f1_score, average="macro"),
}Setting refit="balanced_accuracy" selects the final model by class-balanced behavior.
Configure GASearchCV
model = Pipeline([
("scaler", StandardScaler()),
("logistic", LogisticRegression(solver="saga", max_iter=1200, random_state=RANDOM_STATE)),
])
param_grid = {
"logistic__C": Continuous(1e-3, 30.0, distribution="log-uniform"),
"logistic__l1_ratio": Continuous(0.0, 1.0),
"logistic__class_weight": Categorical([None, "balanced"]),
"logistic__max_iter": Integer(1000, 1500),
}
search = GASearchCV(
estimator=model,
param_grid=param_grid,
scoring=scoring,
refit="balanced_accuracy", # drives best_params_ and best_estimator_
cv=cv,
evolution_config=EvolutionConfig(
population_size=12,
generations=10,
crossover_probability=ExponentialAdapter(initial_value=0.8, end_value=0.4, adaptive_rate=0.15),
mutation_probability=InverseAdapter(initial_value=0.25, end_value=0.08, adaptive_rate=0.25),
tournament_size=3,
elitism=True,
keep_top_k=3,
),
population_config=PopulationConfig(
initializer="smart",
warm_start_configs=[{
"logistic__C": 1.0,
"logistic__l1_ratio": 0.0,
"logistic__class_weight": None,
"logistic__max_iter": 1200,
}],
),
runtime_config=RuntimeConfig(n_jobs=-1, parallel_backend="auto", use_cache=True, verbose=True),
optimization_config=OptimizationConfig(
local_search=True,
local_search_top_k=2,
local_search_steps=1,
local_search_radius=0.20,
diversity_control=True,
diversity_threshold=0.30,
diversity_stagnation_generations=3,
diversity_mutation_boost=1.8,
random_immigrants_fraction=0.10,
fitness_sharing=True,
sharing_radius=0.40,
),
)
callbacks = [
DeltaThreshold(threshold=0.001, generations=5, metric="fitness_best"),
ConsecutiveStopping(generations=7, metric="fitness_best"),
TimerStopping(total_seconds=90),
]
search.fit(X_train, y_train, callbacks=callbacks)Best Parameters and Test Metrics
Because refit="balanced_accuracy", best_params_ and best_estimator_ are selected by the CV rank of that metric.
print("Refit metric:", search.refit_metric)
print("Best balanced-accuracy CV score:", round(search.best_score_, 4))
pprint(search.best_params_)
predictions = search.predict(X_test)
test_metrics = {
"accuracy": accuracy_score(y_test, predictions),
"balanced_accuracy": balanced_accuracy_score(y_test, predictions),
"f1_macro": f1_score(y_test, predictions, average="macro"),
}
print(test_metrics)Explore Multi-Metric cv_results_
For multi-metric searches, cv_results_ contains one set of columns per metric.
results = pd.DataFrame(search.cv_results_)
metric_columns = [
"mean_test_accuracy", "rank_test_accuracy",
"mean_test_balanced_accuracy", "rank_test_balanced_accuracy",
"mean_test_f1_macro", "rank_test_f1_macro",
]
param_columns = [c for c in results.columns if c.startswith("param_")]
results[metric_columns + param_columns].sort_values("rank_test_balanced_accuracy").head()Comparing Metric Rankings
The same cv_results_ can point to different candidate rankings. This example shows the best row for each metric without rerunning the search.
best_rows = []
for metric_name in ["accuracy", "balanced_accuracy", "f1_macro"]:
row = results.sort_values(f"rank_test_{metric_name}").iloc[0]
best_rows.append({
"metric": metric_name,
"mean_test_score": row[f"mean_test_{metric_name}"],
"C": row["param_logistic__C"],
"l1_ratio": row["param_logistic__l1_ratio"],
"class_weight": row["param_logistic__class_weight"],
})
pd.DataFrame(best_rows)Optimizer Telemetry
With multi-metric scoring, the GA optimizes a single scalar fitness — the selected refit metric. Telemetry explains how the optimizer moved through the space while optimizing that metric.
print(search.fit_stats_)
history = pd.DataFrame(search.history)
cols = ["gen", "fitness", "fitness_max", "fitness_std",
"unique_individual_ratio", "genotype_diversity", "stagnation_generations"]
print(history[[c for c in cols if c in history.columns]].tail())Practical Notes
- Set
refitto the metric that should define the final model before fitting. best_score_,best_params_, andbest_estimator_follow therefitmetric, not every metric at once.- Use
cv_results_to inspect tradeoffs between metrics after fitting. - Use
fit_stats_andhistoryto understand optimizer cost, diversity, stagnation, and convergence.
See Also
- Multi-Metric Optimization Guide — full guide with scoring dict details
- GASearchCV API —
scoringandrefitparameter reference - Understanding Cross-Validation — reading the generation log
