.. _problems: Problems ======== All evaluation functions provided by the benchmark suite are required to be minimized. If the objective value of the original underlying problem is to be maximized, the evaluation function provided has its value multiplied by :math:`-1`. .. note:: Evaluation runtimes are measured on a **Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz** CPU, using random search. .. jupyter-execute:: :hide-code: import os import pandas as pd import numpy as np import matplotlib.patheffects as pe import matplotlib.pyplot as plt from scipy.stats import norm, gamma, chi, chi2, beta, expon is_doc = os.getcwd().endswith('doc') datadir = "./source/data/" if is_doc else "./data" def plot_hist(csv, field, dist=None, filter=None, scale=1): data = pd.read_csv(os.path.join(datadir, csv))[field] data_n = len(data) data_min = np.min(data) data_mean = np.mean(data) data_std = np.std(data) data_max = np.max(data) bins = 20 do_plot_density = dist is not None fig = plt.figure() ax = fig.add_subplot() # Label axes if do_plot_density: ax.set_ylabel("Density") else: ax.set_ylabel("Count") if 'time' in field: ax.set_xlabel("Time (s)") # Add histogram ax.hist(data, bins=bins, density=do_plot_density) # Overlay distribution if dist is not None: x = np.linspace(data_min, data_max, 200) if filter is None: data_fil = data else: data_fil = data[filter(data)] dist_f = dist(*dist.fit(data_fil * scale)) data_f_n = len(data_fil) y = dist_f.pdf(x * scale) ax.plot(x, y) txt = ax.text(0.01, 0.99, f"$\\mu: {data_mean:.2f}$\n$\sigma: {data_std:.2f}$\n$c: {data_n}$", horizontalalignment='left', verticalalignment='top', transform=ax.transAxes) txt.set_path_effects([pe.Stroke(linewidth=3, foreground='white'), pe.Normal()]) Windmill Wake Simulator ----------------------- :bibtex: ``floris2020`` :repository: `GitHub `__ :problem-key: ``windwake`` :parameters: --file The path to the windpark/windmill specification. We recommend using `example_input.json `__ from the FLORIS repository. (required) -n The number of windmills to be placed. (default: 3) -w The width of the area in which the windmills are to be placed. (default: 333.33 * ``-n``) -h The height of the area in which the windmills are to be placed. (default: 333.33 * ``-n``) --wind-seed The random seed used for generating the distribution and strength of the wind. (default: 0) --n-samples The number of random wind strength samples to evaluate. More is less noisy but takes more time. Passing the string ``None`` will use a fixed set of wind strengths (previous behaviour, fast, no noise) (default: 5) :dimensionality: :math:`2n`, all continuous (``cont``) :constraints: Windmills are not allowed to be located within a factor of two of each others' radius, this constraint has been incorporated into the objective function. Violations will result in an objective value of :math:`0.0`. :description: The layout of the windmills in a wind farm has noticeable impact on the amount of energy it produces. This benchmark problem employs the `FLORIS `__ wake simulator to analyse how much power production is lost by having windmills be located in each others wake. The objective is to maximize power production. :runtime: **At ``-n`` = 3:** .. jupyter-execute:: :hide-code: plot_hist("windwake_rs.csv.xz", 'iter_eval_time', dist=norm) **At ``-n`` = 5:** .. jupyter-execute:: :hide-code: plot_hist("windwake_rs_5.csv.xz", 'iter_eval_time', dist=norm) :fitness: **At ``-n`` = 3:** .. jupyter-execute:: :hide-code: plot_hist("windwake_rs.csv.xz", 'iter_fitness') **At ``-n`` = 5:** .. jupyter-execute:: :hide-code: plot_hist("windwake_rs_5.csv.xz", 'iter_fitness') Electrostatic Precipitator* --------------------------- :publications: (:cite:`daniels2018suite`) :bibtex: (``daniels2018suite``) :repository: `BitBucket `__ :problem-key: ``esp`` :parameters: None :dimensionality: :math:`49` - all categorical (``cat``) :runtime: .. jupyter-execute:: :hide-code: plot_hist("esp_rs.csv.xz", 'iter_eval_time', dist=norm) :fitness: .. jupyter-execute:: :hide-code: plot_hist("esp_rs.csv.xz", 'iter_fitness') :description: An Electrostatic Precipitator is a large gas filtering installation, whose efficiency and efficiacy is dependent on how well the intake gas is distributed. This installation has slots -- named baffles -- which can be of various types, each having a different impact on the distribution. This benchmark problem employs the OpenFOAM Computational Fluid Dynamics simulator, implemented as part of the `CFD Test Problem Suite `__ by Daniels et al. . The goal is to find a configuration that has the best resulting distribution. PitzDaily --------- :publications: :cite:`daniels2018suite` :bibtex: ``daniels2018suite`` :repository: `BitBucket `__ :problem-key: ``pitzdaily`` :parameters: None :dimensionality: :math:`10` - all continuous (``cont``) :runtime: .. jupyter-execute:: :hide-code: plot_hist("pitzdaily_rs.csv.xz", 'iter_eval_time', dist=norm) :fitness: .. jupyter-execute:: :hide-code: plot_hist("pitzdaily_rs.csv.xz", 'iter_fitness') :constraints: Points must lie in a polygon, constraint violations will result in an objective value of :math:`1.0`. :description: This is a pipe shape optimization problem, where a computational fluid dynamics simulator is used to calculate the pressure loss for a given pipe shape, which needs to be minimized. The variables denote the control points that determine the shape of the pipe. HPO / XGBoost ------------- :problem-key: ``hpo`` :parameters: --folder The folder containing the unpacked files of the `Steel Plates Faults `__ dataset. (required) --time-limit The time limit for a single evaluation of the objective function in seconds. A that requires more time than what time time limit allows will return an objective value of 0 (default: 8) **TODO:** Setting this parameter still needs to be implemented. .. important:: The default time limit is based on a **Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz**, adjust accordingly to hardware used. :dataset: Dataset provided by Semeion, Research Center of Sciences of Communication, Via Sersale 117, 00128, Rome, Italy. www.semeion.it :dimensionality: :math:`135` - :math:`117` categorical (``cat``), :math:`7` integer (``int``), :math:`11` continuous (``cont``), contains conditionals :runtime: .. jupyter-execute:: :hide-code: # plot_hist("hpo_rs.csv.xz", 'iter_eval_time', dist=gamma, filter=lambda x: x < 8.0) plot_hist("hpo_rs.csv.xz", 'iter_eval_time', dist=norm) :fitness: .. jupyter-execute:: :hide-code: plot_hist("hpo_rs.csv.xz", 'iter_fitness') :constraints: Time it limited to 8s (on our machine), violations result in an objective value of :math:`0.0`. :description: Machine Learning approaches often have a large amount of hyperparameters of varying types. This benchmark makes use of scikit-learn to build an XGBoost classifier with per-feature preprocessing. Evaluation of a solution is performed by k-fold cross validation, with the goal to maximize accuracy. Hospital Simulation ------------------- :publications: :cite:`hospital` :bibtex: ``hospital`` :repository: `Website `__ :problem-key: ``hospital`` :parameters: None :dimensionality: :math:`29` - all continuous (``cont``) :runtime: .. jupyter-execute:: :hide-code: plot_hist("hospital_rs.csv.xz", 'iter_eval_time', dist=norm) :fitness: .. jupyter-execute:: :hide-code: plot_hist("hospital_rs.csv.xz", 'iter_fitness') :constraints: Lower and upper bounds only. :description: This problem consists of tuning the parameters of a discrete event simulator for a hospital planning tool in the context of the COVID-19 pandemic. It is especially challenging due to the large amount of noise in the objective function. Rosenbrock ---------- :problem-key: rosenbrock :parameters: --n-int The number of dimensions that are required to be integer (expressed as :math:`i` in the dimensionality below) --n-cont The number of dimensions that are required to be continuous (expressed as :math:`c` in the dimensionality below) --logscale Whether to take the log of the rosenbrock function instead of scaling. :dimensionality: :math:`i + c`, :math:`i` integer (``int``), :math:`c` continuous (``cont``) :description: The rosenbrock function with a configurable amount of integer and continuous variables. Non-expensive problem included to test whether approaches work.