{ "cells": [ { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "```{index} single: application; regression\n", "```\n", "```{index} pandas dataframe\n", "```\n", "```{index} single: solver; HiGHS\n", "```\n", "\n", "# Extra material: Wine quality prediction with $L_1$ regression" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Preamble: Install Pyomo and a solver\n", "\n", "The following cell sets and verifies a global SOLVER for the notebook. If run on Google Colab, the cell installs Pyomo and the HiGHS solver, while, if run elsewhere, it assumes Pyomo and HiGHS have been previously installed. It then sets to use HiGHS as solver via the appsi module and a test is performed to verify that it is available. The solver interface is stored in a global object `SOLVER` for later use." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "5ssUqKOaPVaE", "outputId": "38c1005a-39f4-4307-e305-19a4c9819396" }, "outputs": [], "source": [ "import sys\n", " \n", "if 'google.colab' in sys.modules:\n", " %pip install pyomo >/dev/null 2>/dev/null\n", " %pip install highspy >/dev/null 2>/dev/null\n", " \n", "solver = 'appsi_highs'\n", " \n", "import pyomo.environ as pyo\n", "SOLVER = pyo.SolverFactory(solver)\n", "\n", "assert SOLVER.available(), f\"Solver {solver} is not available.\"" ] }, { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "## Problem description\n", "\n", "Regression analysis aims to fit a predictive model to a dataset, and when executed successfully, this model can generate valuable forecasts for new data points. This notebook demonstrates how linear programming techniques coupled with Least Absolute Deviation (LAD) regression can construct a linear model to predict wine quality based on its physicochemical attributes. The example uses a well known data set from the machine learning community.\n", "\n", "In [this 2009 article](https://doi.org/10.1016/j.dss.2009.05.016) by Cortez et al. comprehensive set of physical, chemical, and sensory quality metrics was gathered for an extensive range of red and white wines produced in Portugal. This dataset was subsequently contributed to the [UCI Machine Learning Repository](https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv). \n", "\n", "The next code cell downloads the red wine data directly from this repository." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "tags": [] }, "outputs": [ { "data": { "text/html": [ "
\n", " | fixed acidity | \n", "volatile acidity | \n", "citric acid | \n", "residual sugar | \n", "chlorides | \n", "free sulfur dioxide | \n", "total sulfur dioxide | \n", "density | \n", "pH | \n", "sulphates | \n", "alcohol | \n", "quality | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "7.4 | \n", "0.700 | \n", "0.00 | \n", "1.9 | \n", "0.076 | \n", "11.0 | \n", "34.0 | \n", "0.99780 | \n", "3.51 | \n", "0.56 | \n", "9.4 | \n", "5 | \n", "
1 | \n", "7.8 | \n", "0.880 | \n", "0.00 | \n", "2.6 | \n", "0.098 | \n", "25.0 | \n", "67.0 | \n", "0.99680 | \n", "3.20 | \n", "0.68 | \n", "9.8 | \n", "5 | \n", "
2 | \n", "7.8 | \n", "0.760 | \n", "0.04 | \n", "2.3 | \n", "0.092 | \n", "15.0 | \n", "54.0 | \n", "0.99700 | \n", "3.26 | \n", "0.65 | \n", "9.8 | \n", "5 | \n", "
3 | \n", "11.2 | \n", "0.280 | \n", "0.56 | \n", "1.9 | \n", "0.075 | \n", "17.0 | \n", "60.0 | \n", "0.99800 | \n", "3.16 | \n", "0.58 | \n", "9.8 | \n", "6 | \n", "
4 | \n", "7.4 | \n", "0.700 | \n", "0.00 | \n", "1.9 | \n", "0.076 | \n", "11.0 | \n", "34.0 | \n", "0.99780 | \n", "3.51 | \n", "0.56 | \n", "9.4 | \n", "5 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
1594 | \n", "6.2 | \n", "0.600 | \n", "0.08 | \n", "2.0 | \n", "0.090 | \n", "32.0 | \n", "44.0 | \n", "0.99490 | \n", "3.45 | \n", "0.58 | \n", "10.5 | \n", "5 | \n", "
1595 | \n", "5.9 | \n", "0.550 | \n", "0.10 | \n", "2.2 | \n", "0.062 | \n", "39.0 | \n", "51.0 | \n", "0.99512 | \n", "3.52 | \n", "0.76 | \n", "11.2 | \n", "6 | \n", "
1596 | \n", "6.3 | \n", "0.510 | \n", "0.13 | \n", "2.3 | \n", "0.076 | \n", "29.0 | \n", "40.0 | \n", "0.99574 | \n", "3.42 | \n", "0.75 | \n", "11.0 | \n", "6 | \n", "
1597 | \n", "5.9 | \n", "0.645 | \n", "0.12 | \n", "2.0 | \n", "0.075 | \n", "32.0 | \n", "44.0 | \n", "0.99547 | \n", "3.57 | \n", "0.71 | \n", "10.2 | \n", "5 | \n", "
1598 | \n", "6.0 | \n", "0.310 | \n", "0.47 | \n", "3.6 | \n", "0.067 | \n", "18.0 | \n", "42.0 | \n", "0.99549 | \n", "3.39 | \n", "0.66 | \n", "11.0 | \n", "6 | \n", "
1599 rows × 12 columns
\n", "\n", " | volatile acidity | \n", "density | \n", "alcohol | \n", "quality | \n", "
---|---|---|---|---|
volatile acidity | \n", "1.000000 | \n", "0.022026 | \n", "-0.202288 | \n", "-0.390558 | \n", "
density | \n", "0.022026 | \n", "1.000000 | \n", "-0.496180 | \n", "-0.174919 | \n", "
alcohol | \n", "-0.202288 | \n", "-0.496180 | \n", "1.000000 | \n", "0.476166 | \n", "
quality | \n", "-0.390558 | \n", "-0.174919 | \n", "0.476166 | \n", "1.000000 | \n", "