Wine dataset collects data of 3 classes of wine from various places at Italy. Some characteristics are listed below:
Here are two papers that use the Wine dataset:
- Data size: 178 entries
- 3 classes
- Data distribution: 59, 71, and 48 entries for each class
- 13 features corresponding to the values from chemical analysis, no missing data:
- Malic acid
- Alcalinity of ash
- Total phenols
- Nonflavanoid phenols
- Color intensity
- OD280/OD315 of diluted wines
- S. Aeberhard, D. Coomans and O. de Vel, Comparison of Classifiers in High Dimensional Settings, Tech. Rep. no. 92-02, (1992), Dept. of Computer Science and Dept. of Mathematics and Statistics, James Cook University of North Queensland. (Also submitted to Technometrics).
The data was used with many others for comparing various classifiers. The classes are separable, though only RDA has achieved 100% correct classification. (RDA : 100%, QDA 99.4%, LDA 98.9%, 1NN 96.1% (z-transformed data)) (All results using the leave-one-out technique)
In a classification context, this is a well posed problem with "well behaved" class structures. A good data set for first testing of a new classifier, but not very challenging.
- S. Aeberhard, D. Coomans and O. de Vel, "THE CLASSIFICATION PERFORMANCE OF RDA" Tech. Rep. no. 92-01, (1992), Dept. of Computer Science and Dept. of Mathematics and Statistics, James Cook University of North Queensland. (Also submitted to Journal of Chemometrics).
We can display the data sizes among all classes, as follows:
We can display the feature distributions over different classes, as follows:
We can plot the classes w.r.t. each of the features: