[chinese][english] Abalone dataset contains 4177 entries in which each entry records the features of an abalone together with its age as the desired output. Characteristics of this dataset can be listed below.
- Data size: 4177 entries.
- Features: 8 features of an abalone's physical measurements, with no missing data.
- Sex
- Length
- Diameter
- Height
- Whole weight
- Shucked weight
- Viscera weight
- Shell weight
- Classes: 28 classes corresponding to the age from 1 to 29 years of abalones.
Abalone 資料集包含不同年齡之鮑魚的各項資料,特性如下:
- 資料筆數:共 4177 筆
- 特徵:共 8 種,都是鮑魚的量測數值,沒有未知量:
- Sex
- Length
- Diameter
- Height
- Whole weight
- Shucked weight
- Viscera weight
- Shell weight
- 類別:共 28 類,分別代表年齡從1歲到28歲的鮑魚。
We can display the data sizes among all classes, as follows:
我們可以計算每一個類別的資料量,如下:
We can display the feature distributions over different classes, as follows:
我們可以計算每一個類別的特徵分布圖,如下:
(In order not to clotter the plot, we have only shown the distributions among the first 8 classes.)We can plot the classes w.r.t. each of the features:
我們可以進行類別對單一特徵的作圖,如下:
We can have a scatter plot after projecting the dataset onto a 2D plane:
我們也可以將資料投影到二度空間,來觀察資料的分佈,範例如下:
We can have another scatter plot after projecting the dataset onto a 3D space, which generates C(8, 3) = 56 subplots, as follows:
我們也可以將資料投影到三度空間,來觀察資料的分佈,但是會畫出的圖形將有 C(8, 3) = 56 個圖,如下:
Since there are 28 classes, it is hard to observe the distribution of each class in either 2D or 3D projection.
由於類別太多,所以無論是二度空間投影或是三度空間投影,都很難看出來是否有「同類資料分佈相近」的趨勢。
Data Clustering and Pattern Recognition (資料分群與樣式辨認)