Can flower samples be assigned to their proper sub-family purely on the basis of quantitative observation?
Linear discriminant classification
high-quality, annotated dataset
technique and data are interwined!
Instance:
n datapoints, each having over d-1 numerical dimensions \(\mathcal{D_1,} \dots \mathcal{D_{d-1}}\)
an expert classification function over k categories
Solution:
a linear combination \(\mathcal{D_1} \times \mathcal{D_2} \times \dots \mathcal{D_{d-1}}\rightarrow \mathcal{D_d}\)
that respects the given classification.
Measure: agreement with the given classification.
n=150 samples manually assigned by Fisher.
d=5 dimensions, four measurements (in cm) and the classification
k=3 classes: Setosa, Versicolour and Virginica, 50 instances each, all available from scikit-learn
A linear classifier corresponds to to a line drawn on the data display which creates two classification areas; more than one line is possible.
Whereas Setosa can be linearly separated, e.g., petal_lenght <2 in the third column, the other two classes can’t be perfectly separated.
Q: Can we accept a linear combination that gives the correct answer only 19 times over 20?
A: It depends on the application.
Given two putative classifiers, which is the best?
Proposed answer:
At the same level of precision, (fraction of cases for which the classifier agrees with the expert classification)
prefer the one that errs less on the clear-cut cases.
ignore the less informative dimensions
Take a 2D scatterplot and map it to a line: does it improve visual classification?
find a predictor where all predictors are used, but some are given less weight.
This section, with the follow-up lab experience, is self-contained.
If you want more background you may read the PDF excerpt from the advanced Zaki-Meira textbook, which is available for download.
Fisher did not practice Statistics per se as he didn’t try to estimate the distribution of tiny flowers in Canada, nor did he estimate measurement errors.
Rather, he asked whether classification could become somehow automatic, without the need to actually see the flower.