In this video, I performed external cluster validation using sklearn's iris dataset. There are three types of flowers in the iris dataset: setosa, versicolor and virginica. The total number of samples is 150. The iris dataset contains 50 samples of each type of flower. The three classes are represented by 4 features: sepal length, sepal width, petal length and petal width. I used the actual classes as ground truths and performed K-Means clustering considering 3 clusters utilizing the 4 features. The labels obtained from K-Means clustering are used as predictions. Then, I calculated rand index (RI) and adjusted rand index (ARI) using sklearn library by comparing ground truths with predictions.
GitHub address: [ Ссылка ]
For mathematical details:
rand index (RI): [ Ссылка ]
adjusted rand index (ARI): [ Ссылка ]
Description:
01:00 --- Import the required libraries
02:24 --- Load 'iris' dataset
03:39 --- Create a dataframe
05:12 --- Separate features from the dataframe
05:57 --- Perform preprocessing
06:25 --- Scaled dataframe
07:39 --- Perform K-Means clustering considering 3 clusters
08:47 --- Add two new columns to the scaled dataframe
10:59 --- Clustering comparison between ground truths and predictions
13:47 --- Calculate rand index (rand score)
16:30 --- Calculate adjusted rand index (adjusted rand score)
#data_science #jupyter_notebook #python #external_cluster_validation #sklearn #rand_index #adjusted_rand_index
Ещё видео!