What is K Medoid Clustering: Why and How? (2024)

Being a data science enthusiast (which you are since you’re reading this), you must have come across quite a number of K prefixed phrases like k means, k modes, k nearest neighbors, and so on and while they might seem a little baffling at first glance, with a little thorough reading you’ll find though they might seem similar, they make perfect sense. Okay, in this lesson we’re gonna take a little closer look at one of there siblings, the K Medoid Clustering. So let’s get going.

Here’s a summary of what we’ll be doing over here, first why K Medoid, then a decent definition, and finally work out an example to help solidify your understanding. Now, I’m gonna assume you’re familiar with the clustering problem and the K means algorithm, if not, here’s a little brush-up.

To explain why we need K medoid or why the concept of medoid over mean, let’s seek an analogy. Here’s a list of weights of 5 students of your class, 62, 64, 65, 62, 120, the mean of whose is 74.6, which is greater than the 80% of individual weights in the population and what’s funnier is that the mean excluding the last item(120) is 63.5, which represents the set quite well. These values, which lye way further than the general distribution of the data points are called outliers and as seen above, the mean is worst affected by such outliers. Now think of another measure, the median. Here’s how to find that, sort the data in ascending order, which gives 62, 62, 64, 65, 120 and point at the one at the middle (in case of even number of items its the average of the two at the middle), that’s the median, which is 64, in this case. Note that the median almost accurately represents the data even at the presence of an outlier. Statistically, medians are less prone to outliers which makes them a more reliable measure of center. Hopefully, this made sense to why K medoids are preferred over K means.

We are now ready to ingest a nice, intuitive definition of the problem at hand. Formally speaking, K Medoids a clustering algorithm that partitions sets of data points around a medoid (the least dissimilar point) and constantly attempts to lower the dissimilarity among the points in the same cluster. The key point here is that the medoid essentially is a data point from the input set, unlike in k means where mean is the mere average.

The algorithm is quite intuitive. In words, we’re given two input parameters, the value of k, ie. the number of clusters to be formed and the dataset on which the clusters form. First, take k number of random data points as medoids. Then for every data point except the ones taken as medoids, compute the dissimilarity with each k medoids using any distance metric such as Euclidean, Manhattan, or anything else and include the current on the cluster represented by the medoid which has the least dissimilarity with the current point. Once every node(the points) have been assigned to a cluster, calculate the total dissimilarity of the iteration, which is the sum of the dissimilarities between every node and the medoid of the cluster in which it resides. This is one iteration. This is to be repeated with different nodes as medoids until the total dissimilarity is found to be greater than the previous iteration. Below is the pseudocode view,

What is K Medoid Clustering: Why and How? (3)

Now, who’s up for a work out example? Come on, it’s gonna be fun and enlightening(!).

We’re gonna work out an example here, the statement is something like this given a dataset containing height and weight of n = 10 people, arrange the dataset into 3 clusters. Here’s the tabular form,

What is K Medoid Clustering: Why and How? (4)

This dataset is created on the fly, for the sole purpose of illustration, and any real-life inconsistency associated should be ignored. Now, note the fact that we’re expecting the dataset to be clustered into 3 disjoint sets. Let's take the 3 medoids randomly, for convenience I’m gonna select the first three nodes, so we have, the medoids = { m1(155, 67), m2(161, 68), m3(152, 76) }. Now if you’re doing it on hands, like me, here’s the trick, chalk out 3 more columns on the right (since k = 3) with headers d1, d2 and d3, where dj would represent the distance of ith node from jth medoid. Calculation time! I’m gonna use the Manhattan distance since it’s plain math devoid of the square and root hassle. If you’re not familiar with that, The Manhattan distance between (a, b) and (c, d) is equal to |a-c|+|b-d|, ie. the aggregate of the point distances. Once the distances are calculated, here’s how the table should appear,

What is K Medoid Clustering: Why and How? (5)

A quick recap on calculations for node 4 (178, 82), to calculate d1, we took medoid m1(155, 67) and hence d1 = |155–178| + |67–82| = 82 and the rest follows. Here are a few thinks a clever eye would note in the second table,

  1. It excludes the first 3 nodes since they are the medoids
  2. I’ve marked the least distance with apple green because that’s the cluster we’re gonna push the corresponding node into

The clusters would initially consist of only the corresponding medoids and once the calculations wrap up, they’re gonna look much like

Cluster 1: { N1, N5, N7, N10 }

Cluster 2: { N2, N4, N6, N8, N9 }

Cluster 3: { N3 }

Ni denotes the ith node. The clusters are the result of the first iteration and now to calculate the overall dissimilarity measure for this iteration, add up all the least distances, such as D1 = 31 + 3 + 11 + 15 + 24 + 3 + 5 = 92. This metric is used as a ranking score to track the best clustering possible.

In the following iterations, which I‘m not going to show here, the same calculations would be repeated with different medoids, selected randomly and this would go on until for ith iteration Di-Di-1 > 0, ie. the swapping cost is positive.

So that’s it. Hope it builds a clear and more understandable basis. You may code this out, shouldn’t take long and as always, SPREAD LOVE.

What is K Medoid Clustering: Why and How? (2024)
Top Articles
Fallout London erscheint heute: Alles was ihr zum Release jetzt wissen müsst
5 Winners and 5 Losers from the Hungarian Grand Prix
2016 Hyundai Sonata Refrigerant Capacity
Refinery29 Horoscopes
Salons Open Near Me Today
Latina Webcam Lesbian
Academic Calendar Pbsc
Babylon Showtimes Near Airport Stadium 12
Tyson Employee Paperless
Costco Gas Price Carlsbad
Unlock the Fun: A Beginner's Guide to Playing TBG95 Unblocked Games at School and Beyond
Lowes Maytag Pet Pro Commercial Actress
1 Bedroom Apartment For Rent Private Landlord
Warped Pocket Dimension
Choke Pony Dating App
35Mmx45Mm In Inches
Jennifer Lenzini Leaving Ktiv
Mifflin County 24 Hour Auction
Worlds Hardest Game Tyrone
The Quiet Girl Showtimes Near Amc Shirlington 7
Linus Tech Tips Forums
Kohls Locations Long Island
Autotrader Ford Ranger
Nephi Veterinarian
Lima Crime Stoppers
Cavender's Boot City Killeen Photos
Az511 Twitter
Hyvee Workday
Pokio.io
Lox Club Gift Code
Prot Pally Wrath Pre Patch
Sport & Fitness in Hainbuch: Zubehör & Ausrüstung günstig kaufen
Visit.lasd
Tighe Hamilton Hudson Ma Obituary
Below Her Mouth | Rotten Tomatoes
Withers Not In Sarcophagus
Craigslist Creative Gigs
Imagemate Orange County
Phrj Incarcerations
Lee County Buy Sell And Trade
Nusl Symplicity Login
Ukg Dimensions Urmc
Download fallout 3 mods pc.10 essential Fallout 3 mods - Modutech
Lesson 2 Homework 4.1 Answer Key
Basis Phoenix Primary Calendar
Morse Road Bmv Hours
Dl 9672
Evalue Mizzou
Schematic Calamity
Chase Bank Time Hours
49 CFR Part 581 -- Bumper Standard
Mecklenburg Warrant Search
Latest Posts
Article information

Author: Dong Thiel

Last Updated:

Views: 6503

Rating: 4.9 / 5 (79 voted)

Reviews: 94% of readers found this page helpful

Author information

Name: Dong Thiel

Birthday: 2001-07-14

Address: 2865 Kasha Unions, West Corrinne, AK 05708-1071

Phone: +3512198379449

Job: Design Planner

Hobby: Graffiti, Foreign language learning, Gambling, Metalworking, Rowing, Sculling, Sewing

Introduction: My name is Dong Thiel, I am a brainy, happy, tasty, lively, splendid, talented, cooperative person who loves writing and wants to share my knowledge and understanding with you.