k-medoids clustering

collapse all in page

Syntax

idx = kmedoids(X,k)

idx = kmedoids(X,k,Name,Value)

[idx,C]= kmedoids(___)

[idx,C,sumd]= kmedoids(___)

[idx,C,sumd,D]= kmedoids(___)

[idx,C,sumd,D,midx]= kmedoids(___)

[idx,C,sumd,D,midx,info]= kmedoids(___)

Description

example

idx = kmedoids(X,k) performs k-medoids Clustering to partition the observations of the n-by-p matrix X into k clusters, and returns an n-by-1 vector idx containing cluster indices of each observation. Rows of X correspond to points and columns correspond to variables. By default, kmedoids uses squared Euclidean distance metric and the k-means++ algorithm for choosing initial cluster medoid positions.

idx = kmedoids(X,k,Name,Value) usesadditional options specified by one or more Name,Value pairarguments.

[idx,C]= kmedoids(___) returns the k clustermedoid locations in the k-by-p matrix C.

[idx,C,sumd]= kmedoids(___) returns the within-clustersums of point-to-medoid distances in the k-by-1vector sumd.

[idx,C,sumd,D]= kmedoids(___) returns distances from eachpoint to every medoid in the n-by-k matrix D.

[idx,C,sumd,D,midx]= kmedoids(___) returns the indices midx suchthat C = X(midx,:). midx isa k-by-1 vector.

[idx,C,sumd,D,midx,info]= kmedoids(___) returns a structure info withinformation about the options used by the algorithm when executed.

Examples

collapse all

Group Data into Two Clusters

Open Live Script

Randomly generate data.

rng('default'); % For reproducibilityX = [randn(100,2)*0.75+ones(100,2); randn(100,2)*0.55-ones(100,2)];figure;plot(X(:,1),X(:,2),'.');title('Randomly Generated Data');

Group data into two clusters using kmedoids. Use the cityblock distance metric.

opts = statset('Display','iter');[idx,C,sumd,d,midx,info] = kmedoids(X,2,'Distance','cityblock','Options',opts);

 rep iter sum 1 1 209.856 1 2 209.856Best total sum of distances = 209.856

info is a struct that contains information about how the algorithm was executed. For example, bestReplicate field indicates the replicate that was used to produce the final solution. In this example, the replicate number 1 was used since the default number of replicates is 1 for the default algorithm, which is pam in this case.

info

info = struct with fields: algorithm: 'pam' start: 'plus' distance: 'cityblock' iterations: 2 bestReplicate: 1

Plot the clusters and the cluster medoids.

figure;plot(X(idx==1,1),X(idx==1,2),'r.','MarkerSize',7)hold onplot(X(idx==2,1),X(idx==2,2),'b.','MarkerSize',7)plot(C(:,1),C(:,2),'co',... 'MarkerSize',7,'LineWidth',1.5)legend('Cluster 1','Cluster 2','Medoids',... 'Location','NW');title('Cluster Assignments and Medoids');hold off

Cluster Categorical Data Using k-Medoids

This example uses "Mushroom" data set [3][4][5] [6][7] from the UCI machine learning archive [7], described in https://archive.ics.uci.edu/dataset/73/mushroom. The data set includes 22 predictors for 8,124 observations of various mushrooms. The predictors are categorical data types. For example, cap shape is categorized with features of 'b' for bell-shaped cap and 'c' for conical. Mushroom color is also categorized with features of 'n' for brown, and 'p' for pink. The data set also includes a classification for each mushroom of either edible or poisonous.

Since the features of the mushroom data set are categorical, it is not possible to define the mean of several data points, and therefore the widely-used k-means clustering algorithm cannot be meaningfully applied to this data set. k-medoids is a related algorithm that partitions data into k distinct clusters, by finding medoids that minimize the sum of dissimilarities between points in the data and their nearest medoid.

The medoid of a set is a member of that set whose average dissimilarity with the other members of the set is the smallest. Similarity can be defined for many types of data that do not allow a mean to be calculated, allowing k-medoids to be used for a broader range of problems than k-means.

Using k-medoids, this example clusters the mushrooms into two groups, based on the predictors provided. It then explores the relationship between those clusters and the classifications of the mushrooms as either edible or poisonous.

This example assumes that you have downloaded the "Mushroom" data set [3][4][5] [6][7] from the UCI database (https://archive.ics.uci.edu/dataset/73/mushroom) and saved the text files agaricus-lepiota.data and agaricus-lepiota.names in your current directory. There is no column header line in the data, so readtable uses the default variable names.

Input Arguments

collapse all

`X` — Data
numeric matrix

Data, specified as a numeric matrix. The rows of X correspondto observations, and the columns correspond to variables.

`k` — Number of medoids
positive integer

Number of medoids in the data, specified as a positive integer.

Name-Value Arguments

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

Example: 'Distance','euclidean','Replicates',3,'Options',statset('UseParallel',1) specifiesEuclidean distance, three replicate medoids at different startingvalues, and to use parallel computing.

`Algorithm` — Algorithm to find medoids
`'pam'` | `'small'` | `'clara'` | `'large'`

Algorithm to find medoids, specified as the comma-separatedpair consisting of 'Algorithm' and 'pam', 'small', 'clara',or 'large'. The default algorithm depends on thenumber of rows of X.

If the number of rows of X isless than 3000, 'pam' is the default algorithm.
If the number of rows is between 3000 and 10000, 'small' isthe default algorithm.
For all other cases, 'large' isthe default algorithm.

You can override the default choice by explicitly stating thealgorithm. This table summarizes the available algorithms.

Algorithm	Description
`'pam'`	Partitioning Around Medoids (PAM) is the classical algorithm for solving the k-medoids problem described in [1]. After applying the initialization function to select initial medoid positions, the program performs the swap-step of the PAM algorithm, that is, it searches over all possible swaps between medoids and non-medoids to see if the sum of medoid to cluster member distances goes down. You can specify which initialization function to use via the `'Start'` name-value pair argument. The algorithm proceeds as follows. Build-step: Each of k clusters is associated with a potential medoid. This assignment is performed using a technique specified by the `'Start'` name-value pair argument. Swap-step: Within each cluster, each point is tested as a potential medoid by checking if the sum of within-cluster distances gets smaller using that point as the medoid. If so, the point is defined as a new medoid. Every point is then assigned to the cluster with the closest medoid. The algorithm iterates the build- and swap-steps until the medoids do not change, or other termination criteria are met. The algorithm can produce better solutions than the other algorithms in some situations, but it can be prohibitively long running.
`'small'`	Use an algorithm similar to the k-means algorithm to find k medoids. This option employs a variant of the Lloyd’s iterations based on [2]. The algorithm proceeds as follows. For each point in each cluster, calculate the sum of distances from the point to every other point in the cluster. Choose the point that minimizes the sum as the new medoid. Update the cluster membership for each data point to reflect the new medoid. The algorithm repeats these steps until no further updates occur or other termination criteria are met. The algorithm has an optional PAM-like online update phase (specified by the `'OnlinePhase'` name-value pair argument) that improves cluster quality. It tends to return higher quality solutions than the `clara` or `large` algorithms, but it may not be the best choice for very large data.
`'clara'`	Clustering LARge Applications (CLARA) [1] repeatedly performs the PAM algorithm on random subsets of the data. It aims to overcome scaling challenges posed by the PAM algorithm through sampling. The algorithm proceeds as follows. Select a subset of the data and apply the PAM algorithm to the subset. Assign points of the full data set to clusters by picking the closest medoid. The algorithm repeats these steps until the medoids do not change, or other termination criteria are met. For the best performance, it is recommended that you perform multiple replicates. By default, the program performs five replicates. Each replicate samples `s` rows from X (specified by `'NumSamples'` name-value pair argument) to perform clustering on. By default, `40+2*k` samples are selected.
`'large'`	This is similar to the `small` scale algorithm and repeatedly performs searches using a k-means like update. However, the algorithm examines only a random sample of cluster members during each iteration. The user-adjustable parameter, `'PercentNeighbors'`, controls the number of neighbors to examine. If there is no improvement after the neighbors are examined, the algorithm terminates the local search. The algorithm performs a total of r replicates (specified by `'Replicates'` name-value pair argument) and returns the best clustering result. The algorithm has an optional PAM-like online phase (specified by the `'OnlinePhase'` name-value pair argument) that improves cluster quality.

Example: 'Algorithm','pam'

`OnlinePhase` — Flag to perform PAM-like online update phase
`'on'` (default) | `'off'`

A flag to perform PAM-like online update phase, specified asa comma-separated pair consisting of 'OnlinePhase' and 'on' or 'off'.

If it is on, then kmedoids performsa PAM-like update to the medoids after the Lloyd iterations in the small and large algorithms.During this online update phase, the algorithm chooses a small subsetof data points in each cluster that are the furthest from and nearestto medoid. For each chosen point, it reassigns the clustering of theentire data set and check if this creates a smaller sum of distancesthan the best known.

In other words, the swap considerations are limited to the pointsnear the medoids and far from the medoids. The near points are consideredin order to refine the clustering. The far points are considered inorder to escape local minima. Turning on this feature tends to improvethe quality of solutions generated by both algorithms. Total run timetends to increase as well, but the increase typically is less thanone iteration of PAM.

Example: OnlinePhase,'off'

`Distance` — Distance metric
`'sqEuclidean'` (default) | `'euclidean'` | character vector | string scalar | function handle | ...

Distance metric, specified as the name of a distance metric described in the following table, or a function handle. kmedoids minimizes the sum of medoid to cluster member distances.

Value	Description
`'sqEuclidean'`	Squared Euclidean distance (default)
`'euclidean'`	Euclidean distance
`'seuclidean'`	Standardized Euclidean distance. Each coordinate difference between observations is scaled by dividing by the corresponding element of the standard deviation, `S = std(X,'omitnan')`.
`'cityblock'`	City block distance
`'minkowski'`	Minkowski distance. The exponent is 2.
`'chebychev'`	Chebychev distance (maximum coordinate difference)
`'mahalanobis'`	Mahalanobis distance using the sample covariance of `X`, `C = cov(X,'omitrows')`
`'cosine'`	One minus the cosine of the included angle between points (treated as vectors)
`'correlation'`	One minus the sample correlation between points (treated as sequences of values)
`'spearman'`	One minus the sample Spearman's rank correlation between observations (treated as sequences of values)
`'hamming'`	Hamming distance, which is the percentage of coordinates that differ
`'jaccard'`	One minus the Jaccard coefficient, which is the percentage of nonzero coordinates that differ
`@distfun`	Custom distance function handle. A distance function has the form function D2 = distfun(ZI,ZJ)% calculation of distance... where `ZI` is a `1`-by-`n` vector containing a single observation. `ZJ` is an `m2`-by-`n` matrix containing multiple observations. `distfun` must accept a matrix `ZJ` with an arbitrary number of observations. `D2` is an `m2`-by-`1` vector of distances, and `D2(k)` is the distance between observations `ZI` and `ZJ(k,:)`. If your data is not sparse, you can generally compute distance more quickly by using a built-in distance instead of a function handle.

For the definition of each distance metric, see Distance Metrics.

Example: 'Distance','hamming'

`Options` — Options to control iterative algorithm to minimize fitting criteria
`[]` (default) | structure array returned by `statset`

Options to control the iterative algorithm to minimize fitting criteria, specified as the comma-separated pair consisting of 'Options' and a structure array returned by statset. Supported fields of the structure array specify options for controlling the iterative algorithm. This table summarizes the supported fields.

Field	Description
`Display`	Level of display output. Choices are `'off'` (default)and `'iter'`.
`MaxIter`	Maximum number of iterations allowed. The default is `100`.
`UseParallel`	If `true`, compute in parallel. If Parallel Computing Toolbox™ is not available, then computation occurs in serial mode. The default is `false`, meaning serial computation.
`UseSubstreams`	Set to `true` to compute in a reproducible fashion. The default is `false`. To compute reproducibly, you must also set `Streams` to a type allowing substreams: `'mlfg6331_64'` or `'mrg32k3a'`.
`Streams`	A `RandStream` object or cell array of suchobjects. For details about these options and parallel computing in Statistics and Machine Learning Toolbox™,see Speed Up Statistical Computations or enter `helpparallelstats` at the command line.

Example: 'Options',statset('Display','off')

`Replicates` — Number of times to repeat clustering using new initial cluster medoid positions
positive integer

Number of times to repeat clustering using new initial clustermedoid positions, specified as a positive integer. The default valuedepends on the choice of algorithm. For pam and small,the default is 1. For clara, the default is 5.For large, the default is 3.

Example: 'Replicates',4

`NumSamples` — Number of samples to take from data when executing `clara` algorithm
`40+2*k` (default) | positive integer

Number of samples to take from the data when executing the clara algorithm, specified as a positive integer. The default number of samples is calculated as 40+2*k.

Example: 'NumSamples',160

`PercentNeighbors` — Percent of data set to examine using large algorithm
0.001 (default) | scalar value between 0 and 1

Percent of the data set to examine using the large algorithm,specified as a positive number.

The program examines percentneighbors*size(X,1) numberof neighbors for the medoids. If there is no improvement in the within-clustersum of distances, then the algorithm terminates.

The value of this parameter between 0 and 1, where a value closerto 1 tends to give higher quality solutions, but the algorithm takeslonger to run, and a value closer to 0 tends to give lower qualitysolutions, but finishes faster.

Example: 'PercentNeighbors',0.01

`Start` — Method for choosing initial cluster medoid positions
`'plus'` (default) | `'sample'` | `'cluster'` | matrix

Method for choosing initial cluster medoid positions, specifiedas the comma-separated pair consisting of 'Start' and 'plus', 'sample', 'cluster',or a matrix. This table summarizes the available methods.

Method	Description
`'plus'` (default)	Select k observations from X accordingto the k-means++algorithm for cluster center initialization.
`'sample'`	Select `k` observations from `X` atrandom.
`'cluster'`	Perform preliminary clustering phase on a random subsample(10%) of `X`. This preliminary phase is itselfinitialized using `sample`, that is, the observationsare selected at random.
matrix	A custom `k`-by-p matrixof starting locations. In this case, you can pass in `[]` forthe `k` input argument, and `kmedoids` infers `k` fromthe first dimension of the matrix. You can also supply a 3-D array,implying a value for `'Replicates'` from the array’sthird dimension.

Example: 'Start','sample'

Data Types: char | string | single | double

Output Arguments

collapse all

`idx` — Medoid indices
numeric column vector

Medoid indices, returned as a numeric column vector. idx hasas many rows as X, and each row indicates themedoid assignment of the corresponding observation.

`C` — Cluster medoid locations
numeric matrix

Cluster medoid locations, returned as a numeric matrix. C isa k-by-p matrix, where row j isthe medoid of cluster j

`sumd` — Within-cluster sums of point-to-medoid distances
numeric column vector

Within-cluster sums of point-to-medoid distances, returned asa numeric column vector. sumd is a k-by1vector, where element j is the sum of point-to-medoiddistances within cluster j.

`D` — Distances from each point to every medoid
numeric matrix

Distances from each point to every medoid, returned as a numericmatrix. D is an n-by-k matrix,where element (j,m) is the distancefrom observation j to medoid m.

`midx` — Index to `X`
column vector

Index to X, returned as a column vectorof indices. midx is a k-by-1vector and the indices satisfy C = X(midx,:).

`info` — Algorithm information
struct

Algorithm information, returned as a struct. info contains options used by the function when executed such as k-medoid clustering algorithm (algorithm), method used to choose initial cluster medoid positions (start), distance metric (distance), number of iterations taken in the best replicate (iterations) and the replicate number of the returned results (bestReplicate).

More About

collapse all

k-medoids Clustering

k-medoids clustering is a partitioning method commonly used in domains that require robustness to outlier data, arbitrary distance metrics, or ones for which the mean or median does not have a clear definition.

It is similar to k-means, and the goal ofboth methods is to divide a set of measurements or observations into k subsetsor clusters so that the subsets minimize the sum of distances betweena measurement and a center of the measurement’s cluster. Inthe k-means algorithm, the center of the subsetis the mean of measurements in the subset, often called a centroid.In the k-medoids algorithm, the center of the subsetis a member of the subset, called a medoid.

The k-medoids algorithm returns medoids whichare the actual data points in the data set. This allows you to usethe algorithm in situations where the mean of the data does not existwithin the data set. This is the main difference between k-medoidsand k-means where the centroids returned by k-meansmay not be within the data set. Hence k-medoidsis useful for clustering categorical data where a mean is impossibleto define or interpret.

The function kmedoids provides severaliterative algorithms that minimize the sum of distances from eachobject to its cluster medoid, over all clusters. One of the algorithmsis called partitioning around medoids (PAM) [1] whichproceeds in two steps.

Build-step: Each of k clustersis associated with a potential medoid. This assignment is performedusing a technique specified by the 'Start' name-valuepair argument.
Swap-step: Within each cluster, each point is testedas a potential medoid by checking if the sum of within-cluster distancesgets smaller using that point as the medoid. If so, the point is definedas a new medoid. Every point is then assigned to the cluster withthe closest medoid.

The algorithm iterates the build- and swap-steps untilthe medoids do not change, or other termination criteria are met.

You can control the details of the minimization using severaloptional input parameters to kmedoids, includingones for the initial values of the cluster medoids, and for the maximumnumber of iterations. By default, kmedoids usesthe k-means++algorithm for cluster medoid initialization and the squaredEuclidean metric to determine distances.

References

[1] Kaufman, L., and Rousseeuw, P. J.(2009). Finding Groups in Data: An Introduction to Cluster Analysis.Hoboken, New Jersey: John Wiley & Sons, Inc.

[2] Park, H-S, and Jun, C-H. (2009). Asimple and fast algorithm for K-medoids clustering. Expert Systemswith Applications. 36, 3336-3341.

[3] Schlimmer,J.S. (1987). Concept AcquisitionThrough Representational Adjustment (Technical Report 87-19). Doctoraldissertation, Department of Information and Computer Science, Universityof California, Irvine.

[4] Iba,W., Wogulis,J., and Langley,P. (1988).Trading off Simplicity and Coverage in Incremental Concept Learning.In Proceedings of the 5th International Conference on Machine Learning,73-79. Ann Arbor, Michigan: Morgan Kaufmann.

[5] Duch W, A.R., and Grabczewski, K. (1996) Extraction of logical rules from training data using backpropagation networks. Proc. of the 1st Online Workshop on Soft Computing, 19-30, pp. 25-30.

[6] Duch, W., Adamczak, R., Grabczewski, K.,Ishikawa, M., and Ueda, H. (1997). Extraction of crisp logical rulesusing constrained backpropagation networks - comparison of two newapproaches. Proc. of the European Symposium on Artificial Neural Networks(ESANN'97), Bruge, Belgium 16-18.

[7] Bache, K. and Lichman, M. (2013). UCIMachine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine,CA: University of California, School of Information and Computer Science.

Extended Capabilities

Automatic Parallel Support
Accelerate code by automatically running computation in parallel using Parallel Computing Toolbox™.

To run in parallel, specify the Options name-value argument in the call to this function and set the UseParallel field of the options structure to true using statset:

Options=statset(UseParallel=true)

For more information about parallel computing, see Run MATLAB Functions with Automatic Parallel Support (Parallel Computing Toolbox).

Version History

Introduced in R2014b

MATLAB Command

You clicked a link that corresponds to this MATLAB command:

Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.

Select a Web Site

Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .

You can also select a web site from the following list:

Americas

América Latina (Español)
Canada (English)
United States (English)

Europe

Belgium (English)
Denmark (English)
Deutschland (Deutsch)
España (Español)
Finland (English)
France (Français)
Ireland (English)
Italia (Italiano)
Luxembourg (English)

Netherlands (English)
Norway (English)
Österreich (Deutsch)
Portugal (English)
Sweden (English)
Switzerland
- Deutsch
- English
- Français
United Kingdom (English)

Asia Pacific

Australia (English)
India (English)
New Zealand (English)
中国
- 简体中文
- English
日本 (日本語)
한국 (한국어)

Contact your local office

k-medoids clustering - MATLAB kmedoids (2024)

FAQs

What is the weakness of k-medoids? ›

The main disadvantage of K-Medoid algorithms (either PAM, CLARA or CLARANS) is that they are not suitable for clustering non-spherical (arbitrary shaped) groups of objects.

What is the function of kmedoids in Matlab? ›

The function kmedoids provides several iterative algorithms that minimize the sum of distances from each object to its cluster medoid, over all clusters. One of the algorithms is called partitioning around medoids (PAM) [1] which proceeds in two steps.

Tell Me More ›

When to use K-means vs kmedoids? ›

K-Means: Generally faster and more efficient, making it suitable for very large datasets. K-Medoids: Slower due to the need to evaluate all possible swaps, better for smaller datasets or when robustness is crucial.

Tell Me More ›

What is the complexity of Kmedoids? ›

The complexity of K-Medoids is O ( N 2 K T ) where is the number of samples, is the number of iterations and is the number of clusters. This makes it more suitable for smaller datasets in comparison to KMeans which is O ( N K T ) .

Read The Full Story ›

What are the two main problems of K-means clustering algorithm? ›

Detailed theorotical explanation and scikit-learn implementation. There are two challenges that need to handled wisely in order to get the most out of the k-means clustering algorithm: Defining the number of clusters. Determining the initial centroids.

Tell Me More ›

Which clustering algorithm is most sensitive to outliers? ›

The K-means clustering algorithm is sensitive to outliers, because a mean is easily influenced by extreme values. K-medoids clustering is a variant of K-means that is more robust to noises and outliers.

Read On ›

What is the objective function of Kmedoids? ›

The objective function corresponds to the sum of the dissimilarities of all objects to their nearest medoid. The SWAP step attempts to improve the quality of the clustering by exchanging selected objects (medoids) and non-selected objects.

Know More ›

What is the difference between a centroid and a medoid? ›

Unlike a centroid, which is a point that represents the mean position of all the points in a cluster, a medoid is an actual data point. Purpose: The medoid is chosen to be the point that is most centrally located within a cluster in terms of dissimilarity to other points.

Learn More Now ›

How does k-means ++ work? ›

K-means++ is a smart centroid initialization method for the K-mean algorithm. The goal is to spread out the initial centroid by assigning the first centroid randomly then selecting the rest of the centroids based on the maximum squared distance. The idea is to push the centroids as far as possible from one another.

Find Out More ›

Why use K-Medoids? ›

Because k -medoids minimizes a sum of pairwise dissimilarities instead of a sum of squared Euclidean distances, it is more robust to noise and outliers than k -means.

Find Out More ›

Which is more robust, K-means or K-Medoids? ›

K- Medoids is more robust as compared to K-Means as in K-Medoids we find k as representative object to minimize the sum of dissimilarities of data objects whereas, K-Means used sum of squared Euclidean distances for data objects. And this distance metric reduces noise and outliers.

Learn More Now ›

What is the difference between K mode and K-Medoids? ›

In k-medoids, cluster representative (discrete median) is a vector in the cluster that minimizes the sum of distances from all other vectors to the cluster representative. In k-modes, the representative is the mode of the cluster, calculated independently for every attribute.

k-medoids clustering - MATLAB kmedoids (2024)

Syntax

Description

Examples

Group Data into Two Clusters

Cluster Categorical Data Using k-Medoids

Input Arguments

X — Data numeric matrix

k — Number of medoids positive integer

Name-Value Arguments

Algorithm — Algorithm to find medoids 'pam' | 'small' | 'clara' | 'large'

OnlinePhase — Flag to perform PAM-like online update phase 'on' (default) | 'off'

Distance — Distance metric 'sqEuclidean' (default) | 'euclidean' | character vector | string scalar | function handle | ...

Options — Options to control iterative algorithm to minimize fitting criteria[] (default) | structure array returned by statset

Replicates — Number of times to repeat clustering using new initial cluster medoid positionspositive integer

NumSamples — Number of samples to take from data when executing clara algorithm40+2*k (default) | positive integer

PercentNeighbors — Percent of data set to examine using large algorithm 0.001 (default) | scalar value between 0 and 1

Start — Method for choosing initial cluster medoid positions 'plus' (default) | 'sample' | 'cluster' | matrix

Output Arguments

idx — Medoid indices numeric column vector

C — Cluster medoid locations numeric matrix

sumd — Within-cluster sums of point-to-medoid distances numeric column vector

D — Distances from each point to every medoid numeric matrix

midx — Index to X column vector

info — Algorithm information struct

More About

k-medoids Clustering

References

Extended Capabilities

Automatic Parallel Support Accelerate code by automatically running computation in parallel using Parallel Computing Toolbox™.

Version History

See Also

MATLAB Command

Americas

Europe

Asia Pacific

FAQs

What is the weakness of k-medoids? ›

Which is more robust, K-means or K-Medoids? ›

`X` — Data
numeric matrix

`k` — Number of medoids
positive integer

`Algorithm` — Algorithm to find medoids
`'pam'` | `'small'` | `'clara'` | `'large'`

`OnlinePhase` — Flag to perform PAM-like online update phase
`'on'` (default) | `'off'`

`Distance` — Distance metric
`'sqEuclidean'` (default) | `'euclidean'` | character vector | string scalar | function handle | ...

`Options` — Options to control iterative algorithm to minimize fitting criteria
`[]` (default) | structure array returned by `statset`

`Replicates` — Number of times to repeat clustering using new initial cluster medoid positions
positive integer

`NumSamples` — Number of samples to take from data when executing `clara` algorithm
`40+2*k` (default) | positive integer

`PercentNeighbors` — Percent of data set to examine using large algorithm
0.001 (default) | scalar value between 0 and 1

`Start` — Method for choosing initial cluster medoid positions
`'plus'` (default) | `'sample'` | `'cluster'` | matrix

`idx` — Medoid indices
numeric column vector

`C` — Cluster medoid locations
numeric matrix

`sumd` — Within-cluster sums of point-to-medoid distances
numeric column vector

`D` — Distances from each point to every medoid
numeric matrix

`midx` — Index to `X`
column vector

`info` — Algorithm information
struct

Automatic Parallel Support
Accelerate code by automatically running computation in parallel using Parallel Computing Toolbox™.