Help for package mlr3cluster

Title:

Cluster Extension for 'mlr3'

Version:

0.4.0

Description:

Extends the 'mlr3' package with cluster analysis.

License:

LGPL-3

URL:

https://mlr3cluster.mlr-org.com, https://github.com/mlr-org/mlr3cluster

BugReports:

https://github.com/mlr-org/mlr3cluster/issues

Depends:

mlr3 (≥ 1.5.0), R (≥ 3.4.0)

Imports:

backports (≥ 1.5.0), checkmate (≥ 2.0.0), cluster, data.table (≥ 1.15.0), mlr3misc (≥ 0.21.0), paradox (≥ 1.0.1), R6 (≥ 2.4.1), stats

Suggests:

apcluster, clue, clusterCrit, ClusterR (≥ 1.3.1), clustMixType (≥ 0.4.0), dbscan (≥ 1.2.1), e1071, flexclust, flexmix, fpc, genieclust, kernlab, kohonen (≥ 3.0.0), LPCM, mclust, mirai (≥ 2.4.1), mlbench, movMF, mvtnorm, protoclust, RWeka, skmeans, stdbscan (≥ 0.2.0), stream (≥ 2.0.0), tclust (≥ 2.0-3), testthat (≥ 3.3.0), withr

Config/roxygen2/markdown:

TRUE

Config/roxygen2/r6:

TRUE

Config/roxygen2/version:

8.0.0

Config/testthat/edition:

Encoding:

UTF-8

Collate:

'LearnerClust.R' 'zzz.R' 'LearnerClustAffinityPropagation.R' 'LearnerClustAgnes.R' 'LearnerClustBICO.R' 'LearnerClustBIRCH.R' 'LearnerClustCLARA.R' 'LearnerClustCMeans.R' 'LearnerClustCobweb.R' 'LearnerClustDBSCAN.R' 'LearnerClustDBSCANfpc.R' 'LearnerClustDiana.R' 'LearnerClustEM.R' 'LearnerClustFanny.R' 'LearnerClustFarthestFirst.R' 'LearnerClustFeatureless.R' 'LearnerClustFlexmix.R' 'LearnerClustGenie.R' 'LearnerClustHDBSCAN.R' 'LearnerClustHclust.R' 'LearnerClustKCCA.R' 'LearnerClustKKMeans.R' 'LearnerClustKMeans.R' 'LearnerClustKProto.R' 'LearnerClustMclust.R' 'LearnerClustMeanShift.R' 'LearnerClustMiniBatchKMeans.R' 'LearnerClustMovMF.R' 'LearnerClustOPTICS.R' 'LearnerClustPAM.R' 'LearnerClustProtoclust.R' 'LearnerClustSKMeans.R' 'LearnerClustSOM.R' 'LearnerClustSTDBSCAN.R' 'LearnerClustSimpleKMeans.R' 'LearnerClustSpectral.R' 'LearnerClustTclust.R' 'LearnerClustXMeans.R' 'MeasureClust.R' 'measures.R' 'cluster_stats.R' 'MeasureClustSimple.R' 'PredictionClust.R' 'PredictionDataClust.R' 'TaskClust.R' 'TaskClust_ruspini.R' 'TaskClust_usarrest.R' 'as_prediction_clust.R' 'as_task_clust.R' 'bibentries.R' 'helper.R'

NeedsCompilation:

Packaged:

2026-06-11 11:19:00 UTC; mmuecke

Author:

Maximilian Mücke

[aut, cre], Damir Pulatov [aut], Michel Lang

[aut], Marc Becker

[ctb]

Maintainer:

Maximilian Mücke <muecke.maximilian@gmail.com>

Repository:

CRAN

Date/Publication:

2026-06-11 11:50:15 UTC

mlr3cluster: Cluster Extension for 'mlr3'

Description

Extends the 'mlr3' package with cluster analysis.

Author(s)

Maintainer: Maximilian Mücke muecke.maximilian@gmail.com (ORCID)

Authors:

Maximilian Mücke muecke.maximilian@gmail.com (ORCID)
Damir Pulatov damirpolat@protonmail.com
Michel Lang michellang@gmail.com (ORCID)

Other contributors:

Marc Becker marcbecker@posteo.de (ORCID) [contributor]

Cluster Learner

Description

This Learner specializes mlr3::Learner for cluster problems:

task_type is set to "clust".
Creates mlr3::Predictions of class PredictionClust.
Possible values for predict_types are:
- "partition": Integer indicating the cluster membership.
- "prob": Probability for belonging to each cluster.
Additional learner properties include:
- "exclusive": The method natively assigns each observation to exactly one cluster.
- "overlapping": The method natively assigns observations to multiple clusters.
- "fuzzy": The method natively produces soft cluster memberships, e.g. fuzzy or probabilistic model-based methods. The hard partition is derived from the memberships.
- "complete": Every observation is assigned to a cluster.
- "partial": Observations may be left unassigned, e.g. as noise points.
- "partitional": The method divides the data into non-nested clusters.
- "hierarchical": The method produces a nested hierarchy of clusters.
- "density": The method finds clusters as dense regions in the feature space.
These properties describe the nature of the underlying method, not its interface capabilities: whether a learner can return soft memberships is encoded by the "prob" predict type, which "exclusive" learners may also support via derived scores.

Predefined learners can be found in the mlr3misc::Dictionary mlr3::mlr_learners.

Super class

mlr3::Learner -> LearnerClust

Public fields

assignments: (NULL | vector())
Cluster assignments from learned model.
save_assignments: (logical(1))
Should assignments for 'train' data be saved in the learner? Default is TRUE.

Methods

Inherited methods

`LearnerClust$new()`

Creates a new instance of this R6 class.

Usage

LearnerClust$new(
  id,
  param_set = ps(),
  predict_types = "partition",
  feature_types = character(),
  properties = character(),
  packages = character(),
  label = NA_character_,
  man = NA_character_
)

Arguments

id

(character(1))
Identifier for the new instance.

param_set

(paradox::ParamSet)
Set of hyperparameters.

predict_types

(character())
Supported predict types. Must be a subset of mlr_reflections$learner_predict_types.

feature_types

(character())
Feature types the learner operates on. Must be a subset of mlr_reflections$task_feature_types.

properties

(character())
Set of properties of the mlr3::Learner. Must be a subset of mlr_reflections$learner_properties. The following properties are currently standardized and understood by learners in mlr3:

"missings": The learner can handle missing values in the data.
"weights": The learner supports observation weights.
"offset": The learner can incorporate offset values to adjust predictions.
"importance": The learner supports extraction of importance scores, i.e. comes with an ⁠$importance()⁠ extractor function (see section on optional extractors in mlr3::Learner).
"selected_features": The learner supports extraction of the set of selected features, i.e. comes with a ⁠$selected_features()⁠ extractor function (see section on optional extractors in mlr3::Learner).
"oob_error": The learner supports extraction of estimated out of bag error, i.e. comes with a oob_error() extractor function (see section on optional extractors in mlr3::Learner).
"validation": The learner can use a validation task during training.
"internal_tuning": The learner is able to internally optimize hyperparameters (those are also tagged with "internal_tuning").
"marshal": To save learners with this property, you need to call ⁠$marshal()⁠ first. If a learner is in a marshaled state, you call first need to call ⁠$unmarshal()⁠ to use its model, e.g. for prediction.
"hotstart_forward": The learner supports to hotstart a model forward.
"hotstart_backward": The learner supports hotstarting a model backward.
"featureless": The learner does not use features.

packages

(character())
Set of required packages. A warning is signaled by the constructor if at least one of the packages is not installed, but loaded (not attached) later on-demand via requireNamespace().

label

(character(1))
Label for the new instance.

man

(character(1))
String in the format ⁠[pkg]::[topic]⁠ pointing to a manual page for this object. The referenced help package can be opened via method ⁠$help()⁠.

`LearnerClust$reset()`

Reset assignments field before calling parent's reset().

Usage

LearnerClust$reset()

`LearnerClust$clone()`

The objects of this class are cloneable with this method.

Usage

LearnerClust$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

Examples

library(mlr3)
library(mlr3cluster)
ids = mlr_learners$keys("^clust")
ids

# get a specific learner from mlr_learners:
learner = lrn("clust.kmeans")
print(learner)

Cluster Measure

Description

This measure specializes mlr3::Measure for cluster analysis:

task_type is set to "clust".
Possible values for predict_type are "partition" and "prob".

Predefined measures can be found in the mlr3misc::Dictionary mlr3::mlr_measures.

Super class

mlr3::Measure -> MeasureClust

Methods

Public methods

MeasureClust$new()

Inherited methods

`MeasureClust$new()`

Creates a new instance of this R6 class.

Usage

MeasureClust$new(
  id,
  range,
  minimize = NA,
  aggregator = NULL,
  properties = character(),
  predict_type = "partition",
  task_properties = character(),
  packages = character(),
  label = NA_character_,
  man = NA_character_
)

Arguments

id

(character(1))
Identifier for the new instance.

range

(numeric(2))
Feasible range for this measure as c(lower_bound, upper_bound). Both bounds may be infinite.

minimize

(logical(1))
Set to TRUE if good predictions correspond to small values, and to FALSE if good predictions correspond to large values. If set to NA (default), tuning this measure is not possible.

aggregator

(⁠function()⁠ | NULL)
Function to aggregate over multiple iterations. The role of this function depends on the value of field "average":

"macro": A numeric vector of scores (one per iteration) is passed. The aggregate function defaults to mean() in this case.
"micro": The aggregator function is not used. Instead, predictions from multiple iterations are first combined and then scored in one go.
"custom": A ResampleResult is passed to the aggregate function.

properties

(character())
Properties of the measure. Must be a subset of mlr_reflections$measure_properties. Supported by mlr3:

"requires_task" (requires the complete mlr3::Task),
"requires_learner" (requires the trained mlr3::Learner),
"requires_model" (requires the trained mlr3::Learner, including the fitted model),
"requires_train_set" (requires the training indices from the mlr3::Resampling),
"na_score" (the measure is expected to occasionally return NA or NaN),
"weights" (support weighted scoring using sample weights from task, column role weights_measure),
"primary_iters" (the measure explicitly handles resamplings that only use a subset of their iterations for the point estimate), and
"requires_no_prediction" (No prediction is required; This usually means that the measure extracts some information from the learner state.).

predict_type

(character(1))
Required predict type of the mlr3::Learner. Possible values are stored in mlr_reflections$learner_predict_types.

task_properties

(character())
Required task properties, see mlr3::Task.

packages

label

(character(1))
Label for the new instance.

man

(character(1))
String in the format ⁠[pkg]::[topic]⁠ pointing to a manual page for this object. The referenced help package can be opened via method ⁠$help()⁠.

Prediction Object for Cluster Analysis

Description

This object wraps the predictions returned by a learner of class LearnerClust, i.e. the predicted partition and cluster probability.

Super class

mlr3::Prediction -> PredictionClust

Active bindings

partition: (integer())
Access the stored partition.
prob: (matrix() | NULL)
Access to the stored probabilities.

Methods

Public methods

PredictionClust$new()
PredictionClust$clone()

Inherited methods

`PredictionClust$new()`

Creates a new instance of this R6 class.

Usage

PredictionClust$new(
  task = NULL,
  row_ids = task$row_ids,
  partition = NULL,
  prob = NULL,
  check = TRUE
)

Arguments

task: (TaskClust | NULL)
Task, used to extract defaults for row_ids.
row_ids: (integer())
Row ids of the predicted observations, i.e. the row ids of the test set.
partition: (integer() | NULL)
Vector of cluster partitions.
prob: (matrix() | NULL)
Numeric matrix of cluster membership probabilities with one column for each cluster and one row for each observation. Columns must be named with cluster numbers, row names are automatically removed. If prob is provided, but partition is not, the cluster memberships are calculated from the probabilities using max.col() with ties.method set to "first".
check: (logical(1))
If TRUE, performs some argument checks and predict type conversions.

`PredictionClust$clone()`

The objects of this class are cloneable with this method.

Usage

PredictionClust$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

Examples

library(mlr3)
library(mlr3cluster)
task = tsk("usarrests")
learner = lrn("clust.kmeans")
p = learner$train(task)$predict(task)
p$predict_types
head(as.data.table(p))

Cluster Task

Description

This task specializes mlr3::Task for cluster problems. As an unsupervised task, this task has no target column. The task_type is set to "clust".

Predefined tasks are stored in the dictionary mlr3::mlr_tasks.

Super classes

mlr3::Task -> mlr3::TaskUnsupervised -> TaskClust

Methods

Public methods

TaskClust$new()
TaskClust$clone()

Inherited methods

`TaskClust$new()`

Creates a new instance of this R6 class.

Usage

TaskClust$new(id, backend, label = NA_character_)

Arguments

id: (character(1))
Identifier for the new instance.
backend: (mlr3::DataBackend)
Either a mlr3::DataBackend, or any object which is convertible to a mlr3::DataBackend with as_data_backend(). E.g., a data.frame() will be converted to a mlr3::DataBackendDataTable.
label: (character(1))
Label for the new instance.

`TaskClust$clone()`

The objects of this class are cloneable with this method.

Usage

TaskClust$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

Examples

library(mlr3)
library(mlr3cluster)
task = TaskClust$new("usarrests", backend = USArrests)
task$task_type

# possible properties:
mlr_reflections$task_properties$clust

Convert to a Cluster Prediction

Description

Convert object to a PredictionClust.

Usage

as_prediction_clust(x, ...)

## S3 method for class 'PredictionClust'
as_prediction_clust(x, ...)

## S3 method for class 'data.frame'
as_prediction_clust(x, ...)

Arguments

x

(any)
Object to convert.

...

(any)
Additional arguments.

Value

PredictionClust.

Examples


# create a prediction object
task = tsk("usarrests")
learner = lrn("clust.cmeans", predict_type = "prob")
learner$train(task)
p = learner$predict(task)

# convert to a data.table
tab = as.data.table(p)

# convert back to a Prediction
as_prediction_clust(tab)

# split data.table into a 3 data.tables based on UrbanPop
f = cut(task$data(rows = tab$row_ids)$UrbanPop, 3)
tabs = split(tab, f)

# convert back to list of predictions
preds = lapply(tabs, as_prediction_clust)

# calculate performance in each group
sapply(preds, function(p) p$score(task = task))

Convert to a Cluster Task

Description

Convert object to a TaskClust. This is a S3 generic, specialized for at least the following objects:

TaskClust: ensure the identity.
data.frame() and mlr3::DataBackend: provides an alternative to calling constructor of TaskClust.

Usage

as_task_clust(x, ...)

## S3 method for class 'TaskClust'
as_task_clust(x, clone = FALSE, ...)

## S3 method for class 'data.frame'
as_task_clust(x, id = deparse1(substitute(x)), ...)

## S3 method for class 'DataBackend'
as_task_clust(x, id = deparse1(substitute(x)), ...)

## S3 method for class 'formula'
as_task_clust(x, data, id = deparse1(substitute(data)), ...)

Arguments

x

(any)
Object to convert.

...

(any)
Additional arguments.

clone

(logical(1))
If TRUE, ensures that the returned object is not the same as the input x.

id

(character(1))
Id for the new task. Defaults to the (deparsed and substituted) name of the data argument.

data

(data.frame())
Data frame containing all columns specified in formula x.

Value

TaskClust.

Examples

as_task_clust(datasets::USArrests)

Mini Batch K-Means Clustering Learner

Description

Mini-batch k-means clustering. Calls ClusterR::MiniBatchKmeans() from package ClusterR.

The clusters parameter is set to 2 by default since ClusterR::MiniBatchKmeans() doesn't have a default value for the number of clusters. The predict method uses ClusterR::predict_MBatchKMeans() to compute the cluster memberships for new data. The learner supports both partitional and fuzzy clustering.

Dictionary

This mlr3::Learner can be instantiated via the dictionary mlr3::mlr_learners or with the associated sugar function mlr3::lrn():

mlr_learners$get("clust.MBatchKMeans")
lrn("clust.MBatchKMeans")

Meta Information

Task type: “clust”
Predict Types: “partition”, “prob”
Feature Types: “logical”, “integer”, “numeric”
Required Packages: mlr3, mlr3cluster, ClusterR

Parameters

Id	Type	Default	Levels	Range
clusters	integer	-		`[1, \infty)`
batch_size	integer	10		`[1, \infty)`
num_init	integer	1		`[1, \infty)`
max_iters	integer	100		`[1, \infty)`
init_fraction	numeric	1		`[0, 1]`
initializer	character	kmeans++	optimal_init, quantile_init, kmeans++, random	-
early_stop_iter	integer	10		`[1, \infty)`
verbose	logical	FALSE	TRUE, FALSE	-
CENTROIDS	untyped	NULL		-
tol	numeric	1e-04		`[0, \infty)`
tol_optimal_init	numeric	0.3		`[0, \infty)`
seed	integer	1		`(-\infty, \infty)`

Super classes

mlr3::Learner -> LearnerClust -> LearnerClustMiniBatchKMeans

Methods

Public methods

LearnerClustMiniBatchKMeans$new()
LearnerClustMiniBatchKMeans$clone()

Inherited methods

`LearnerClustMiniBatchKMeans$new()`

Creates a new instance of this R6 class.

Usage

LearnerClustMiniBatchKMeans$new()

`LearnerClustMiniBatchKMeans$clone()`

The objects of this class are cloneable with this method.

Usage

LearnerClustMiniBatchKMeans$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

References

Sculley, David (2010). “Web-scale k-means clustering.” In Proceedings of the 19th international conference on World wide web, 1177–1178.

Examples


# Define the Learner and set parameter values
learner = lrn("clust.MBatchKMeans")
print(learner)

# Define a Task
task = tsk("usarrests")

# Train the learner on the task
learner$train(task)

# Print the model
print(learner$model)

# Make predictions for the task
prediction = learner$predict(task)

# Score the predictions
prediction$score(task = task)

K-Means Clustering Learner (Weka)

Description

K-means clustering (Weka). Calls RWeka::SimpleKMeans() from package RWeka.

The predict method uses RWeka::predict.Weka_clusterer() to compute the cluster memberships for new data.

Dictionary

This mlr3::Learner can be instantiated via the dictionary mlr3::mlr_learners or with the associated sugar function mlr3::lrn():

mlr_learners$get("clust.SimpleKMeans")
lrn("clust.SimpleKMeans")

Meta Information

Task type: “clust”
Predict Types: “partition”
Feature Types: “logical”, “integer”, “numeric”
Required Packages: mlr3, mlr3cluster, RWeka

Parameters

Id	Type	Default	Levels	Range
A	untyped	"weka.core.EuclideanDistance"		-
C	logical	FALSE	TRUE, FALSE	-
fast	logical	FALSE	TRUE, FALSE	-
I	integer	100		`[1, \infty)`
init	integer	0		`[0, 3]`
M	logical	FALSE	TRUE, FALSE	-
max_candidates	integer	100		`[1, \infty)`
min_density	integer	2		`[1, \infty)`
N	integer	2		`[1, \infty)`
num_slots	integer	1		`[1, \infty)`
O	logical	FALSE	TRUE, FALSE	-
periodic_pruning	integer	10000		`[1, \infty)`
S	integer	10		`[0, \infty)`
t2	numeric	-1		`(-\infty, \infty)`
t1	numeric	-1.5		`(-\infty, \infty)`
V	logical	FALSE	TRUE, FALSE	-
output_debug_info	logical	FALSE	TRUE, FALSE	-

Super classes

mlr3::Learner -> LearnerClust -> LearnerClustSimpleKMeans

Methods

Public methods

LearnerClustSimpleKMeans$new()
LearnerClustSimpleKMeans$clone()

Inherited methods

`LearnerClustSimpleKMeans$new()`

Creates a new instance of this R6 class.

Usage

LearnerClustSimpleKMeans$new()

`LearnerClustSimpleKMeans$clone()`

The objects of this class are cloneable with this method.

Usage

LearnerClustSimpleKMeans$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

References

Witten, H I, Frank, Eibe (2002). “Data mining: practical machine learning tools and techniques with Java implementations.” Acm Sigmod Record, 31(1), 76–77.

Forgy, W E (1965). “Cluster analysis of multivariate data: efficiency versus interpretability of classifications.” Biometrics, 21, 768–769.

Lloyd, P S (1982). “Least squares quantization in PCM.” IEEE Transactions on Information Theory, 28(2), 129–137.

MacQueen, James (1967). “Some methods for classification and analysis of multivariate observations.” In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, volume 1, 281–297.

Examples


# Define the Learner and set parameter values
learner = lrn("clust.SimpleKMeans")
print(learner)

# Define a Task
task = tsk("usarrests")

# Train the learner on the task
learner$train(task)

# Print the model
print(learner$model)

# Make predictions for the task
prediction = learner$predict(task)

# Score the predictions
prediction$score(task = task)

Agglomerative Nesting Clustering Learner

Description

Agglomerative hierarchical clustering. Calls cluster::agnes() from package cluster.

The predict method uses stats::cutree() which cuts the tree resulting from hierarchical clustering into specified number of groups (see parameter k). The default number for k is 2.

Initial parameter values

keep.diss:
- Actual default: n < 100, where n is the number of observations.
- Adjusted default: FALSE.
- Reason for change: Avoid storing the dissimilarity matrix in the model to save memory.
keep.data:
- Actual default: TRUE.
- Adjusted default: FALSE.
- Reason for change: Avoid storing the training data in the model to save memory.

Dictionary

This mlr3::Learner can be instantiated via the dictionary mlr3::mlr_learners or with the associated sugar function mlr3::lrn():

mlr_learners$get("clust.agnes")
lrn("clust.agnes")

Meta Information

Task type: “clust”
Predict Types: “partition”
Feature Types: “logical”, “integer”, “numeric”
Required Packages: mlr3, mlr3cluster, cluster

Parameters

Id	Type	Default	Levels	Range
metric	character	euclidean	euclidean, manhattan	-
stand	logical	FALSE	TRUE, FALSE	-
method	character	average	average, single, complete, ward, weighted, flexible, gaverage	-
keep.diss	logical	-	TRUE, FALSE	-
keep.data	logical	TRUE	TRUE, FALSE	-
trace.lev	integer	0		`[0, \infty)`
k	integer	-		`[1, \infty)`
par.method	untyped	-		-

Super classes

mlr3::Learner -> LearnerClust -> LearnerClustAgnes

Methods

Public methods

LearnerClustAgnes$new()
LearnerClustAgnes$clone()

Inherited methods

`LearnerClustAgnes$new()`

Creates a new instance of this R6 class.

Usage

LearnerClustAgnes$new()

`LearnerClustAgnes$clone()`

The objects of this class are cloneable with this method.

Usage

LearnerClustAgnes$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

References

Kaufman, Leonard, Rousseeuw, J P (2009). Finding groups in data: an introduction to cluster analysis. John Wiley & Sons.

Examples


# Define the Learner and set parameter values
learner = lrn("clust.agnes")
print(learner)

# Define a Task
task = tsk("usarrests")

# Train the learner on the task
learner$train(task)

# Print the model
print(learner$model)

# Make predictions for the task
prediction = learner$predict(task)

# Score the predictions
prediction$score(task = task)

Affinity Propagation Clustering Learner

Description

Affinity Propagation clustering. Calls apcluster::apcluster() from package apcluster.

Note that apcluster::apcluster() doesn't have a default for the similarity function. The predict method computes the closest cluster exemplar to find the cluster memberships for new data. The code is taken from StackOverflow answer by the apcluster package maintainer.

Initial parameter values

includeSim:
- Actual default: TRUE.
- Adjusted default: FALSE.
- Reason for change: Avoid storing the n x n similarity matrix in the model.

Dictionary

This mlr3::Learner can be instantiated via the dictionary mlr3::mlr_learners or with the associated sugar function mlr3::lrn():

mlr_learners$get("clust.ap")
lrn("clust.ap")

Meta Information

Task type: “clust”
Predict Types: “partition”
Feature Types: “logical”, “integer”, “numeric”
Required Packages: mlr3, mlr3cluster, apcluster

Parameters

Id	Type	Default	Levels	Range
s	untyped	-		-
p	untyped	NA_real_		-
q	numeric	NA		`[0, 1]`
maxits	integer	1000		`[1, \infty)`
convits	integer	100		`[1, \infty)`
lam	numeric	0.9		`[0.5, 1]`
includeSim	logical	TRUE	TRUE, FALSE	-
details	logical	FALSE	TRUE, FALSE	-
nonoise	logical	FALSE	TRUE, FALSE	-
seed	integer	NA		`(-\infty, \infty)`

Super classes

mlr3::Learner -> LearnerClust -> LearnerClustAP

Methods

Public methods

LearnerClustAP$new()
LearnerClustAP$clone()

Inherited methods

`LearnerClustAP$new()`

Creates a new instance of this R6 class.

Usage

LearnerClustAP$new()

`LearnerClustAP$clone()`

The objects of this class are cloneable with this method.

Usage

LearnerClustAP$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

References

Bodenhofer, Ulrich, Kothmeier, Andreas, Hochreiter, Sepp (2011). “APCluster: an R package for affinity propagation clustering.” Bioinformatics, 27(17), 2463–2464.

Frey, J B, Dueck, Delbert (2007). “Clustering by passing messages between data points.” science, 315(5814), 972–976.

Examples


# Define the Learner and set parameter values
learner = lrn("clust.ap")
print(learner)

BICO Clustering Learner

Description

BICO (fast computation of k-means coresets in a data stream) clustering. Calls stream::DSC_BICO() from package stream.

Dictionary

This mlr3::Learner can be instantiated via the dictionary mlr3::mlr_learners or with the associated sugar function mlr3::lrn():

mlr_learners$get("clust.bico")
lrn("clust.bico")

Meta Information

Task type: “clust”
Predict Types: “partition”
Feature Types: “integer”, “numeric”
Required Packages: mlr3, mlr3cluster, stream

Parameters

Id	Type	Default	Range
k	integer	5	`[1, \infty)`
space	integer	10	`[1, \infty)`
p	integer	10	`[1, \infty)`
iterations	integer	10	`[1, \infty)`

Super classes

mlr3::Learner -> LearnerClust -> LearnerClustBICO

Methods

Public methods

LearnerClustBICO$new()
LearnerClustBICO$clone()

Inherited methods

`LearnerClustBICO$new()`

Creates a new instance of this R6 class.

Usage

LearnerClustBICO$new()

`LearnerClustBICO$clone()`

The objects of this class are cloneable with this method.

Usage

LearnerClustBICO$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

References

Fichtenberger, Hendrik, Gille, Marc, Schmidt, Melanie, Schwiegelshohn, Chris, Sohler, Christian (2013). “BICO: BIRCH Meets Coresets for k-Means Clustering.” In Algorithms–ESA 2013: 21st Annual European Symposium, Sophia Antipolis, France, September 2-4, 2013. Proceedings 21, 481–492. Springer.

Hahsler M, Bolaños M, Forrest J (2017). “Introduction to stream: An Extensible Framework for Data Stream Clustering Research with R.” Journal of Statistical Software, 76(14), 1–50. doi:10.18637/jss.v076.i14.

Examples


# Define the Learner and set parameter values
learner = lrn("clust.bico")
print(learner)

# Define a Task
task = tsk("usarrests")

# Train the learner on the task
learner$train(task)

# Print the model
print(learner$model)

# Make predictions for the task
prediction = learner$predict(task)

# Score the predictions
prediction$score(task = task)

BIRCH Clustering Learner

Description

BIRCH (balanced iterative reducing clustering using hierarchies) clustering. Calls stream::DSC_BIRCH() from package stream.

Dictionary

This mlr3::Learner can be instantiated via the dictionary mlr3::mlr_learners or with the associated sugar function mlr3::lrn():

mlr_learners$get("clust.birch")
lrn("clust.birch")

Meta Information

Task type: “clust”
Predict Types: “partition”
Feature Types: “integer”, “numeric”
Required Packages: mlr3, mlr3cluster, stream

Parameters

Id	Type	Default	Range
threshold	numeric	-	`[0, \infty)`
branching	integer	-	`[1, \infty)`
maxLeaf	integer	-	`[1, \infty)`
maxMem	integer	0	`[0, \infty)`
outlierThreshold	numeric	0.25	`(-\infty, \infty)`

Super classes

mlr3::Learner -> LearnerClust -> LearnerClustBIRCH

Methods

Public methods

LearnerClustBIRCH$new()
LearnerClustBIRCH$clone()

Inherited methods

`LearnerClustBIRCH$new()`

Creates a new instance of this R6 class.

Usage

LearnerClustBIRCH$new()

`LearnerClustBIRCH$clone()`

The objects of this class are cloneable with this method.

Usage

LearnerClustBIRCH$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

References

Zhang, Tian, Ramakrishnan, Raghu, Livny, Miron (1996). “BIRCH: An Efficient Data Clustering Method for Very Large Databases.” ACM sigmod record, 25(2), 103–114.

Zhang, Tian, Ramakrishnan, Raghu, Livny, Miron (1997). “BIRCH: A new data clustering algorithm and its applications.” Data Mining and Knowledge Discovery, 1, 141–182.

Examples


# Define the Learner and set parameter values
learner = lrn("clust.birch")
print(learner)

CLARA Clustering Learner

Description

Clustering Large Applications (CLARA) clustering. Calls cluster::clara() from package cluster.

CLARA extends the PAM algorithm to handle larger datasets by working on sub-datasets of fixed size. The k parameter is set to 2 by default since cluster::clara() doesn't have a default value for the number of clusters. The predict method uses clue::cl_predict() to compute the cluster memberships for new data.

Initial parameter values

keep.data:
- Actual default: TRUE.
- Adjusted default: FALSE.
- Reason for change: Avoid storing the training data in the model to save memory.

Dictionary

This mlr3::Learner can be instantiated via the dictionary mlr3::mlr_learners or with the associated sugar function mlr3::lrn():

mlr_learners$get("clust.clara")
lrn("clust.clara")

Meta Information

Task type: “clust”
Predict Types: “partition”
Feature Types: “logical”, “integer”, “numeric”
Required Packages: mlr3, mlr3cluster, cluster, clue

Parameters

Id	Type	Default	Levels	Range
k	integer	-		`[1, \infty)`
metric	character	euclidean	euclidean, manhattan, jaccard	-
stand	logical	FALSE	TRUE, FALSE	-
samples	integer	5		`[1, \infty)`
sampsize	integer	-		`[1, \infty)`
trace	integer	0		`[0, \infty)`
medoids.x	logical	TRUE	TRUE, FALSE	-
keep.data	logical	TRUE	TRUE, FALSE	-
rngR	logical	FALSE	TRUE, FALSE	-
pamLike	logical	FALSE	TRUE, FALSE	-
correct.d	logical	TRUE	TRUE, FALSE	-

Super classes

mlr3::Learner -> LearnerClust -> LearnerClustCLARA

Methods

Public methods

LearnerClustCLARA$new()
LearnerClustCLARA$clone()

Inherited methods

`LearnerClustCLARA$new()`

Creates a new instance of this R6 class.

Usage

LearnerClustCLARA$new()

`LearnerClustCLARA$clone()`

The objects of this class are cloneable with this method.

Usage

LearnerClustCLARA$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

References

Kaufman, Leonard, Rousseeuw, J P (2009). Finding groups in data: an introduction to cluster analysis. John Wiley & Sons.

Schubert, Erich, Rousseeuw, J P (2019). “Faster k-medoids clustering: improving the PAM, CLARA, and CLARANS algorithms.” In Similarity Search and Applications: 12th International Conference, SISAP 2019, Newark, NJ, USA, October 2–4, 2019, Proceedings 12, 171–187. Springer.

Examples


# Define the Learner and set parameter values
learner = lrn("clust.clara")
print(learner)

# Define a Task
task = tsk("usarrests")

# Train the learner on the task
learner$train(task)

# Print the model
print(learner$model)

# Make predictions for the task
prediction = learner$predict(task)

# Score the predictions
prediction$score(task = task)

Fuzzy C-Means Clustering Learner

Description

Fuzzy c-means clustering. Calls e1071::cmeans() from package e1071.

The centers parameter is set to 2 by default since e1071::cmeans() doesn't have a default value for the number of clusters. The predict method uses clue::cl_predict() to compute the cluster memberships for new data.

Dictionary

This mlr3::Learner can be instantiated via the dictionary mlr3::mlr_learners or with the associated sugar function mlr3::lrn():

mlr_learners$get("clust.cmeans")
lrn("clust.cmeans")

Meta Information

Task type: “clust”
Predict Types: “partition”, “prob”
Feature Types: “logical”, “integer”, “numeric”
Required Packages: mlr3, mlr3cluster, e1071, clue

Parameters

Id	Type	Default	Levels	Range
centers	untyped	-		-
iter.max	integer	100		`[1, \infty)`
verbose	logical	FALSE	TRUE, FALSE	-
dist	character	euclidean	euclidean, manhattan	-
method	character	cmeans	cmeans, ufcl	-
m	numeric	2		`[1, \infty)`
rate.par	numeric	-		`[0, 1]`
weights	untyped	1L		-
control	untyped	-		-

Super classes

mlr3::Learner -> LearnerClust -> LearnerClustCMeans

Methods

Public methods

LearnerClustCMeans$new()
LearnerClustCMeans$clone()

Inherited methods

`LearnerClustCMeans$new()`

Creates a new instance of this R6 class.

Usage

LearnerClustCMeans$new()

`LearnerClustCMeans$clone()`

The objects of this class are cloneable with this method.

Usage

LearnerClustCMeans$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

References

Dimitriadou, Evgenia, Hornik, Kurt, Leisch, Friedrich, Meyer, David, Weingessel, Andreas (2008). “Misc functions of the Department of Statistics (e1071), TU Wien.” R package, 1, 5–24.

Bezdek, C J (2013). Pattern recognition with fuzzy objective function algorithms. Springer Science & Business Media.

Examples


# Define the Learner and set parameter values
learner = lrn("clust.cmeans")
print(learner)

# Define a Task
task = tsk("usarrests")

# Train the learner on the task
learner$train(task)

# Print the model
print(learner$model)

# Make predictions for the task
prediction = learner$predict(task)

# Score the predictions
prediction$score(task = task)

Cobweb Clustering Learner

Description

Cobweb clustering. Calls RWeka::Cobweb() from package RWeka.

The predict method uses RWeka::predict.Weka_clusterer() to compute the cluster memberships for new data.

Dictionary

This mlr3::Learner can be instantiated via the dictionary mlr3::mlr_learners or with the associated sugar function mlr3::lrn():

mlr_learners$get("clust.cobweb")
lrn("clust.cobweb")

Meta Information

Task type: “clust”
Predict Types: “partition”
Feature Types: “logical”, “integer”, “numeric”
Required Packages: mlr3, mlr3cluster, RWeka

Parameters

Id	Type	Default	Range
A	numeric	1	`[0, \infty)`
C	numeric	0.002	`[0, \infty)`
S	integer	42	`[1, \infty)`

Super classes

mlr3::Learner -> LearnerClust -> LearnerClustCobweb

Methods

Public methods

LearnerClustCobweb$new()
LearnerClustCobweb$clone()

Inherited methods

`LearnerClustCobweb$new()`

Creates a new instance of this R6 class.

Usage

LearnerClustCobweb$new()

`LearnerClustCobweb$clone()`

The objects of this class are cloneable with this method.

Usage

LearnerClustCobweb$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

References

Witten, H I, Frank, Eibe (2002). “Data mining: practical machine learning tools and techniques with Java implementations.” Acm Sigmod Record, 31(1), 76–77.

Fisher, H D (1987). “Knowledge acquisition via incremental conceptual clustering.” Machine learning, 2, 139–172.

Gennari, H J, Langley, Pat, Fisher, Doug (1989). “Models of incremental concept formation.” Artificial intelligence, 40(1-3), 11–61.

Examples


# Define the Learner and set parameter values
learner = lrn("clust.cobweb")
print(learner)

# Define a Task
task = tsk("usarrests")

# Train the learner on the task
learner$train(task)

# Print the model
print(learner$model)

# Make predictions for the task
prediction = learner$predict(task)

# Score the predictions
prediction$score(task = task)

DBSCAN Clustering Learner

Description

DBSCAN (density-based spatial clustering of applications with noise) clustering. Calls dbscan::dbscan() from package dbscan.

Dictionary

This mlr3::Learner can be instantiated via the dictionary mlr3::mlr_learners or with the associated sugar function mlr3::lrn():

mlr_learners$get("clust.dbscan")
lrn("clust.dbscan")

Meta Information

Task type: “clust”
Predict Types: “partition”
Feature Types: “logical”, “integer”, “numeric”
Required Packages: mlr3, mlr3cluster, dbscan

Parameters

Id	Type	Default	Levels	Range
eps	numeric	-		`[0, \infty)`
minPts	integer	5		`[0, \infty)`
weights	untyped	-		-
borderPoints	logical	TRUE	TRUE, FALSE	-
search	character	kdtree	kdtree, linear, dist	-
bucketSize	integer	10		`[1, \infty)`
splitRule	character	SUGGEST	STD, MIDPT, FAIR, SL_MIDPT, SL_FAIR, SUGGEST	-
approx	numeric	0		`(-\infty, \infty)`

Super classes

mlr3::Learner -> LearnerClust -> LearnerClustDBSCAN

Methods

Public methods

LearnerClustDBSCAN$new()
LearnerClustDBSCAN$clone()

Inherited methods

`LearnerClustDBSCAN$new()`

Creates a new instance of this R6 class.

Usage

LearnerClustDBSCAN$new()

`LearnerClustDBSCAN$clone()`

The objects of this class are cloneable with this method.

Usage

LearnerClustDBSCAN$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

References

Hahsler M, Piekenbrock M, Doran D (2019). “dbscan: Fast Density-Based Clustering with R.” Journal of Statistical Software, 91(1), 1–30. doi:10.18637/jss.v091.i01.

Ester, Martin, Kriegel, Hans-Peter, Sander, Jörg, Xu, Xiaowei, others (1996). “A density-based algorithm for discovering clusters in large spatial databases with noise.” In kdd, volume 96 number 34, 226–231.

Examples


# Define the Learner and set parameter values
learner = lrn("clust.dbscan")
print(learner)

DBSCAN Clustering Learner (fpc)

Description

DBSCAN (density-based spatial clustering of applications with noise) clustering. Calls fpc::dbscan() from package fpc.

Dictionary

This mlr3::Learner can be instantiated via the dictionary mlr3::mlr_learners or with the associated sugar function mlr3::lrn():

mlr_learners$get("clust.dbscan_fpc")
lrn("clust.dbscan_fpc")

Meta Information

Task type: “clust”
Predict Types: “partition”
Feature Types: “logical”, “integer”, “numeric”
Required Packages: mlr3, mlr3cluster, fpc

Parameters

Id	Type	Default	Levels	Range
eps	numeric	-		`[0, \infty)`
MinPts	integer	5		`[0, \infty)`
scale	logical	FALSE	TRUE, FALSE	-
method	character	hybrid	hybrid, raw, dist	-
seeds	logical	TRUE	TRUE, FALSE	-
showplot	untyped	FALSE		-
countmode	untyped	NULL		-

Super classes

mlr3::Learner -> LearnerClust -> LearnerClustDBSCANfpc

Methods

Public methods

LearnerClustDBSCANfpc$new()
LearnerClustDBSCANfpc$clone()

Inherited methods

`LearnerClustDBSCANfpc$new()`

Creates a new instance of this R6 class.

Usage

LearnerClustDBSCANfpc$new()

`LearnerClustDBSCANfpc$clone()`

The objects of this class are cloneable with this method.

Usage

LearnerClustDBSCANfpc$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

References

Examples


# Define the Learner and set parameter values
learner = lrn("clust.dbscan_fpc")
print(learner)

Divisive Analysis Clustering Learner

Description

Divisive hierarchical clustering. Calls cluster::diana() from package cluster.

The predict method uses stats::cutree() which cuts the tree resulting from hierarchical clustering into specified number of groups (see parameter k). The default value for k is 2.

Initial parameter values

keep.diss:
- Actual default: n < 100, where n is the number of observations.
- Adjusted default: FALSE.
- Reason for change: Avoid storing the dissimilarity matrix in the model to save memory.
keep.data:
- Actual default: TRUE.
- Adjusted default: FALSE.
- Reason for change: Avoid storing the training data in the model to save memory.

Dictionary

This mlr3::Learner can be instantiated via the dictionary mlr3::mlr_learners or with the associated sugar function mlr3::lrn():

mlr_learners$get("clust.diana")
lrn("clust.diana")

Meta Information

Task type: “clust”
Predict Types: “partition”
Feature Types: “logical”, “integer”, “numeric”
Required Packages: mlr3, mlr3cluster, cluster

Parameters

Id	Type	Default	Levels	Range
metric	character	euclidean	euclidean, manhattan	-
stand	logical	FALSE	TRUE, FALSE	-
stop.at.k	untyped	FALSE		-
keep.diss	logical	-	TRUE, FALSE	-
keep.data	logical	TRUE	TRUE, FALSE	-
trace.lev	integer	0		`[0, \infty)`
k	integer	-		`[1, \infty)`

Super classes

mlr3::Learner -> LearnerClust -> LearnerClustDiana

Methods

Public methods

LearnerClustDiana$new()
LearnerClustDiana$clone()

Inherited methods

`LearnerClustDiana$new()`

Creates a new instance of this R6 class.

Usage

LearnerClustDiana$new()

`LearnerClustDiana$clone()`

The objects of this class are cloneable with this method.

Usage

LearnerClustDiana$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

References

Kaufman, Leonard, Rousseeuw, J P (2009). Finding groups in data: an introduction to cluster analysis. John Wiley & Sons.

Examples


# Define the Learner and set parameter values
learner = lrn("clust.diana")
print(learner)

# Define a Task
task = tsk("usarrests")

# Train the learner on the task
learner$train(task)

# Print the model
print(learner$model)

# Make predictions for the task
prediction = learner$predict(task)

# Score the predictions
prediction$score(task = task)

Expectation-Maximization Clustering Learner

Description

Expectation-Maximization clustering. Calls the EM Weka clusterer from package RWeka.

The predict method uses RWeka::predict.Weka_clusterer() to compute the cluster memberships for new data. The learner supports both partitional and fuzzy clustering.

Dictionary

This mlr3::Learner can be instantiated via the dictionary mlr3::mlr_learners or with the associated sugar function mlr3::lrn():

mlr_learners$get("clust.em")
lrn("clust.em")

Meta Information

Task type: “clust”
Predict Types: “partition”, “prob”
Feature Types: “logical”, “integer”, “numeric”
Required Packages: mlr3, mlr3cluster, RWeka

Parameters

Id	Type	Default	Levels	Range
I	integer	100		`[1, \infty)`
ll_cv	numeric	1e-06		`[1e-06, \infty)`
ll_iter	numeric	1e-06		`[1e-06, \infty)`
M	numeric	1e-06		`[1e-06, \infty)`
max	integer	-1		`[-1, \infty)`
N	integer	-1		`[-1, \infty)`
num_slots	integer	1		`[1, \infty)`
S	integer	100		`[0, \infty)`
X	integer	10		`[1, \infty)`
K	integer	10		`[1, \infty)`
V	logical	FALSE	TRUE, FALSE	-
output_debug_info	logical	FALSE	TRUE, FALSE	-

Super classes

mlr3::Learner -> LearnerClust -> LearnerClustEM

Methods

Public methods

LearnerClustEM$new()
LearnerClustEM$clone()

Inherited methods

`LearnerClustEM$new()`

Creates a new instance of this R6 class.

Usage

LearnerClustEM$new()

`LearnerClustEM$clone()`

The objects of this class are cloneable with this method.

Usage

LearnerClustEM$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

References

Witten, H I, Frank, Eibe (2002). “Data mining: practical machine learning tools and techniques with Java implementations.” Acm Sigmod Record, 31(1), 76–77.

Dempster, P A, Laird, M N, Rubin, B D (1977). “Maximum likelihood from incomplete data via the EM algorithm.” Journal of the royal statistical society: series B (methodological), 39(1), 1–22.

Examples


# Define the Learner and set parameter values
learner = lrn("clust.em")
print(learner)

# Define a Task
task = tsk("usarrests")

# Train the learner on the task
learner$train(task)

# Print the model
print(learner$model)

# Make predictions for the task
prediction = learner$predict(task)

# Score the predictions
prediction$score(task = task)

Fuzzy Analysis Clustering Learner

Description

Fuzzy Analysis (FANNY) clustering. Calls cluster::fanny() from package cluster.

The k parameter is set to 2 by default since cluster::fanny() doesn't have a default value for the number of clusters. The predict method copies cluster assignments and memberships generated for train data. The predict does not work for new data.

Initial parameter values

keep.diss:
- Actual default: n < 100, where n is the number of observations.
- Adjusted default: FALSE.
- Reason for change: Avoid storing the dissimilarity matrix in the model to save memory.
keep.data:
- Actual default: TRUE.
- Adjusted default: FALSE.
- Reason for change: Avoid storing the training data in the model to save memory.

Dictionary

This mlr3::Learner can be instantiated via the dictionary mlr3::mlr_learners or with the associated sugar function mlr3::lrn():

mlr_learners$get("clust.fanny")
lrn("clust.fanny")

Meta Information

Task type: “clust”
Predict Types: “partition”, “prob”
Feature Types: “logical”, “integer”, “numeric”
Required Packages: mlr3, mlr3cluster, cluster

Parameters

Id	Type	Default	Levels	Range
k	integer	-		`[1, \infty)`
memb.exp	numeric	2		`[1, \infty)`
metric	character	euclidean	euclidean, manhattan, SqEuclidean	-
stand	logical	FALSE	TRUE, FALSE	-
iniMem.p	untyped	NULL		-
keep.diss	logical	-	TRUE, FALSE	-
keep.data	logical	TRUE	TRUE, FALSE	-
maxit	integer	500		`[0, \infty)`
tol	numeric	1e-15		`[0, \infty)`
trace.lev	integer	0		`[0, \infty)`

Super classes

mlr3::Learner -> LearnerClust -> LearnerClustFanny

Methods

Public methods

LearnerClustFanny$new()
LearnerClustFanny$clone()

Inherited methods

`LearnerClustFanny$new()`

Creates a new instance of this R6 class.

Usage

LearnerClustFanny$new()

`LearnerClustFanny$clone()`

The objects of this class are cloneable with this method.

Usage

LearnerClustFanny$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

References

Kaufman, Leonard, Rousseeuw, J P (2009). Finding groups in data: an introduction to cluster analysis. John Wiley & Sons.

Examples


# Define the Learner and set parameter values
learner = lrn("clust.fanny")
print(learner)

# Define a Task
task = tsk("usarrests")

# Train the learner on the task
learner$train(task)

# Print the model
print(learner$model)

# Make predictions for the task
prediction = learner$predict(task)

# Score the predictions
prediction$score(task = task)

Featureless Clustering Learner

Description

Featureless clustering. Randomly (but evenly) assigns observations to num_clusters partitions (default: 1 partition).

Dictionary

This mlr3::Learner can be instantiated via the dictionary mlr3::mlr_learners or with the associated sugar function mlr3::lrn():

mlr_learners$get("clust.featureless")
lrn("clust.featureless")

Meta Information

Task type: “clust”
Predict Types: “partition”, “prob”
Feature Types: “logical”, “integer”, “numeric”
Required Packages: mlr3, mlr3cluster

Parameters

Id	Type	Default	Range
num_clusters	integer	-	`[1, \infty)`

Super classes

mlr3::Learner -> LearnerClust -> LearnerClustFeatureless

Methods

Public methods

LearnerClustFeatureless$new()
LearnerClustFeatureless$clone()

Inherited methods

`LearnerClustFeatureless$new()`

Creates a new instance of this R6 class.

Usage

LearnerClustFeatureless$new()

`LearnerClustFeatureless$clone()`

The objects of this class are cloneable with this method.

Usage

LearnerClustFeatureless$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

Examples


# Define the Learner and set parameter values
learner = lrn("clust.featureless")
print(learner)

# Define a Task
task = tsk("usarrests")

# Train the learner on the task
learner$train(task)

# Print the model
print(learner$model)

# Make predictions for the task
prediction = learner$predict(task)

# Score the predictions
prediction$score(task = task)

Farthest First Clustering Learner

Description

Farthest First clustering. Calls RWeka::FarthestFirst() from package RWeka.

The predict method uses RWeka::predict.Weka_clusterer() to compute the cluster memberships for new data.

Dictionary

This mlr3::Learner can be instantiated via the dictionary mlr3::mlr_learners or with the associated sugar function mlr3::lrn():

mlr_learners$get("clust.ff")
lrn("clust.ff")

Meta Information

Task type: “clust”
Predict Types: “partition”
Feature Types: “logical”, “integer”, “numeric”
Required Packages: mlr3, mlr3cluster, RWeka

Parameters

Id	Type	Default	Levels	Range
N	integer	2		`[1, \infty)`
S	integer	1		`[1, \infty)`
output_debug_info	logical	FALSE	TRUE, FALSE	-

Super classes

mlr3::Learner -> LearnerClust -> LearnerClustFarthestFirst

Methods

Public methods

LearnerClustFarthestFirst$new()
LearnerClustFarthestFirst$clone()

Inherited methods

`LearnerClustFarthestFirst$new()`

Creates a new instance of this R6 class.

Usage

LearnerClustFarthestFirst$new()

`LearnerClustFarthestFirst$clone()`

The objects of this class are cloneable with this method.

Usage

LearnerClustFarthestFirst$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

References

Witten, H I, Frank, Eibe (2002). “Data mining: practical machine learning tools and techniques with Java implementations.” Acm Sigmod Record, 31(1), 76–77.

Hochbaum, S D, Shmoys, B D (1985). “A best possible heuristic for the k-center problem.” Mathematics of operations research, 10(2), 180–184.

Examples


# Define the Learner and set parameter values
learner = lrn("clust.ff")
print(learner)

# Define a Task
task = tsk("usarrests")

# Train the learner on the task
learner$train(task)

# Print the model
print(learner$model)

# Make predictions for the task
prediction = learner$predict(task)

# Score the predictions
prediction$score(task = task)

Finite Mixture Model Clustering Learner

Description

Finite mixture model clustering via the EM algorithm. Calls flexmix::flexmix() from package flexmix.

The component model is selected through the model parameter, exposing the multivariate normal, univariate normal, multivariate binary, and multivariate Poisson drivers shipped with flexmix. The predict method calls flexmix::clusters() for cluster assignments and flexmix::posterior() for component probabilities on new data.

Note that EM can prune components whose prior falls below minprior during fitting, so the final number of components may be smaller than k.

Dictionary

This mlr3::Learner can be instantiated via the dictionary mlr3::mlr_learners or with the associated sugar function mlr3::lrn():

mlr_learners$get("clust.flexmix")
lrn("clust.flexmix")

Meta Information

Task type: “clust”
Predict Types: “partition”, “prob”
Feature Types: “logical”, “integer”, “numeric”
Required Packages: mlr3, mlr3cluster, flexmix

Parameters

Id	Type	Default	Levels	Range
k	integer	-		`[1, \infty)`
model	character	FLXMCmvnorm	FLXMCmvnorm, FLXMCnorm1, FLXMCmvbinary, FLXMCmvpois	-
diagonal	logical	TRUE	TRUE, FALSE	-
truncated	logical	FALSE	TRUE, FALSE	-
cluster	untyped	-		-
iter.max	integer	200		`[1, \infty)`
minprior	numeric	0.05		`[0, 1]`
tolerance	numeric	1e-06		`[0, \infty)`
verbose	integer	0		`[0, \infty)`
classify	character	auto	auto, weighted, CEM, SEM, hard, random	-
nrep	integer	1		`[1, \infty)`

Super classes

mlr3::Learner -> LearnerClust -> LearnerClustFlexmix

Methods

Public methods

LearnerClustFlexmix$new()
LearnerClustFlexmix$clone()

Inherited methods

`LearnerClustFlexmix$new()`

Creates a new instance of this R6 class.

Usage

LearnerClustFlexmix$new()

`LearnerClustFlexmix$clone()`

The objects of this class are cloneable with this method.

Usage

LearnerClustFlexmix$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

References

Leisch, Friedrich (2004). “FlexMix: A General Framework for Finite Mixture Models and Latent Class Regression in R.” Journal of Statistical Software, 11(8), 1–18. doi:10.18637/jss.v011.i08.

Grün, Bettina, Leisch, Friedrich (2008). “FlexMix Version 2: Finite Mixtures with Concomitant Variables and Varying and Constant Parameters.” Journal of Statistical Software, 28(4), 1–35. doi:10.18637/jss.v028.i04.

Examples


# Define the Learner and set parameter values
learner = lrn("clust.flexmix")
print(learner)

# Define a Task
task = tsk("usarrests")

# Train the learner on the task
learner$train(task)

# Print the model
print(learner$model)

# Make predictions for the task
prediction = learner$predict(task)

# Score the predictions
prediction$score(task = task)

Genie Hierarchical Clustering Learner

Description

Genie hierarchical clustering, a fast and robust outlier-resistant algorithm based on the Gini inequality measure applied to cluster sizes during the linkage process. Calls genieclust::gclust() from package genieclust.

There is no predict method for genieclust::gclust(), so the method returns cluster labels for the training data obtained via stats::cutree() at the requested k.

Dictionary

This mlr3::Learner can be instantiated via the dictionary mlr3::mlr_learners or with the associated sugar function mlr3::lrn():

mlr_learners$get("clust.genie")
lrn("clust.genie")

Meta Information

Task type: “clust”
Predict Types: “partition”
Feature Types: “logical”, “integer”, “numeric”
Required Packages: mlr3, mlr3cluster, genieclust

Parameters

Id	Type	Default	Levels	Range
gini_threshold	numeric	0.3		`[0, 1]`
M	integer	0		`[0, \infty)`
distance	character	euclidean	euclidean, l2, manhattan, cityblock, l1, cosine	-
verbose	logical	FALSE	TRUE, FALSE	-
k	integer	-		`[1, \infty)`

Super classes

mlr3::Learner -> LearnerClust -> LearnerClustGenie

Methods

Public methods

LearnerClustGenie$new()
LearnerClustGenie$clone()

Inherited methods

`LearnerClustGenie$new()`

Creates a new instance of this R6 class.

Usage

LearnerClustGenie$new()

`LearnerClustGenie$clone()`

The objects of this class are cloneable with this method.

Usage

LearnerClustGenie$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

References

Gagolewski, Marek, Bartoszuk, Maciej, Cena, Anna (2016). “Genie: A new, fast, and outlier-resistant hierarchical clustering algorithm.” Information Sciences, 363, 8–23. doi:10.1016/j.ins.2016.05.003.

Gagolewski, Marek (2021). “genieclust: Fast and robust hierarchical clustering.” SoftwareX, 15, 100722. doi:10.1016/j.softx.2021.100722.

Examples


# Define the Learner and set parameter values
learner = lrn("clust.genie")
print(learner)

# Define a Task
task = tsk("usarrests")

# Train the learner on the task
learner$train(task)

# Print the model
print(learner$model)

# Make predictions for the task
prediction = learner$predict(task)

# Score the predictions
prediction$score(task = task)

Hierarchical Clustering Learner

Description

Agglomerative hierarchical clustering. Calls stats::hclust() from package stats.

Distance calculation is done by stats::dist().

Dictionary

This mlr3::Learner can be instantiated via the dictionary mlr3::mlr_learners or with the associated sugar function mlr3::lrn():

mlr_learners$get("clust.hclust")
lrn("clust.hclust")

Meta Information

Task type: “clust”
Predict Types: “partition”
Feature Types: “logical”, “integer”, “numeric”
Required Packages: mlr3, mlr3cluster, 'stats'

Parameters

Id	Type	Default	Levels	Range
method	character	complete	ward.D, ward.D2, single, complete, average, mcquitty, median, centroid	-
members	untyped	NULL		-
distmethod	character	euclidean	euclidean, maximum, manhattan, canberra, binary, minkowski	-
diag	logical	FALSE	TRUE, FALSE	-
upper	logical	FALSE	TRUE, FALSE	-
p	numeric	2		`(-\infty, \infty)`
k	integer	-		`[1, \infty)`

Super classes

mlr3::Learner -> LearnerClust -> LearnerClustHclust

Methods

Public methods

LearnerClustHclust$new()
LearnerClustHclust$clone()

Inherited methods

`LearnerClustHclust$new()`

Creates a new instance of this R6 class.

Usage

LearnerClustHclust$new()

`LearnerClustHclust$clone()`

The objects of this class are cloneable with this method.

Usage

LearnerClustHclust$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

References

Becker, A R, Chambers, M J, Wilks, R A (1988). The New S Language. Wadsworth & Brooks/Cole.

Everitt, S B (1974). Cluster Analysis. Heinemann Educational Books.

Hartigan, A J (1975). Clustering Algorithms. John Wiley & Sons.

Sneath, HA P, Sokal, R R (1973). Numerical Taxonomy. Freeman.

Anderberg, R M (1973). Cluster Analysis for Applications. Academic Press.

Gordon, David A (1999). Classification, 2 edition. Chapman and Hall / CRC.

Murtagh, Fionn (1985). “Multidimensional Clustering Algorithms.” In COMPSTAT Lectures 4. Physica-Verlag.

McQuitty, L L (1966). “Similarity Analysis by Reciprocal Pairs for Discrete and Continuous Data.” Educational and Psychological Measurement, 26(4), 825–831. doi:10.1177/001316446602600402.

Legendre, Pierre, Legendre, Louis (2012). Numerical Ecology, 3 edition. Elsevier Science BV.

Murtagh, Fionn, Legendre, Pierre (2014). “Ward's Hierarchical Agglomerative Clustering Method: Which Algorithms Implement Ward's Criterion?” Journal of Classification, 31, 274–295. doi:10.1007/s00357-014-9161-z.

Examples


# Define the Learner and set parameter values
learner = lrn("clust.hclust")
print(learner)

# Define a Task
task = tsk("usarrests")

# Train the learner on the task
learner$train(task)

# Print the model
print(learner$model)

# Make predictions for the task
prediction = learner$predict(task)

# Score the predictions
prediction$score(task = task)

HDBSCAN Clustering Learner

Description

HDBSCAN (hierarchical DBSCAN) clustering. Calls dbscan::hdbscan() from package dbscan.

The minPts parameter is set to 5 by default since dbscan::hdbscan() doesn't have a default value for the minimum size of clusters.

Dictionary

This mlr3::Learner can be instantiated via the dictionary mlr3::mlr_learners or with the associated sugar function mlr3::lrn():

mlr_learners$get("clust.hdbscan")
lrn("clust.hdbscan")

Meta Information

Task type: “clust”
Predict Types: “partition”
Feature Types: “logical”, “integer”, “numeric”
Required Packages: mlr3, mlr3cluster, dbscan

Parameters

Id	Type	Default	Levels	Range
minPts	integer	-		`[0, \infty)`
cluster_selection_epsilon	numeric	0		`(-\infty, \infty)`
gen_hdbscan_tree	logical	FALSE	TRUE, FALSE	-
gen_simplified_tree	logical	FALSE	TRUE, FALSE	-
verbose	logical	FALSE	TRUE, FALSE	-

Super classes

mlr3::Learner -> LearnerClust -> LearnerClustHDBSCAN

Methods

Public methods

LearnerClustHDBSCAN$new()
LearnerClustHDBSCAN$clone()

Inherited methods

`LearnerClustHDBSCAN$new()`

Creates a new instance of this R6 class.

Usage

LearnerClustHDBSCAN$new()

`LearnerClustHDBSCAN$clone()`

The objects of this class are cloneable with this method.

Usage

LearnerClustHDBSCAN$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

References

Hahsler M, Piekenbrock M, Doran D (2019). “dbscan: Fast Density-Based Clustering with R.” Journal of Statistical Software, 91(1), 1–30. doi:10.18637/jss.v091.i01.

Campello, JGB R, Moulavi, Davoud, Sander, Jörg (2013). “Density-based clustering based on hierarchical density estimates.” In Pacific-Asia conference on knowledge discovery and data mining, 160–172. Springer.

Examples


# Define the Learner and set parameter values
learner = lrn("clust.hdbscan")
print(learner)

# Define a Task
task = tsk("usarrests")

# Train the learner on the task
learner$train(task)

# Print the model
print(learner$model)

# Make predictions for the task
prediction = learner$predict(task)

# Score the predictions
prediction$score(task = task)

K-Centroids Cluster Analysis Learner

Description

K-Centroids Cluster Analysis - a unified framework for partitional clustering with selectable distance / centroid families: standard k-means, k-medians, spherical k-means ("angle"), Jaccard, and extended Jaccard. Calls flexclust::kcca() from package flexclust.

The k parameter is set to 2 by default since flexclust::kcca() has no default value for the number of clusters. Predictions dispatch to flexclust's S4 predict method via methods::getMethod("predict", "kccasimple") rather than calling predict() directly, since both flexclust and kernlab define an S4 class named "kcca" and the resulting class-cache collision can break S4 dispatch when both packages are loaded.

Dictionary

This mlr3::Learner can be instantiated via the dictionary mlr3::mlr_learners or with the associated sugar function mlr3::lrn():

mlr_learners$get("clust.kcca")
lrn("clust.kcca")

Meta Information

Task type: “clust”
Predict Types: “partition”
Feature Types: “logical”, “integer”, “numeric”
Required Packages: mlr3, mlr3cluster, flexclust

Parameters

Id	Type	Default	Levels	Range
k	integer	-		`[1, \infty)`
family	character	kmeans	kmeans, kmedians, angle, jaccard, ejaccard	-
weights	untyped	-		-
group	untyped	-		-
simple	logical	FALSE	TRUE, FALSE	-
save.data	logical	FALSE	TRUE, FALSE	-
iter.max	integer	200		`[1, \infty)`
tolerance	numeric	1e-06		`[0, \infty)`
verbose	integer	0		`[0, \infty)`
classify	character	auto	auto, weighted, hard	-
initcent	untyped	-		-
gamma	numeric	1		`[0, \infty)`
ntry	integer	5		`[1, \infty)`
min.size	integer	2		`[1, \infty)`

Super classes

mlr3::Learner -> LearnerClust -> LearnerClustKCCA

Methods

Public methods

LearnerClustKCCA$new()
LearnerClustKCCA$clone()

Inherited methods

`LearnerClustKCCA$new()`

Creates a new instance of this R6 class.

Usage

LearnerClustKCCA$new()

`LearnerClustKCCA$clone()`

The objects of this class are cloneable with this method.

Usage

LearnerClustKCCA$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

References

Leisch, Friedrich (2006). “A Toolbox for K-Centroids Cluster Analysis.” Computational Statistics & Data Analysis, 51(2), 526–544. doi:10.1016/j.csda.2005.10.006.

Examples


# Define the Learner and set parameter values
learner = lrn("clust.kcca")
print(learner)

# Define a Task
task = tsk("usarrests")

# Train the learner on the task
learner$train(task)

# Print the model
print(learner$model)

# Make predictions for the task
prediction = learner$predict(task)

# Score the predictions
prediction$score(task = task)

Kernel K-Means Clustering Learner

Description

Kernel k-means clustering. Calls kernlab::kkmeans() from package kernlab.

The centers parameter is set to 2 by default since kernlab::kkmeans() doesn't have a default value for the number of clusters. Kernel parameters have to be passed directly and not by using the kpar list in kernlab::kkmeans(). The predict method finds the nearest center in kernel distance to assign clusters for new data points.

Dictionary

This mlr3::Learner can be instantiated via the dictionary mlr3::mlr_learners or with the associated sugar function mlr3::lrn():

mlr_learners$get("clust.kkmeans")
lrn("clust.kkmeans")

Meta Information

Task type: “clust”
Predict Types: “partition”
Feature Types: “logical”, “integer”, “numeric”
Required Packages: mlr3, mlr3cluster, kernlab

Parameters

Id	Type	Default	Levels	Range
centers	untyped	-		-
kernel	character	rbfdot	rbfdot, polydot, vanilladot, tanhdot, laplacedot, besseldot, anovadot, splinedot	-
sigma	numeric	-		`[0, \infty)`
degree	integer	3		`[1, \infty)`
scale	numeric	1		`[0, \infty)`
offset	numeric	1		`(-\infty, \infty)`
order	integer	1		`(-\infty, \infty)`
alg	character	kkmeans	kkmeans, kerninghan	-
p	numeric	1		`(-\infty, \infty)`

Super classes

mlr3::Learner -> LearnerClust -> LearnerClustKKMeans

Methods

Public methods

LearnerClustKKMeans$new()
LearnerClustKKMeans$clone()

Inherited methods

`LearnerClustKKMeans$new()`

Creates a new instance of this R6 class.

Usage

LearnerClustKKMeans$new()

`LearnerClustKKMeans$clone()`

The objects of this class are cloneable with this method.

Usage

LearnerClustKKMeans$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

References

Karatzoglou, Alexandros, Smola, Alexandros, Hornik, Kurt, Zeileis, Achim (2004). “kernlab-an S4 package for kernel methods in R.” Journal of statistical software, 11, 1–20.

Dhillon, S I, Guan, Yuqiang, Kulis, Brian (2004). A unified view of kernel k-means, spectral clustering and graph cuts. Citeseer.

Examples


# Define the Learner and set parameter values
learner = lrn("clust.kkmeans")
print(learner)

# Define a Task
task = tsk("usarrests")

# Train the learner on the task
learner$train(task)

# Print the model
print(learner$model)

# Make predictions for the task
prediction = learner$predict(task)

# Score the predictions
prediction$score(task = task)

K-Means Clustering Learner

Description

K-means clustering. Calls stats::kmeans() from package stats.

The centers parameter is set to 2 by default since stats::kmeans() doesn't have a default value for the number of clusters. The predict method uses clue::cl_predict() to compute the cluster memberships for new data.

Dictionary

This mlr3::Learner can be instantiated via the dictionary mlr3::mlr_learners or with the associated sugar function mlr3::lrn():

mlr_learners$get("clust.kmeans")
lrn("clust.kmeans")

Meta Information

Task type: “clust”
Predict Types: “partition”
Feature Types: “logical”, “integer”, “numeric”
Required Packages: mlr3, mlr3cluster, 'stats', clue

Parameters

Id	Type	Default	Levels	Range
centers	untyped	-		-
iter.max	integer	10		`[1, \infty)`
algorithm	character	Hartigan-Wong	Hartigan-Wong, Lloyd, Forgy, MacQueen	-
nstart	integer	1		`[1, \infty)`
trace	logical	FALSE	TRUE, FALSE	-

Super classes

mlr3::Learner -> LearnerClust -> LearnerClustKMeans

Methods

Public methods

LearnerClustKMeans$new()
LearnerClustKMeans$clone()

Inherited methods

`LearnerClustKMeans$new()`

Creates a new instance of this R6 class.

Usage

LearnerClustKMeans$new()

`LearnerClustKMeans$clone()`

The objects of this class are cloneable with this method.

Usage

LearnerClustKMeans$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

References

Forgy, W E (1965). “Cluster analysis of multivariate data: efficiency versus interpretability of classifications.” Biometrics, 21, 768–769.

Hartigan, A J, Wong, A M (1979). “Algorithm AS 136: A K-means clustering algorithm.” Journal of the Royal Statistical Society. Series C (Applied Statistics), 28(1), 100–108. doi:10.2307/2346830.

Lloyd, P S (1982). “Least squares quantization in PCM.” IEEE Transactions on Information Theory, 28(2), 129–137.

Examples


# Define the Learner and set parameter values
learner = lrn("clust.kmeans")
print(learner)

# Define a Task
task = tsk("usarrests")

# Train the learner on the task
learner$train(task)

# Print the model
print(learner$model)

# Make predictions for the task
prediction = learner$predict(task)

# Score the predictions
prediction$score(task = task)

K-Prototypes Clustering Learner

Description

K-prototypes clustering for mixed-type data. Calls clustMixType::kproto() from package clustMixType.

The k parameter is set to 2 by default since clustMixType::kproto() doesn't have a default value for the number of clusters.

Initial parameter values

keep.data:
- Actual default: TRUE.
- Adjusted default: FALSE.
- Reason for change: Avoid storing the training data in the model to save memory.
verbose:
- Actual default: TRUE.
- Adjusted default: FALSE.
- Reason for change: Suppress verbose output during training.

Dictionary

This mlr3::Learner can be instantiated via the dictionary mlr3::mlr_learners or with the associated sugar function mlr3::lrn():

mlr_learners$get("clust.kproto")
lrn("clust.kproto")

Meta Information

Task type: “clust”
Predict Types: “partition”
Feature Types: “logical”, “integer”, “numeric”, “factor”, “ordered”
Required Packages: mlr3, mlr3cluster, clustMixType

Parameters

Id	Type	Default	Levels	Range
k	untyped	-		-
lambda	untyped	NULL		-
type	character	huang	huang, gower	-
iter.max	integer	100		`[1, \infty)`
nstart	integer	1		`[1, \infty)`
na.rm	character	yes	yes, no, imp.internal, imp.onestep	-
keep.data	logical	TRUE	TRUE, FALSE	-
verbose	logical	TRUE	TRUE, FALSE	-
init	character	NULL	nbh.dens, sel.cen, nstart.m	-
p_nstart.m	numeric	0.9		`[0, 1]`

Super classes

mlr3::Learner -> LearnerClust -> LearnerClustKProto

Methods

Public methods

LearnerClustKProto$new()
LearnerClustKProto$clone()

Inherited methods

`LearnerClustKProto$new()`

Creates a new instance of this R6 class.

Usage

LearnerClustKProto$new()

`LearnerClustKProto$clone()`

The objects of this class are cloneable with this method.

Usage

LearnerClustKProto$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

References

Huang, Zhexue (1998). “Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values.” Data Mining and Knowledge Discovery, 2(3), 283–304.

Examples


# Define the Learner and set parameter values
learner = lrn("clust.kproto")
print(learner)

# Define a mixed-type Task (kproto requires at least one factor variable)
data = data.frame(
  x1 = c(1, 2, 10, 11, 1, 2, 10, 11),
  x2 = factor(c("a", "a", "b", "b", "a", "a", "b", "b"))
)
task = as_task_clust(data)

# Train the learner on the task
learner$train(task)

# Print the model
print(learner$model)

# Make predictions for the task
prediction = learner$predict(task)

# Score the predictions
prediction$score(task = task)

Gaussian Mixture Model Clustering Learner

Description

Gaussian mixture model-based clustering. Calls mclust::Mclust() from package mclust.

The predict method uses mclust::predict.Mclust() to compute the cluster memberships for new data.

Initial parameter values

verbose:
- Actual default: interactive().
- Adjusted default: FALSE.
- Reason for change: Suppress progress output during training.
warn:
- Actual default: mclust.options("warn"), which is FALSE by default.
- Adjusted default: FALSE.
- Reason for change: Suppress warnings during training independently of the mclust global options.

Dictionary

This mlr3::Learner can be instantiated via the dictionary mlr3::mlr_learners or with the associated sugar function mlr3::lrn():

mlr_learners$get("clust.mclust")
lrn("clust.mclust")

Meta Information

Task type: “clust”
Predict Types: “partition”, “prob”
Feature Types: “logical”, “integer”, “numeric”
Required Packages: mlr3, mlr3cluster, mclust

Parameters

Id	Type	Default	Levels
G	untyped	1:9
modelNames	untyped	-
prior	untyped	-
control	untyped	-
initialization	untyped	-
warn	logical	FALSE	TRUE, FALSE
x	untyped	-
verbose	logical	FALSE	TRUE, FALSE

Super classes

mlr3::Learner -> LearnerClust -> LearnerClustMclust

Methods

Public methods

LearnerClustMclust$new()
LearnerClustMclust$clone()

Inherited methods

`LearnerClustMclust$new()`

Creates a new instance of this R6 class.

Usage

LearnerClustMclust$new()

`LearnerClustMclust$clone()`

The objects of this class are cloneable with this method.

Usage

LearnerClustMclust$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

References

Scrucca, Luca, Fop, Michael, Murphy, Brendan T, Raftery, E A (2016). “mclust 5: clustering, classification and density estimation using Gaussian finite mixture models.” The R journal, 8(1), 289.

Fraley, Chris, Raftery, E A (2002). “Model-based clustering, discriminant analysis, and density estimation.” Journal of the American statistical Association, 97(458), 611–631.

Examples


# Define the Learner and set parameter values
learner = lrn("clust.mclust")
print(learner)

# Define a Task
task = tsk("usarrests")

# Train the learner on the task
learner$train(task)

# Print the model
print(learner$model)

# Make predictions for the task
prediction = learner$predict(task)

# Score the predictions
prediction$score(task = task)

Mean Shift Clustering Learner

Description

Mean shift clustering. Calls LPCM::ms() from package LPCM.

There is no predict method for LPCM::ms(), so the method returns cluster labels for the training data.

Initial parameter values

plot:
- Actual default: TRUE.
- Adjusted default: FALSE.
- Reason for change: Suppress plotting during training.

Dictionary

This mlr3::Learner can be instantiated via the dictionary mlr3::mlr_learners or with the associated sugar function mlr3::lrn():

mlr_learners$get("clust.meanshift")
lrn("clust.meanshift")

Meta Information

Task type: “clust”
Predict Types: “partition”
Feature Types: “logical”, “integer”, “numeric”
Required Packages: mlr3, mlr3cluster, LPCM

Parameters

Id	Type	Default	Levels	Range
h	untyped	-		-
subset	untyped	-		-
thr	numeric	0.01		`(-\infty, \infty)`
scaled	integer	1		`[0, \infty)`
iter	integer	200		`[1, \infty)`
plot	logical	TRUE	TRUE, FALSE	-

Super classes

mlr3::Learner -> LearnerClust -> LearnerClustMeanShift

Methods

Public methods

LearnerClustMeanShift$new()
LearnerClustMeanShift$clone()

Inherited methods

`LearnerClustMeanShift$new()`

Creates a new instance of this R6 class.

Usage

LearnerClustMeanShift$new()

`LearnerClustMeanShift$clone()`

The objects of this class are cloneable with this method.

Usage

LearnerClustMeanShift$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

References

Cheng, Yizong (1995). “Mean shift, mode seeking, and clustering.” IEEE transactions on pattern analysis and machine intelligence, 17(8), 790–799.

Examples


# Define the Learner and set parameter values
learner = lrn("clust.meanshift")
print(learner)

# Define a Task
task = tsk("usarrests")

# Train the learner on the task
learner$train(task)

# Print the model
print(learner$model)

# Make predictions for the task
prediction = learner$predict(task)

# Score the predictions
prediction$score(task = task)

von Mises-Fisher Mixture Clustering Learner

Description

Fits a mixture of von Mises-Fisher distributions via EM, the directional-data analogue of a Gaussian mixture for points on the unit hypersphere. Calls movMF::movMF() from package movMF.

The k parameter is set to 2 by default since movMF::movMF() has no default value for the number of mixture components. Rows of x are standardised to unit length internally by movMF::movMF(). Predictions use the predict() method from movMF; prob returns the soft memberships.

Dictionary

This mlr3::Learner can be instantiated via the dictionary mlr3::mlr_learners or with the associated sugar function mlr3::lrn():

mlr_learners$get("clust.movMF")
lrn("clust.movMF")

Meta Information

Task type: “clust”
Predict Types: “partition”, “prob”
Feature Types: “logical”, “integer”, “numeric”
Required Packages: mlr3, mlr3cluster, movMF

Parameters

Id	Type	Default	Levels	Range
k	integer	-		`[1, \infty)`
E	character	softmax	softmax, hardmax, stochmax	-
kappa	untyped	-		-
start	untyped	"p"		-
nruns	integer	1		`[1, \infty)`
maxiter	integer	100		`[1, \infty)`
reltol	numeric	-		`[0, \infty)`
minalpha	numeric	0		`[0, \infty)`
converge	logical	TRUE	TRUE, FALSE	-
verbose	logical	FALSE	TRUE, FALSE	-

Super classes

mlr3::Learner -> LearnerClust -> LearnerClustMovMF

Methods

Public methods

LearnerClustMovMF$new()
LearnerClustMovMF$clone()

Inherited methods

`LearnerClustMovMF$new()`

Creates a new instance of this R6 class.

Usage

LearnerClustMovMF$new()

`LearnerClustMovMF$clone()`

The objects of this class are cloneable with this method.

Usage

LearnerClustMovMF$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

References

Banerjee, Arindam, Dhillon, S I, Ghosh, Joydeep, Sra, Suvrit (2005). “Clustering on the Unit Hypersphere using von Mises-Fisher Distributions.” Journal of Machine Learning Research, 6(46), 1345–1382.

Hornik, Kurt, Grün, Bettina (2014). “movMF: An R Package for Fitting Mixtures of von Mises-Fisher Distributions.” Journal of Statistical Software, 58(10), 1–31. doi:10.18637/jss.v058.i10.

Examples


# Define the Learner and set parameter values
learner = lrn("clust.movMF")
print(learner)

# Define a Task
task = tsk("usarrests")

# Train the learner on the task
learner$train(task)

# Print the model
print(learner$model)

# Make predictions for the task
prediction = learner$predict(task)

# Score the predictions
prediction$score(task = task)

OPTICS Clustering Learner

Description

OPTICS (ordering points to identify the clustering structure) clustering. Calls dbscan::optics() from package dbscan.

Dictionary

This mlr3::Learner can be instantiated via the dictionary mlr3::mlr_learners or with the associated sugar function mlr3::lrn():

mlr_learners$get("clust.optics")
lrn("clust.optics")

Meta Information

Task type: “clust”
Predict Types: “partition”
Feature Types: “logical”, “integer”, “numeric”
Required Packages: mlr3, mlr3cluster, dbscan

Parameters

Id	Type	Default	Levels	Range
eps	numeric	NULL		`[0, \infty)`
minPts	integer	5		`[0, \infty)`
search	character	kdtree	kdtree, linear, dist	-
bucketSize	integer	10		`[1, \infty)`
splitRule	character	SUGGEST	STD, MIDPT, FAIR, SL_MIDPT, SL_FAIR, SUGGEST	-
approx	numeric	0		`(-\infty, \infty)`
eps_cl	numeric	-		`[0, \infty)`

Super classes

mlr3::Learner -> LearnerClust -> LearnerClustOPTICS

Methods

Public methods

LearnerClustOPTICS$new()
LearnerClustOPTICS$clone()

Inherited methods

`LearnerClustOPTICS$new()`

Creates a new instance of this R6 class.

Usage

LearnerClustOPTICS$new()

`LearnerClustOPTICS$clone()`

The objects of this class are cloneable with this method.

Usage

LearnerClustOPTICS$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

References

Hahsler M, Piekenbrock M, Doran D (2019). “dbscan: Fast Density-Based Clustering with R.” Journal of Statistical Software, 91(1), 1–30. doi:10.18637/jss.v091.i01.

Ankerst, Mihael, Breunig, M M, Kriegel, Hans-Peter, Sander, Jörg (1999). “OPTICS: Ordering points to identify the clustering structure.” ACM Sigmod record, 28(2), 49–60.

Examples


# Define the Learner and set parameter values
learner = lrn("clust.optics")
print(learner)

Partitioning Around Medoids Clustering Learner

Description

Partitioning Around Medoids (PAM) clustering. Calls cluster::pam() from package cluster.

The k parameter is set to 2 by default since cluster::pam() doesn't have a default value for the number of clusters. The predict method uses clue::cl_predict() to compute the cluster memberships for new data.

Initial parameter values

keep.diss:
- Actual default: n < 100, where n is the number of observations.
- Adjusted default: FALSE.
- Reason for change: Avoid storing the dissimilarity matrix in the model to save memory.
keep.data:
- Actual default: TRUE.
- Adjusted default: FALSE.
- Reason for change: Avoid storing the training data in the model to save memory.

Dictionary

This mlr3::Learner can be instantiated via the dictionary mlr3::mlr_learners or with the associated sugar function mlr3::lrn():

mlr_learners$get("clust.pam")
lrn("clust.pam")

Meta Information

Task type: “clust”
Predict Types: “partition”
Feature Types: “logical”, “integer”, “numeric”
Required Packages: mlr3, mlr3cluster, cluster, clue

Parameters

Id	Type	Default	Levels	Range
k	integer	-		`[1, \infty)`
metric	character	euclidean	euclidean, manhattan	-
medoids	untyped	NULL		-
nstart	integer	1		`[1, \infty)`
stand	logical	FALSE	TRUE, FALSE	-
do.swap	logical	TRUE	TRUE, FALSE	-
keep.diss	logical	-	TRUE, FALSE	-
keep.data	logical	TRUE	TRUE, FALSE	-
pamonce	untyped	FALSE		-
variant	character	original	original, o_1, o_2, f_3, f_4, f_5, faster	-
trace.lev	integer	0		`[0, \infty)`

Super classes

mlr3::Learner -> LearnerClust -> LearnerClustPAM

Methods

Public methods

LearnerClustPAM$new()
LearnerClustPAM$clone()

Inherited methods

`LearnerClustPAM$new()`

Creates a new instance of this R6 class.

Usage

LearnerClustPAM$new()

`LearnerClustPAM$clone()`

The objects of this class are cloneable with this method.

Usage

LearnerClustPAM$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

References

Reynolds, P A, Richards, Graeme, de la Iglesia, Beatriz, Rayward-Smith, J V (2006). “Clustering rules: a comparison of partitioning and hierarchical clustering algorithms.” Journal of Mathematical Modelling and Algorithms, 5, 475–504.

Examples


# Define the Learner and set parameter values
learner = lrn("clust.pam")
print(learner)

# Define a Task
task = tsk("usarrests")

# Train the learner on the task
learner$train(task)

# Print the model
print(learner$model)

# Make predictions for the task
prediction = learner$predict(task)

# Score the predictions
prediction$score(task = task)

Prototype Hierarchical Clustering Learner

Description

Hierarchical clustering using minimax linkage with prototypes. Calls protoclust::protoclust() from package protoclust.

There is no predict method for protoclust::protoclust(), so the method returns cluster labels for the training data.

Dictionary

This mlr3::Learner can be instantiated via the dictionary mlr3::mlr_learners or with the associated sugar function mlr3::lrn():

mlr_learners$get("clust.protoclust")
lrn("clust.protoclust")

Meta Information

Task type: “clust”
Predict Types: “partition”
Feature Types: “logical”, “integer”, “numeric”
Required Packages: mlr3, mlr3cluster, protoclust

Parameters

Id	Type	Default	Levels	Range
method	character	euclidean	euclidean, maximum, manhattan, canberra, binary, minkowski	-
diag	logical	FALSE	TRUE, FALSE	-
upper	logical	FALSE	TRUE, FALSE	-
p	numeric	2		`(-\infty, \infty)`
verb	logical	FALSE	TRUE, FALSE	-
k	integer	-		`[1, \infty)`

Super classes

mlr3::Learner -> LearnerClust -> LearnerClustProtoclust

Methods

Public methods

LearnerClustProtoclust$new()
LearnerClustProtoclust$clone()

Inherited methods

`LearnerClustProtoclust$new()`

Creates a new instance of this R6 class.

Usage

LearnerClustProtoclust$new()

`LearnerClustProtoclust$clone()`

The objects of this class are cloneable with this method.

Usage

LearnerClustProtoclust$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

References

Bien, Jacob, Tibshirani, Robert (2011). “Hierarchical Clustering with Prototypes via Minimax Linkage.” Journal of the American Statistical Association, 106(495), 1075–1084.

Examples


# Define the Learner and set parameter values
learner = lrn("clust.protoclust")
print(learner)

# Define a Task
task = tsk("usarrests")

# Train the learner on the task
learner$train(task)

# Print the model
print(learner$model)

# Make predictions for the task
prediction = learner$predict(task)

# Score the predictions
prediction$score(task = task)

Spherical K-Means Clustering Learner

Description

Spherical k-means clustering for data on the unit hypersphere. Calls skmeans::skmeans() from package skmeans.

The k parameter is set to 2 by default since skmeans::skmeans() doesn't have a default value for the number of clusters. Observations are partitioned by maximising cosine similarity to cluster prototypes. Predictions on new data assign each observation to the prototype with the highest cosine similarity. Rows with zero norm are not allowed by skmeans::skmeans().

Dictionary

This mlr3::Learner can be instantiated via the dictionary mlr3::mlr_learners or with the associated sugar function mlr3::lrn():

mlr_learners$get("clust.skmeans")
lrn("clust.skmeans")

Meta Information

Task type: “clust”
Predict Types: “partition”
Feature Types: “logical”, “integer”, “numeric”
Required Packages: mlr3, mlr3cluster, skmeans

Parameters

Id	Type	Default	Levels	Range
k	integer	-		`[1, \infty)`
method	character	-	genetic, pclust, CLUTO, gmeans, kmndirs, LIH, LIHC	-
m	numeric	1		`[1, \infty)`
weights	untyped	1		-
maxiter	integer	-		`[1, \infty)`
nruns	integer	-		`[1, \infty)`
popsize	integer	-		`[1, \infty)`
mutations	numeric	-		`[0, 1]`
reltol	numeric	-		`[0, \infty)`
verbose	logical	-	TRUE, FALSE	-

Super classes

mlr3::Learner -> LearnerClust -> LearnerClustSKMeans

Methods

Public methods

LearnerClustSKMeans$new()
LearnerClustSKMeans$clone()

Inherited methods

`LearnerClustSKMeans$new()`

Creates a new instance of this R6 class.

Usage

LearnerClustSKMeans$new()

`LearnerClustSKMeans$clone()`

The objects of this class are cloneable with this method.

Usage

LearnerClustSKMeans$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

References

Dhillon, S I, Modha, S D (2001). “Concept decompositions for large sparse text data using clustering.” Machine Learning, 42(1), 143–175. doi:10.1023/A:1007612920971.

Hornik, Kurt, Feinerer, Ingo, Kober, Martin, Buchta, Christian (2012). “Spherical k-Means Clustering.” Journal of Statistical Software, 50(10), 1–22. doi:10.18637/jss.v050.i10.

Examples


# Define the Learner and set parameter values
learner = lrn("clust.skmeans")
print(learner)

# Define a Task
task = tsk("usarrests")

# Train the learner on the task
learner$train(task)

# Print the model
print(learner$model)

# Make predictions for the task
prediction = learner$predict(task)

# Score the predictions
prediction$score(task = task)

Self-Organizing Maps Clustering Learner

Description

Self-organizing map (Kohonen network) clustering. Calls kohonen::som() from package kohonen.

Each map unit corresponds to a cluster, so the number of clusters is xdim * ydim. Grid dimensions, topology, and neighbourhood function are exposed directly as parameters and forwarded to kohonen::somgrid(). The predict method uses kohonen::predict.kohonen() to assign new data to the closest unit.

Dictionary

This mlr3::Learner can be instantiated via the dictionary mlr3::mlr_learners or with the associated sugar function mlr3::lrn():

mlr_learners$get("clust.som")
lrn("clust.som")

Meta Information

Task type: “clust”
Predict Types: “partition”
Feature Types: “logical”, “integer”, “numeric”
Required Packages: mlr3, mlr3cluster, kohonen

Parameters

Id	Type	Default	Levels	Range
xdim	integer	8		`[1, \infty)`
ydim	integer	6		`[1, \infty)`
topo	character	rectangular	rectangular, hexagonal	-
neighbourhood.fct	character	bubble	bubble, gaussian	-
toroidal	logical	FALSE	TRUE, FALSE	-
rlen	integer	100		`[1, \infty)`
alpha	untyped	c(0.05, 0.01)		-
radius	untyped	-		-
user.weights	untyped	1		-
maxNA.fraction	numeric	0		`[0, 1]`
keep.data	logical	TRUE	TRUE, FALSE	-
dist.fcts	untyped	NULL		-
mode	character	online	online, batch, pbatch	-
cores	integer	-1		`(-\infty, \infty)`
init	untyped	-		-
normalizeDataLayers	logical	TRUE	TRUE, FALSE	-

Super classes

mlr3::Learner -> LearnerClust -> LearnerClustSOM

Methods

Public methods

LearnerClustSOM$new()
LearnerClustSOM$clone()

Inherited methods

`LearnerClustSOM$new()`

Creates a new instance of this R6 class.

Usage

LearnerClustSOM$new()

`LearnerClustSOM$clone()`

The objects of this class are cloneable with this method.

Usage

LearnerClustSOM$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

References

Kohonen, Teuvo (1990). “The self-organizing map.” Proceedings of the IEEE, 78(9), 1464–1480. doi:10.1109/5.58325.

Wehrens, Ron, Kruisselbrink, Johannes (2018). “Flexible self-organizing maps in kohonen 3.0.” Journal of Statistical Software, 87(7), 1–18. doi:10.18637/jss.v087.i07.

Examples


# Define the Learner and set parameter values
learner = lrn("clust.som")
print(learner)

# Define a Task
task = tsk("usarrests")

# Train the learner on the task
learner$train(task)

# Print the model
print(learner$model)

# Make predictions for the task
prediction = learner$predict(task)

# Score the predictions
prediction$score(task = task)

Spectral Clustering Learner

Description

Spectral clustering. Calls kernlab::specc() from package kernlab.

The centers parameter is set to 2 by default since kernlab::specc() doesn't have a default value for the number of clusters. Kernel parameters have to be passed directly and not by using the kpar list in kernlab::specc().

There is no predict method for kernlab::specc(), so the method returns cluster labels for the training data.

Dictionary

This mlr3::Learner can be instantiated via the dictionary mlr3::mlr_learners or with the associated sugar function mlr3::lrn():

mlr_learners$get("clust.specc")
lrn("clust.specc")

Meta Information

Task type: “clust”
Predict Types: “partition”
Feature Types: “logical”, “integer”, “numeric”
Required Packages: mlr3, mlr3cluster, kernlab

Parameters

Id	Type	Default	Levels	Range
centers	integer	-		`[1, \infty)`
kernel	character	rbfdot	rbfdot, polydot, vanilladot, tanhdot, laplacedot, besseldot, anovadot, splinedot	-
sigma	numeric	-		`[0, \infty)`
degree	integer	3		`[1, \infty)`
scale	numeric	1		`[0, \infty)`
offset	numeric	1		`(-\infty, \infty)`
order	integer	1		`(-\infty, \infty)`
nystrom.red	logical	FALSE	TRUE, FALSE	-
nystrom.sample	integer	-		`[1, \infty)`
iterations	integer	200		`[1, \infty)`
mod.sample	numeric	0.75		`[0, 1]`

Super classes

mlr3::Learner -> LearnerClust -> LearnerClustSpectral

Methods

Public methods

LearnerClustSpectral$new()
LearnerClustSpectral$clone()

Inherited methods

`LearnerClustSpectral$new()`

Creates a new instance of this R6 class.

Usage

LearnerClustSpectral$new()

`LearnerClustSpectral$clone()`

The objects of this class are cloneable with this method.

Usage

LearnerClustSpectral$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

References

Karatzoglou, Alexandros, Smola, Alexandros, Hornik, Kurt, Zeileis, Achim (2004). “kernlab-an S4 package for kernel methods in R.” Journal of statistical software, 11, 1–20.

Ng, Y A, Jordan, I M, Weiss, Yair (2001). “On Spectral Clustering: Analysis and an Algorithm.” In Advances in Neural Information Processing Systems, volume 14.

Examples


# Define the Learner and set parameter values
learner = lrn("clust.specc")
print(learner)

# Define a Task
task = tsk("usarrests")

# Train the learner on the task
learner$train(task)

# Print the model
print(learner$model)

# Make predictions for the task
prediction = learner$predict(task)

# Score the predictions
prediction$score(task = task)

ST-DBSCAN Clustering Learner

Description

ST-DBSCAN (spatio-temporal density-based spatial clustering of applications with noise) clustering. Calls stdbscan::st_dbscan() from package stdbscan.

Dictionary

This mlr3::Learner can be instantiated via the dictionary mlr3::mlr_learners or with the associated sugar function mlr3::lrn():

mlr_learners$get("clust.stdbscan")
lrn("clust.stdbscan")

Meta Information

Task type: “clust”
Predict Types: “partition”
Feature Types: “integer”, “numeric”
Required Packages: mlr3, mlr3cluster, stdbscan

Parameters

Id	Type	Default	Levels	Range
eps_spatial	numeric	-		`[0, \infty)`
eps_temporal	numeric	-		`[0, \infty)`
min_pts	integer	-		`[1, \infty)`
borderPoints	logical	TRUE	TRUE, FALSE	-
search	character	kdtree	kdtree, linear, dist	-
bucketSize	integer	10		`[1, \infty)`
splitRule	character	SUGGEST	STD, MIDPT, FAIR, SL_MIDPT, SL_FAIR, SUGGEST	-
approx	numeric	0		`(-\infty, \infty)`

Super classes

mlr3::Learner -> LearnerClust -> LearnerClustSTDBSCAN

Methods

Public methods

LearnerClustSTDBSCAN$new()
LearnerClustSTDBSCAN$clone()

Inherited methods

`LearnerClustSTDBSCAN$new()`

Creates a new instance of this R6 class.

Usage

LearnerClustSTDBSCAN$new()

`LearnerClustSTDBSCAN$clone()`

The objects of this class are cloneable with this method.

Usage

LearnerClustSTDBSCAN$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

References

Birant, Derya, Kut, Alp (2007). “ST-DBSCAN: An algorithm for clustering spatial-temporal data.” Data & Knowledge Engineering, 60(1), 208–221. doi:10.1016/j.datak.2006.01.013.

Examples


# Define the Learner and set parameter values
learner = lrn("clust.stdbscan")
print(learner)

Robust Trimmed Clustering Learner

Description

Robust trimmed clustering. Each cluster is modeled by a multivariate Gaussian; the most outlying alpha fraction of observations is trimmed and labeled with cluster 0 in the returned partition. Calls tclust::tclust() from package tclust.

The k parameter is set to 2 by default since tclust::tclust() doesn't have a default value for the number of clusters. There is no predict method for tclust::tclust(), so the method returns cluster labels for the training data.

Initial parameter values

store_x:
- Actual default: TRUE.
- Adjusted default: FALSE.
- Reason for change: Avoid storing the training data in the model to save memory.

Dictionary

This mlr3::Learner can be instantiated via the dictionary mlr3::mlr_learners or with the associated sugar function mlr3::lrn():

mlr_learners$get("clust.tclust")
lrn("clust.tclust")

Meta Information

Task type: “clust”
Predict Types: “partition”
Feature Types: “logical”, “integer”, “numeric”
Required Packages: mlr3, mlr3cluster, tclust

Parameters

Id	Type	Default	Levels	Range
k	integer	-		`[1, \infty)`
alpha	numeric	0.05		`[0, 0.5]`
nstart	integer	500		`[1, \infty)`
niter1	integer	3		`[1, \infty)`
niter2	integer	20		`[1, \infty)`
nkeep	integer	5		`[1, \infty)`
iter.max	integer	-		`[1, \infty)`
equal.weights	logical	FALSE	TRUE, FALSE	-
restr	character	eigen	eigen, deter	-
restr.fact	numeric	12		`[1, \infty)`
cshape	numeric	1e+10		`[1, \infty)`
opt	character	HARD	HARD, MIXT	-
center	logical	FALSE	TRUE, FALSE	-
scale	logical	FALSE	TRUE, FALSE	-
store_x	logical	TRUE	TRUE, FALSE	-
parallel	logical	FALSE	TRUE, FALSE	-
n.cores	integer	-1		`(-\infty, \infty)`
zero_tol	numeric	1e-16		`[0, \infty)`
drop.empty.clust	logical	TRUE	TRUE, FALSE	-
trace	integer	0		`[0, \infty)`

Super classes

mlr3::Learner -> LearnerClust -> LearnerClustTclust

Methods

Public methods

LearnerClustTclust$new()
LearnerClustTclust$clone()

Inherited methods

`LearnerClustTclust$new()`

Creates a new instance of this R6 class.

Usage

LearnerClustTclust$new()

`LearnerClustTclust$clone()`

The objects of this class are cloneable with this method.

Usage

LearnerClustTclust$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

References

García-Escudero, A L, Gordaliza, Alfonso, Matrán, Carlos, Mayo-Iscar, Agustín (2008). “A general trimming approach to robust cluster analysis.” The Annals of Statistics, 36(3), 1324–1345. doi:10.1214/07-AOS515.

Fritz, Heinrich, García-Escudero, A L, Mayo-Iscar, Agustín (2012). “tclust: An R package for a trimming approach to cluster analysis.” Journal of Statistical Software, 47(12), 1–26. doi:10.18637/jss.v047.i12.

Examples


# Define the Learner and set parameter values
learner = lrn("clust.tclust")
print(learner)

# Define a Task
task = tsk("usarrests")

# Train the learner on the task
learner$train(task)

# Print the model
print(learner$model)

# Make predictions for the task
prediction = learner$predict(task)

# Score the predictions
prediction$score(task = task)

X-Means Clustering Learner

Description

X-means clustering. Calls RWeka::XMeans() from package RWeka.

The predict method uses RWeka::predict.Weka_clusterer() to compute the cluster memberships for new data.

Dictionary

This mlr3::Learner can be instantiated via the dictionary mlr3::mlr_learners or with the associated sugar function mlr3::lrn():

mlr_learners$get("clust.xmeans")
lrn("clust.xmeans")

Meta Information

Task type: “clust”
Predict Types: “partition”
Feature Types: “logical”, “integer”, “numeric”
Required Packages: mlr3, mlr3cluster, RWeka

Parameters

Id	Type	Default	Levels	Range
B	numeric	1		`[0, \infty)`
C	numeric	0		`[0, \infty)`
D	untyped	"weka.core.EuclideanDistance"		-
H	integer	4		`[1, \infty)`
I	integer	1		`[1, \infty)`
J	integer	1000		`[1, \infty)`
K	untyped	""		-
L	integer	2		`[1, \infty)`
M	integer	1000		`[1, \infty)`
S	integer	10		`[1, \infty)`
U	integer	0		`[0, \infty)`
use_kdtree	logical	FALSE	TRUE, FALSE	-
N	untyped	-		-
O	untyped	-		-
Y	untyped	-		-
output_debug_info	logical	FALSE	TRUE, FALSE	-

Super classes

mlr3::Learner -> LearnerClust -> LearnerClustXMeans

Methods

Public methods

LearnerClustXMeans$new()
LearnerClustXMeans$clone()

Inherited methods

`LearnerClustXMeans$new()`

Creates a new instance of this R6 class.

Usage

LearnerClustXMeans$new()

`LearnerClustXMeans$clone()`

The objects of this class are cloneable with this method.

Usage

LearnerClustXMeans$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

References

Witten, H I, Frank, Eibe (2002). “Data mining: practical machine learning tools and techniques with Java implementations.” Acm Sigmod Record, 31(1), 76–77.

Pelleg, Dan, Moore, W A, others (2000). “X-means: Extending k-means with efficient estimation of the number of clusters.” In Icml, volume 1, 727–734.

Examples


# Define the Learner and set parameter values
learner = lrn("clust.xmeans")
print(learner)

Average Between-Cluster Distance

Description

The mean of all pairwise distances between observations belonging to different clusters. Higher values indicate greater separation between clusters. This measure is scale-dependent and is most useful for comparing clusterings of the same dataset.

Details

If the task contains factor or ordered features, Gower distances (cluster::daisy()) are used instead of Euclidean distances.

Dictionary

This mlr3::Measure can be instantiated via the dictionary mlr3::mlr_measures or with the associated sugar function mlr3::msr():

mlr_measures$get("clust.avg_between")
msr("clust.avg_between")

Meta Information

Task type: “clust”
Range: [0, \infty)
Minimize: FALSE
Average: macro
Required Prediction: “partition”
Required Packages: mlr3, mlr3cluster, cluster

Average Within-Cluster Distance

Description

The weighted mean of average pairwise distances within each cluster, where weights are the cluster sizes. Lower values indicate more compact clusters. This measure is scale-dependent and is most useful for comparing clusterings of the same dataset.

Details

If the task contains factor or ordered features, Gower distances (cluster::daisy()) are used instead of Euclidean distances.

Dictionary

This mlr3::Measure can be instantiated via the dictionary mlr3::mlr_measures or with the associated sugar function mlr3::msr():

mlr_measures$get("clust.avg_within")
msr("clust.avg_within")

Meta Information

Task type: “clust”
Range: [0, \infty)
Minimize: TRUE
Average: macro
Required Prediction: “partition”
Required Packages: mlr3, mlr3cluster, cluster

Calinski Harabasz Pseudo F-Statistic

Description

The Calinski-Harabasz index (also known as the Variance Ratio Criterion) is the ratio of between-cluster variance to within-cluster variance, adjusted for the number of clusters and observations. It is defined as CH = \frac{\mathrm{tr}(B) / (k - 1)}{\mathrm{tr}(W) / (n - k)} where B is the between-cluster scatter matrix, W is the within-cluster scatter matrix, k is the number of clusters, and n is the number of observations. Higher values indicate better-defined clusters.

Dictionary

This mlr3::Measure can be instantiated via the dictionary mlr3::mlr_measures or with the associated sugar function mlr3::msr():

mlr_measures$get("clust.ch")
msr("clust.ch")

Meta Information

Task type: “clust”
Range: [0, \infty)
Minimize: FALSE
Average: macro
Required Prediction: “partition”
Required Packages: mlr3, mlr3cluster

References

Caliński, Tadeusz, Harabasz, Jerzy (1974). “A dendrite method for cluster analysis.” Communications in Statistics, 3(1), 1–27. doi:10.1080/03610927408827101.

Davies-Bouldin Index

Description

The Davies-Bouldin index measures the average similarity between each cluster and the cluster most similar to it, where similarity is the ratio of within-cluster scatter to between-cluster separation. It is defined as DB = \frac{1}{k} \sum_{i=1}^{k} \max_{j \neq i} \frac{s_i + s_j}{d_{ij}} where s_i is the average distance of observations in cluster i to its centroid and d_{ij} is the Euclidean distance between centroids i and j. Lower values indicate better clustering.

Dictionary

This mlr3::Measure can be instantiated via the dictionary mlr3::mlr_measures or with the associated sugar function mlr3::msr():

mlr_measures$get("clust.davies_bouldin")
msr("clust.davies_bouldin")

Meta Information

Task type: “clust”
Range: [0, \infty)
Minimize: TRUE
Average: macro
Required Prediction: “partition”
Required Packages: mlr3, mlr3cluster

References

Davies, L D, Bouldin, W D (1979). “A cluster separation measure.” IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-1(2), 224–227. doi:10.1109/TPAMI.1979.4766909.

Dunn Index

Description

The Dunn index is the ratio of the smallest inter-cluster distance to the largest intra-cluster diameter, defined as D = \min_{i \neq j} \delta(C_i, C_j) / \max_k \Delta(C_k) where \delta(C_i, C_j) is the minimum distance between clusters i and j, and \Delta(C_k) is the maximum distance between any two observations in cluster k. Higher values indicate compact, well-separated clusters.

Details

If the task contains factor or ordered features, Gower distances (cluster::daisy()) are used instead of Euclidean distances.

Dictionary

This mlr3::Measure can be instantiated via the dictionary mlr3::mlr_measures or with the associated sugar function mlr3::msr():

mlr_measures$get("clust.dunn")
msr("clust.dunn")

Meta Information

Task type: “clust”
Range: [0, \infty)
Minimize: FALSE
Average: macro
Required Prediction: “partition”
Required Packages: mlr3, mlr3cluster, cluster

References

Dunn, C J (1974). “Well-separated clusters and optimal fuzzy partitions.” Journal of Cybernetics, 4(1), 95–104. doi:10.1080/01969727408546059.

Dunn2 Index

Description

An alternative formulation of the Dunn index that uses average distances instead of extremes. It is defined as the ratio of the minimum average between-cluster distance to the maximum average within-cluster distance: D_2 = \min_{i \neq j} \bar{d}(C_i, C_j) / \max_k \bar{d}(C_k). This variant is more robust to outliers than the standard Dunn index. Higher values indicate better separation.

Details

If the task contains factor or ordered features, Gower distances (cluster::daisy()) are used instead of Euclidean distances.

Dictionary

This mlr3::Measure can be instantiated via the dictionary mlr3::mlr_measures or with the associated sugar function mlr3::msr():

mlr_measures$get("clust.dunn2")
msr("clust.dunn2")

Meta Information

Task type: “clust”
Range: [0, \infty)
Minimize: FALSE
Average: macro
Required Prediction: “partition”
Required Packages: mlr3, mlr3cluster, cluster

References

Dunn, C J (1974). “Well-separated clusters and optimal fuzzy partitions.” Journal of Cybernetics, 4(1), 95–104. doi:10.1080/01969727408546059.

Entropy

Description

The Shannon entropy of the cluster size distribution, defined as H = -\sum_{k=1}^{K} p_k \log(p_k) where p_k = n_k / n is the proportion of observations in cluster k. Lower values indicate more uneven cluster sizes (with 0 for a single cluster), while higher values indicate more uniform sizes. This measure does not evaluate cluster quality directly but characterizes the balance of the partition.

Dictionary

This mlr3::Measure can be instantiated via the dictionary mlr3::mlr_measures or with the associated sugar function mlr3::msr():

mlr_measures$get("clust.entropy")
msr("clust.entropy")

Meta Information

Task type: “clust”
Range: [0, \infty)
Minimize: NA
Average: macro
Required Prediction: “partition”
Required Packages: mlr3, mlr3cluster

Pearson Gamma

Description

The Pearson correlation between pairwise distances and a binary indicator of whether two observations belong to different clusters. All within-cluster distances are paired with indicator 0, and all between-cluster distances with indicator 1. Values close to 1 indicate that between-cluster distances tend to be larger than within-cluster distances, suggesting well-separated clusters.

Details

If the task contains factor or ordered features, Gower distances (cluster::daisy()) are used instead of Euclidean distances.

Dictionary

This mlr3::Measure can be instantiated via the dictionary mlr3::mlr_measures or with the associated sugar function mlr3::msr():

mlr_measures$get("clust.pearsongamma")
msr("clust.pearsongamma")

Meta Information

Task type: “clust”
Range: [-1, 1]
Minimize: FALSE
Average: macro
Required Prediction: “partition”
Required Packages: mlr3, mlr3cluster, cluster

Rousseeuw's Silhouette Quality Index

Description

The Silhouette Width measures how well each observation fits within its assigned cluster compared to neighboring clusters. For each observation, the silhouette value is defined as s(i) = (b(i) - a(i)) / \max(a(i), b(i)) where a(i) is the average distance to all other observations in the same cluster and b(i) is the minimum average distance to observations in any other cluster. The score returned is the mean silhouette width across all observations. Values close to 1 indicate well-clustered observations, values near 0 indicate observations on cluster boundaries, and negative values indicate possible misclassification.

The score function calls cluster::silhouette() from package cluster.

Details

If the task contains factor or ordered features, Gower distances (cluster::daisy()) are used instead of Euclidean distances.

Dictionary

This mlr3::Measure can be instantiated via the dictionary mlr3::mlr_measures or with the associated sugar function mlr3::msr():

mlr_measures$get("clust.silhouette")
msr("clust.silhouette")

Meta Information

Task type: “clust”
Range: [-1, 1]
Minimize: FALSE
Average: macro
Required Prediction: “partition”
Required Packages: mlr3, mlr3cluster, cluster

References

Rousseeuw, J P (1987). “Silhouettes: A graphical aid to the interpretation and validation of cluster analysis.” Journal of Computational and Applied Mathematics, 20, 53–65. doi:10.1016/0377-0427(87)90125-7.

Within/Between Ratio

Description

The ratio of the average within-cluster distance to the average between-cluster distance. The average within-cluster distance is the weighted mean of all pairwise distances within each cluster, and the average between-cluster distance is the mean of all pairwise distances between observations in different clusters. Lower values indicate compact clusters that are well separated from each other.

Details

If the task contains factor or ordered features, Gower distances (cluster::daisy()) are used instead of Euclidean distances.

Dictionary

This mlr3::Measure can be instantiated via the dictionary mlr3::mlr_measures or with the associated sugar function mlr3::msr():

mlr_measures$get("clust.wb_ratio")
msr("clust.wb_ratio")

Meta Information

Task type: “clust”
Range: [0, \infty)
Minimize: TRUE
Average: macro
Required Prediction: “partition”
Required Packages: mlr3, mlr3cluster, cluster

Within Sum of Squares

Description

The total within-cluster sum of squares measures the compactness of the clustering by summing the squared Euclidean distances of each observation to its cluster centroid across all clusters: WSS = \sum_{k=1}^{K} \sum_{i \in C_k} \| x_i - \mu_k \|^2. Lower values indicate tighter clusters.

Dictionary

This mlr3::Measure can be instantiated via the dictionary mlr3::mlr_measures or with the associated sugar function mlr3::msr():

mlr_measures$get("clust.wss")
msr("clust.wss")

Meta Information

Task type: “clust”
Range: [0, \infty)
Minimize: TRUE
Average: macro
Required Prediction: “partition”
Required Packages: mlr3, mlr3cluster

Ruspini Cluster Task

Description

A cluster task for the cluster::ruspini data set.

Format

R6::R6Class inheriting from TaskClust.

Dictionary

This mlr3::Task can be instantiated via the dictionary mlr3::mlr_tasks or with the associated sugar function mlr3::tsk():

mlr_tasks$get("ruspini")
tsk("ruspini")

Meta Information

Task type: “clust”
Dimensions: 75x2
Properties: -
Has Missings: FALSE
Target: -
Features: “x”, “y”

References

Ruspini EH (1970). “Numerical methods for fuzzy clustering.” Information Sciences, 2(3), 319-350. doi:10.1016/S0020-0255(70)80056-1.

US Arrests Cluster Task

Description

A cluster task for the datasets::USArrests data set. Rownames are stored as variable "states" with column role "name".

Format

R6::R6Class inheriting from TaskClust.

Dictionary

This mlr3::Task can be instantiated via the dictionary mlr3::mlr_tasks or with the associated sugar function mlr3::tsk():

mlr_tasks$get("usarrests")
tsk("usarrests")

Meta Information

Task type: “clust”
Dimensions: 50x4
Properties: -
Has Missings: FALSE
Target: -
Features: “Assault”, “Murder”, “Rape”, “UrbanPop”

References

Berry, Brian J (1979). “Interactive Data Analysis: A Practical Primer.” Journal of the Royal Statistical Society: Series C (Applied Statistics), 28, 181.

Package {mlr3cluster}

mlr3cluster: Cluster Extension for 'mlr3'

Description

Author(s)

See Also

Cluster Learner

Description

Super class

Public fields

Methods

Public methods

LearnerClust$new()

Usage

Arguments

LearnerClust$reset()

Usage

LearnerClust$clone()

Usage

Arguments

Examples

Cluster Measure

Description

Super class

Methods

Public methods

MeasureClust$new()

Usage

Arguments

See Also

Prediction Object for Cluster Analysis

Description

Super class

Active bindings

Methods

Public methods

PredictionClust$new()

Usage

Arguments

PredictionClust$clone()

Usage

Arguments

Examples

Cluster Task

Description

Super classes

Methods

Public methods

TaskClust$new()

Usage

Arguments

TaskClust$clone()

Usage

Arguments

See Also

Examples

Convert to a Cluster Prediction

Description

Usage

Arguments

Value

Examples

Convert to a Cluster Task

Description

Usage

Arguments

Value

Examples

Mini Batch K-Means Clustering Learner

Description

Dictionary

Meta Information

Parameters

Super classes

Methods

Public methods

LearnerClustMiniBatchKMeans$new()

Usage

LearnerClustMiniBatchKMeans$clone()

Usage

Arguments

`LearnerClust$new()`

`LearnerClust$reset()`

`LearnerClust$clone()`

`MeasureClust$new()`

`PredictionClust$new()`

`PredictionClust$clone()`

`TaskClust$new()`

`TaskClust$clone()`

`LearnerClustMiniBatchKMeans$new()`

`LearnerClustMiniBatchKMeans$clone()`

`LearnerClustSimpleKMeans$new()`

`LearnerClustSimpleKMeans$clone()`

`LearnerClustAgnes$new()`

`LearnerClustAgnes$clone()`

`LearnerClustAP$new()`

`LearnerClustAP$clone()`

`LearnerClustBICO$new()`

`LearnerClustBICO$clone()`