.\" Text automatically generated by txt2man .TH mlpack_dbscan 1 "11 January 2024" "mlpack-4.3.0" "User Commands" .SH NAME \fBmlpack_dbscan \fP- dbscan clustering .SH SYNOPSIS .nf .fam C \fBmlpack_dbscan\fP \fB-i\fP \fIunknown\fP [\fB-e\fP \fIdouble\fP] [\fB-m\fP \fIint\fP] [\fB-N\fP \fIbool\fP] [\fB-s\fP \fIstring\fP] [\fB-S\fP \fIbool\fP] [\fB-t\fP \fIstring\fP] [\fB-V\fP \fIbool\fP] [\fB-a\fP \fIunknown\fP] [\fB-C\fP \fIunknown\fP] [\fB-h\fP \fB-v\fP] .fam T .fi .fam T .fi .SH DESCRIPTION This program implements the DBSCAN algorithm for clustering using accelerated tree-based range search. The type of tree that is used may be parameterized, or brute-force range search may also be used. .PP The input dataset to be clustered may be specified with the '\fB--input_file\fP (\fB-i\fP)' parameter; the radius of each range search may be specified with the \(cq\fB--epsilon\fP (\fB-e\fP)' parameters, and the minimum number of points in a cluster may be specified with the '\fB--min_size\fP (\fB-m\fP)' parameter. .PP The '\fB--assignments_file\fP (\fB-a\fP)' and '\fB--centroids_file\fP (\fB-C\fP)' output parameters may be used to save the output of the clustering. '\fB--assignments_file\fP (\fB-a\fP)' contains the cluster assignments of each point, and '\fB--centroids_file\fP (\fB-C\fP)' contains the centroids of each cluster. .PP The range search may be controlled with the '\fB--tree_type\fP (\fB-t\fP)', '\fB--single_mode\fP (\fB-S\fP)', and '\fB--naive\fP (\fB-N\fP)' parameters. '\fB--tree_type\fP (\fB-t\fP)' can control the type of tree used for range search; this can take a variety of values: 'kd', 'r', \(cqr-star', 'x', 'hilbert-r', 'r-plus', 'r-plus-plus', 'cover', 'ball'. The \(cq\fB--single_mode\fP (\fB-S\fP)' parameter will force single-tree search (as opposed to the default dual-tree search), and ''\fB--naive\fP (\fB-N\fP)' will force brute-force range search. .PP An example usage to run DBSCAN on the dataset in 'input.csv' with a radius of 0.5 and a minimum cluster size of 5 is given below: .PP $ \fBmlpack_dbscan\fP \fB--input_file\fP input.csv \fB--epsilon\fP 0.5 \fB--min_size\fP 5 .RE .PP .SH REQUIRED INPUT OPTIONS .TP .B \fB--input_file\fP (\fB-i\fP) [\fIunknown\fP] Input dataset to cluster. .SH OPTIONAL INPUT OPTIONS .TP .B \fB--epsilon\fP (\fB-e\fP) [\fIdouble\fP] Radius of each range search. Default value 1. .TP .B \fB--help\fP (\fB-h\fP) [\fIbool\fP] Default help info. .TP .B \fB--info\fP [\fIstring\fP] Print help on a specific option. Default value ''. .TP .B \fB--min_size\fP (\fB-m\fP) [\fIint\fP] Minimum number of points for a cluster. Default value 5. .TP .B \fB--naive\fP (\fB-N\fP) [\fIbool\fP] If set, brute-force range search (not tree-based) will be used. .TP .B \fB--selection_type\fP (\fB-s\fP) [\fIstring\fP] If using point selection policy, the type of selection to use ('ordered', 'random'). Default value 'ordered'. .TP .B \fB--single_mode\fP (\fB-S\fP) [\fIbool\fP] If set, single-tree range search (not dual-tree) will be used. .TP .B \fB--tree_type\fP (\fB-t\fP) [\fIstring\fP] If using single-tree or dual-tree search, the type of tree to use ('kd', 'r', 'r-star', 'x', 'hilbert-r', 'r-plus', 'r-plus-plus', 'cover', 'ball'). Default value 'kd'. .TP .B \fB--verbose\fP (\fB-v\fP) [\fIbool\fP] Display informational messages and the full list of parameters and timers at the end of execution. .TP .B \fB--version\fP (\fB-V\fP) [\fIbool\fP] Display the version of mlpack. .SH OPTIONAL OUTPUT OPTIONS .TP .B \fB--assignments_file\fP (\fB-a\fP) [\fIunknown\fP] Output matrix for assignments of each point. .TP .B \fB--centroids_file\fP (\fB-C\fP) [\fIunknown\fP] Matrix to save output centroids to. .SH ADDITIONAL INFORMATION For further information, including relevant papers, citations, and theory, consult the documentation found at http://www.mlpack.org or included with your distribution of mlpack.