DCPY.SPECTRALCLUST(n_clusters, random_state, n_init, columns)
Spectral clustering works by applying K-means clustering on fewer dimensions as a result of low-dimension embedding of the affinity matrix between data points.
Parameters
- 
                                                            
n_clusters – Number of clusters to find, integer (default 8).
 - 
                                                            
random_state – Seed used to generate random numbers by the K-means initialization and eigenvectors decomposition, integer (default 0).
 - 
                                                            
n_init – Number of times the K-means will run with different centroids seeds, to get the best output, integer (default 10).
 - 
                                                            
columns – Dataset columns or custom calculations.
 
Example: DCPY.SPECTRALCLUST(8, 0, 10, sum([Gross Sales]), sum([No of customers])) used as a calculation for the Color field of the Scatterplot visualization.
Input data
- Numeric variables are automatically scaled to zero mean and unit variance.
 - Character variables are transformed to numeric values using one-hot encoding.
 - Dates are treated as character variables, so they are also one-hot encoded.
 - Size of input data is not limited, but many categories in character or date variables increase rapidly the dimensionality.
 - Rows that contain missing values in any of their columns are dropped.
 
Result
- Column of integer values starting with 0, where each number corresponds to a cluster assigned to each record (row) by the algorithm.
 - Rows that were dropped from input data due to containing missing values have missing value instead of assigned cluster.
 
Key usage points
- It often outperforms traditional clustering methods like K-means.
 - Very useful when the structure of individual clusters is highly non-convex or when a measure of the center and spread of the cluster is not a suitable description of the complete cluster, for example nested circles on the 2D plan.
 - Works well when the estimated number of clusters is relatively low.
 - Avoid using it with too many clusters.
 - Must know estimated number of clusters.
 - Lower clustering quality when the dataset contains structures at different scales of size and density.
 - High time complexity and memory usage, suitable for small to medium sized datasets.
 
For the whole list of algorithms, see Data science built-in algorithms.