Clustering¶

HAC BIC¶

class clustering.hac_bic.HAC_BIC(cep, table, alpha=1.0, sr=False)[source]¶

BIC Hierarchical Agglomerative Clustering (HAC) with gaussian models

The algorithm is based upon a hierarchical agglomerative clustering. The initial set of clusters is composed of one segment per cluster. Each cluster is modeled by a Gaussian with a full covariance matrix (see gauss.GaussFull). $\Delta BIC$ measure is employed to select the candidate clusters to group as well as to stop the merging process. The two closest clusters $i$ and $j$ are merged at each iteration until $\Delta BIC_{i,j} > 0$ .

$\Delta BIC_{i,j} = PBIC_{i+j} - PBIC_{i} - PBIC_{j} - P$

$PBIC_{x} = \frac{n_x}{2} \log|\Sigma_x|$

$cst = \frac{1}{2} \alpha \left(d + \frac{d(d+1)}{2}\right)$

$P = cst + log(n_i+n_j)$

where $|\Sigma_i|$ , $|\Sigma_j|$ and $|\Sigma|$ are the determinants of gaussians associated to the clusters $i$ , $j$ and $i+j$ . $\alpha$ is a parameter to set up. The penalty factor $P$ depends on $d$ , the dimension of the features, as well as on $n_i$ and $n_j$ , refering to the total length of cluster $i$ and cluster $j$ respectively.

_dist(mi, mj)[source]¶: Compute the BIC distance d(i,j) :param mi: a GaussFull object :param mj: a GaussFull object :return: float

_init_distance()[source]¶: Compute distance matrix

_init_train()[source]¶: Train initial models

_merge_model(mi, mj)[source]¶: Merge two a GaussFull objects :param mi: a GaussFull object :param mj: a GaussFull object :return: a GaussFull object

_update_dist(i)[source]¶: Update row and column i of the distance matrix :param i: int

perform(to_the_end=False, min_spk=1)[source]¶: perform the HAC algorithm :return: a Diar object and a dictonary mapping the old cluster_list to the new lables

HAC CLR¶

class clustering.hac_clr.HAC_CLR(features_server, diar, ubm, ce=False, ntop=5)[source]¶: CLR Hierarchical Agglomerative Clustering (HAC) with GMM trained by MAP

Tools¶

clustering.hac_utils.argmax(distances, nb)[source]¶: Get argmin and min indexes between 0 and nb of a distance matrix :param distances: a numpy.ndarray :param nb: int :return: row and column indexes, the value

clustering.hac_utils.argmin(distances, nb)[source]¶: Get argmin and min indexes between 0 and nb of a distance matrix :param distances: a numpy.ndarray :param nb: int :return: row and column indexes, the value

clustering.hac_utils.bic_square_root(ni, nj, alpha, dim)[source]¶

Compute a BIC square root distance described in [Stafylakis2010].

[Stafylakis2010]

Stafylakis, V. Katsouros, and G. Carayannis. The segmental bayesian information criterion and its applications to speaker diarization. Selected Topics in Signal Processing, IEEE Journal of, 4(5):857–866, 2010.

Parameters:	ni – covariance matrix of speaker i nj – covariance matrix of speaker j alpha – a threshold dim – the dimenssion of the features
Returns:	a float

clustering.hac_utils.idmap_remove(idmap, index)[source]¶: ” remove data at position index :param index: the index to remove

clustering.hac_utils.roll(mat, j)[source]¶: delete the line j and column j in the matrix :param mat: numpy.ndarray :param j: int :return: numpy.ndarray

clustering.hac_utils.scores_remove(scores, index_model=None, index_seg=None)[source]¶: ” remove data at position index_model and/or index_seg :param index_model: the index in model set to remove :param index_seg: the index in segment set to remove

clustering.hac_utils.stat_server_merge(stat_server, i, j, wi=1.0, wj=1.0)[source]¶: merge the ith and jth stat0 and stat1 into ith data, remove jth data :param i: index destination :param j: index removed

clustering.hac_utils.stat_server_remove(stat_server, index)[source]¶: ” remove data at position index :param index: the index to remove

Table Of Contents

Previous topic

Next topic

This Page

Clustering¶

HAC BIC¶

HAC CLR¶

Tools¶