Welcome to SIDEKIT for diarization documentation!¶
SIDEKIT for diarization (s4d as short name) is an open source package extension of SIDEKIT for Speaker diarization .
The aim of S4D is to provide an educational and efficient toolkit
for speaker diarization including the whole chain of treatment
that goes from the audio data to the analysis of the system performance.
Authors: | Sylvain Meignier & Anthony Larcher |
---|---|
Version: | 0.1.0 of 2015/11/15 |
Implementation¶
to allow a wider usage of the code that, we hope, could be beneficial to the community.
The structure of the core package makes use of a limited number of classes in order
to facilitate the readability and reusability of the code.
s4d has been tested under Python 2.7 and Python 3.4 for both Linux and MacOS.
Citation¶
When using s4d for research, please cite:
Authors, Title of the paper to come, in, issue, year, pages...
What for¶
s4d aims at providing the whole chain of tools required to perform speaker diarization.
s4d extends SIDEKIT and the main tools available include:
Acoustic features extraction off SIDEKIT
- Linear-Frequency Cepstral Coefficients (LFCC)
- Mel-Frequency Cepstral Coefficients (MFCC)
- RASTA filtering
- Energy-based Voice Activity Detection (VAD)
- normalization (CMS, CMVN, Short Term Gaussianization)
- Modeling and classification
- from SIDEKIT
- Gaussian Mixture Models (GMM)
- i - vectors
- Probabilistic Linear Discriminant Analysis (PLDA)
- Joint Factor Analysis (JFA)
- Support Vector Machine (SVM)
- from S4D
- Mono gaussian model with full covariance matrix
- BIC segmentation and
- BIC Hierarchical Agglomerative Clustering (HAC) with gaussian models
- Cross-Likelihood Ratio HAC with MAP-GMM models (CLR-HAC)
- i-vector base clustering: HAC, graph based clustering and Integer Linear Programing clustering (ILP)
- Presentation of the results - DER scoring for single and cross-show diarization - Segmentation output viewer