Diarization for ASR =================== This script performs a BIC diarization (ussally for ASR decoding) The proposed diarization system was inspired by the system~:raw-latex:`\cite{Barras06}` which won the RT'04 fall evaluation and the ESTER,1 evaluation. It was developed during the ESTER,2 evaluation campaign for the transcription with the goal of minimizing word error rate. Automatic transcription requires accurate segment boundaries. Segment boundaries have to be set within non-informative zones such as filler words. Speaker diarization needs to produce homogeneous speech segments; however, purity and coverage of the speaker clusters are the main objectives here. Errors such as having two distinct clusters (i.e., detected speakers) corresponding to the same real speaker, or conversely, merging segments of two real speakers into only one cluster, get heavier penalty in the NIST time-based diarization metric than misplaced boundaries~:raw-latex:`\cite{NIST2004a}`. The system is composed of acoustic BIC segmentation followed with BIC hierarchical clustering. Viterbi decoding is performed to adjust the segment boundaries. Music and jingle regions are not removed but a speech activity diarization could be load before to segment and cluster the show. Optionally, long segments are cut to be shorter than 20 seconds. .. code:: python %matplotlib inline __license__ = "LGPL" __author__ = "Sylvain Meignier" __copyright__ = "Copyright 2015-2016 Sylvain Meignier" __license__ = "LGPL" __maintainer__ = "Sylvain Meignier" __email__ = "sidekit@univ-lemans.fr" __status__ = "Production" __docformat__ = 'reStructuredText' import argparse import logging import matplotlib import copy from matplotlib import pyplot as plot from s4d.utils import * from s4d.diar import Diar from s4d import viterbi, segmentation from sidekit import features_server from s4d.clustering import hac_bic from sidekit.sidekit_io import init_logging from s4d.gui.dendrogram import plot_dendrogram BIC diarization =============== Arguments, variables and logger ------------------------------- Set the log level .. code:: python loglevel = logging.INFO Set the input audio or mfcc file and the speech activity detection file (here not set). .. code:: python data_dir = 'data' show='20041219_1300_1314_RTM_ELDA' input_show = os.path.join(data_dir, 'audio',show+'.sph') #input_sad = None input_sad = os.path.join(data_dir, 'sns',show+'.sns.seg') Size of left or right windows (step 2) .. code:: python win_size=250 Threshold for: \* linear segmentation (step 3) \* BIC HAC (step 4) \* Vitterbi (step 5) .. code:: python thr_l = 2 thr_h = 3 thr_vit = -250 If ``save_all`` is ``True`` then all produced diarization are saved .. code:: python save_all = True Set the logger options: logge information in console and file, set the level. .. code:: python init_logging( level=loglevel) Check if we work with an audio file or a mffc in spro4 format .. code:: python input_fn = path_show_ext(input_show) ffile = 'spro4' if input_fn[2] in ['.sph', '.wav']: ffile = 'audio' logging.info('type of input: '+ffile) .. parsed-literal:: 2015-12-09 13:41:52,909 - root - INFO - type of input: audio Prepare various variables .. code:: python wdir = os.path.join('out', show) if not os.path.exists(wdir): os.makedirs(wdir) Step 0 : MFCC ------------- Get a Feature server instance .. code:: python logging.info('Make MFCC') fs = features_server.FeaturesServer(input_dir=input_fn[0], input_file_extension=input_fn[2], from_file=ffile, config='diar_16k') .. parsed-literal:: 2015-12-09 13:41:52,916 - root - INFO - Make MFCC Load MFCC or compute MFCC from audio .. code:: python logging.info('Load show: %s', show) tcep, vad = fs.load(show) cep = tcep[0] .. parsed-literal:: 2015-12-09 13:41:52,920 - root - INFO - Load show: 20041219_1300_1314_RTM_ELDA 2015-12-09 13:41:52,921 - root - INFO - read audio 2015-12-09 13:41:53,069 - root - INFO - size of signal: 53.790894 len 14100960 type size 4 2015-12-09 13:41:53,666 - root - INFO - process part : 0.000000 881.310000 881.310000 2015-12-09 13:41:54,989 - root - INFO - no vad 2015-12-09 13:41:55,314 - root - INFO - keep log_e 2015-12-09 13:41:55,322 - root - INFO - !! size of signal cep: 8.740822 len 88129 type size 104 2015-12-09 13:41:55,327 - root - INFO - Smooth the labels and fuse the channels if more than one 2015-12-09 13:41:55,732 - root - INFO - no norm Save the MFCC in spro4 format .. code:: python mfcc_filename = os.path.join(wdir, show + '.pmfcc') fs.save(show, mfcc_filename, 'spro4') .. parsed-literal:: 2015-12-09 13:41:55,737 - root - INFO - save spro4 format: out/20041219_1300_1314_RTM_ELDA/20041219_1300_1314_RTM_ELDA.pmfcc Step 1 ------ The initial diarization is loaded from a speech activity detection diarization (SAD) or a segment from the fist MFCC feature to the last MFCC feature is created. .. code:: python logging.info('Check MFCC') if input_sad is not None: init_diar = Diar.read_seg(input_sad) init_diar.pack(50) else: init_diar = seg.init_seg(cep, show) if save_all: init_filename = os.path.join(wdir, show + '.i.seg') Diar.write_seg(init_filename, init_diar) .. parsed-literal:: 2015-12-09 13:41:55,983 - root - INFO - Check MFCC Step 2: gaussian divergence segmentation ---------------------------------------- First segmentation: Segment each segment of ``init_diar`` using the Gaussian Divergence method .. code:: python logging.info('Make segmentation:') #process every segment seg_diar = Diar() for seg in init_diar: l = seg.length() logging.debug('start: ', seg['start'],'end: ', seg['stop'], 'len: ', l) if l > 2*win_size: cep_seg = seg.seg_features(cep) tmp = segmentation.div_gauss(cep_seg, show=show, win=win_size, shift=seg['start']) seg_diar.append_diar(tmp) else: seg_diar.append_seg(seg) i=0 for seg in seg_diar: seg['label'] = 'S'+str(i) i += 1 if save_all: seg_filename = os.path.join(wdir, show + '.s.seg') Diar.write_seg(seg_filename, seg_diar) .. parsed-literal:: 2015-12-09 13:41:55,998 - root - INFO - Make segmentation: Step 3: linear BIC segmentation ------------------------------- This segmentation over the signal fuses consecutive segments of the same speaker from the start to the end of the record. The measure employs the :math:`\Delta BIC` based on Bayesian Information Criterion , using full covariance Gaussians (see class ``gauss.GaussFull``). .. code:: python bicl_diar = segmentation.bic_linear(cep, seg_diar, thr_l, sr=False) if save_all: bicl_filename = os.path.join(wdir, show + '.l.seg') Diar.write_seg(bicl_filename, bicl_diar) .. parsed-literal:: 2015-12-09 13:41:56,263 - root - WARNING - there is a hole between segment 2015-12-09 13:41:56,263 - root - WARNING - there is a hole between segment 2015-12-09 13:41:56,265 - root - WARNING - there is a hole between segment 2015-12-09 13:41:56,274 - root - WARNING - there is a hole between segment 2015-12-09 13:41:56,280 - root - WARNING - there is a hole between segment 2015-12-09 13:41:56,281 - root - WARNING - there is a hole between segment 2015-12-09 13:41:56,282 - root - WARNING - there is a hole between segment 2015-12-09 13:41:56,284 - root - WARNING - there is a hole between segment 2015-12-09 13:41:56,286 - root - WARNING - there is a hole between segment 2015-12-09 13:41:56,287 - root - WARNING - there is a hole between segment 2015-12-09 13:41:56,290 - root - WARNING - there is a hole between segment .. parsed-literal:: 147 Step 4: BIC HAC --------------- Perform a BIC HAC .. code:: python logging.info('Make clustering alpha: %f', thr_h) bic = hac_bic.HAC_BIC(fs, bicl_diar, alpha=thr_h, sr=False) bich_diar = bic.perform(to_the_end=True) if save_all: bichac_filename = os.path.join(wdir, show + '.h.seg') Diar.write_seg(bichac_filename, bich_diar) link, data = plot_dendrogram(bic.merge, 0, size=(25,6), log=True) .. parsed-literal:: 2015-12-09 13:41:56,298 - root - INFO - Make clustering alpha: 3.000000 /Users/meignier/pyenv3/lib/python3.5/site-packages/matplotlib/collections.py:590: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison if self._edgecolors == str('face'): .. image:: diarization_31_1.png Step 4: re-segmentation ----------------------- Viterbi decoding \* HMM is trained: one GMM per speaker, GMM has 8 component with diagonal covariance matrix. Only a penalty between state is fixed. \* Emission is computer: likelyhood for each feature \* a Viterbi decoding is performed .. code:: python logging.info('Make viterbi penalties: %f', thr_vit) hmm = viterbi.Viterbi(cep, bich_diar, exit_penalties=[thr_vit]) hmm.train() hmm.emission() vit_diar = hmm.decode(init_diar) if save_all: vit_filename = os.path.join(wdir, show + '.d.seg') Diar.write_seg(vit_filename, vit_diar) .. parsed-literal:: 2015-12-09 13:41:56,734 - root - INFO - Make viterbi penalties: -250.000000 Step 5 ------ Move the segment boundaries into low energy aeras. .. code:: python # adj_table = seg.adjust(cep, vit_table) # if save_all: # Table.write_seg(adj_filename, adj_table) # output_seg(args.output, adj_table) Compute the diarization error rate ---------------------------------- .. code:: python from s4d.scoring import DER from s4d.gui.viewer import PlotDiar from s4d.gui.viewer_utils import diar_diff ref = Diar.read_seg(os.path.join(data_dir, 'ref', show + '.seg')) uem = Diar.read_uem(os.path.join(data_dir, 'ref', show + '.uem')) der = DER(vit_diar, ref, uem, collar=25, no_overlap=False) der.confusion() der.assignment() res = der.error() print(res.rate_header()) print(res.time()) print(res.rate()) diff_diar = diar_diff(vit_diar, ref) p = PlotDiar(diff_diar, size=(25, 6)) p.draw() .. parsed-literal:: || sns fa miss | speaker fa miss conf || || 1762.62s 12.49s 3.89s | 827.56s 12.49s 3.89s 254.23s || || 0.93% 0.71% 0.22% | 32.70% 1.51% 0.47% 30.72% || uem from ref append collar .. parsed-literal:: /Users/meignier/pyenv3/lib/python3.5/site-packages/matplotlib/figure.py:1653: UserWarning: This figure includes Axes that are not compatible with tight_layout, so its results might be incorrect. warnings.warn("This figure includes Axes that are not " .. image:: diarization_37_2.png Speaker diarization =================== MFCC for Speaker clustering --------------------------- - get 12 MFCC + Delta and normalize them .. code:: python from sidekit.frontend.features import compute_delta from s4d.utils import cep_sliding_norm #print('baseline cep:', cep.shape) cep12 = cep[:,1:] #print('12 MFCC:', cep12.shape) delta = compute_delta(cep12) #print('12 delta:', delta.shape) cep24 = np.column_stack((cep12, delta)) #print('12MFCC + 12 delta:', cep24.shape) cep_sliding_norm(cep24, win=301, center=True, reduce=True) #hack put the new cep24 in the feature server fs.cep[0] = cep24 #print(fs.cep.shape) CLR clustering -------------- - load UBM .. code:: python from sidekit.mixture import Mixture ubm = Mixture() ubm_fn = os.path.join(data_dir, 'model', 'ubm128.gmm') print(ubm_fn) ubm.read_pickle(ubm_fn) .. parsed-literal:: data/model/ubm128.gmm - perform CLR HAC clustreing - initialize HAC - compute models - compute distance - do clustreing .. code:: python from s4d.clustering.hac_clr import HAC_CLR thr_clr = -1.4 clr_hac = HAC_CLR(fs, vit_diar, ubm) clr_hac.initial_models() clr_hac.initial_distances() .. parsed-literal:: Warning, some arguments are not named, computation might not be parallelized No Parallel processing with this module .. code:: python dist = copy.deepcopy(clr_hac.dist) n=len(vit_diar.unique('label')) #print(dist) np.fill_diagonal(dist, np.min(dist)) # Plot the density map using nearest-neighbor interpolation plot.figure(figsize=(10,10)) plot.pcolor(dist.T, cmap='RdBu') plot.colorbar() plot.show() .. parsed-literal:: /Users/meignier/pyenv3/lib/python3.5/site-packages/matplotlib/collections.py:590: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison if self._edgecolors == str('face'): .. image:: diarization_44_1.png .. code:: python clr_diar = clr_hac.perform(thr_clr, to_the_end=True) if save_all: clr_filename = os.path.join(wdir, show + '.clr.seg') Diar.write_seg(clr_filename, clr_diar) link, data = plot_dendrogram(clr_hac.merge, thr_clr, size=(25,6), log=True) .. parsed-literal:: /Users/meignier/pyenv3/lib/python3.5/site-packages/matplotlib/collections.py:590: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison if self._edgecolors == str('face'): .. image:: diarization_45_1.png - Compute DER .. code:: python der = DER(clr_diar, ref, uem, collar=25, no_overlap=False) der.confusion() der.assignment() res = der.error() print(show) print(res.rate_header()) print(res.time()) print(res.rate()) diff_diar = diar_diff(clr_diar, ref) p = PlotDiar(diff_diar, size=(25, 6)) p.draw() .. parsed-literal:: 20041219_1300_1314_RTM_ELDA || sns fa miss | speaker fa miss conf || || 1762.62s 12.49s 3.89s | 827.56s 12.49s 3.89s 25.94s || || 0.93% 0.71% 0.22% | 5.11% 1.51% 0.47% 3.13% || uem from ref append collar .. parsed-literal:: /Users/meignier/pyenv3/lib/python3.5/site-packages/matplotlib/figure.py:1653: UserWarning: This figure includes Axes that are not compatible with tight_layout, so its results might be incorrect. warnings.warn("This figure includes Axes that are not " .. image:: diarization_47_2.png I-vector PLDA clustering ------------------------ .. code:: python #matplotlib.use('TKAgg') Graph and HAC ------------- Licence ------- This file is part of S4D. SD4 is a python package for speaker diarization based on SIDEKIT. S4D home page: http://www-lium.univ-lemans.fr/s4d/ SIDEKIT home page: http://www-lium.univ-lemans.fr/sidekit/ S4D is free software: you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. S4D is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details. You should have received a copy of the GNU Lesser General Public License along with SIDEKIT. If not, see http://www.gnu.org/licenses/. Copyright 2014-2015 Sylvain Meignier