from s4d.diar import Diar, Segment

Class Diar

Diar is a class describing an audio/video diarization file. The diarization file is the most important file in S4D toolkit. All programs are driven by a diarization file and most of them generate a diarization file (except trainers generate models).

To get a instance of Diar:

diar = Diar()

Storage

Diar stores a list of segments (Diar.segments) and it contains the list of the attribut name of the segments (Diar.attr_names).

A segment is a list composed of n attributs. The attribut at position i is named by Diar.attr_names[i]. Attributs could by added or removed. The basic segment is composed of: * field 0: the name of the show, * field 1 :the label of the segment, eg the name of the speaker, * field 2 :the label type in [‘speaker’, ‘head’] (the available type is stored in the list Diar.type_labels), * field 3 :the start corresponding to a feature (a time in centi seconde), * field 4 :the stop corresponding to a feature (a time in centi seconde).

A segment is a portion of a show (audio or video file) with label as annotation. It defines as a kind of slice from start through end-1. The unit is the frame rate. A diarization could draw data from several shows. It is very useful in a batch mode context (training of model, computing log likelihood ratio, cross-show diarization, etc.).

print(diar)
[
  attribut definition  : ['show', 'label', 'label_type', 'start', 'stop']
]

Add segments

There is 4 methods to add segment into a Diar: * Diar.append takes the named arguments available in Diar.attr_names, * Diar.insert takes the named arguments available in Diar.attr_names and insert a segment at a given position, * Diar.append_seg takes a Segment instance, * Diar.append_diar copy the list of segment given in agument.

The example below show how to append 5 segments into diar:

diar.append(show='foo', label='name', start=0, stop=100)
diar.append(show='foo', label='name', start=100, stop=200)
diar.append(show='foo', label='name', start=300, stop=400)
diar.append(show='foo', label='name', start=350, stop=450)
diar.append(show='foo', label='name', start=310, stop=320)
diar.append(show='foo', label='name', start=470, stop=500)


print(diar)
[
  attribut definition  : ['show', 'label', 'label_type', 'start', 'stop']
  row 0: ['foo', 'name', 'speaker', 0, 100]
  row 1: ['foo', 'name', 'speaker', 100, 200]
  row 2: ['foo', 'name', 'speaker', 300, 400]
  row 3: ['foo', 'name', 'speaker', 350, 450]
  row 4: ['foo', 'name', 'speaker', 310, 320]
  row 5: ['foo', 'name', 'speaker', 470, 500]
]

Get and set segment

import copy

diar[0]
seg = copy.deepcopy(diar[0])
seg['label'] = 'name2'
seg['stop'] = 200
diar[1] = seg

print(diar)

diar.rename('label', ['name'], 'name1')

print(diar)
[
  attribut definition  : ['show', 'label', 'label_type', 'start', 'stop']
  row 0: ['foo', 'name', 'speaker', 0, 100]
  row 1: ['foo', 'name2', 'speaker', 0, 200]
  row 2: ['foo', 'name', 'speaker', 300, 400]
  row 3: ['foo', 'name', 'speaker', 350, 450]
  row 4: ['foo', 'name', 'speaker', 310, 320]
  row 5: ['foo', 'name', 'speaker', 470, 500]
]
[
  attribut definition  : ['show', 'label', 'label_type', 'start', 'stop']
  row 0: ['foo', 'name1', 'speaker', 0, 100]
  row 1: ['foo', 'name2', 'speaker', 0, 200]
  row 2: ['foo', 'name1', 'speaker', 300, 400]
  row 3: ['foo', 'name1', 'speaker', 350, 450]
  row 4: ['foo', 'name1', 'speaker', 310, 320]
  row 5: ['foo', 'name1', 'speaker', 470, 500]
]

Attributs

Attributs could be add or delete. To add an attribut named gender in each segment and initialize the value with unk use Diar.att_attribute.

diar.add_attribut('gender', 'unk')
print(diar)
[
  attribut definition  : ['show', 'label', 'label_type', 'start', 'stop', 'gender']
  row 0: ['foo', 'name1', 'speaker', 0, 100, 'unk']
  row 1: ['foo', 'name2', 'speaker', 0, 200, 'unk']
  row 2: ['foo', 'name1', 'speaker', 300, 400, 'unk']
  row 3: ['foo', 'name1', 'speaker', 350, 450, 'unk']
  row 4: ['foo', 'name1', 'speaker', 310, 320, 'unk']
  row 5: ['foo', 'name1', 'speaker', 470, 500, 'unk']
]

Extend or shorten segments

  • Merge consecutive segments with same label:
import copy
diar_pack = copy.deepcopy(diar)
diar_pack.pack()
print(diar_pack)
[
  attribut definition  : ['show', 'label', 'label_type', 'start', 'stop', 'gender']
  row 0: ['foo', 'name1', 'speaker', 0, 100, 'unk']
  row 1: ['foo', 'name2', 'speaker', 0, 200, 'unk']
  row 2: ['foo', 'name1', 'speaker', 300, 320, 'unk']
  row 3: ['foo', 'name1', 'speaker', 350, 450, 'unk']
  row 4: ['foo', 'name1', 'speaker', 470, 500, 'unk']
]
  • Remove small gap (< epsilon) between consecutive segments and merge them:
diar_pack.pack(epsilon=20)
print(diar_pack)
[
  attribut definition  : ['show', 'label', 'label_type', 'start', 'stop', 'gender']
  row 0: ['foo', 'name1', 'speaker', 0, 100, 'unk']
  row 1: ['foo', 'name2', 'speaker', 0, 200, 'unk']
  row 2: ['foo', 'name1', 'speaker', 300, 320, 'unk']
  row 3: ['foo', 'name1', 'speaker', 350, 500, 'unk']
]
  • Remove epsilon to the start and stop of each segment
diar_pad = copy.deepcopy(diar)
print(diar_pad)
diar_pad.pad(epsilon=20)
print(diar_pad)
[
  attribut definition  : ['show', 'label', 'label_type', 'start', 'stop', 'gender']
  row 0: ['foo', 'name1', 'speaker', 0, 100, 'unk']
  row 1: ['foo', 'name2', 'speaker', 0, 200, 'unk']
  row 2: ['foo', 'name1', 'speaker', 300, 400, 'unk']
  row 3: ['foo', 'name1', 'speaker', 350, 450, 'unk']
  row 4: ['foo', 'name1', 'speaker', 310, 320, 'unk']
  row 5: ['foo', 'name1', 'speaker', 470, 500, 'unk']
]
[
  attribut definition  : ['show', 'label', 'label_type', 'start', 'stop', 'gender']
  row 0: ['foo', 'name1', 'speaker', 0, -10, 'unk']
  row 1: ['foo', 'name2', 'speaker', -10, 220, 'unk']
  row 2: ['foo', 'name1', 'speaker', 280, 300, 'unk']
  row 3: ['foo', 'name1', 'speaker', 300, 340, 'unk']
  row 4: ['foo', 'name1', 'speaker', 340, 460, 'unk']
  row 5: ['foo', 'name1', 'speaker', 470, 500, 'unk']
]
  • Apply a collar to each segment.
diar_col = copy.deepcopy(diar)
diar_col.pad(epsilon=20)
print(diar_col)
[
  attribut definition  : ['show', 'label', 'label_type', 'start', 'stop', 'gender']
  row 0: ['foo', 'name1', 'speaker', 0, -10, 'unk']
  row 1: ['foo', 'name2', 'speaker', -10, 220, 'unk']
  row 2: ['foo', 'name1', 'speaker', 280, 300, 'unk']
  row 3: ['foo', 'name1', 'speaker', 300, 340, 'unk']
  row 4: ['foo', 'name1', 'speaker', 340, 460, 'unk']
  row 5: ['foo', 'name1', 'speaker', 470, 500, 'unk']
]

Read and write

seg_diar = Diar.read_seg('data/ref/20041219_1300_1314_RTM_ELDA.seg')    # LIUM format
mdtm_diar = Diar.read_mdtm('data/ref/20041219_1300_1314_RTM_ELDA.mdtm') # MDTM format
rttm_diar = Diar.read_rttm('data/ref/20041219_1300_1314_RTM_ELDA.rttm') # RTTM format
uem_diar = Diar.read_uem('data/ref/20041219_1300_1314_RTM_ELDA.uem')    # UEM format
Diar.write_seg('data/out/20041223_1300_1318_RTM_ELDA.out.seg', seg_diar)
Diar.write_seg('data/out/20041223_1300_1318_RTM_ELDA.mdtm.seg', mdtm_diar)
Diar.write_seg('data/out/20041223_1300_1318_RTM_ELDA.rttm.seg', rttm_diar)
Diar.write_seg('data/out/20041223_1300_1318_RTM_ELDA.uem.seg', uem_diar)

Class: Segment

Segment implements class methods: intersection, union, diff, gap.

print(diar)
seg0 = diar[0]
seg1 = diar[1]

print('intersection 0 and 1: ',Segment.intersection(seg0, seg1))
print('intersection 2 and 3: ', Segment.intersection(diar[2], diar[3]))

print('diff 0 and 1: ',Segment.diff(seg0, seg1))
print('diff 2 and 3: ', Segment.diff(diar[2], diar[3]))

print('union 0 and 1: ',Segment.union(seg0, seg1))
print('union 2 and 3: ', Segment.union(diar[2], diar[3]))


print('gap 0 and 1: ',Segment.gap(seg0, seg1))
print('gap 2 and 3: ', Segment.gap(diar[2], diar[3]))
[
  attribut definition  : ['show', 'label', 'label_type', 'start', 'stop', 'gender']
  row 0: ['foo', 'name1', 'speaker', 0, 100, 'unk']
  row 1: ['foo', 'name2', 'speaker', 0, 200, 'unk']
  row 2: ['foo', 'name1', 'speaker', 300, 400, 'unk']
  row 3: ['foo', 'name1', 'speaker', 350, 450, 'unk']
  row 4: ['foo', 'name1', 'speaker', 310, 320, 'unk']
  row 5: ['foo', 'name1', 'speaker', 470, 500, 'unk']
]
intersection 0 and 1:  ['foo', 'name1 / name2', 'speaker', 0, 100, 'unk']
intersection 2 and 3:  ['foo', 'name1 / name1', 'speaker', 350, 400, 'unk']
diff 0 and 1:  ([['foo', 'name1', 'speaker', 100, 200, 'unk']], [2])
diff 2 and 3:  ([['foo', 'name1', 'speaker', 300, 350, 'unk'], ['foo', 'name1', 'speaker', 400, 450, 'unk']], [1, 2])
union 0 and 1:  ['foo', 'name1', 'speaker', 0, 200, 'unk']
union 2 and 3:  ['foo', 'name1', 'speaker', 300, 450, 'unk']
gap 0 and 1:  ['foo', 'name1', 'speaker', 100, 0, 'unk']
gap 2 and 3:  ['foo', 'name1', 'speaker', 400, 350, 'unk']