Skip to content

yunitate segmentation outside audio duration #153

@gobbios

Description

@gobbios

yunitate.sh seems to produce rttm files with segments that go beyond (or even are completely outside) the duration of the source wave file.

the audio I'm using is this:
vagrant ssh -c "sox --i '/vagrant/data/0513.wav'"

Input File : '/vagrant/data/0513.wav'
Channels : 1
Sample Rate : 44100
Precision : 16-bit
Duration : 00:10:04.12 = 26641575 samples = 45308.8 CDDA sectors
File Size : 53.3M
Bit Rate : 706k
Sample Encoding: 16-bit Signed Integer PCM

So that amounts to 604.12 seconds duration.

After running vagrant ssh -c "yunitate.sh data/", I get the following rttm (only last few lines shown):

SPEAKER 0513.rttm 1 601.4 0.1 CHI
SPEAKER 0513.rttm 1 601.5 1.2 FEM
SPEAKER 0513.rttm 1 602.7 2.1 CHI

where the last segment starts inside the source wave file's duration, but goes beyond the end (602.7 + 2.1 = 604.9).

When running vagrant ssh -c "yunitate.sh data/ english" things become even stranger:

SPEAKER 0513.rttm 1 601.6 0.6 FEM
SPEAKER 0513.rttm 1 603.3 0.1 CHI
SPEAKER 0513.rttm 1 603.6 0.1 CHI
SPEAKER 0513.rttm 1 603.9 0.3 CHI
SPEAKER 0513.rttm 1 604.2 0.1 FEM

Here the last segment starts after the end of the original source.

This becomes problematic when using the latter file for vagrant ssh -c "~/launcher/WCE_from_SAD_outputs.sh /vagrant/data/ yunitator_english". Here, the tool finishes without error message, but doesn't produce the word count output. The wav_tmp folder is still present and contains this empty (corrupt?) wav file:

Input File : '/vagrant/data/wav_tmp/yunitator_english_0513_00604200-00000100.wav'
Channels : 1
Sample Rate : 44100
Precision : 16-bit
Sample Encoding: 16-bit Signed Integer PCM

And finally, if I use this file in the analyze.sh pipeline, I get the following message:

(MSG) [2] in SMILExtract : openSMILE starting!
(MSG) [2] in SMILExtract : config file is: MED_2s_100ms_htk.conf
(MSG) [2] in cComponentManager : successfully registered 96 component types.
(MSG) [2] in cComponentManager : successfully finished createInstances
(19 component instances were finalised, 1 data memories were finalised)
(MSG) [2] in cComponentManager : starting single thread processing loop
(MSG) [2] in cComponentManager : Processing finished! System ran for 60436 ticks.
sox WARN trim: End position is after expected end of audio.
sox WARN trim: Last 1 position(s) not reached.
/home/vagrant/utils/analyze.sh: line 40: /vagrant/data//detailed_outputs/WCE_yunitator_english_0513.rttm: No such file or directory
paste: /vagrant/data//wce.temp: No such file or directory

vcm_0513.rttm and yunitator_english_0513.rttm are present in detailed_output, but the corresponding wce_0513.rttm is missing.

One hackish solution might be to append a second or two of silence to the end of the source wave file, I suppose. I haven't tried that yet.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions