Online Speaker Diarization with Interactive Learning
Speaker diarization is the task to partition an audio stream into homogeneous segments according to the speaker identity. A layman’s way to put it would be “Who spoke when”. It is observed that some state-of-the-art speaker diarization systems require really large datasets to train the clustering modules which might not be easily available everywhere. Here, the method of learning continually can be employed i.e., online learning. Online learning is a problem where data becomes available in a sequential order and is later used to update the best predictor for future data or reward associated with the data features. The only way sometimes, in which the online learning agent can learn from the experience is the feedback in terms of rewards approach. This online learning problem is particularly important in the field of sequential decision-making. In sequential decision-making, the best possible action to perform at each step to maximize the cumulative reward over time is chosen by the agent. It is important to obtain an optimal balance between the exploration of new actions and the exploitation of the possible rewards generated from known previous actions. Report Code