Entry Date:
May 31, 2016

SIDR: Deep Learning-Based Real-Time Speaker Identification


Consider each of our individual voices as a flashlight to illuminate how we project ourselves in society and how much sonic space we give ourselves or others. Thus, turn-taking computation through speaker recognition systems has been used as a tool to understand social situations or work meetings. We present SIDR, a deep learning-based, real-time speaker recognition system designed to be used in real-world settings. The system is resilient to noise, changes in room acoustics, different languages, and overlapping dialogues. While existing systems require the use of several microphones for each speaker or the need to couple video and sound recordings for accurate recognition of a speaker, SIDR only requires a medium-quality microphone or computer-embedded microphone.