Research lines
- Speaker Recognition/Identification/Tracking/Diarization
In some applications, a speaker recognition stage is needed to allow access or to customize services. This is the case of banking services through mobile phones, or the activation and customization of ambient intelligence environments to each particular user. Other applications require extracting speaker identity information to build indexes that allow searching for contents related to specific speakers. This is the case of search tools that produce enriched transcriptions of broadcast news or meetings, including speaker turn segmentation and identification (tracking) or clustering (diarization). Though many efforts have been made in the last years, further research is still needed to cope with robustness issues and to attain performance improvements that make this technology useful in realistic conditions. We are strongly involved in the speaker and language recognition research community, we have attended to all the NIST evaluations from 2007 and have presented our own contributions and developments in the most relevant conferences and workshops organized by ISCA and IEEE.
- Information Retrieval from multimedia (speech tracks)
This line consists on the research of new improvements for Speech Recognition and information Retrieval aided by two concrete implementations: retrieval from broadcast news and general multimedia (Ehiztari and Ehiztari2 projects), and retrieval from meeting recordings (Idazkari project)
- Utilities for Speech Processing
All efforts in this direction are included in the Sautrela Project.
- Speech transcription
Though we are not specifically focused on developing new algorithms for automatic speech recognition (ASR), we try to keep up to date with the latest advancements in the field, either by applying freely available tools (e.g. HTK, Sphinxs or SRILM) or by programming our own implementations, which we make also freely available through Sautrela. Currently we apply Hidden Markov Models as acoustic models and smoothed n-grams as language models. Part of our current efforts are devoted to improve our ASR backed to produce more accurate indexes for our spoken document retrieval system, which deals with speech in Basque, Spanish or English.
- Data visualization in experimental sciences
This was a special, temporary collaboration with people from the Applied Phisics in our department. It was mainly supported by the PhD works of Ibon Bustinduy (Line closed) (web)