Speech-to-text publications | Vocapia

ALADAN at IWSLT25 Low-resource Arabic Dialectal Speech Translation Task [IWSLT 2025]

ALADAN at IWSLT24 Low-resource Arabic Dialectal Speech Translation Task [IWSLT 2024]

Exploring the Impact of Pretrained Models and Web-Scraped Data for the 2022 NIST Language Recognition Evaluation [InterSpeech 2023]

Multilingual models with language embeddings for low-resource speech recognition [SIGUL 2023]

Comparing Self-Supervised Pre-Training and Semi-Supervised Training for Speech Recognition in Languages with Weak Language Models [InterSpeech 2023]

Modeling the effect of military oxygen masks on speech characteristics [InterSpeech 2021]

Vocapia-LIMSI System for 2020 Shared Task on Code-switched Spoken Language Identification [First Workshop on Speech Technologies for Code-switching in Multilingual Communities, 2020]

Challenges in Audio Processing of Terrorist-Related Data [25th International Conference on MultiMedia Modeling, 2019]

Exploring temporal reduction in dialectal Spanish: a large-scale study of lenition of voiced stops and coda-s [InterSpeech 2018]

Connected speech in Romanian: Exploring sound change through an ASR system [LINCOM Studies in Theoretical Linguistics, 2018]

Design of a Knowledge-Based Agent as a Social Companion [HCist 2017]

Infected Phonemes: How a Cold Impairs Speech on a Phonetic Level [InterSpeech 2017]

An Investigation into Language Model Data Augmentation for Low-Resourced STT and KWS [ICASSP 2017]

Effective Keyword Search for Low-Resourced Conversational Speech [ICASSP 2017]

KRISTINA: A Knowledge-Based Virtual Conversation Agent [LNCS, vol. 10349, 2017]

Lithuanian Broadcast Speech Transcription using Semi-supervised Acoustic Model Training [Procedia Computer Science, 2016]

Language Recognition for Dialects and Closely Related Languages [Odyssey 2016]

Language Model Data Augmentation for Keyword Spotting in Low-Resourced Training Conditions [InterSpeech 2016]

Marginal contrast among Romanian vowels: evidence from ASR and functional load [InterSpeech 2016]

A Divide-and-Conquer Approach for Language Identification based on Recurrent Neural Networks [InterSpeech 2016]

Towards a Multimedia Knowledge-Based Agent with Social Competence and Human Interaction Capabilities [MARMI 2016]

Investigating Techniques for Low Resource Conversational Speech Recognition [ICASSP 2016]

On Improving Speech Recognition and Keyword Spotting With Automatically Generated Morphological Units [LTC 2015]

Improving Data Selection for Low-Resource STT and KWS [ASRU 2015]

Active Learning Based Data Selection for Limited Resource STT and KWS [InterSpeech 2015]

Lexical Speaker Identification in TV Shows [MTAP 2015]

Comparing Decoding Strategies for Subword-based Keyword Spotting in Low-Resourced Languages [InterSpeech 2014]

Combination of Cepstral and Phonetically Discriminative Features for Speaker Verification [IEEE Signal Processing Letters 2014]

Developing STT and KWS systems using limited language resources [InterSpeech 2014]

Exploring Pronunciation Variants for Romanian Speech-to-Text Transcription [SLTU 2014]

Person instance graphs for mono-, cross- and multi-modal person recognition in multimedia data: application to speaker identification in TV broadcast [IJMIR 2014]

Person Instance Graphs for Named Speaker Identification in TV Broadcast [Odyssey 2014]

Score Normalization and System Combination for Improved Keyword Spotting [ASRU 2013]

Spontaneous speech and opinion detection: mining call-centre transcripts [LRE 2013]

QCompere @ REPERE 2013 [SLAM 2013]

The Vocapia Research ASR systems for Evalita 2011 [LNCS, 2013]

Augmenting Short-term Cepstral Features with Long-term Discriminative Features for Speaker Verification of Telephone Data [InterSpeech 2013]

Unsupervised naming of speakers in broadcast TV: using written names, pronounced names or both? [InterSpeech 2013]

Lattice MLLR based M-Vector System for Speaker Verification [ICASSP 2013]

Fusion of Speech, Faces and Text for Person Identification in TV Broadcast [ECCV 2012]

LIMSI/Vocapia Speaker Verification System for NIST SRE 2012 [NIST SRE 2012]

Unsupervised Speaker Identification using Overlaid Texts in TV Broadcast [InterSpeech 2012]

Transcription of Russian Conversational Speech [SLTU 2012]

Incorporating MLP Features in the Unsupervised Training Process [SLTU 2012]

The Vocapia Research ASR Systems for Evalita 2011 [AISV/Evalita 2012]

Speech Recognition for Machine Translation in Quaero [IWSLT 2011]

Comparing Multi-Stage Approaches for Cross-Show Speaker Diarization [InterSpeech 2011]

A Scalable Video Search Engine Based on Audio Content Indexing and Topic Segmentation [NEM Summit, 2011]

Development of a Speech-to-Text Transcription System for Finnish [SLTU 2010]

Introducing topic segmentation and segmented-based browsing tools into a content based video retrieval system [ACM Multimedia 2010]

Modeling Northern and Southern Varieties of Dutch for STT [InterSpeech 2009]

The Joint LIMSI and Vocapia Research^* Systems for NBEST 2008 [Nbest 2008]

^*formerly Vecsys Research.

More publications :

A list of publications of Vocapia staff from previous positions.