Finding Speech Dataset for ASR and/or Speaker ID

I thought I'd make a list of all the available speech datasets that I know about.

  • TIMIT Not free. 630 speakers, 10 files per speaker ~ 3 seconds each utterance. 16 bit 16kHz.

  • YOHO Not free. Speaker Verification corpus. 8kHz. Combination lock phrases (e.g. 36-24-36).

  • ANDOSL Not free. 108 speakers, 200 files per speaker ~ 3 seconds each utterance. 16 bit 20kHz.

  • RM2 Not free. Resource Management. 16 bit 16kHz.

  • ISOLET Free. very small, each utterance consists of a single spoken letter. 16 bit 16kHz.

  • TED-LIUM about 207 hours of transcribed peech from ted talks. 1242 speakers. 16 bit 16kHz.

  • librispeech 1000 hours of read English speech. 1166? speakers. 16 bit 16kHz.

  • WSJ Not free, Wall Street Journal Continuous Speech dataset. 16kHz.

  • CMU AN4 A really small, old database.

  • Voxforge An open speech database of questionable quality.

Comments !