I thought I'd make a list of all the available speech datasets that I know about.
-
TIMIT Not free. 630 speakers, 10 files per speaker ~ 3 seconds each utterance. 16 bit 16kHz.
-
YOHO Not free. Speaker Verification corpus. 8kHz. Combination lock phrases (e.g. 36-24-36).
-
ANDOSL Not free. 108 speakers, 200 files per speaker ~ 3 seconds each utterance. 16 bit 20kHz.
-
RM2 Not free. Resource Management. 16 bit 16kHz.
-
ISOLET Free. very small, each utterance consists of a single spoken letter. 16 bit 16kHz.
-
TED-LIUM about 207 hours of transcribed peech from ted talks. 1242 speakers. 16 bit 16kHz.
-
librispeech 1000 hours of read English speech. 1166? speakers. 16 bit 16kHz.
-
WSJ Not free, Wall Street Journal Continuous Speech dataset. 16kHz.
-
CMU AN4 A really small, old database.
-
Voxforge An open speech database of questionable quality.
Comments !