I spoke about triplet loss in the previous post. I wanted to build a speaker embedding similar to the face embedding used in FaceNet. It turns out that the guys at Baidu beat me to it: https://arxiv.org/pdf/1705.02304.pdf. They have datasets with 50k speakers! And …read more
There are comments.