Speaker recognition: is the process of automatically recognizing who is speaking by using the speaker-specific information included in speech waves to verify identities being claimed by people accessing systems; that is, it enables access control of various services by voice.
Speaker verification (also called speaker authentication) contrasts with identification.
2-stage speaker recognition: Enroll and verify
Before we can verify a speaker, we need the user to enroll his voice:
“Hello” 3 times
and random strings 2 times
Store the averaged embedding vector on database use for speaker recognition.
Generalized End-to-End Loss for create embedding vector:
Verification decision Thresholding cosine similarity: