Voxceleb do a comp every year aswell as datasets.
But feature extraction and classification.
Some dude wrote a little about it in this.
Its a bit of a confusion where to have the classification system as KWS would seem logical as that initiates.
But KWS is supposed to be this light always listener so maybe it should pass the keyword on to profile classification scheme.
I can not remember where I saw it or URL but there was a dataset of gender and ages think it was one of the emotion dataset providers.
But gender classification if there is just 2 of oppisite sex should work really well.
Tensorflow and mobilenet have some examples and seem quite easy to implement if you convert audio to MFFC spectograms then likely you could use the code verbatum.