guy spokesman chairman men's men him he's his boy boyfriend boyfriends boys brother brothers dad dads dude father fathers fiance gentleman gentlemen god grandfather grandpa grandson groom he himself husband husbands king male man mr nephew nephews priest prince son sons uncle uncles waiter widower widowers
heroine spokeswoman chairwoman women's actress women she's her aunt aunts bride daughter daughters female fiancee girl girlfriend girlfriends girls goddess granddaughter grandma grandmother herself ladies lady mom moms mother mothers mrs ms niece nieces priestess princess queens she sister sisters waitress widow widows wife wives woman
94.939% unknown (1234 sentences)
3.085% both (21 sentences)
1.227% female (10 sentences)
0.749% male (7 sentences)
90.967% unknown (11981 sentences)
5.862% male (637 sentences)
2.457% female (270 sentences)
0.714% both (65 sentences)
Now, statistic analysis would be necessary to see how relevant these raw figures really are
This technique is naive, but it can be extended to detect other things, e.g. tense language, tone of text, sentiment, e.g. "awesome", "good", "stupendous" vs "horrible", "tasteless", "bland".
However, when we look at word senses, we need more elaborate language context-based models to cope with ambiguity (e.g. n-grams, co-occurrences, etc.)
https://repl.it/@msoutopico/text-analysis
https://capps.capstan.be/Lab/ling/TM/
https://capps.capstan.be/Lab/ling/corpus_cog.txt
https://capps.capstan.be/Lab/ling/corpus_qq.txt