Workshop in Biostatistics

M112 Alway Building, Medical Center

(next to the Dean's courtyard)

DATE: February 16, 2017
TIME: 1:30 - 2:50 pm
TITLE: The geometry of gender stereotype in word embeddings

James Zou
Assistant Professor of Biomedical Data Science and, by courtesy, of Computer Science and of Electrical Engineering, Stanford


The blind application of machine learning runs the risk of amplifying biases present in data. Such a danger is facing us with word embedding, a popular framework to represent text data as vectors which has been used in many machine learning and natural language processing tasks. We show that even word embeddings trained on news articles exhibit female/male gender stereotypes to a disturbing extent. This raises concerns because their widespread use can amplify these biases. Geometrically, we show that gender bias is captured by a low dimensional subspace in the embedding. We then provide a methodology for modifying an embedding to remove gender stereotypes while preserving gender appropriate relations such as the connection between female and sister. Using crowd-worker evaluation as well as standard benchmarks, we empirically demonstrate that our algorithms significantly reduce gender bias in embeddings while preserving the its useful properties such as the ability to cluster related concepts and to solve analogy tasks. The resulting debiased embeddings can be used in applications without amplifying gender bias.

Suggested readings:

Tolga Bolukbasi, KW Chang, James Zou, Venkatesh Saligrama, et al.  Man is to Computer Programmer as Woman is to Homemaker?  Debiasing Word Embeddings.

Removing gender bias from algorithms.