21 November, 2013

How representative is our sample?

We're currently working on writing up the key results of the online survey for publication in a peer-reviewed journal, but something has come up in the course of writing that seems worthy of a small update here: the question of how well we've managed to sample LGBTQ-identified folks working in STEM.

Because there isn't any central listing of queer folks working in the sciences, we used a "snowball sampling" approach, via social media, asking survey participants to forward links to the survey site to their friends. This is a good way to find study participants from populations that aren't readily visible, but that do have strong social ties. However, it also means that the sample of participants we end up with may be limited by the breadth of the social networks we're able to tap.

One way to evaluate how well this worked out is to ask whether survey participants are distributed across the U.S. the way we'd expect them to be if we'd drawn them at random from the population. That is, is there a significant positive correlation between population in a given geographic region and the number of survey participants from that region? It turns out that there is. The correlation between the number of participants in each of the U.S. Census bureau regions we asked participants to choose as indicators of their current location (see the post on the demographics and identities of participants for a map), and the Census Bureau's estimate of total population in those regions in 2012 is equal to 0.66, which is greater than expected by chance with p = 0.054. That's not overwhelmingly strong, but it's encouraging.

And maybe that's not the best way to evaluate our sample anyway, since LGBTQ-identified folks are almost certainly not evenly distributed among the Census regions. So as a follow-up, I used estimates of the percentage of people in every U.S. state identifying as LGBT from a 2012 survey by Gallup and the Williams Institute together with the Census estimates of total population to calculate how many LGBTQ-identified people there probably are in each state. The correlation between our sample size in each region and this new estimate of LGBTQ population is 0.78, with p = 0.013. That's really pretty good!