Want to learn more? Take the full course at [ Ссылка ] at your own pace. More than a video, you'll learn hands-on coding & quickly apply skills to your daily work.
---
Hi! I am Kelly McConville, a survey statistician, and professor. Welcome to my course on analyzing survey data.
Now I am wondering if you have ever found yourself in the following situation: You have a question you want to answer. You found a great dataset to answer that question and then there’s this column in the dataset that represents survey weights. And you ask yourself: What are those? Can I ignore those?
Well, let’s pretend we have found ourselves in this situation. We want to estimate the average household income in the US. We find that the Bureau of Labor Statistics provides a public use dataset. And, this dataset includes the variable FINCBTAX, given in the second column here, which is the amount of household income before taxes in 2016. But the first column in the dataset is a survey weight variable, FINLWT21. How should these weights impact our analyses?
First, we should ask: what are survey weights? Survey weights result from data that were collected under a complex sampling design. The weights tell us the number of individuals in the population that each sampled individual represents.
Returning to the BLS sample, the first weight equals 25,985, which means that the first sampled household in the dataset represents 25,985 households in the population. The second represents 6,581 households.
Now that we know what survey weights are, the question remains: How will they impact our analyses?
Let’s consider a common goal for survey data: to estimate a population quantity. Suppose this picture reflects all households in the US where each green box is an individual household.
And we want to estimate the average household income. Then y_i is the income for the ith household, U represents all US households, and capital N is the total number of households. Then, the fancy notation can be read to say that mu, the average household income, equals the sum of all the incomes, divided by the total number of households.
Of course, we can only calculate mu if we have income data for every household.
But we don’t. Instead, BLS takes a sample of households, represented by the blue squares, using a complex sampling design. We will call that sample s. They only collect income data for the n households in the sample.
Now to estimate mu, we can calculate the sample average, which is called y-bar. y-bar is the average income for the households in BLS's sample.
To calculate the sample mean for the BLS survey, we must insert the income variable, FINCBTAX, from the Consumer Expenditure dataset, denoted by ce, into the mean() function. Remember we can call a variable using the syntax dataset$variable_name. The average household income for the Consumer Expenditure sample is $62,480. Is this a good estimate of the average income of ALL US households?
Probably not. The problem is that the sample mean assumes all households in the sample represent the same number of households in the population. But when we looked at the survey weights, we learned that just isn't true!
And remember, for the sampled households, we have both the income data and the survey weights. To properly estimate the mean income, we need to use both when constructing our estimator.
But how do I incorporate the sampling design into my estimates, my data visualizations, my models? Well, that's exactly what we will learn to do in this course.
But first, let's practice exploring the weights themselves.
#DataCamp #RTutorial #AnalyzingSurveyDatainR #AnalyzingSurveyData
R Tutorial: What are survey weights?
Теги
What are survey weights?Analyzing Survey DataAnalyzing Survey Data in Risualizing and analyzing survey resultsSurvey estimationHow do the weights impact my estimates?How do I incorporate the weights?what are survey weights?R TutorialData Science in RR programmingData Scientist with RData Science RR Data Science