Description
The goal of this assignment is to help you understand the logic underlying the estimation of RSE (Random Sampling Error) based on simulated computation (estimation) using height data.
See Excel
Sheet 1 on Excel contains 1620 peoples height data
These 1620 peoples height are 54 sets of 30 samples this means that sample size(n) is 30 and you have 54 of them.
Therefore, in this assignment we make following assumptions:
- Height data of 1620 people (54 sets of samples containing 30 peoples height) are population (I know this actually is a set of sample, but we pretend that this is a population: N = 1620)
- 30 peoples height within each set of sample is a set of sample: therefore sample size is 30 (n n=30) and there are 54 sets of samples.
Based on these assumptions, please compute:
- Population mean (mean of the 1620 peoples height)
- Sample mean (mean of the 30 people) please choose a specific sample from 54 samples, and compute the sample mean based on 30 samples in that particular set.
- Population standard deviation based on 1620 people as population
- Sample standard deviation (population standard deviation estimated based on your own sample of 30 so you need to compute the SD on 30 peoples height in your own sample that you chose)
- Create a sampling distribution of the mean based on these 54 sets of samples and compare the shape (characteristics) of the sampling distribution with population distribution of height that I provided (sheet 2 grouped frequency polygon) by following these steps:
Then
step 1 compute the mean of 30 peoples height for each of all the 54 sets of samples so you need 54 sample means for 54 sets of sample
step 2 create a group frequency distribution table based on the computed means (54) this is a grouped frequency table for sampling distribution of the 54 means
Compare the shape of frequency distributions between Population of Height (one I provided) and Sampling distribution of the means (54 sets of Means you created). For your reference I am providing the grouped frequency polygon representing the population distribution (the third sheet of the excel) and answer the following questions:
- What is the relation between population mean and the mean of the 54 means? same or different
- Which of the two distributions (population distribution of 1620 height data vs sampling distribution of the 54 means) has a narrower distribution clustered around the population mean?
- to what extent, the observation of the above two (a and b) aspects of the sampling distribution lend support to the Central Limit Theorem? this requires you read CLM and understand it.
- RSE as difference between your own sample mean and the mean of the sampling distribution of mean (average of the 54 sets of sample mean)
- RSE as Standard Error of the Mean which is the Standard Deviation computed based on sampling distribution of the mean this means computing a SD based on 54 sample means. For this use the sampling distribution of the mean that you created in the above (you need to use population St Dev computation function in Excel see below).
- RSE as Standard Error of the Mean approximated by population standard deviation (based on 1620 data) divided by the square root of n (n=30)
- RSE as Standard Error of the Mean approximated by sample standard deviation (based on your own sample of n=30) (use of n-1 in denominator sample standard deviation in Excel see below) divided by the square root of n (n=30)
- (This is a bonus point of 5 on top of 30) I assume that 6-3 and 6-4 are different even though they are supposed to be similar according to the lecture. Speculate on the reason why they are different.Based on what you have learned on the four different approaches of estimating RSE, they should be the same. But they are different in this one. Why? Hint: the nature of the sample (30 peoples height)? You can include any questions or comments based on this process. Your points is not entirely based on whether your answer is correct; it is mainly based on evidence of THINKING you put here.
6 ) Estimate the Random Sampling Error in the following four different ways based on your understanding of the definition of RSE we just covered in the class:
In computing Means and SD, use excels computational functions:
For mean (average) see: https://www.youtube.com/watch?v=5_OHS-18RbU
For standard deviation see: https://www.youtube.com/watch?v=uZWQXQG37Zs
There are STD. P (population where the denominator is n) and STD.S (sample where the denominator is n-1). Be careful to use appropriate one. You should