Unit 11
Unit 11:
Categorical vs Continuous Relationships
|
Unit 11: Assignment #1 (due before 11:59 pm Central on MON JUL 22):
- To begin this Unit, you’re going to learn how to evaluate the difference between expected and observed categorical (discrete) frequencies.
- First, read the first two paragraphs of Poldrack’s (2020) Chapter 12, “Chi-Square Test for Goodness of Fit.” While reading these two paragraphs, make sure you understand the following:
- When we talk about categorical relationships, we are talking about relationships between discrete measurements.
- Remember, as you learned back in Unit 2, that discrete measurements cannot be subdivided into parts. For example, the total number of children in a class is a discrete measurement because there can be a total of 12 students or 14 students, but there can’t be a total of 12.563 children.
- If you’re still unclear about discrete measurements, be sure to review Unit 2.
- Second, because in the current Unit 11, you’ll be learning to use chi-square tests, learn the following:
- The word “chi” is the English representation of the Greek letter that looks like a fancy lower case x.
- In spoken English, the word “chi” is pronounced “khi” (like “hi” with a “k” sound first).
- Next, you’ll learn how to use a chi-square test to assess what’s known as “goodness of fit.”
- First, read the remaining paragraphs in Poldrack’s (2020) Chapter 12, “Chi-Square Test for Goodness of Fit.” While reading this chapter section, make sure you understand the following:
- A chi-square goodness-of-fit test allows us to test whether the frequencies of discrete data that we observed differ from the frequencies of discrete data that we expected under the null hypothesis.
- The key points to remember are that we’re comparing what we observed with what we expected, and, for a goodness-of-fit test, what we expected is based on the null hypothesis.
- If, as in the candy bag example Poldrack gives, we expected an even split of three types of candy, then that even split is our null hypothesis.
- Second, to cement your understanding of using a chi-square goodness-of-fit test, read an excerpt from StatisticsSolution’s (no date) article, “Chi-Square Goodness of Fit Test.”
- Remember from Unit 6 that observed frequencies (and probabilities) are often called empirical frequencies (and probabilities), because we have empirically observed them.
- Therefore, a chi-square goodness-of-fit test determines how well empirical (or OBSERVED) distributions fit theoretical (or EXPECTED) distributions.
- When calculating a chi-square goodness-of-fit test:
- the null hypothesis predicts that the OBSERVED frequencies will not differ from the EXPECTED frequencies, and
- the alternative hypothesis predicts that the OBSERVED frequencies will differ from the EXPECTED frequencies.
- Third, note that both Poldrack’s (2020) Chapter 12, “Chi-Square Test for Goodness of Fit” and StatisticsSolution’s (no date) article, “Chi-Square Goodness of Fit Test” tell us the following:
- To begin calculating a chi-square goodness-of-fit test, we need to first calculate the observed frequencies and the expected frequencies.
- To complete calculating a chi-square goodness-of-fit test ourselves, we also need to calculate differences and square, then sum, those differences.
- However, for this Unit, we will use online calculators, which means we only need to calculate the observed frequencies and the expected frequencies.
- Now, you’re going get some experience conducting a chi-square goodness-of-fit test.
- First, imagine the following scenario (which was created by Professor Richard Landers of the University of Minnesota):
- You run a small business with four employees: Albert, Camilla, Diamond, and Kumar. Because you need three employees at work at any given time, only one employee at a time has the day off.
- Of course, everyone wants Saturdays off. One of your employees has confronted you and said that you favor some employees over others in giving them Saturdays off.
- To investigate this, you pull up a long list of which employees have had Saturdays off each week, for the past two years, and you calculate a chi-square goodness-of-fit test to investigate the employee’s concern.
- Second, remembering your unique data set number, which you were given in Unit 3, download your unique Saturday Off Data Set.
- If you’re using the browser Chrome or the browser Firefox, click on the link for your data set, below. When prompted, save the file to your PSY-210_SEMESTERYEAR_Unit11 folder.
- If you’re using the browser Safari, right-click on the link for your data set, below, and select “Download Linked File.”
- Saturday Off Data Set 001, Saturday Off Data Set 002, Saturday Off Data Set 003, Saturday Off Data Set 004, Saturday Off Data Set 005, Saturday Off Data Set 006, Saturday Off Data Set 007, Saturday Off Data Set 008, Saturday Off Data Set 009, Saturday Off Data Set 010, Saturday Off Data Set 011, Saturday Off Data Set 012, Saturday Off Data Set 013, Saturday Off Data Set 014, Saturday Off Data Set 015
- Third, import your unique Saturday Off Data Set .csv file into a blank spreadsheet in your chosen data management platform.
- Remember to follow Andrews’ (2020) Import Data how-to article.
- Save the new spreadsheet in which you have imported your unique Saturday Off Data Set, naming the file, YourLastName_PSY-210_Unit11_SaturdayOff_Data
- Fourth, using your newly created Saturday Off Data Set spreadsheet, create a Frequency Distribution Table for Discrete Data.
- Fifth, create three additional columns in your Saturday Off Data Set Frequency Distribution Table, so that it now looks something like this or it now looks something like this (again, your frequencies will differ from these example screenshots because of the unique data set you were assigned).
- The first additional column you’ll create is another list of your Categories.
- The second additional column you’ll create are your Absolute Frequencies, only now you’ll call that column Observed Frequency, because your absolute frequencies are the frequencies you observed in this data set.
- The third additional column you’ll create is your Null Expected Frequency.
- To calculate each category’s Null Expected Frequency, write a formula in your chosen data management platform that divides your Observed Frequency Total (e.g., 400) by the total number of categories, which for the Saturday Off Data Set is 4.
- The total number of categories in this data set is 4 because there are four employees: Albert, Camilla, Jimmy, and Susan.
- Each Null Expected Frequency is the total number of data values (e.g., 400) divided by the total number of categories (e.g., 4) because the null hypothesis predicts an even split.
- Be sure to calculate a Total of your Null Expected Frequencies and ensure that total equals 400 (because you were given 400 data values).
- Take a screenshot of your final Saturday Off Frequency Distribution Table (not your entire screen) and save the screenshot as YourLastName_PSY-210_Unit11_SaturdayOff_Frequency.xxx (where xxx is the file type, for example, .jpg, .png, .jpeg, and the like).
- Last step!
- First, choose ONE of these online chi-square goodness-of-fit calculators:
- Second, using the values in your Saturday Off Frequency Distribution Table, fill in the online chi-square goodness-of-fit calculator with the following:
- your Categories (if required),
- your Observed Frequencies (required), and
- your Null Expected Frequencies (required).
- Third, after clicking the “calculate” button, take a screenshot of the chi-square statistic and p-value (and degrees of freedom if provided), and name the screenshot YourLastName_PSY-210_Unit11_SaturdayOff_Chi-Square.xxx (where xxx is the file type, for example, .jpg, .png, .jpeg, and the like).
- From the chi-square statistic and p-value you calculated on your Saturday Off Data Set, can you reject the null hypothesis that the observed frequencies did not differ from the expected frequencies (an even split)?
- Remember from Unit 8, if the p-value is low enough (e.g., p < .050), we can reject the null hypothesis (of an even split).
- If the p-value is not low enough (e.g., p ≥ .050), we cannot reject the null hypothesis (of an even split).
- To get more experience calculating chi-square goodness-of-fit tests:
- First, imagine the following scenario (which was also created by Professor Richard Landers of the University of Minnesota):
- You run a successful store at which you’re always eager to introduce new products.
- Therefore, you recently offered samples of three new products to every customer who entered your store.
- You then asked your customers to choose which product they preferred. You recorded these preferences for Product A, Product B, and Product C.
- To examine whether any of the products are more likely to be chosen, you will conduct a chi-square goodness-of-fit test.
- Second, remembering your unique data set number, which you were given in Unit 3, download your unique Product ABC Data Set.
- If you’re using the browser Chrome or the browser Firefox, click on the link for your data set, below. When prompted, save the file to your PSY-210_SEMESTERYEAR_Unit11 folder.
- If you’re using the browser Safari, right-click on the link for your data set, below, and select “Download Linked File.”
- Product ABC Data Set 001, Product ABC Data Set 002, Product ABC Data Set 003, Product ABC Data Set 004, Product ABC Data Set 005, Product ABC Data Set 006, Product ABC Data Set 007, Product ABC Data Set 008, Product ABC Data Set 009, Product ABC Data Set 010, Product ABC Data Set 011, Product ABC Data Set 012, Product ABC Data Set 013, Product ABC Data Set 014, Product ABC Data Set 015
- Third, import your unique Product ABC Data Set .csv file into a blank spreadsheet in your chosen data management platform.
- Save the new spreadsheet in which you have imported your unique Product ABC Data Set, naming the file, YourLastName_PSY-210_Unit11_ProductABC_Data
- Fourth, using your newly created Product ABC Data Set spreadsheet, create a Frequency Distribution Table for Discrete Data.
- Initially, your Product ABC Data Set Frequency Distribution Table should look something like this or something like this — although your frequencies will differ from these example screenshots because of the unique data set you were assigned.
- Fifth, create three additional columns in your Product ABC Data Set Frequency Distribution Table so that it now looks something like this or it now looks something like this (again, your frequencies will differ from these example screenshots because of the unique data set you were assigned).
- As before, the first additional column you’ll create is another list of your Categories.
- As before, the second additional column you’ll create are your Absolute Frequencies now being called Observed Frequency, because your absolute frequencies are the frequencies you observed in this data set.
- As before, the third additional column you’ll create is your Null Expected Frequency.
- To calculate each category’s Null Expected Frequency, you’ll again write a formula that divides your Observed Frequency Total (e.g., 300) by the total number of categories; however, the total number of categories for the Product ABC Data Set is 3.
- The total number of categories in this data set is 3 because there are three products: Product A, Product B, and Product C.
- As before, each Null Expected Frequency is the total number of data values (e.g., 300) divided by the total number of categories (e.g., 3) because the null hypothesis predicts an even split.
- Be sure to calculate a Total of your Null Expected Frequencies and ensure that total equals 300 (because you were given 300 data values).
- Sixth, take a screenshot of your final Product ABC Frequency Distribution Table (not your entire screen) and save the screenshot as YourLastName_PSY-210_Unit11_ProductABC_Frequency.xxx (where xxx is the file type, for example, .jpg, .png, .jpeg, and the like).
- Calculate the chi-square goodness-of-fit statistic for your Product ABC observed versus expected frequencies using ONE of the (above listed) online chi-square goodness-of-fit calculators.
- First, you must use a different online calculator than you used before (for your Saturday Off data).
- Second, after clicking the “calculate” button, take a screenshot of the chi-square statistic and p-value (and degrees of freedom, if provided), and name the screenshot YourLastName_PSY-210_Unit11_ProductABC_Chi-Square.xxx (where xxx is the file type, for example, .jpg, .png, .jpeg, and the like).
- Third, from the chi-square statistic and p-value you calculated on your Product ABC Data Set, can you reject the null hypothesis that the observed frequencies did not differ from the expected frequencies (an even split)?
- Remember from Unit 8, if the p-value is low enough (e.g., p < .050), we can reject the null hypothesis (of an even split).
- If the p-value is not low enough (e.g., p ≥ .050), we cannot reject the null hypothesis (of an even split).
- Go to Unit 11: Assignment #1 Discussion Board and create a new post in which you do the following:
- First, in the first sentence of your Discussion Board post, state your unique data set number (e.g., “My unique data set number is 001”).
- Second, embed the screenshot of your final Saturday Off Frequency Distribution Table (the screenshot you named YourLastName_PSY-210_Unit11_SaturdayOff_Frequency.xxx).
- Third, embed the screenshot of the Saturday Off chi-square goodness-of-fit calculation you made with the online calculator (the screenshot you named YourLastName_PSY-210_Unit11_SaturdayOff_Chi-Square.xxx).
- Fourth, report the chi-square statistic and p-value (remember, we always round our numbers to three decimal places!).
- Can you reject the null hypothesis of an even split?
- Fifth, embed the screenshot of your final Product ABC Frequency Distribution Table (the screenshot you named YourLastName_PSY-210_Unit11_ProductABC_Frequency.xxx).
- Sixth, embed the screenshot of the Product ABC chi-square goodness-of-fit calculation you made with the online calculator (the screenshot you named YourLastName_PSY-210_Unit11_ProductABC_Chi-Square.xxx).
- Seventh, report the chi-square statistic and p-value (remember, we always round our numbers to three decimal places!).
- Can you reject the null hypothesis of an even split?
- Eighth, name three specific instances in your past, present, or future that you think it would have been or it will be useful to conduct a chi-square goodness-of-fit test.
Unit 11: Assignment #2 (due before 11:59 pm Central on MON JUL 22):
- In this assignment, you’ll learn how to use chi-square to conduct what’s known as a “test of independence.”
- First, read an excerpt from Frost’s (no date) article, “How the Chi-Square Test of Independence Works.” While reading this chapter section, make sure you understand the following:
- “A chi-square test of independence determines whether a relationship exists between two discrete (categorical) variables.”
- “If the two discrete variables are dependent, then the frequencies of one variable will depend upon the frequencies of the other variable.”
- “If the two variables are independent, then the frequencies of one variable do not depend on the frequencies of the other variable.”
- Second, read Poldrack’s (2020) Chapter 12, “Contingency Tables and the Chi-Square Test of Independence.” While reading this chapter section, make sure you understand the following:
- A chi-square test of independence allows us to test whether two discrete measures are related to, or contingent on, one another.
- The null hypothesis of a chi-square test of independence predicts that the two measures will be independent.
- The standard way to prepare data for a chi-square test of independence is by creating a Contingency Frequency Table, which presents the frequency of observations that fall into each possible combination — each contingency.”
- To compute the degrees of freedom of a chi-square test of independence we use the formula df = (the number of Rows in our Contingency Frequency Table minus 1) * (the number of Columns in our Contingency Frequency Table minus 1).
- Now, you’ll learn how to make a Contingency Frequency Table.
- First, complete Andrews’ (2020) tutorial “Using Excel’s [Google Sheets’, and Numbers’] COUNTIFS Function to Make a Contingency Frequency Table for Discrete Data.”
- Although you aren’t required to take a screenshot of the Contingency Frequency Table you create while working through this tutorial, it’s definitely in your best interest to make sure you work through the entire tutorial.
- You’ll need to know how to make a Contingency Frequency Table to complete the rest of this assignment.
- Next, you’ll get some experience calculating a chi-square test of independence, which as you know from reading Poldrack’s chapter, requires making a Contingency Frequency Table.
- First, imagine the following scenario (which was also created by Professor Richard Landers of the University of Minnesota):
- You own three clothing stores at three locations: your East Store, your South Store, and your West Store.
- At each of your three stores’ locations, you sell three price ranges of clothes: Budget Items, Mid-Range Items, and High Fashion Items.
- You’d like to know whether sales of these different priced clothes depends on the different locations of the stores; therefore, you conduct a chi-square test of independence.
- Second, remembering your unique data set number, which you were given in Unit 3, download your unique Clothing Sales Data Set.
- If you’re using the browser Chrome or the browser Firefox, click on the link for your data set, below. When prompted, save the file to your PSY-210_SEMESTERYEAR_Unit11 folder.
- If you’re using the browser Safari, right-click on the link for your data set, below, and select “Download Linked File.”
- Clothing Sales Data Set 001, Clothing Sales Data Set 002, Clothing Sales Data Set 003, Clothing Sales Data Set 004, Clothing Sales Data Set 005, Clothing Sales Data Set 006, Clothing Sales Data Set 007, Clothing Sales Data Set 008, Clothing Sales Data Set 009, Clothing Sales Data Set 010, Clothing Sales Data Set 011, Clothing Sales Data Set 012, Clothing Sales Data Set 013, Clothing Sales Data Set 014, Clothing Sales Data Set 015
- Third, import your unique Clothing Sales Data Set .csv file into a blank spreadsheet in your chosen data management platform.
- Save the new spreadsheet in which you have imported your unique Clothing Sales Data Set, naming the file, YourLastName_PSY-210_Unit11_ClothingSales_Data
- Fourth, using your newly created Clothing Sales Data Set spreadsheet and based on what you learned in Andrew’s (2020) how-to article, create a Contingency Frequency Table for your Clothing Sales Data Set.
- Your Clothing Sales Contingency Frequency Table should look something like this — although your frequencies will differ from this example screenshot because of the unique data set you were assigned.
- Fifth, take a screenshot of your Clothing Sales Contingency Frequency Table (not your entire screen) and save the screenshot as YourLastName_PSY-210_Unit11_ClothingSales_Frequency.xxx (where xxx is the file type, for example, .jpg, .png, .jpeg, and the like).
- Now, calculate your chi-square test of independence.
- Choose ONE of these online chi-square test of independence calculators:
- Second, using the Absolute Frequencies in your Clothing Sales Contingency Frequency Table, fill in the online chi-square test of independence calculator:
- Third, after clicking the “calculate” button, take a screenshot of the chi-square statistic and p-value (and degrees of freedom if provided), and name the screenshot YourLastName_PSY-210_Unit11_ClothingSales_Chi-Square.xxx (where xxx is the file type, for example, .jpg, .png, .jpeg, and the like).
- From the chi-square statistic and p-value you calculated on your Clothing Sales Data Set, can you reject the null hypothesis that the two variables (store location and clothing price range) are independent?
- Remember from Unit 8, if the p-value is low enough (e.g., p < .050), we can reject the null hypothesis (of independence).
- If the p-value is not low enough (e.g., p ≥ .050), we cannot reject the null hypothesis (of independence).
- Again, you’re going to get some experience calculating a chi-square test of independence.
- First, imagine the following scenario (which was also created by Professor Richard Landers of the University of Minnesota):
- You are the CEO of a large company. To reduce employee turnover (which means employees leaving your corporation), you implemented a new company-wide training program two years ago.
- However, you’re not sure if the training is equally effective in reducing employee turnover among employees who work in your service department, sales department, and warehouse.
- Therefore, you retrieved a list of all current and former employees who received the training. Your list also includes whether each current or former employee works or used to work in the service department, the sales department, or the warehouse.
- What you want to know is whether being a current versus a former employee is contingent on working in the service department, sales department, or the warehouse.
- In other words, you want to know whether employee turnover depends on the department in which the employee works (or used to work). Therefore, you conduct a chi-square test of independence.
- Second, remembering your unique data set number, which you were given in Unit 3, download your unique Employee Turnover Data Set.
- If you’re using the browser Chrome or the browser Firefox, click on the link for your data set, below. When prompted, save the file to your PSY-210_SEMESTERYEAR_Unit11 folder.
- If you’re using the browser Safari, right-click on the link for your data set, below, and select “Download Linked File.”
- Employee Turnover Data Set 001, Employee Turnover Data Set 002, Employee Turnover Data Set 003, Employee Turnover Data Set 004, Employee Turnover Data Set 005, Employee Turnover Data Set 006, Employee Turnover Data Set 007, Employee Turnover Data Set 008, Employee Turnover Data Set 009, Employee Turnover Data Set 010, Employee Turnover Data Set 011, Employee Turnover Data Set 012, Employee Turnover Data Set 013, Employee Turnover Data Set 014, Employee Turnover Data Set 015.
- Third, import your unique Employee Turnover Data Set .csv file into a blank spreadsheet in your chosen data management platform.
- Save the new spreadsheet in which you have imported your unique Employee Turnover Data Set, naming the file, YourLastName_PSY-210_Unit11_EmployeeTurnover_Data
- Fourth, using your newly created Employee Turnover Data Set spreadsheet and based on what you learned in Andrew’s (2020) how-to article, create a Contingency Frequency Table for your Employee Turnover Data Set.
- Your Employee Turnover Contingency Frequency Table should look something like this — although your frequencies will differ from this example screenshot because of the unique data set you were assigned.
- Fifth, take a screenshot of your Employee Turnover Contingency Frequency Table and save the screenshot as YourLastName_PSY-210_Unit11_EmployeeTurnover_Frequency.xxx (where xxx is the file type, for example, .jpg, .png, .jpeg, and the like).
- Calculate the chi-square test of independence statistic for your Employee Turnover Data Set using ONE of the (above listed) online chi-square test of independence calculators.
- First, you must use a different online calculator than you used before (for your Clothing Sales Data Set).
- Second, using the Absolute Frequencies in your Employee Turnover Contingency Frequency Table, fill in the online chi-square test of independence calculator.
- Third, after clicking the “calculate” button, take a screenshot of the chi-square statistic and p-value (and degrees of freedom, if provided), and name the screenshot YourLastName_PSY-210_Unit11_EmployeeTurnover_Chi-Square.xxx (where xxx is the file type, for example, .jpg, .png, .jpeg, and the like).
- Fourth, from the chi-square statistic and p-value you calculated on your Employee Turnover Data Set, can you reject the null hypothesis that the two variables (employee turnover and the department in which the employee works or used to work) are independent?
-
- Remember from Unit 8, if the p-value is low enough (e.g., p < .050), we can reject the null hypothesis (of independence).
- If the p-value is not low enough (e.g., p ≥ .050), we cannot reject the null hypothesis (of independence).
- Go to the Unit 11: Assignment #2 Discussion Board and do the following:
- First, in the first sentence of your Discussion Board post, state your unique data set number (e.g., “My unique data set number is 001”).
- Second, embed the screenshot of your final Clothing Sales Frequency Distribution Table (the screenshot you named YourLastName_PSY-210_Unit11_ClothingSales_Frequency.xxx).
- Third, embed the screenshot of the Clothing Sales chi-square goodness-of-fit calculation you made with the online calculator (the screenshot you named YourLastName_PSY-210_Unit11_ClothingSales_Chi-Square.xxx).
- Fourth, report the chi-square statistic and p-value (remember, we always round our numbers to three decimal places!).
- Can you reject the null hypothesis of independence?
- Fifth, embed the screenshot of your final Employee Turnover Frequency Distribution Table (the screenshot you named YourLastName_PSY-210_Unit11_EmployeeTurnover_Frequency.xxx).
- Sixth, embed the screenshot of the Employee Turnover chi-square goodness-of-fit calculation you made with the online calculator (the screenshot you named YourLastName_PSY-210_Unit11_EmployeeTurnover_Chi-Square.xxx).
- Seventh, report the chi-square statistic and p-value (remember, we always round our numbers to three decimal places!).
- Can you reject the null hypothesis of independence?
- Eighth, name three specific instances in your past, present, or future that you think it would have been or it will be useful to conduct a chi-square test of independence.
Unit 11: Assignment #3 (due before 11:59 pm Central on TUE JUL 23):
- In the second half of this Unit, you’ll learn how to evaluate relationships between continuous variables.
- First, remember, as you learned back in Unit 2, that continuous measurements can fall anywhere in an infinite range of values. For example, your height, the length of your foot, and the amount of sleep you got last night are all continuous measurements.
- Second, refresh your memory about correlations by re-reading of Investopedia’s (No Date) article, “Correlation Coefficient.” While reading this excerpt, make sure you understand the following:
- “The correlation coefficient is a statistical measure of the strength of the relationship between two continuous variables.”
- “The values of a Pearson correlation coefficient range between -1.000 and 1.000.”
- A correlation of -1.000 shows a perfect negative correlation, while a correlation of 1.000 shows a perfect positive correlation. A correlation of 0.000 shows no linear relationship between the two variables.”
- “The strength of a relationship is indicated by the magnitude of the correlation coefficient.”
- Third, read Poldrack’s (2020) Chapter 13 “Modeling Continuous Relationships.” While reading this excerpt, make sure you understand the following:
- “One way to quantify the relationship between two continuous variables is by
calculating their covariance.”
- The variance measures one variable’s deviation from the mean; the covariance measures the relation between two variables’ deviation from their mean.
- Although we don’t usually use the covariance to describe relationships between two variables, we do use correlation coefficients.
- After calculating a correlation coefficient, we can test the null hypothesis, which predicts that the correlation coefficient is 0.000.
- Now, you’ll learn how to compute a correlation coefficient.
- First, search the Internet for a tutorial or how-to guide to teach you how to calculate a Pearson correlation coefficient using your chosen data management platform.
- The how-to guide you find can be in any format (e.g., video, written text, figures, or the like — or a combination of formats).
- However, the how-to guide you find must be from the Internet and not from other sources (e.g., textbooks or friends).
- Remember that it’s important to learn to use Google to find out how to do something you don’t know how to do (and that most data scientists frequently use Google to learn — or remind themselves) how to do things).
- Be sure to write down the URL of the tutorial or how-to guide you find and use.
- Second, download your classmates’ Height-Foot Length Data Set.
- If you’re using the browser Chrome or the browser Firefox, click on the link for your data set. When prompted, save the file to your PSY-210_SEMESTERYEAR_Unit11 folder.
- If you’re using the browser Safari, right-click on the link for your data set and select “Download Linked File.”
- Third, import your classmates’ Height-Foot Length Data Set .csv file into a blank spreadsheet in your chosen data management platform.
- Remember to follow Andrews’ (2020) Import Data how-to article.
- Save the new spreadsheet in which you have imported your classmates’ Height-Foot Length Data Set, naming the file, YourLastName_PSY-210_Unit11_HeightFootLength_Data
- Fourth, using your chosen data management platform and your classmates’ Height-Foot Length Data Set calculate the following:
- the mean of your classmates’ Height (in feet)
- the standard deviation of your classmates’ Height (in feet)
- the N, meaning the sample size, which is the number of students in your class who reported their Height (in feet)
- the mean of your classmates’ Foot Length (in inches)
- the standard deviation of your classmates’ Foot Length (in inches)
- the N, meaning the sample size, which is the number of students in your class who reported their Foot Length (in inches)
- the Pearson correlation coefficient between your classmates’ Height (in feet) and their Foot Length (in inches)
- Fifth, take a screenshot of your classmates’ Height-Foot Length Data Set means, standard deviations, Ns, and Pearson correlation coefficient (not your entire screen) and save the screenshot as YourLastName_PSY-210_Unit11_HeightFootLength_Stats.xxx (where xxx is the file type, for example, .jpg, .png, .jpeg, and the like).
- To get more practice computing correlation coefficients:
- First, download your classmates’ Height-Sleep Data Set.
- If you’re using the browser Chrome or the browser Firefox, click on the link for your data set. When prompted, save the file to your PSY-210_SEMESTERYEAR_Unit11 folder.
- If you’re using the browser Safari, right-click on the link for your data set and select “Download Linked File.”
- Second, import your classmates’ Height-Sleep Data Set .csv file into a blank spreadsheet in your chosen data management platform.
- Save the new spreadsheet in which you have imported your classmates’ Height-Sleep Data Set, naming the file, YourLastName_PSY-210_Unit11_HeightSleep_Data
- Third, using your classmates’ Height-Sleep Data Set calculate the following:
- the mean of your classmates’ Height (in feet); you can use this calculation to check the mean you previously calculated for this variable
- the standard deviation of your classmates’ Height (in feet); you can use this calculation to check the standard deviation you previously calculated for this variable
- the N, meaning the sample size, which is the number of students in your class who reported their Height (in feet); you can use this calculation to check the N you previously calculated for this variable.
- the mean of your classmates’ previous night of Sleep (in hours)
- the standard deviation of your classmates’ previous night of Sleep (in hours)
- the N, meaning the sample size, which is the number of students in your class who reported their previous night of Sleep (in hours)
- the Pearson correlation coefficient between your classmates’ Height (in feet) and their previous night of Sleep (in hours)
- Fourth, take a screenshot of your classmates’ Height-Sleep Data Set means, standard deviations, Ns, and Pearson correlation coefficient (not your entire screen) and save the screenshot as YourLastName_PSY-210_Unit11_HeightSleep_Stats.xxx (where xxx is the file type, for example, .jpg, .png, .jpeg, and the like)
- To get even more practice calculating correlation coefficients:
- First, download this Temp-Coffee-Juice Data Set, which comprises the Daily Maximum Temperature (in Fahrenheit), the daily Coffee Sales (in US dollars), and the daily Juice Sales (in US dollars) at another university’s student-run cafe (and made available by Penn State Online Statistics).
- If you’re using the browser Chrome or the browser Firefox, click on the link for your data set. When prompted, save the file to your PSY-210_SEMESTERYEAR_Unit11 folder.
- If you’re using the browser Safari, right-click on the link for your data set and select “Download Linked File.”
- Second, import the Temp-Coffee-Juice Data Set .csv file into a blank spreadsheet in your chosen data management platform.
- Save the new spreadsheet in which you have imported the Temp-Coffee-Juice Data Set, naming the file, YourLastName_PSY-210_Unit11_Temp-Coffee-Juice_Data
- Third, using your chosen data management platform and the Temp-Coffee-Juice Data Set calculate the following:
- the mean of the Daily Maximum Temperature (in Fahrenheit)
- the standard deviation of the Daily Maximum Temperature (in Fahrenheit)
- the N, meaning the sample size, which is the number of days for which the Daily Maximum Temperature (in Fahrenheit) was recorded
- the mean of the daily Coffee Sales (in US dollars)
- the standard deviation of the daily Coffee Sales (in US dollars)
- the N, meaning the sample size, which is the number of days for which the daily Coffee Sales (in US dollars) were recorded
- the mean of the daily Juice Sales (in US dollars)
- the standard deviation of the daily Juice Sales (in US dollars)
- the N, meaning the sample size, which is the number of days for which the daily Juice Sales (in US dollars) were recorded
- the Pearson correlation coefficient between the Daily Maximum Temperature (in Fahrenheit) and the daily Coffee Sales (in US dollars)
- the Pearson correlation coefficient between the Daily Maximum Temperature (in Fahrenheit) and the daily Juice Sales (in US dollars)
- Fourth, take a screenshot of the Temp-Coffee-Juice Data Set means, standard deviations, Ns, and Pearson correlation coefficients (not your entire screen) and save the screenshot as YourLastName_PSY-210_Unit11_Temp-Coffee-Juice_Stats.xxx (where xxx is the file type, for example, .jpg, .png, .jpeg, and the like).
- Now, it’s time to learn how to test the null hypothesis associated with each of the correlation coefficients you calculated.
- First, make sure you have calculated, using your chosen data management platform, FOUR correlation coefficients:
- the correlation between your classmates’ Height (in feet) and their Foot Length (in inches)
- the correlation between your classmates’ Height (in feet) and their previous night of Sleep (in hours)
- the correlation between the Daily Maximum Temperature (in Fahrenheit) and the daily Coffee Sales (in US dollars)
- the correlation between the Daily Maximum Temperature (in Fahrenheit) and the daily Juice Sales (in US dollars)
- Second, remember from Poldrack’s (2020) chapter that for all correlation coefficients, the null hypothesis predicts that the correlation coefficient is 0.000.
- Third, using ONE of the below calculators, find the p-value of EACH of the four correlation coefficients you have calculated:
- Fourth, when using the above calculators:
- The N or Sample Size is based on the number of students (classmates) who reported their height, foot length, and sleep OR the number of days for which the temperature, the coffee sales, and the juice sales were recorded.
- If you have the choice, choose a two-sided test, also referred to as two-tailed probability (because we did not have a directional alternative hypothesis).
- Record the p-value for EACH correlation.
- Fifth, for EACH p-value you recorded, decide whether you can reject the null hypothesis that the correlation is 0.000?
- Remember from Unit 8, if the p-value is low enough (e.g., p < .050), we can reject the null hypothesis of a 0.000 correlation (the variables are not related).
- If the p-value is not low enough (e.g., p ≥ .050), we cannot reject the null hypothesis of a 0.000 correlation.
- Go to the Unit 11: Assignment #3 Discussion Board and do the following:
- First, embed the screenshot of your classmates’ Height-Foot Length Data Set means, standard deviations, Ns, and Pearson correlation coefficient (YourLastName_PSY-210_Unit11_HeightFootLength_Stats.xxx).
- Report the Pearson correlation coefficient between your classmates’ Height and Foot Length.
- Report the Pearson correlation coefficient’s p-value.
- State whether you can reject the null hypothesis of a zero correlation.
- Second, embed the screenshot of your classmates’ Height-Sleep Data Set means, standard deviations, Ns, and Pearson correlation coefficient (YourLastName_PSY-210_Unit11_HeightSleep_Stats.xxx).
- Report the Pearson correlation coefficient between your classmates’ Height and Sleep.
- Report the Pearson correlation coefficient’s p-value.
- State whether you can reject the null hypothesis of a zero correlation.
- Third, embed the screenshot of the Temp-Coffee-Juice Data Set means, standard deviations, Ns, and Pearson correlation coefficients (YourLastName_PSY-210_Unit11_Temp-Coffee-Juice_Stats.xxx
- Report the Pearson correlation coefficient between the Daily Maximum Temperature and the daily Coffee Sales.
- Report the Pearson correlation coefficient’s p-value.
- State whether you can reject the null hypothesis of a zero correlation.
- Report the Pearson correlation coefficient between the Daily Maximum Temperature and the daily Juice Sales.
- Report the Pearson correlation coefficient’s p-value.
- State whether you can reject the null hypothesis of a zero correlation.
- Remember to adhere to good scientific principles, always round numbers to three decimal places!
Unit 11: Assignment #4 (due before 11:59 pm Central on TUE JUL 23):
- In this assignment, you’re going to learn how to make Scatter Plots.
- First, to learn what a Scatter Plot is, read an excerpt from Khan Academy’s (No Date) article, “Scatter Plots.” While reading this excerpt, make sure you understand the following:
- In a Scatter Plot, each pair of values in the data set gets plotted as one point whose x-coordinate represents one variable’s value and whose y-coordinate represents the other variable’s value.
- For example, in the example Scatter Plot in Khan Academy’s article, each dot represents one of the 23 students’ quiz scores and that same student’s shoe size.
- Second, to learn more about Scatter Plots, read an excerpt from Math Is Fun’s (2017) article, “Scatter Plots.” While reading this excerpt, make sure you understand the following:
- Scatter Plots are also called X-Y Plots because they display the relationship between two sets of data, which are plotted using Cartesian (x,y) coordinates.
- For example, in the first example Scatter Plot in Math Is Fun’s article, each dot represents one of the 11 students’ heights and that same student’s weight.
- As another example, in the second example Scatter Plot in Math Is Fun’s article, each dot represents one of the 12 days on which the temperature was recorded and that same day on which ice cream sales were recorded.
- Now, search the Internet for a tutorial or how-to guide to teach you how to make Scatter Plots (often called X-Y Plots) using your chosen data management platform.
- The how-to guide you find can be in any format (e.g., video, written text, figures, or the like — or a combination of formats).
- However, the how-to guide you find must be from the Internet and not from other sources (e.g., textbooks or friends).
- Next, using your chosen data management platform, create a Scatter Plot for each of FOUR data pairs for which you calculated Pearson correlation coefficients in Unit 11: Assignment #3:
- ONE: the correlation between your classmates’ Height (in feet) and their Foot Length (in inches)
- TWO: the correlation between your classmates’ Height (in feet) and their previous night of Sleep (in hours)
- THREE: the correlation between the Daily Maximum Temperature (in Fahrenheit) and daily Coffee Sales (in US dollars)
- FOUR: the correlation between the Daily Maximum Temperature (in Fahrenheit) and daily Juice Sales (in US dollars)
- Remember that a “good graph” includes these four major components:
- a Graph Title
- Axis Labels
- Graph Units
- Graph Data
- For the two Scatter Plots that present your classmates’ Height (in feet):
- Use the y-axis to represent your classmates’ Height (in feet), and adjust the y-axis to a minimum of 4.000 feet and a maximum of 7.000 feet.
- You can refresh your memory on how to change the y-axis by reading this handout.
- For one Scatter Plot, use the x-axis to represent your classmates’ Foot Length (in inches), and adjust the x-axis to a minimum of 7.000 inches and a maximum of 13.000 inches.
- For the other Scatter Plot, use the x-axis to represent your classmates’ Sleep (in hours), and adjust the x-axis to a minimum of 0.000 hours and a maximum of 16.000 hours.
- For the two Scatter Plots that present Daily Maximum Temperature (in Fahrenheit):
- Use the y-axis to represent the Daily Maximum Temperature (in Fahrenheit), and adjust the y-axis to a minimum of 0.000 (degrees) Fahrenheit and a maximum of 90.000 (degrees) Fahrenheit.
- For one Scatter Plot, use the x-axis to represent daily Coffee Sales (in US dollars), and adjust the x-axis to a minimum of 0.000 dollars and a maximum of 140.000 dollars.
- For the other Scatter Plot, use the x-axis to represent Juice Sales (in US dollars), and adjust the x-axis to a minimum of 0.000 dollars and a maximum of 45.000 dollars.
- Save each of the four Scatter Plots you created as a screenshot named YourLastName_PSY-210_Unit11_YYY_ScatterPlot.xxx (where YYY is the data set and xxx is the file type, for example, .jpg, .png, .jpeg, and the like).
- Return to the excerpt from Math Is Fun’s (2017) article, “Scatter Plots,” and study the last page that presents seven idealized Scatter Plots.
- First, notice that the data points in Scatter Plots that represent positive correlations tilt up left-to-right.
- In contrast, the data points in Scatter Plots that represent negative correlations tilt down left-to-right.
- Second, notice that the data points in Scatter Plots that represent high correlations are much more tightly clustered around an imaginary diagonal line.
- In contrast, the data points in Scatter Plots that represent low correlations are much more scattered.
- Go to the Unit 11: Assignment #4 Discussion Board and do the following:
- First, embed each of the four Scatter Plots that you created.
- Second, for each Scatter Plot, identify whether it is a positive or negative correlation and whether it is a high, moderate, low, or no correlation.
Unit 11: Assignment #5 (due before 11:59 pm Central on WED JUL 24):
- Meet online with your NEW Chat Group (which you formed during Unit 8) for a one-hour text-based Group Chat at a time/date that your Chat Group previously arranged.
- BEFORE YOU MEET WITH YOUR CHAT GROUP, each member of the Chat Group must do ALL of the following:
- First, to sharpen your ability to interpret Scatter Plots, read Math Boot Camp’s (2017) article, “Reading Scatter Plots.” While reading this article, make sure you understand the following:
- The shape in a Scatter Plot can be either linear or curvilinear.
- Scatter Plots with a linear shape have points that seem to fall along a line (hence, the term linear).
- In a positive linear pattern, the imaginary line slopes UP from left-to-right.
- In a negative linear pattern, the imaginary line slopes DOWN, from left-to-right.
- The strength of a correlation is shown by how tightly clustered together the points are to each other.
- Second, to better understand what correlations can and cannot tell us:
- Third, while watching these videos, to make sure you understand why correlation cannot be used to prove causation:
- Write down at least four examples of correlation not proving causation from the videos you watched.
- One example you can write down is the correlation between the amount of ice cream purchased (during each month of the year) and the number of drowning deaths (during each month of the year) not proving that ice cream causes drowning.
- Write down at least two examples of a correlation that might be caused by another variable.
- One example you can write down is the correlation between ice cream sales per month and drowning deaths per month; the correlation might be caused by another variable, the season of the year. In such cases, the other variable is called a confounding variable.
- Write down at least two examples of a correlation that might be due to coincidence.
- One example you can write down is the correlation between pool drownings per year and Nicholas Cage films per year. That correlation is most likely simply be due to coincidence.
- Fourth, other than completing the above reading and video-watching assignments and writing down your examples, DO NOT begin working on any of the steps listed below until your Chat Group begins their one-hour Group Chat.
- During your one-hour Group Chat:
- First, begin your chat with each Chat Group member indicating ONE of the nine “How Are You Feeling at the START of Today’s Group Chat?” images. More than one Chat Group member can indicate the same image if that’s how they are feeling, and please refer to each image by its number.
- Then, play “Guess the Correlation Coefficient Based on the Scatter Plot.”
- Instructions are included on the first page of the game.
- Every member of your Chat Group MUST make guesses about every set of four Scatter Plots before you, as a group, scroll to see the answers.
- As for who gets to make their four guesses first:
- Sum your birthdate (e.g., if you were born on March 18, 1999, your birthdate sum is 03 [March] + 18 [18th] + 99 [1999] = 120):
- For three-student Chat Groups:
- Rotate in the order of highest birthdate sum, lowest birthday sum, neither highest nor lowest birthdate sum (i.e., Trial One: highest, lowest, neither; Trial Two: lowest, neither, highest; Trial Three: neither, highest, lowest; and so forth).
- For two-student Chat Groups:
- Rotate in the order of highest birthdate sum, lowest birthday sum (i.e., Trial One: highest, lowest; Trial Two: lowest, highest; Trial Three: highest, lowest; and so forth).
- Third, as a group rather than individually, look through the homepage of Tyler Vigen’s “Spurious Correlation” website.
- Spurious means “apparent but not actually valid.”
- The spurious correlations on Tyler Vigen’s homepage are like the correlations you learned about in AsapScience’s (2017) YouTube, “This ≠ That.”
- Identify your group’s SIX favorite “spurious correlations” from Tyler Vigen’s homepage. Save a screenshot of each of these six graphs
- Fourth, and again as a group rather than individually, search the Internet for THREE more spurious correlations based on, but not found on, Tyler Vigen’s “Spurious Correlation” website, by following these instructions.
- Search on Google using the two search terms – in quotation marks – “Spurious Correlation” and “Tyler Vigen.”
- Examine each webpage that is identified by the search
- If the webpage contains a graph based on, but not found on, Tyler Vigen’s “Spurious Correlation” website, take a screenshot of that graph (not the entire webpage, just the graph)
- Repeat until you have found THREE more spurious correlations based on, but not found on, Tyler Vigen’s “Spurious Correlation” website
- AT THE END OF YOUR ONE-HOUR GROUP CHAT do the following:
- Each Chat Group member needs to indicate ONE of the nine “How Are You Feeling at the END of Today’s Group Chat?” images. More than one Chat Group member can indicate the same image if that’s how they are feeling, and please refer to each image by its number.
- NOTE: The “How Are You Feeling at the END of Today’s Group Chat” grid of images differs from the “How Are You Feeling at the START of Today’s Group Chat” grid of images.
- Nominate one member of your Chat Group (who participated in the Chat) to make a post on the Unit 11: Assignment #5 Discussion Board that summarizes your Group Chat in at least 200 words. This Chat Group member should not post their 200-word summary of your Group Chat until they have completed their Course Journal for the current Unit.
- At the end of the 200-word summary, this member needs to write this sentence filling the blanks: “I have completed my Course Journal for the current Unit. It contains ___ words and the two things I learned during this Unit about which I journaled about are ___ and ___.”
- Nominate a second member of your Chat Group (who participated in the Group Chat using the browser Chrome on their laptop, rather than on their mobile device) to save the Chat transcript, as described in the Course How To (under the topic, “How To Save and Attach a Chat Transcript”).
- This Chat Group member needs to make a post on the Unit 11: Assignment #5 Discussion Board and attach the Chat transcript, saved as a PDF, to that Discussion Board post. This Chat Group member should not post the transcript of your Group Chat until they have completed their Course Journal for the current Unit.
- In their Discussion Board post, this member needs to write this sentence filling the blanks: “I have completed my Course Journal for the current Unit. It contains ___ words and the two things I learned during this Unit about which I journaled about are ___ and ___.”
- Remember to attach the Chat transcript by clicking on the word “Attach.” (Do not click on the sidebar menu “Files.”)
- Nominate a third member of your Chat Group (who also participated in the Chat) to make another post on the Unit 11: Assignment #5 Discussion Board that states the name of your Chat Group, the names of the Chat Group members who participated in the Chat, the date of your Chat, and the start and stop time of your Group Chat. This Chat Group member should not post the names, date, and times of your Group Chat until they have completed their Course Journal for the current Unit.
- In their Discussion Board post, this member needs to write this sentence filling the blanks: “I have completed my Course Journal for the current Unit. It contains ___ words and the two things I learned during this Unit about which I journaled about are ___ and ___.”
- This Chat Group member also needs to embed the six screenshots of your Chat Group’s six favorite spurious correlations (not their entire screen) from Tyler Vigen’s “Spurious Correlation” website and the three additional spurious correlations that your Chat Group found through an Internet search.
- If only two students participated in the Group Chat, then one of those two students needs to do two of the above three tasks.
- Before ending the Group Chat, arrange the date and time for the Group Chat you will need to hold during the next Unit (Unit 12: Assignment #5).
Congratulations, you have finished Unit 11! Onward to Unit 12! |
|