Unit 05

Unit 5:
Modeling Data
Unit 5: Assignment #1 (due before 11:59 pm Central on MON JUL 1):

  1. To become familiar with the three measures of central tendency and how they are calculated, read Kahn Academy’s (n.d.) article, “Mean, Median, and Mode.”
  2. Learn why it’s important to use Google to learn something you don’t know how to do by reading this handout about data scientists who frequently Google to learn how to do things.
  3. To learn how to calculate three measures of central tendency — the mean, the median, and the mode — in your chosen data management platform, search the Internet for tutorials or how-to guides that are specific to your chosen data management platform.
    1. Some how-to guides may cover the mean, the median, and the mode all in one tutorial. Other how-to guides may cover only the mean in one tutorial, the mode in another tutorial, and so forth. That’s ok. But you do need to find a how-to guide for calculating
      • the mean,
      • the median, and
      • the mode.
    2. The how-to guides you find can be in any format (e.g., video, written text, figures, or the like — or a combination formats). However, all the how-to guides you find must be from the Internet and not from other sources (e.g., textbooks or friends).
  4. After you’ve found a how-to guide (or guides) for calculating the mean, the median, and the mode for your chosen data management platform, calculate the mean, the median, and the mode for your assigned Age Data Set.
    1. First, open your Age Data spreadsheet, YourLastName_PSY-210_Unit03_AgeFrequency
    2. Second, following the how-to guide you’ve found, calculate the mean, the median, and the mode for your assigned Age Data Set.
    3. Third, write down the mean, the median, and the mode that you calculated for your assigned Age Data Set. When you need to use decimals (which you will need for reporting the mean and the median), remember to use three decimal places (e.g., 32.394).
  5. Next, calculate the mean, the median, and the mode for your assigned Height Data Set using your chosen data management platform.
    1. First, open your Height Data spreadsheet, YourLastName_PSY-210_Unit03_HeightFrequency
    2. Second, following the how-go guide you’ve found, calculate the mean, the median, and the mode for your assigned Height Data Set.
    3. Third, write down the mean, the median, and the mode that you calculated for your assigned Height Data Set. When you need to use decimals (which you will need for reporting the mean and the median), remember to use three decimal places (e.g., 65.231).
  6. From the Course How To, learn “How To Embed a URL into a Discussion Board Post.”
  7. Go to the Unit 5: Assignment #1 Discussion Board and make a new Discussion Board post in which you do the following:
    1. First, in the first sentence of your Discussion Board post, state your unique data set number (e.g., “My unique data set number is 001”).
    2. Second, define in your own words and in no more than one sentence, each of these three measures of central tendency:
      • the mean
      • the median
      • the mode
    3. Third, embed the three links to the three how-to guides you found and used to calculate the mean, the median, and the mode in your chosen data management platform.
      • Be sure to embed one link for EACH of the three measures of central tendency (one for the mean, one for the median, and one for the mode).
      • If you used the same how-to guide for learning more than one measure of central tendency, you should embed the same link more than once (you should have three links total).
      • Remember to embed your links using the procedures you learned from the Course How To.
    4. Fourth, report the mean, the median, and the mode that you calculated with your chosen data management platform for your assigned Age Data Set, using statements such as
      • The mean of my assigned Age Data set = xx.xxx years
      • The median of my assigned Age Data set = xx.xxx years
      • The mode of my assigned Age Data set = xx years
    5. Fifth, report the mean, the median, and the mode that you calculated with your chosen data management platform for your assigned Height Data Set, using statements such as
      • The mean of my assigned Height Data set = xx.xxx inches
      • The median of my assigned Height Data set = xx.xxx inches
      • The mode of my assigned Height Data set = xx inches

Unit 5: Assignment #2 (due before 11:59 pm Central on MON JUL 1):

  1. Using the skills you learned in Unit 5: Assignment #1, and using your chosen data management platform, calculate the mean, the median, and the mode for Course A and for Course B in the Final Grade Data Set.
    1. First, download the .csv file, Final Grade Data Set, which includes hypothetical final course grades (as percentages) from two hypothetical courses, Course A and Course B.
      • If you are using the browser Chrome or the browser Firefox, click on the link for your data set, below. When prompted, save the file to your PSY-210_Summer2021_Unit05 folder.
      • If you are using the browser Safari, right-click on the link for your data set, below, and select “Download Linked File.”
    2. Second, import the Final Grade Set .csv file into a blank spreadsheet in your chosen data management platform.
      • Remember to follow Andrews’ (2020) Import Data how-to article.
      • Then, save the new spreadsheet in which you have imported the Final Grade Data Set, naming the file, YourLastName_PSY-210_Unit05_FinalGradeData
    3. Third, using your chosen data management platform, calculate the mean, the median, and the mode for Course A.
      • Write down the mean, the median, and the mode for Course A.
      • When you need to use decimals (which you will need for reporting the mean and the median), remember to use three decimal places (e.g., 70.755%).
      • If your data management platform converts percentages (e.g., 70.755%) to proportions (e.g., .70755), you will need to search the Internet for a how-to guide to learn how to use your data management platform to convert proportions (e.g., .70755) back to percentages (e.g., 70.755%).
    4. Fourth, using your chosen data management platform, calculate the mean, the median, and the mode for Course B.
      • Write down the mean, the median, and the mode for Course B.
      • When you need to use decimals (which you will need for reporting the mean and the median), remember to use three decimal places (e.g., 70.755%).
      • If your data management platform converts percentages (e.g., 70.755%) to proportions (e.g., .70755), you will need to search the Internet for a how-to guide to learn how to use your data management platform to convert proportions (e.g., .70755) back to percentages (e.g., 70.755%).
  2. To become familiar with what variability is and how it is measured, read Open Statistics Education’s (n.d.) “Measures of Variability.”
  3. To learn how to calculate three measures of variability — the range, the variance, and the standard deviation — search the Internet for tutorial or how-to guides that are specific to your chosen data management platform.
    1. Some how-to guides may cover the range, the variance, and the standard deviation all in one tutorial. Other how-to guides may cover only the range in one tutorial, the variance in another tutorial, and so forth. Again, that’s ok. But you need to find a how-to guide for calculating
      • the range,
      • the variance, and
      • the standard deviation.
    2. The how-to guides you find can be in any format (video, written text, figures, or the like — or a combination of formats). However, all the how-to guides you find must be from the Internet and not from other sources (e.g., textbooks or friends).
  4. After you’ve found a how-to guide (or guides) for calculating the range, the variance, and the standard deviation using your chosen data management platform:
    1. Use your chosen data management platform to calculate the range, the variance, and the standard deviation for Course A.
      • Write down the range, the variance, and the standard deviation for Course A.
      • When you need to use decimals, remember to use three decimal places
      • The range and the standard deviation are reported in their original units, which will be percentages for these data (e.g., 6.932%); however, the variance is not reported in its original units because the variance is calculated in squared units.
    2. Use your chosen data management platform to calculate the range, the variance, and the standard deviation for Course B.
      • Write down the range, the variance, and the standard deviation for Course B.
      • When you need to use decimals, remember to use three decimal places
      • The range and the standard deviation are reported in their original units, which will be percentages for these data (e.g., 6.932%); however, the variance is not reported in its original units because the variance is calculated in squared units.
  5. Go to the Unit 5: Assignment #2 Discussion Board and make a new Discussion Board post in which you do the following:
    1. First, define in your own words and in no more than one sentence, each of these three measures of variability:
      • the range
      • the variance
      • the standard deviation
    2. Second, embed the three links to the three how-to guides you found and used to calculate the range, the variance, and the standard deviation in your chosen data management platform.
      • Be sure to embed one link for EACH of the three measures of variability (one for the range, one for the variance, and one for the standard deviation).
      • If you used the same how-to guide for learning more than one measure of variability, you should embed the same link more than once (you should have three links total).
      • Remember to embed your links using the procedures you learned from the Course How To.
    3. Third, report the mean, the median, and the mode that you calculated for Course A, using statements such as
      • The mean final grade in Course A = xx.xxx%
      • The median final grade in Course A = xx.xxx%
      • The mode final grade in Course A = xx%
    4. Fourth, report the mean, the median, and the mode that you calculated for Course B, using statements such as
      • The mean final grade in Course B = xx.xxx%
      • The median final grade in Course B = xx.xxx%
      • The mode final grade in Course B = xx%
    5. Fifth, report the range, the variance, and the standard deviation that you calculated for Course A, using statements such as
      • The range of final grades in Course A = xx%
      • The variance of final grades in Course A = xx.xxx (note that variances are not reported in the original units, e.g., % in this case, because they are in squared units)
      • The standard deviation of final grades in Course A = xx.xxx%
    6. Fourth, report the range, the variance, and the standard deviation that you calculated for Course B, using statements such as
      • The range of final grades in Course B = xx%
      • The variance of final grades in Course B = xx.xxx (note that variances are not reported in the original units, e.g., % in this case, because they are in squared units)
      • The standard deviation of final grades in Course B = xx.xxx%

Unit 5: Assignment #3 (due before 11:59 pm Central on TUE JUL 2):

  1. To reinforce the skills you’ve learned previously, you will now create a Frequency Distribution Table and a Histogram for the final grades from Course A using the Final Grade Data Set.
    1. First, open the spreadsheet you saved as YourLastName_PSY-210_Unit05_FinalGradeData.
    2. Second, create a Frequency Distribution Table for final grades in Course A.
      • Use ranges of 5%, starting with the range 61% to 65% and ending with the range 96% to 100% (e.g., 61% to 65%, 66% to 70%, 71% to 75%, and so forth). Therefore, your first range will be 61% to 65%, and your last range will be 96% to 100%.
      • Your Course A Frequency Distribution Table must include the following:
        • a column for the Minimum range value
        • a column for the Maximum range value
        • a column for Absolute Frequency
        • a total for Absolute Frequency
        • a column for Relative Frequency
        • a total for Relative Frequency
    3. Third, create a Histogram for the final grades in Course A.
      • Again, use ranges of 5%, for your Histogram bins, starting with the range 61% to 65% and ending with the range 96% to 100% (e.g., 61% to 65%, 66% to 70%, 71% to 75%, and so forth). Therefore, your first Histogram bin will be 61% to 65%, and your last Histogram bin will be 96% to 100%.
      • Your Histogram must include the four major components of a graph:
        • a Graph Title
        • Axis Labels
        • Graph Units
        • Graph Data
    4. Fourth, after creating your Course A Frequency Distribution Table and Histogram, be sure to save (again) your spreadsheet, which should already be named, YourLastName_PSY-210_Unit05_FinalGradeData
    5. Fifth, take a partial screenshot (not a screenshot of your entire screen) of your Course A Histogram (not your entire screen and not your Frequency Distribution Table, only your Histogram) and save the screenshot with the filename YourLastName_PSY-210_Unit05_CourseA_Histogram_Screenshot.xxx (where xxx is the file type, for example, .jpg, .png, .jpeg, and the like).
  2. Next, create a Frequency Distribution Table and a Histogram for the final grades from Course B using the Final Grade Data Set.
    1. First, open the spreadsheet you saved as YourLastName_PSY-210_Unit05_FinalGradeData.
    2. Second, create a Frequency Distribution Table for the final grades in Course B.
      • Use ranges of 5%, starting with the range 61% to 65% and ending with the range 96% to 100% (e.g., 61% to 65%, 66% to 70%, 71% to 75%, and so forth). Therefore, your first range will be 61% to 65%, and your last range will be 96% to 100%.
      • Your Course B Frequency Distribution Table must include the following:
        • a column for the Minimum range value
        • a column for the Maximum range value
        • a column for Absolute Frequency
        • a total for Absolute Frequency
        • a column for Relative Frequency
        • a total for Relative Frequency
    3. Third, create a Histogram for the final grades in Course B.
      • Again, use ranges of 5%, for your Histogram bins, starting with the range 61% to 65% and ending with the range 96% to 100% (e.g., 61% to 65%, 66% to 70%, 71% to 75%, and so forth). Therefore, your first Histogram bin will be 61% to 65%, and your last Histogram bin will be 96% to 100%.
      • Your Histogram must include the four major components of a graph:
        • a Graph Title
        • Axis Labels
        • Graph Units
        • Graph Data
    4. Fourth, to make sure that your Course A Histogram and your Course B Histogram are plotted on the same scale (and, therefore, do not misrepresent the data by distorting the scale), learn how to adjust the y-axis by reading this handout.
    5. Fifth, after creating your Course B Frequency Distribution Table and Histogram, be sure to save (again) your spreadsheet, which should already be named, YourLastName_PSY-210_Unit05_FinalGradeData
    6. Fifth, take a partial screenshot (not a screenshot of your entire screen) of your Course B Histogram (not your entire screen and not your Frequency Distribution Table, only your Histogram) and save the screenshot with the filename YourLastName_PSY-210_Unit05_CourseB_Histogram_Screenshot.xxx (where xxx is the file type, for example, .jpg, .png, .jpeg, and the like).
  3. To learn about the shape of distributions, read Poldrack’s (2020) Chapter 3 “Summarizing Data: Idealized Representations of Distributions.”
  4. After creating your two histograms and reading the Chapter from Poldrack (2020), compare your two histograms to one another, taking notes on the following:
    1. Describe the shape of the distribution: Is it clustered and peaked? Or is it short and elongated?
    2. Describe the variability of each distribution: Are the grades for one course more variable than the other?
    3. Describe what the variability of the final grades means for students in that course. If one course is more variable than the other, what does that mean for students?
  5. Now, imagine you have the option to enroll in either Course A or Course B. Based on what you know about the variability of the final grades in each course, decide which course you would rather enroll in, and why.
  6. Go to the Unit 5: Assignment #3 Discussion Board and make a post in which you:
    1. Embed your two screenshots:
      • one screenshot of the Histogram for Course A (YourLastName_PSY-210_Unit05_CourseA_Histogram_Screenshot.xxx)
      • one screenshot of the Histogram for Course B (YourLastName_PSY-210_Unit05_CourseB_Histogram_Screenshot.xxx)
    2. State which course you would rather be a member of (Course A or Course B).
    3. In two to three sentences, explain why you would rather be a member of Course A or Course B by referencing both the shape of the distribution and at least one measure of variability (range, standard deviation, variance).

Unit 5: Assignment #4 (due before 11:59 pm Central on TUE JUL 2):

  1. Read Andrews and Gernsbacher’s (2020) lecture transcript, “What Does It Mean to Flatten the Curve?” While reading this lecture transcript, make sure you understand the following:
    1. the difference between flattened (spread out) versus peaked (clustered) distributions;
    2. how normal curves can be fit onto histograms; and
    3. how the same number of points and the same number of cases can have two different distributions.
  2. Teach what it means to flatten the curve to three different people (friends, family members, roommates, and the like).
    1. You can teach each person via email, phone, text, Facebook, Zoom, in person, or any other communication medium.
    2. But you must teach what it means to flatten the curve to three different people at three different times.
    3. When you are teaching what it means to flatten the curve to three different people, make sure you explain clearly the following:
      • the difference between flattened (spread out) versus peaked (clustered) distributions;
      • how normal curves can be fit onto histograms; and
      • how the same number of points and the same number of cases can have two different distributions.
    4. To make sure that each of the three people understands what it means to flatten the curve, test each person on their understanding.
  3. Go to the Unit 5: Assignment #4 Discussion Board and make a new Discussion Board post of at least 200 words in which you
    1. identify the medium (text message, email, Zoom, Facebook, phone call, in-person, etc) you used to teach each of the three persons what it means to flatten the curve;
    2. state each of the three persons’ initials (e.g., MG) and their approximate age; and
    3. report the test(s) you used to test each person’s understanding of flatten the curve.

Unit 5: Assignment #5 (due before 11:59 pm Central on WED JUL 3):

  1. Meet online with your small Chat Group for a one-hour text-based Group Chat at the time and date that your Chat Group previously arranged. BEFORE MEETING ONLINE:
    1. First, to learn why it’s important to double-check and swap-check your statistical analyses:
    2. Second, to learn what a model is in statistical thinking:
    3. Third, to cement your understanding of what a model is in statistical thinking, read Poldrack’s (2020) Chapter 5, “Fitting Models to Data.”
      • After reading Poldrack’s chapter, you should now know that measures of central tendency, such as the mean, the mode, and the median, and measures of variability, such as the range, the variance, and the standard deviation, are statistical models: They represent a phenomenon; in this case, the phenomenon the models are representing are the data set the measures of central tendency and measures of variability are describing.
    4. Fourth, to understand how, in some cases, the mean (the average) is a poorer model of a set of data than is the range, read Harvard Business Review’s (2002) article, “The Flaw of Averages.”
    5. Fifth, to complete your preparation prior to your Group Chat, make sure that each Chat Group member still has access to the combined spreadsheet they created during Unit 3’s Group Chat, which should have been saved (in your chosen data management platform) with the filename, YourLastName_PSY-210_Unit03_Combined_FiveData.
      • Other than reading all the above and ensuring that each Chat Group member has access to the Combined_FiveData spreadsheet they created during Unit 3’s Group Chat, DO NOT begin working on any of the steps listed below until your Chat Group begins their one-hour Group Chat.
  2. DURING your one-hour Group Chat:
    1. First, each Chat Group member needs to work independently, using their chosen data management platform and their individual copy of the Combined_FiveData spreadsheet, to calculate the mean, the median, and the mode of your Chat Group’s combined data for Age (in years).
      • After each Chat Group member has finished computing these values, share, as a group, the values each Chat Group member observed.
      • If any Chat Group members observed different values, work, as a group, to resolve the discrepancy.
    2. Second, each Chat Group member needs to work independently, using their chosen data management platform and their individual copy of the Combined_FiveData spreadsheet, to calculate the mean, the median, and the mode of your Chat Group’s combined data for Height (in inches).
      • After each Chat Group member has finished computing these values, share, as a group, the values each Chat Group member observed.
      • If any Chat Group members observed different values, work, as a group, to resolve the discrepancy.
    3. Third, each Chat Group member needs to work independently, using their chosen data management platform and their individual copy of the Combined_FiveData spreadsheet, to calculate the range, the variance, and the standard deviation of your Chat Group’s combined data for Age (in years).
      • After each Chat Group member has finished computing these values, share, as a group, the values each Chat Group member observed.
      • If any Chat Group members observed different values, work, as a group, to resolve the discrepancy.
    4. Fourth, each Chat Group member needs to work independently, using their chosen data management platform and their individual copy of the Combined_FiveData spreadsheet, to calculate the range, the variance, and the standard deviation of your Chat Group’s combined data for Height (in inches).
      • After each Chat Group member has finished computing these values, share, as a group, the values each Chat Group member observed.
      • If any Chat Group members observed different values, work, as a group, to resolve the discrepancy.
    5. Fifth, discuss, as a group, the value of data-checking.
      • What value did you personally experience?
      • What value do you see for psychological scientists more generally?
    6. Sixth, discuss, as a group, the meaning of a model in statistical thinking.
      • Does the concept of a model in statistical thinking make sense to you?
      • Which model of your combined Age data does your Chat Group think is the most representative of your Chat Group’s data?
        • Is it the mean, the mode, the median, the range, the variance, or the standard deviation?
        • Why do you think that model is the most representative of your Age data?
      • Which model of the combined Height data does your Chat Group think is the most representative of your Chat Group’s data?
        • Is it the mean, the mode, the median, the range, the variance, or the standard deviation?
        • Why do you think that model is the most representative of your Chat Group’s Height data?
  3. AT THE END of your one-hour Group Chat:
    1. Nominate one member of your Chat Group (who participated in the Chat) to make a post on the Unit 5: Assignment #5 Discussion Board that summarizes your Group Chat in at least 200 words. This Chat Group member should not post their 200-word summary of your Group Chat until they have completed their Course Journal for the current Unit.
      • At the end of the 200-word summary, this member needs to write this sentence filling the blanks: “I have completed my Course Journal for the current Unit. It contains ___ words and the two things I learned during this Unit about which I journaled about are ___ and ___.”
      • This Chat Group member also needs to report the following:
        • The mean Age in our Chat Group’s combined data = xx.xxx years
        • The median Age in our Chat Group’s combined data = xx.xxx years
        • The mode Age in our Chat Group’s combined data = xx years
    2. Nominate one member of your Chat Group (who participated in the Group Chat using the browser Chrome on their laptop, rather than on their mobile device) to save the Chat transcript, as described in the Course How To (under the topic, “How To Save and Attach a Chat Transcript”).
      • This member of the Chat Group needs to make a post on the Unit 5: Assignment #5 Discussion Board and attach the Chat transcript, saved as a PDF, to that Discussion Board post. This Chat Group member should not post the transcript of your Group Chat until they have completed their Course Journal for the current Unit.
        • In their Discussion Board post, this member needs to write this sentence filling the blanks: “I have completed my Course Journal for the current Unit. It contains ___ words and the two things I learned during this Unit about which I journaled about are ___ and ___.”
      • Remember to attach the Chat transcript by clicking on the word “Attach.” (Do not click on the sidebar menu “Files.”)
      • This Chat Group member also needs to report the following:
        • The mean Height in our Chat Group’s combined data = xx.xxx inches
        • The median Height in our Chat Group’s combined data = xx.xxx inches
        • The mode Height in our Chat Group’s combined data = xx inches
    3. Nominate a third member of your Chat Group (who also participated in the Chat) to make another post on the Unit 5: Assignment #5 Discussion Board that states the name of your Chat Group, the names of the Chat Group members who participated the Chat, the date of your Chat, and the start and stop time of your Group Chat. This Chat Group member should not post the names, date, and times of your Group Chat until they have completed their Course Journal for the current Unit.
      • In their Discussion Board post, this member needs to write this sentence filling the blanks: “I have completed my Course Journal for the current Unit. It contains ___ words and the two things I learned during this Unit about which I journaled about are ___ and ___.”
      • This Chat Group member also needs to report the following:
        • The range of Age in our Chat Group’s combined data = xx.xxx years
        • The variance of Age in our Chat Group’s combined data = xx.xxx
        • The standard deviation of Age in our Chat Group’s combined data = xx.xxx years
        • The range of Height in our Chat Group’s combined data = xx.xxx inches
        • The variance of Height in our Chat Group’s combined data = xx.xxx
        • The standard deviation of Height in our Chat Group’s combined data = xx.xxx inches
    4. If only two students participated in the Chat, then one of those two students needs to do two of the above three tasks.
    5. Before ending the Group Chat, arrange the date and time for the Group Chat you will need to hold during the next Unit (Unit 6: Assignment #6).

Congratulations, you have finished Unit 5! Onward to Unit 6!