Data Mining and Analysis

1.     Choose a health topic, disease, or trend that you want to find more information about. Note that this data and topic will be used throughout the semester for other assignments

2.      Use one of the websites covered in chapter 1 to find your data or any website you would like. Data is everywhere so if you find it, use it.

a.       CDC

b.      FDA/USDA

c.       Census

d.      Baltimore City Health Department

e.       Maryland Department of Health

f.        Any website you find interesting

3.      Enter your data in excel

a.       You need a minimum of sample of 50 to complete this assignment

b.      Example: if you choose cancer, you will need at least 50 cases of cancer

4.      Create a data dictionary on the second sheet of your excel file

a.       A data dictionary defines the variables in your data set

5.      You must include these variables in your data set. Feel free to add any other variables you want to add to your data set. This information may be useful to you

a.       Gender

i.      Female

ii.      Male

iii.      other

b.      Ethnicity

c.       Age

d.      Employment

e.       Education

f.        State

g.      County

6.      For each variable, have the following

a.       Number (count)

b.      Percent

c.       Rate (if applicable)

Assignment-Data Mining and Analysis

Use the data you downloaded and organized in part 1 to complete the following analyses in Microsoft Excel. You have been provided with instructional videos and tutorials by the professor to complete this assignment. Your rubric is attached.

For this assignment, you are only conducting data analysis. You will interpret the results in the next assignment.

Directions: Use 1 Excel file to complete all analyses. Complete each analysis on a different sheet. Note: Excel files have different sheets at the bottom of the page. Rename each sheet by the title of the analysis you are completing (example-change sheet 1 to Independent T Test)

Conduct the following statistical analyses in Excel

  1. Independent T-Test
  2. T-test
  3. Pearson’s Correlation Test
  4. ANOVA
  5. Chi Square Test

Assignment-Data Mining and Analysis Rubric

Organization & Formatting Information is very well organized by using appropriate variables and labels, as well as the appropriate advanced formatting, including shading, alignment tools, borders, special fonts, and appropriate column/row height & width Information is mostly organized by using appropriate variables and labels and the appropriate formatting Some Information is organized, using standard formatting tools.

Some labels or other important formatting tools are missing.

Computations & Formulas
  1. All data required is entered correctly.
  2. 100% use of correct applicable formulas as required.
  1. Most data required is entered with correctly.
  2. 100% use of correct applicable formulas as required.
  1. Several errors.
  2. Some required data may be missing.

Incorrect use of applicable formulas as required

Directions Followed all directions accurately and fully completed the assignment Followed most of the directions accurately and fully completed assignment Did not follow any of the directions or,

Followed some directions but missed many of the details required

Data Entry Successfully entered all data correctly Entered almost all data correctly with minor to no corrections needed Data entered but major corrections needed

Several errors noted

Data missing

