Skip to content

Course Syllabus

MATH 3080 Foundations of Data Science

  • Division: Natural Science and Math
  • Department: Mathematics
  • Credit/Time Requirement: Credit: 3; Lecture: 3; Lab: 0
  • Prerequisites: Math 1210 and (either Math 2040 or Math 3040) with a C or better in each course
  • Semesters Offered: Spring
  • Semester Approved: Fall 2020
  • Five-Year Review Semester: Fall 2025
  • End Semester: Spring 2026
  • Optimum Class Size: 20
  • Maximum Class Size: 25

Course Description

Students will get an introduction to Python programming, data analysis tools, and the necessary statistics to acquire, clean, analyze, explore, and visualize data real-life data sets. Using statistics, students will learn to make data-driven inferences and decisions, and to communicate those results effectively.

Justification

Data collection and the analysis of data is ubiquitous and fast becoming a prerequisite to economic success for businesses. This course provides a subset of the tools necessary to leverage data for prediction. This course will support the bachelor’s in software engineering degree by providing relevant mathematics coursework.

Student Learning Outcomes

  1. Students will acquire data through we-scraping and data APIs.
  2. Students will clean and reshape messy datasets.
  3. Students will learn to use statistical software to deploy statistical methods including generalized linear regression, cluster analysis, and classification.
  4. Students will apply dimensionality reduction and perform basic analysis of network data.
  5. Students will evaluate outcomes, make decisions based on data, and effectively communicate those results.
  6. Students will understand and be able to apply the theoretical foundations underlying the methods applied throughout the course.

Course Content

This course will include introduction to data analysis tools in Python, descriptive statistics, data structures with Numpy & Pandas, introductory hypothesis testing & statistical inference, web scraping and data acquisition via APIs, generalized linear regression, classification methods including logistic regression; k-nearest neighbors; decision trees; support vector machines; and neural networks, data visualization, clustering methods, dimensionality reduction; including principle component analysis; network analysis; rating, ranking, and elections, cleaning and reformatting messy datasets using regular expression or dedicated tools such as open refine; natural language processing; ethics of big data. This course supports a learning environment where perspectives are recognized, respected and seen as a source of strength.