Second Semester
DATA WRANGLING
This course will teach students how to retrieve data from disparate sources, combine it into a unified format, and prepare it for effective analysis. This aspect of data science is often estimated to be upwards of 80% of the effort in a typical analytics process. Students will learn how to read data from a variety of common storage formats, evaluate its quality, and learn various techniques for data cleansing. Students will also learn how to select appropriate features for analysis, transform them into more usable formats, and engineer new features into more powerful predictors. This class will also teach students how to split the data set into training and validation data for more effective analytical modeling.
BAYESIAN STATISTICS FOR DATA SCIENCE
This course introduces Bayesian statistical methods that enable data analysts and scientists to combine information from similar experiments, account for complex spatial, temporal, and other relationships, and also incorporate prior information or expert knowledge into a statistical analysis. This course explains the theory behind Bayesian methods and their practical applications, such as social network analysis, predicting crime risk, or predicting credit fraud. The course emphasizes data analysis through the use of modern analytic programming languages.
TIME SERIES ANALYSIS
This course introduces the theory and application of statistical methods for the analysis of data that have been observed over time. Students will learn techniques for working with time series data and how to account for the correlation that may exist between measurements that are separated by time. The class will concentrate on both univariate and multivariate time series analysis, with a balance between theory and applications. Students will complete a time series analysis project using real-world scenario and data set
SOFTWARE METHODS
Students learn the tools, methodology, and etiquette in software development, focusing upon developing data science applications, tools, and analytical workflows in collaborative environments. Data scientists are at the nexus of software engineering, science, and business. In order to thrive in this world, they must work collaboratively across these fields and skill sets, while ensuring that work is accessible and digestible to everyone involved. Moreover, they must ensure their work is production-worthy and extensible. This course teaches all of the elements, both technical and conceptual, to create productive, helpful, and professional data scientists.
MACHINE LEARNING
This course introduces the theory and application of machine learning for uncovering patterns and relationships contained in large data sets. Machine learning algorithms offer a complimentary set of analytical techniques to statistical methods. Students will be exposed to the theory underlying supervised and unsupervised learning methods. Practical application of these techniques will be introduced using R. Additionally, students will learn proper techniques for developing, training, and cross validating predictive models; bias versus variance; and will explore the practical usage of these techniques in business and scientific environments.
ADVANCED MACHINE LEARNING
This course builds on the fundamentals introduced in ANLT 222 Machine Learning, by studying examining more machine learning algorithms and neural network topologies and studying their respective applications. The course includes an overview of the TensorFlow language, Decision Tree methods, and an introduction to Natural Language Processing (NLP).
WEEKLY HOT TOPICS
This course consists of a set of weekly presentations and discussions around key analytic issues and current case studies. These hot topics will be presented by a combination of guest speakers-industry luminaries in the area of analytics-and University of the Pacific faculty members, including the MS analytics program director. Many of these topics will be drawn from relevant real-world contemporary analytic stories that reinforce specific elements of the academic content being taught and can not be predicted in advance.