The Evans School and the eScience Institute offer two degree options for doctoral students in Public Policy and Management (PPM) program interested in credentialed data science training.

The Data Science Option (DSO) leads to a Doctor of Philosophy (Public Policy and Management: Data Science). The Advanced Data Science Option (ADSO) leads to a leads to a Doctor of Philosophy (Public Policy and Management: Advanced Data Science).

To be eligible for the DSO/ADSO, a doctoral student must be a full-time Ph.D. student in Public Policy and Management program at the Evans School, be in good academic standing, and have the approval of their faculty advisor and the faculty Ph.D. program director. Applicants to the ADSO will also need to show proof of prior training in computer science.

Data Science Option

  • Knowledge of theory and applications of machine learning, predictive analytics, and other sophisticated statistical techniques. 
  • Knowledge of a range of tools and processes used for managing and analyzing large and messy data. 
  • Knowledge of visualization methods and tools. 
  • Knowledge of ethical issues related to data science.
  • Opportunities to design and conduct data science dissertation projects that address important policy and management research questions.

Advanced Data Science Option

  • Knowledge of statistical theory, including frequentist and Bayesian techniques.
  • Knowledge of theory and applications of machine learning, predictive analytics, and other sophisticated statistical techniques.
  • Knowledge of a range of tools and processes used for managing and analyzing large and messy data.
  • Knowledge of visualization methods and tools.
  • Knowledge of ethical issues related to data science
  • Opportunities to design and conduct data science dissertation projects that address important policy and management research questions.
  • Capacity to develop innovative data science techniques in the field of program evaluation and policy analysis.

Students admitted to the DSO/ADSO must meet all the standard requirements of the PPM degree without the option.

In addition, the DSO requires at least two quarters of CHEME 599F eScience Community Seminar (1 credit per quarter) and satisfactory completion of 8 additional credits of coursework chosen from the classes listed in two of the three core data science areas 

  • Software development for data science 
    • CSE 583 Software Development for Data Scientists (4 credits)
    • CHEME 546 Software Engineering for Molecular Data Scientists (3 credits)
  • Statistics and machine learning
    • CSE 546 Machine Learning (4 credits)
    • CSE416/STAT416 Introduction to Machine Learning (4 credits)
    • STAT 527 Nonparametric Regression and Classification (3 credits)
    • STAT 509 Introduction to Mathematical Statistics (4 credits)
    • STAT 512/513 Statistical Inference (4 credits)
  • Data management and data visualization  
    • CSE 412: Introduction to Data Visualization(4 credits)
    • CSE 414 Introduction to Database Systems (4 credits) 
    • HCDE 411/511 Information for Visualization (5 credits)
    • INFO 474 Interactive Information Visualization (5 credits)  

The ADSO requires at least four quarters of CHEME 599F eScience Community Seminar (1 credit per quarter) and satisfactory completion of three classes in these four areas: 

  • Data Management: CSE 544 Principles of DBMS (4 credits) 
  • Machine Learning: CSE 546 Machine Learning (4 credits) or STAT 535 Statistical Learning: Modeling, Prediction, and Computing (3 credits). 
  • Data Visualization: CSE 512  (4 credits) 
  • Statistics: STAT 509 Introduction to Mathematical Statistics: Econometrics I (5 credits) or STAT 512-513 Statistical Inference (4 credits each). 

CHEM E 599F eScience Community Seminar: The eScience Community Seminar is open to all. The seminar serves as an informal environment for presentations and discussions on research that is relevant to all data science researchers around campus. The seminar takes place in the new Data Science Studio, located in the physics and astronomy building. Several times during the quarter, the seminar is replaced by the UW Data Science Seminar, which highlights external speakers from other research institutions and industry. Finally, there is at least one discussion session per quarter focused on ethical issues around Big Data & Data Science. Students periodically give talks about their research in addition to other data science researchers on campus.  

CSE 412: Introduction to Data Visualization: Introduction to data visualization design and use for both data exploration and explanation. Methods for creating effective visualizations using principles from graphic design, psychology, and statistics. Topics include data models, visual encoding methods, data preparation, exploratory analysis, uncertainty, cartography, interaction techniques, visual perception, and evaluation methods. 

CSE 414 Introduction to Database Systems: Introduces database management systems and writing applications that use such systems; data models, query languages, transactions, database tuning, data warehousing, parallelism. Intended for non-majors. 

CSE 416/STAT 416 Introduction to Machine Learning: Provides practical introduction to machine learning. Modules include regression, classification, clustering, retrieval, recommender systems, and deep learning, with a focus on an intuitive understanding grounded in real-world applications. Intelligent applications are designed and used to make predictions on large, complex datasets. Intended for non-majors. Prerequisite: CSE 143 or CSE 160; and STAT 311 or STAT 390 Offered: jointly with STAT 416. 

CSE 512 Data Visualization (4 credits): Techniques and algorithms for creating effective visual displays of information based on principles from graphic design, perceptual psychology, cognitive science and statistics. Topics include data and image models, visual encoding methods, graphical perception, color, animation, interaction techniques, graph layout, and automated design. Methods of presenting complex information to enhance comprehension and analysis. Incorporating visualization techniques into human-computer interfaces.  

CSE 544 Principles of DBMS: This course covers the principles of data management. The course includes topics related to effectively using a data management system (locally or in a public cloud), building applications on top of such a system, and building the internals of such a system. Detailed topics include: the relational data model, the relational algebra, and SQL, query execution and optimization, transaction processing and recovery, views, data integration, ETL, OLAP, warehousing, big data management and analytics, parallel databases (shared-nothing architectures, parallel query processing, fault-tolerance, skew), modern systems (main memory databases, key-value stores, NoSQL, column-oriented databases), non-relational data models (key-value, trees, graphs, arrays, streams), general principles concerning installation and tuning of a database system and using a data management system in a public cloud.  

CSE 546 Machine Learning (4 credits) or STAT 535 Statistical Learning: Modeling, Prediction, and Computing (3 credits): Practical methods for identifying valid, novel, useful, and understandable patterns in data. Basic statistics. Induction of predictive models from data: classification, regression, probability estimation. Discovery of clusters and association rules. Methods for designing systems that learn from data and improve with experience. Supervised learning and predictive modeling: decision trees, rule induction, nearest neighbors, Bayesian methods, neural networks, support vector machines, and model ensembles. Unsupervised learning and clustering. Emphasis will be on the ability to use and understand various machine learning methods, rather than an in-depth study of theoretical considerations.  

CSE 583 Software Development for Data Scientists: Provides students outside of CSE with a practical knowledge of software development that is sufficient to do graduate work in their discipline. Modules include Python basics, software version control, software design, and using Python for machine learning and visualization.  

HCDE 411/511 Information for Visualization: This course covers the factors contributing to the creation of successful visualizations. This course also focuses on how to present information clearly and effectively. Specific topics included: The design and presentation of digital information. Use of graphics, animation, sound, visualization software, and hypermedia in presenting information to the user. Vision and perception. Methods of presenting complex information to enhance comprehension and analysis. Incorporation of visualization techniques into human-computer interfaces. 

INFO 474 Interactive Information Visualization: Techniques and theory for visualizing, analyzing, and supporting interaction with structured data like numbers, text, and relations. Provides practical experience designing and building interactive visualizations for the web. Exposes students to cognitive science, statistics, and perceptual psychology. An empirical approach will be used to design and evaluate visualizations. 

STAT 509 Introduction to Mathematical Statistics: Econometrics I (5 credits) or STAT 512-513 Statistical Inference (4 credits each): Examines methods, tools, and theory of mathematical statistics. Covers, probability densities, transformations, moment generating functions, conditional expectation. Bayesian analysis with conjugate priors, hypothesis tests, the Neyman-Pearson Lemma. Likelihood ratio tests, confidence intervals, maximum likelihood estimation, Central limit theorem, Slutsky Theorems, and the delta-method.  

STAT 512-513 Statistical Inference: STAT 512-513 covers much of the above, though provides a bit more rigorous treatment and goes more in depth on some of the theoretical concepts. It does not cover Bayesian statistical inference (3rd bullet above). 

STAT 527 Nonparametric Regression and Classification: Covers techniques for smoothing and classification including spline models, kernel methods, generalized additive models, and the averaging of multiple models. Describes measures of predictive performance, along with methods for balancing bias and variance. Prerequisite: either STAT 502 and STAT 504 or BIOST 514 and BIOST 515.