Syllabus
Course
NANO 181/281 Data Science in Materials Science (4)
Description
To provide a comprehensive introduction to the application of data science to materials science.
Prerequisites
Consent of Instructor
Textbook, Required Materials:
- The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition [Free]
- Python Data Science Handbook [Free]
Class/Laboratory Schedule
Two 80-minute lectures per week and three computational laboratory sessions
Course Topics
- Introduction to Data Science in Materials Science
- Python for Data Science
- Linear Methods for Regression
- Ordinary least squares
- Subset selection
- Regularization and Shrinkage
- Derived input directions
- Extending linear methods
- Transformations of inputs
- Piece-wise polynomials
- Basis expansion
- Linear Methods for Classification
- Discriminant Analysis
- Logistic regression
- Unsupervised learning
- Principal Component Analysis
- K-means
- Hierarchical and density-based clustering
- Kernel regression
- k nearest neighbor
- Kernel density estimation and classification
- Trees
- Decision trees
- Ensemble of trees
- Neural networks
Course Objectives
- To provide students with a foundation in data science techniques, with practical examples rooted in materials science domain applications.
- To provide hands-on experience in the use of the Python programming language for data science
- To inculcate best practices in developing and interpreting machine learning models for materials property predictions.
Methods of Evaluation
- Jupyter notebook reports of three hands-on laboratory sessions
Performance Criteria
Objective 1: (basic engineering knowledge and applications)
1.1 Understand the fundamentals of data science methods.
1.2 Understand the successes and limitations of each method, and the tradeoff between accuracy and cost.
1.3 Understand how to derive material and nano-scale properties from first principles calculations.
Objective 2: (methods and problem solving)
2.1 Choosing the most appropriate data science method for a particular application.
2.2 Ability to effectively manage and analyze large materials datasets.
2.3 Best practices in construction and evaluation of machine learning models.