NANOx81 - Data Science in Materials Science
Welcome to the UCSD Course NANO 181/281 (“x81”) - Data Science in Materials Science. NANOx81 is a co-scheduled course for both undergraduates and graduate students. The aim is to provide a comprehensive introduction to the application of data science to materials science. The full syllabus is available here.
Approximate Course Schedule
Week | Description |
---|---|
1 | Course Admin |
2 | Introduction to Data Science, Python and Data Wrangling |
3 | Data Science in Materials Science |
4 | Lab 1 (Introduction to Python for Data Science and Data Wrangling) |
5 | Linear Methods |
6 | Unsupervised Learning & Kernel Methods |
7 | Lab 2 (Linear methods and clustering for materials science) |
8 | Trees and Neural Networks |
9 & 10 | Lab 3 (Kaggle competition) |
Lecture materials
Slides
- Course Admin
- Python for Data Science
- Data Science in Materials Science
- Linear Methods
- Improving and extending linear models
- Linear Classification
- Unsupervised Learning
- Kernel Regression
- Generalized Additive Models and Trees
- Neural Networks
Jupyter Notebooks
In-lecture demos will be conducted using Jupyter notebooks, available here.
Course textbooks
The course is intended to be self-contained and all textbooks are optional. However, the following are useful to have around:
- The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition Amazon, or get the free online version.
- Python Data Science Handbook. Buy from Amazon or get the free online version.
Labs
There are three lab sessions. The instructions for the first two labs are available via the menu on the left. The final lab will be an open problem that will be determined at a later date and will be held via a Kaggle competition.
Programming language
All lectures and labs will be conducted in Python 3.9+.
For most students, especially those that are new to python, you can simply use the Google Colab cloud service to run all lecture notebooks and do all labs. The advantage of using Google Colab is that you do not need bother with installation of python and the necessary libraries in your local machine.
The main disadvantage of Google Colab is that you have to work in the cloud and often, the compute resources provided will not be as fast as running things on your laptop or any high performance computing system of your choosing.
For serious work, you can follow the instructions provided to install Python and the necessary libraries for this course.
Using Google Colab
- Go to Google Colab. Sign in with your Google account (preferably your UCSD one).
- If you are working on a lab or creating a new notebook for your own work, exit the textbox and select
File->New Notebook
from the menu. - If you want to work with the lecture examples, select
File-Open Notebook
from the menu. - Select the
Github
tab. - Enter
materialsvirtuallab
into the Github organization field and click the magnifying glass. - Under
Repository
, select thematerialsvirtuallab/nano281
repository. - Click on any of the notebooks to open them.