NANOx81 - Data Science in Materials Science

Welcome to the UCSD Course NANO 181/281 (“x81”) - Data Science in Materials Science. NANOx81 is a co-scheduled course for both undergraduates and graduate students. The aim is to provide a comprehensive introduction to the application of data science to materials science. The full syllabus is available here.

Approximate Course Schedule

Week Description
1 Course Admin
2 Introduction to Data Science, Python and Data Wrangling
3 Data Science in Materials Science
4 Lab 1 (Introduction to Python for Data Science and Data Wrangling)
5 Linear Methods
6 Unsupervised Learning & Kernel Methods
7 Lab 2 (Linear methods and clustering for materials science)
8 Trees and Neural Networks
9 & 10 Lab 3 (Kaggle competition)

Lecture materials

Slides

  1. Course Admin YouTube Video
  2. Python for Data Science YouTube Video
  3. Data Science in Materials Science YouTube Video
  4. Linear Methods YouTube Video
  5. Improving and extending linear models YouTube Video
  6. Linear Classification YouTube Video
  7. Unsupervised Learning YouTube Video
  8. Kernel Regression YouTube Video
  9. Generalized Additive Models and Trees YouTube Video
  10. Neural NetworksYouTube Video

Jupyter Notebooks

In-lecture demos will be conducted using Jupyter notebooks, available here.

Course textbooks

The course is intended to be self-contained and all textbooks are optional. However, the following are useful to have around:

  1. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition Amazon, or get the free online version.
  2. Python Data Science Handbook. Buy from Amazon or get the free online version.

Labs

There are three lab sessions. The instructions for the first two labs are available via the menu on the left. The final lab will be an open problem that will be determined at a later date and will be held via a Kaggle competition.

Programming language

All lectures and labs will be conducted in Python 3.9+.

For most students, especially those that are new to python, you can simply use the Google Colab cloud service to run all lecture notebooks and do all labs. The advantage of using Google Colab is that you do not need bother with installation of python and the necessary libraries in your local machine.

The main disadvantage of Google Colab is that you have to work in the cloud and often, the compute resources provided will not be as fast as running things on your laptop or any high performance computing system of your choosing.

For serious work, you can follow the instructions provided to install Python and the necessary libraries for this course.

Using Google Colab

  1. Go to Google Colab. Sign in with your Google account (preferably your UCSD one).
  2. If you are working on a lab or creating a new notebook for your own work, exit the textbox and select File->New Notebook from the menu.
  3. If you want to work with the lecture examples, select File-Open Notebook from the menu.
  4. Select the Github tab.
  5. Enter materialsvirtuallab into the Github organization field and click the magnifying glass.
  6. Under Repository, select the materialsvirtuallab/nano281 repository.
  7. Click on any of the notebooks to open them.

Google Colab


Copyright © 2019-2023 Shyue Ping Ong, Materials Virtual Lab