2021 Summer boot camp

July 12-15, 2021

Overview

The CAES Summer Boot Camp in Data Science is a virtual crash course, taking place on July 12-15 to educate researchers and students and enable rapid implementation of data science tools in their research. The virtual boot camp invites data science experts within and outside of CAES institutions and industry to give tutorials and presentations from data science basics to applications. Topics include data mining process, scientific visualization, machine learning, and industrial applications. The boot camp also hosts a research discussion panel for participants to share experiences and establish collaboration.

The boot camp is open to students, faculty, and researchers interested in using data science tools in their research. No prior knowledge of the tools to be presented is needed.

Event Materials

Z

Agenda

Videos and Powerpoints will be posted after the event.

Registration Instructions

Go here to register. Closer to the workshop dates, organizers will send meeting links for online participation.

In order to facilitate the best learning environment for participants, you agree to appear for your online seat by registering. If you cannot attend, please email researchcomputing@boisestate.edu so we can open the seat for someone else.

Agenda

The following workshops will occur remotely on the listed dates. Please register only for the workshops you plan to attend, and register for as much or as little of the program as your schedule allows.

 

Date

Time (MST)

Instructor(s)

Institution(s)

Tutorial

July 12

9am-noon

Randall Reese

Idaho National Laboratory

RShiny and PyFlask: Building Interactive Web Apps for Data

July 12

1pm-5pm

Leslie Kerby

Idaho State University

NumPy, Pandas, and Scikit-Learn: Prediction with Decision Trees

July 13

9am-noon

Steve Cutchin

Boise State University

Visualization with ParaView

July 13

1pm-2pm

Roba Binyahib

University of Oregon

TBD

July 13

2pm-5pm

Eric Brugger

Lawrence Livermore National Laboratory

TBD

July 14

9am-10:15am

Benjamin Afflerbach

University of Wisconsin-Madison

An Introduction to Machine Learning for Materials Science: A Basic Workflow for Predicting Materials Properties

July 14

10:15am-noon

Ryan Jacobs

University of Wisconsin-Madison

The Materials Simulation Toolkit for Machine Learning (MAST-ML): Automating Development and Evaluation of Machine Learning Models for Materials Property Prediction

July 14

1pm-2pm

Mahmood Mamivand

Boise State University

The Informatics Skunkworks: A Program for Undergraduate Research at the Interface of Data Science and Materials Science and Engineering

July 14

2pm-5pm

Paul Bodily

Idaho State University

Crossing Darwin and Computer Science: The Staying Power of Evolutionary Algorithms

July 15

9am-noon

Sara Ewing, Matty Jones, Conrad Kennington

Kount

A Day in the Life of a Kount Data Scientist

July 15

1pm-3pm

Local Industry Expert(s)

TBD

Industry Applications of Artificial Intelligence / Machine Learning

July 15

3pm-5pm

Various

Various

Data Science Research Discussion Panel: Tools Applications, Networking, and Collaboration

Workshop Descriptions and Materials

RShiny and PyFlask: Building Interactive Web Apps for Data. Building data dashboards accessible via the Internet is an excellent way to allow users to interface with data using tools they are already familiar with. This tutorial will teach participants how to build their own web-based data dashboards using RShiny (in R) and Flask (in Python). We will begin with building a simple example of each tool, then move to advanced applications of these packages.

NumPy, Pandas, and Scikit-Learn: Prediction with Decision Trees. Numpy and pandas are the foundation of the python data science stack: most python machine learning libraries utilize their objects and data structures, including scikit-learn. Come learn the basics of numpy and pandas, and learn how to build and train decision trees for classification.

An Introduction to Machine Learning for Materials Science: A Basic Workflow for Predicting Materials Properties. This tutorial will introduce core concepts of machine learning through the lens of a basic workflow to predict material bandgaps from material compositions. As we progress through this workflow we will highlight key steps, challenges that can come up with materials data, and potential solutions to these challenges. The core workflow we will introduce includes Data Cleaning, Feature Generation, Feature Engineering, Establishing Model Assessment, Training a Default Model, Hyperparameter Optimization, and Making Predictions. By the end of the tutorial I hope that you will have a better understanding of these core concepts, and how they can all fit together. If you want to preview the materials ahead of time you can find them on Nanohub here: https://nanohub.org/tools/intromllab

The Materials Simulation Toolkit for Machine Learning (MAST-ML): Automating Development and Evaluation of Machine Learning Models for Materials Property Prediction. This tutorial contains an introduction to the use of the Materials Simulation Toolkit for Machine Learning (MAST-ML), a python package designed to broaden and accelerate the use of machine learning and data science methods for materials property prediction. Through hands-on activities, we will use MAST-ML to (1) import materials datasets from online databases and clean and examine our input data, (2) conduct feature engineering analysis, including generation, preprocessing, and selection of features, (3) construct, evaluate and compare the performance of different model types and data splitting techniques, and (4) conduct a preliminary assessment of model error analysis and uncertainty quantification (UQ). MAST-ML Tool Github page: https://github.com/uw-cmg/MAST-ML

The Informatics Skunkworks: A Program for Undergraduate Research at the Interface of Data Science and Materials Science and Engineering.In this presentation, I will go over the new infrastructure and ecosystem that we are developing for the engagement and training of undergraduate students (UGs) in research at the interface of data science and materials science and engineering, with a focus on the use of applied machine learning (ML) in materials informatics. I will describe the resources that we have developed to lower barriers to starting research projects, including (a) curriculum to train UGs in relevant data science and materials informatics, (b) software tools that augment existing ML packages to be UG accessible, and (c) authentic and appropriate-level research problems.

Crossing Darwin and Computer Science: The Staying Power of Evolutionary Algorithms. Beyond revolutionizing our views on life and the world in which we live, Darwin's theory of evolution has been the basis and ongoing inspiration for an entire branch of machine learning. Evolutionary algorithms are frequently used to tackle some of Computer Science's most nefarious challenges'the notorious NP-complete problems'in applications as varied as mirrors designed to funnel sunlight to a solar collector, antennae designed to pick up radio signals in space, walking methods for computer figures, and optimal design of aerodynamic bodies in complex flowfields. In this tutorial, Dr. Bodily will lay out the theory behind genetic algorithms, illustrate several applied examples of genetic algorithms from his research and other real-world applications, and will involve participants in an interactive, live-coding demo to implement a genetic algorithm that can be repurposed for a variety of applications. Come prepared for a fun, engaging, rewarding learning experience!

Data Science Research Discussion Panel: Tools Applications, Networking, and Collaboration: Join us to hear our presenters share their experience with the data science tools they use in their research, their plans for future projects and grants, and how they recommend students continue growing their skills in these areas.

Presenter Bios

Randall Reese, Ph.D. Randall Reese is currently a data scientist at Idaho National Laboratory. He holds bachelor's and master's degrees in mathematics and a PhD in statistics, with an emphasis in computational statistics. Formerly from Missoula, Montana, he now resides in Idaho Falls, Idaho.

Kerby.png?fit=scale&fm=png&h=300&ixlib=php 3.3 TEMPLATE: Single EventLeslie Kerby, Ph.D., M.B.A. Leslie Kerby is the director of Computational Engineering And Data Science (CEADS). Research interests are interdisciplinary and include computational science, data science, and nuclear science and engineering.

SSteve Cutchin 1.jpg?fit=scale&fm=pjpg&h=300&ixlib=php 3.3 TEMPLATE: Single Eventteve Cutchin, Ph.D. Steve Cutchin is the director of Research Computing at Boise State, faculty in the Computer Science Department. Research interests include scientific data visualization, immersive environments, serious games.

Afflerbach.png?fit=scale&fm=png&h=277&ixlib=php 3.3 TEMPLATE: Single EventBenjamin Afflerbach. Benjamin Afflerbach is a graduate student in the Department of Materials Science and Engineering, at University of Wisconsin-Madison. His work has focused on machine learning predictions of metallic glass forming ability.

Jacobs.png?fit=scale&fm=png&h=300&ixlib=php 3.3 TEMPLATE: Single EventRyan Jacobs, Ph.D. Ryan Jacobs is a Research Scientist with the Department of Materials Science and Engineering, University of Wisconsin-Madison. His work focuses on using atomistic modeling and machine learning to understand the structure and properties of materials at the atomic scale, with a particular focus on the discovery and engineering of novel material compounds.

Mamivand.png?fit=scale&fm=png&h=300&ixlib=php 3.3 TEMPLATE: Single EventMahmood Mamivand, Ph.D. Mahmood Mamivand is an assistant professor at the Department of Mechanical and Biomedical Engineering at Boise State. Dr. Mamivand's research lies at the intersection of Computational Materials Science and Materials Informatics, with a particular focus on microstructure-mediated materials design.

Bodily.png?fit=scale&fm=png&h=300&ixlib=php 3.3 TEMPLATE: Single EventPaul Bodily, Ph.D. Paul Bodily is an assistant professor of Computer Science in the Computer Science Department and head of the Computational Creativity and Intelligence Lab (CCIL) at Idaho State University. His research addresses the question of whether or not computers, beyond possessing artificial intelligence, can exhibit autonomous creativity. His primary research interest focuses particularly on the domain of lyrical music composition and the challenge of invoking long-term structure in sequence generation.

Sara Ewing, Matty Jones, Ph.D., Conrad Kennington are data scientists at Kount and will be presenting A Day in the Life of a Kount Data Scientist, and will be using Jupyter notebooks on Google Colab to share representative workflows from their team.

Organization Committee:

Lan Li (BSU)
Leslie Kerby (ISU)
Eric Jankowski (BSU)
Steven Cutchin (BSU)
Mahmood Mamivand (BSU)
Mendi Edgar (BSU)
Lawrence Spear (BSU)
Hillary K. Fishler (CAES)