About ML101

Course description

Goal

This course covers the fundamentals of the field, including supervised and unsupervised learning algorithms, regression, classification, and clustering. The course may also cover topics such as model evaluation, feature selection, and regularization.

In a supervised learning setting, students learn about linear regression and logistic regression, as well as more complex algorithms such as Naive Bayes, decision trees, random forests, and kNN. They learn how to train models on a labeled dataset and make predictions on new data.

In an unsupervised learning setting, students learn about clustering algorithms such as k-means and Apriori. They learn how to extract meaningful structure from unlabeled data.

The course may also cover advanced topics such as natural language processing. Students learn how to implement and use these algorithms in R.

Throughout the course, students work on practical projects and assignments to apply the concepts they have learned. By the end of the course, students should have a solid understanding of the basics of machine learning and be able to apply these concepts to real-world problems.

Syllabus


  • Week 1: Introduction to Data Science and R

  • Week 2: About ML & Modelling

  • Week 3: Public Holiday


PART I: Classification

  • Week 4: Decision Tree

  • Week 5: Random Forest

  • Week 6: Naive Bayes

  • Week 7: kNN

  • Week 8: QZ #1


PART II: Regression

  • Week 9: Linear regression

  • Week 10: Non-linear regression


PART III: Unsupervised Learning

  • Week 11: Clustering

  • Week 12: Apriori


PART IV: Model Improvement

  • Week 13: Performance Evaluation

  • Week 14: Wrap-up

  • Week 15: QZ #2

  • Week 16: Project Presentation

Weekly Design

Week Date Pre-class Class PBL Note
1 09/03/2024 Course intro Participate Ars Electronica
<Recorded Lecture>
2 09/10/2024

Install R & R Studio

About ML & Modelling

Practice
3 09/17/2024 Thanks giving holiday
4 09/24/2024

Classification

  • Decision Tree
Practice Problem description
5 10/01/2024
  • Random Forest
Practice Data introduction
6 10/08/2024
  • Naive Bayes
Practice Team arrangement
7 10/15/2024
  • kNN
Practice Team meeting #1
8 10/22/2024

Regression

  • Linear regression
Practice Team meeting #2
9 10/29/2024 QZ #1 Team meeting #3
10 11/05/2024
  • Non-linear regression
Practice Team meeting #4
11 11/12/2024

Unsupervised learning

  • Clustering
Practice Team meeting #5
12 11/19/2024
  • Apriori
Practice Team meeting #6
13 11/26/2024 Model improvement Practice Team meeting #7
14 12/03/2024 Text mining & other skills Practice Team consulting #1
15 12/10/2024 QZ #2 Team consulting #2
16 12/17/2024 Proj Report Project Presentation


Course management


  • Lecturer: Changjun Lee (Associate Professor in SKKU School of Convergence)

    • changjunlee@skku.edu
  • TA: Haeyoon LEE (Ph.D. Student, SKKU Interaction Science)

    • haileysunny@naver.com
  • Time:

    • (1h): Flipped learning content

    • (2h): Tue 09:00 ~ 10:50

  • Location: [70527] 중앙학술정보관 Active Learning Classroom(70527)


Class consists of Pre-class, Class, and PBL project

  • Pre-class

    • Students will be required to watch the lecturer’s recorded lecture (or other given videos) before the off-line (or online streaming ZOOM) class and learn themselves

    • Video is about the concept of the ML algorithms

    • (Sometimes) Students are required to submit Discussions to check the level of their understanding

  • Class

    • Lecturer summarize the pre-class lecture and explain more details

      • Ask students about the pre-class content to check whether they learned themselves

      • OK to answer incorrectly, but if you cannot answer at all, it will be reflected in your pre-class discussion score.

    • Students will practice with the advanced code

    • A Quiz will be in the class to check the level of understanding

  • PBL project

    • Students organize teams that meet several conditions.

      • 4~5 members in a team

      • Background diversity: no homogeneous majors in a team

      • Exception: Allowed if persuasion is possible for sufficient reasons

    • Data will be given. Teams are going to choose the data they want to explore considering their interest

    • Teams can offer a zoom meeting with lecturer if they need


Final outputs (An example not limited)

  • Data Preparing (or Collecting)

  • Explore data (Descriptive stats)

  • Set your hypothesis (or research questions)

  • Modeling

  • Scoring the models

  • Expanding your findings to implications


Textbooks for the course

  • R4DS: R for Data Science (written by Hadley Wickham and Garrett Grolemund)
    • is an excellent resource for learning data science using R, covering data manipulation, visualization, and modeling with R. The book is available as a free online resource.
  • RC2E: R Cookbook (written by JD Long and Paul Teetor)
    • is a comprehensive resource for data scientists, statisticians, and programmers who want to explore the capabilities of R programming for data analysis and visualization.
  • RGC: R Graphic Cookbook (written by Winston Chang)
    • is a practical guide that provides more than 150 recipes to help you generate high-quality graphs quickly, without having to comb through all the details of R’s graphing systems
  • MDR: Statistical Inference via Data Science (Modern Dive) (written by Chester Ismay and Albert Y. Kim)
    • is a comprehensive textbook that provides an accessible and hands-on approach to learning the fundamental concepts of statistical inference and data analysis using the R programming language.
  • ISR: Introductory Statistics with R (written by Peter Dalgaard)
    • is a great resource for learning basic statistics with a focus on R programming. This book covers a wide range of statistical concepts, from descriptive statistic


Grade

  • Attendance & Participation (20 %)

  • QZs (40 %)

  • Project (40 %)


Communication

  • Notices & Questions

    • Please join Kakao open-chat room

      • https://open.kakao.com/o/gli0lhDg

      • When you enter, please make sure to enter your name as it is on the attendance sheet. (입장하셔서 이름을 꼭 출석부에 있는 이름으로 설정해주세요.)

  • Personal counsel (Scholarship, recommendation letter, etc.)