MS in Data Science: self-paced, 10 months, under $10k

Discussion in 'IT and Computer-Related Degrees' started by Seylan, Jun 5, 2020.

Loading...
  1. Dustin

    Dustin Well-Known Member

    Got my final grade for the third course (the final project was graded by hand), another A. My 4.0 remains intact.
     
    SteveFoerster and manuel like this.
  2. Dustin

    Dustin Well-Known Member

    Completed DTSC-575 Principles of Python Programming, which is an extension of the first course (520) and includes more focused programming assignments. I dual-enrolled in this course and DTSC-660 (the databases one). 575 is 40% exams and 60% coding assignments.

    Of the 23 or so coding assignments, I only struggled with one in particular because the instructions were confusing and I only passed 1 of the 3 unit tests (so I got a 3.33/10 on that assignment which knocked my grade down a couple points.) Even still, because of my exam scores I'll still finish with a 93+ so my 4.0 remains intact. Some of them definitely took some thinking about, in particular an assignment involving modulus and another one involving the quadratic formula.

    I've got about a month to finish this database course (I've got 3 modules left plus some assignments), but luckily I'm going to have a week of vacation coming up shortly where I'll be able to focus fully on that class.
     
    SteveFoerster likes this.
  3. Dustin

    Dustin Well-Known Member

    This database course has been kicking my butt. I finally got on vacation and finished the Module 2 quiz with an 85 (you need above an 80 to unlock each quiz.) I also completed Assignments 1 and 2. Assignment 1 you create an ER diagram and a linear database schema for a fictional database. Assignment 2 you answer some questions on another database and create a second linear schema.

    I started in on the 8 hours of video for Module 3 which is all SQL. I know basic SQL so it's mostly been refreshing myself on the exact syntax so that's gone quicker. I've got 90 minutes of video left, and then I need to do the Module 3 quiz. In contrast to Modules 1 and 2 which were mostly the Professor reading from some papers, I've been really happy to see Module 3 is more visual with the Professor sharing his screen while he works through the SQL.

    Module 4 is 3 hours of video on intermediate/advanced SQL which I'll do tomorrow and then tackle the Module 4 quiz. At the end of Module 4 is Assignment 4, a rather complex SQL statement just to confirm we've understood all of the processes.

    Module 5 is unknown amount of video but I'm hoping it's not too long. It's on functions, triggers and procedures. At the end of Module 5 is Assignment 4 which is supposed to be the longest one in the program.

    Module 6 is about an hour of video on Git, and an associated quiz.

    My goal is to get all the lecture material seen and the tests done by the end of this week so that I can have a week to finish up the assignments.
     
    SteveFoerster likes this.
  4. Dustin

    Dustin Well-Known Member

    Phew. Finished Module 3 (Basic SQL) and quiz/exam, Module 4 (Intermediate SQL) and quiz/exam, Module 5 (Git/Github) and quiz/exam. Currently I have an 85 in the course. I've submitted Assignment 3 which was a simple SQL outer join worth 3% of the grade. I've completed Assignments 1 and 2 but not submitted them yet (they're worth a combined 21%.) I'm waiting for a Thursday group call with the Professor to find out if there any tips I need to know.

    Otherwise, I just need to complete Assignment 4 worth 20% and cross my fingers that I pass this course.
     
    SteveFoerster likes this.
  5. Dustin

    Dustin Well-Known Member

    Okay! This course (DTSC-660) is done. I just need to clean up my assignments 1, 2 and 4 and submit them tonight. Looks like I'll get an actual weekend to myself on this vacation, then I turn to my Quantic marketing project which is due August 25.

    It turns out there's a course resource document that I just learned existed today (thanks Discord!) so I'll be going through the test cases included to make sure my SQL is correct before I hand in all the assignments.
     
    SteveFoerster likes this.
  6. Dustin

    Dustin Well-Known Member

    My assignments got marked. I got in the 70s for assignment 1 and 2 (which I was expecting) and perfect on assignment 4. That leaves me with an 88. I can possibly make it to a 93 (A+/4.0) if I can ace the repeatable tests but some of those are just super challenging (long fill-in-the-blanks where if you get one part of it wrong you get a zero for the whole question) so we'll see. I've got 9 days.

    After that, there's a 2 week break before the next term begins which I'll be using to prepare for the foundations of machine learning course, DTSC-670.
     
    chrisjm18 and SteveFoerster like this.
  7. Dustin

    Dustin Well-Known Member

    I started DTSC-670 Fundamentals of Machine Learning yesterday. I'm really excited to get into it.

    Module 1: The Machine Learning Landscape
    Module 2: End-to-End Machine Learning
    Module 3: Classification
    Module 4: Training Models
    Module 5: Support Vector Machines
    Module 6: Unsupervised Learning Techniques
     
    asianphd likes this.
  8. asianphd

    asianphd Active Member

    I am always annoyed with the imbalanced dataset if working with classification :D
     
    Dustin likes this.
  9. Dustin

    Dustin Well-Known Member

    Module 1 includes 2 assignments. Assignment 1 is worth 4%, Assignment 2 is worth 20%. I've completed the Module 1 quiz (5% of the grade), which focuses on the basics of machine learning. Mostly conceptual things that show up in the first 2 chapters of the book on topics like supervised/semi-supervised/unsupervised/reinforcement/batch/online/instance-based/model-based learning.

    Assignment 1 (4% of the grade ) is a simple regression model to predict whether a boy likes pies based on a small dataset of previous pie data. I completed that one and sent it off. Very nervous about how much of this course is based on these assignments where you only get one shot at at it. Sometimes, a missing hashtag (code comment) or apostrophe can break your whole notebook. But now that I no longer have a 4.0 (I'm at a 3.8 right now) I feel less pressure.

    Assignment 2 (20% of the grade) is quite a bit more involved. We're taking a raw dataset used in an actual paper (https://linkinghub.elsevier.com/retrieve/pii/S0048969720323792) and doing the data-wrangling to get it ready for analysis. I'll likely chip away at this one tomorrow and Friday and then spend most of Saturday and Sunday getting it done.

    Assignments are worth 85% of this course's grade. Quizzes are worth 15%.
     
  10. Dustin

    Dustin Well-Known Member

    Assignment 1 is complete, I got 100%. Now on to Assignment 2, which involves data-cleaning of a dataset that's 800,000 rows, to extract the ~4100 rows needed and perform some basic transformations on them. This one is a doozy.
     
    SteveFoerster likes this.
  11. Dustin

    Dustin Well-Known Member

    Assignment 2 complete. The hardest part of this assignment was doing various merges and joins to get the data from across different spreadsheets into the format that we needed. I spent all day yesterday on one part of the assignment and got nowhere, and then when I woke up today I had an idea for a different approach. That particular technique (joining like you would in a SQL database) wasn't the answer, but a similar approach (using pandas' merge function) turned out to be.

    Now onto Module 2. This module comes right from the textbook and walks through an end-to-end machine learning project to predict California housing prices from a dataset. One thing that surprised me, the textbook notes that the 20,000 rows and 10 features in the dataset (200,000 items) was fairly small by machine learning standards. That's how I learned that the 200 rows and 15 columns of economic development data I had (3000 items) was going to woefully inadequate for my capstone project.
     
    SteveFoerster likes this.
  12. Dustin

    Dustin Well-Known Member

    Module 2 videos are complete. I got my grades back for Assignments 1 and 2. Perfect on assignment 1, lost a couple points on assignment 2 so I'm sitting at a 95 overall.

    Assignment 3 involves graphing a linear regression in 3D, to build our plotting and also because later in the course we'll actually be plotting the results of the different machine learning algorithms to help us understand how they work. We're given a few images of plots based on some data with some noise inserted, then we have to run a regression and plot the line of best fit, with the goal of "recovering" the original model parameters that were used to generate the original plots.

    Finished it today, though I spent a ridiculous amount of time refreshing myself on matplotlib, the most popular Python graphing library.

    Assignment 4 involves creating a custom transformer. Because data science is heavily focused on cleaning the data so that it can be piped into a model, assignment 4 has us take a dataset and create a pipeline where data is:

    1. Imputed (so that missing values in a column are replaced by the mean of that column)
    2. Put into a transformer which will add and delete columns based on some given criteria
    3. Put through a scaler which centers data (by subtracting the mean from it so that the mean = 0) and scales it (so that the standard deviation is 1.) This apparently helps ensure machine learning algorithms treat the data well.
    Assignment 4 looks like it will be some work. I think I've done enough Python for now, though.
     
    Graves, asianphd and SteveFoerster like this.
  13. Dustin

    Dustin Well-Known Member

    I've completed Assignment 4 (the custom transformer) but I must be doing something wrong because my values are off for several of the columns. I decided to move to Assignment 5 before I revisit Assignment 4. Assignment 4 is only worth 10% so I'm not too worried, but because the code is reused frequently I don't want to just leave it unknown.

    Assignment 5 is worth 4%. It's a short (<650 words) essay about the design principles of scikit-learn, the data science library. I went ahead and submitted that.
     
  14. Dustin

    Dustin Well-Known Member

    Started on Module 3, which is on Classification. We will be training a Binary Classifier to recognize digits from the MNIST database and also reviewing the other fundamental concepts of evaluating machine learning models:
    • Performance Measures
    • Confusion Matrices
    • Accuracy and Error Rate
    • Precision and Recall
    • ROC Curves
    • Error Analysis
     
    Graves likes this.
  15. Dustin

    Dustin Well-Known Member

    Finished Module 3 and submitted Assignment 6. The assignment was a real bear. We hand-coded our own accuracy, recall, precision, etc. functions and plotted an ROC curve so that we could understand how they compare to the ones provided by scikit-learn.
     
  16. Dustin

    Dustin Well-Known Member

    Been a while since my last update. I got mighty stuck on the end of Assignment 7 which was at the end of Module 4.

    Assignment 7 has us use a polynomial transformer and linear regressor, basically to allow us to use a linear regression model to fit non-linear data and predict from it. Unfortunately it was dummy data so I didn't get as much out of that as I would have liked because the predictions themselves had no value outside of us practicing the exercise. Assignment 8 was similar (almost identical) but it had us adjust the hyperparameters of the polynomial regressor to see how that improved the fit of the model.

    Assignment 9 was pretty basic, we had to hand-calculate the Mean Squared Error (MSE) of a prediction model.

    Module 5 is on Support Vector Machines (SVM) which can be used for classification, regression and outlier detection. We had a quiz but no assignment. Module 6 is on unsupervised learning techniques. I need to finish that tonight. The course ends 11:59 PM ET on Sunday, so I'm cutting it close. I currently have a 94 in the course.
     
  17. Dustin

    Dustin Well-Known Member

    Starting 680 Applied Machine Learning today. Looking at the syllabus, it includes decision trees, Random Forests, dimensionality reduction, K-Nearest Neighbor, and neural networks.
     
  18. Dustin

    Dustin Well-Known Member

    Assignment 1 complete but not submitted yet. We built a simple decision tree and visualized it. These are cool tools because they're "white box", where you can see how the model made its decisions and boil that down to something a senior leader can actually read. Pretty cool.
     
  19. addision

    addision Member

    Could you tell me how many courses you have completed so far? When do you feel you will look to finish?
     

Share This Page