MS in Data Science: self-paced, 10 months, under $10k

Discussion in 'IT and Computer-Related Degrees' started by Seylan, Jun 5, 2020.

Loading...
  1. Dustin

    Dustin Well-Known Member

    Started 691!

    Tonight is a live session where we get questions answered about the general process of the course. We'll be assigned a mentor by Friday, and we need to have a proposal to give them within a week. Mine is done, but I'm a tad concerned about whether I'll be able to build a good model based on the time series data I have, because I don't know how to implement time series algorithms in Python...we'll see.
     
    Maniac Craniac likes this.
  2. Johann

    Johann Well-Known Member

    Maniac Craniac and Dustin like this.
  3. danders

    danders New Member

  4. Dustin

    Dustin Well-Known Member

    I haven't been assigned a mentor yet, so no feedback on my proposal. That is frustrating since it's only a 7 week course. They've pushed the timeline for mentor assignment to Tuesday.

    There are approximately 60 students and at least 5 mentors. The mentor's role is to provide feedback on the proposal, which effectively forms a contract. If your proposal is approved, it means you've provided enough work for it to be a final project. If you complete those things, you'll pass the project. If your project changes, you need to keep your mentor in the looop to make sure that the project remains sufficiently rigorous.
     
    JoshD and Maniac Craniac like this.
  5. Dustin

    Dustin Well-Known Member

    Ah, I spoke too soon. I was just assigned my mentor and shared my proposal.
     
    JoshD, JBjunior, Johann and 1 other person like this.
  6. Dustin

    Dustin Well-Known Member

    Got feedback on my proposal. Minor revisions requested:
    • I referenced PCA without explaining why I was using it
    • I talked about scaling and normalizing data but not all of the algorithms I'm going to try required that so I need to make that more clear
    • Because this is an imbalanced classification problem, I need to add in a technique like SMOTE (https://machinelearningmastery.com/smote-oversampling-for-imbalanced-classification/)
    • I needed to add more detail to my deployment plan (more than just explaining I'm using Flask but how the user will interact it, etc.)
    Made the required updates and I should get approval in the next 2 days or so. Once the course is complete, I'll probably put my full proposal on my blog. Basically I'll be using economic development incentive data to classify grants as likely to succeed or fail and to understand what factors (from the data available) influence that success or failure. I've got data from 2 states lined up already for a proof of concept and if I build a decently predictive model I'll see if I can convince additional states to provide their data.
     
    Vicki, Graves, Johann and 2 others like this.
  7. Dustin

    Dustin Well-Known Member

    Proposal approved!
     
    cklapka, asianphd, Graves and 4 others like this.
  8. Dustin

    Dustin Well-Known Member

    Been working hard on my project. I've got my data cleaning done, implemented the SMOTE technique to remedy class imbalance, and written a function implementing the Haversine formula to calculate the distance between two lat/long points in miles. This last one is part of a feature for the distance of the business from the capital in miles, on the basis of a theory that the closer they are to the state capital (the center of power) the greater their likelihood of success.

    After Grid Search, I've determined that the Random Forest algorithm performs the best so that will be the one I'm using. Still struggling with feature selection algorithms. The one implemented in sklearn hangs on my dataset so I might need to let it go over night with a timer and see how long it takes. I'm also trying to implement a pipeline, which will be important because if I want to make web-based predictions then I need to be able to have the data scaled and one-hot encoded automatically, even on new predictions, versus the dataset that I started with.

    I'm debating whether to keep both datasets (Iowa and Connecticut.) The CT dataset actually has several other features that Iowa doesn't that I think could be useful, like whether the grant was given to a Visible Minority/Veteran/Woman business and what industry the company is - factors that might turn out to be valuable to the success of the grant.

    Getting there!
     
  9. Johann

    Johann Well-Known Member

    You GOT this, Dustin! :)
     
    Dustin likes this.
  10. Dustin

    Dustin Well-Known Member

    After another week of work I now have a simple Flask app set up. I can feed it predictions and get the results, but it doesn't include all of the data points (only the numerical.) What's next will be adding those additional data points and then returning to the idea of feature selection since I'm still struggling with it. I've got lots of functions written now, so I might try starting from scratch (keeping only my functions but not the code that uses them) and trying to build a cleaner version of what I have now to see how that affects things.
     
    Johann and Maniac Craniac like this.
  11. Johann

    Johann Well-Known Member

    Good call! I'd hire you ANYTIME, Dustin!! :)
     
    Maniac Craniac likes this.
  12. Johann

    Johann Well-Known Member

    @Dustin I have to compliment you on your knack for explaining technical processes. Several times now, you've made them real, live, understandable things - so real I can see the wheels going round - things pretty well jump from the pages. I don't think you've ever expressed an interest in being a career tech-teacher or tech-writer, but if you wanted such a career - I think you'd be fabulous at it.

    All props to you - big time! :) (You're a darn good book reviewer, too!)
     
    Dustin, JoshD and Maniac Craniac like this.
  13. Dustin

    Dustin Well-Known Member

    With a week to go I am just about done my project. I did end up starting from scratch (keeping the functions) in order to have a cleaner data cleaning process. So I've got a working model and pipeline and my prediction website is set up for numerical features. I need to add the one categorical feature (the program name, e.g. Iowa High Quality Jobs Program) to the pipeline/prediction site and then record my 30 minute project walkthrough. Home stretch!
     
    Flelmo likes this.
  14. Dustin

    Dustin Well-Known Member

    The official end of the term was yesterday at midnight ET. I submitted my capstone project at about 6pm! Hopefully I pass.

    We were required to submit three things:

    1) A file containing all the code and data we used for the project
    2) A max 30 minute video walkthrough of our project
    3) A project writeup which is a more detailed version of our project proposal

    Interestingly, because my proposal was so detailed with "what if's" my proposal ended up being longer than my final project submission. You can see an example of my proposal on this page (it describes the Form 990 project I had discussed elsewhere that I decided not to move forward with, but had already written a full proposal and done several days worth of data cleaning): http://dustinkmacdonald.com/dtsc-691-example-project-proposal-predicting-nonprofit-collapse-using-irs-data/

    In the end I included almost everything I had proposed, except that while I was able to build categorical data into my model I wasn't able to get it working with Flask (to allow you to choose county in the prediction, for example.) I'm satisfied, but not super proud of the project at this point. I definitely bit off a bit more than I could chew in the part-time that I had to complete this project, but it was a great exercise and really helped me feel more prepared.

    Assuming that I passed, and will be receiving my diploma soon, the next task will be to work on my portfolio so that my skills don't degrade.
     
  15. Dustin

    Dustin Well-Known Member

    My mentor reviewed my project yesterday. He provided very detailed feedback, stepping through my Python and identifying what's good and specific things to improve. I'm really impressed! No grade yet. Fingers crossed. Some students have been told now that they have passed, so I'm just waiting to get mine.
     
  16. Johann

    Johann Well-Known Member

    ...which was coiled and sleeping, on the the floor. :)

    I think you've GOT this one, Dustin! "Hello, Eastern? I'd like one Presidential Diploma Frame, please. Yes, a red one... "

    Waiting is SO hard.... :)
     
    Maniac Craniac and Dustin like this.
  17. Dustin

    Dustin Well-Known Member

    I passed!
     
  18. Johann

    Johann Well-Known Member

    YAY DUSTIN .... again! :)
     
  19. cklapka

    cklapka Member

    Congrats Dustin!
    Hopefully, my journey will end the same, going into my second term in May.
    Thanks again for all your hard work keeping us informed about this program and its expectations.
     
  20. cklapka

    cklapka Member

    Dustin, did you take 690 and 691 together? Or did you take them in different terms? Just curious what your thoughts are on how one should approach those two classes?
     

Share This Page