Possible time line to improve my data science skills in 1-2 years.

1 posts, 0 answers
  1. Sai
    Sai avatar
    1 posts
    Member since:
    Sep 2014

    Posted 17 Sep 2014 Link to this post

    There are some excellent resources here. But, I thought, the more
    helpful approach might be a plan and hence am adding one more answer to
    this list.
    My goal is to create a plan where you get to the level of average industry practitioner
    Skills you need: Ability to take Excel/CSV data sets, pre-process and visualize; Build a model and Visualize the results
    Recommended steps:
    1. Download one data set from Kaggle/UCI or anywhere from the
    Internet. I am deliberately not giving a link as I want you to search
    through multiple sets. Create a deck of slides describing the business
    problem, ROI, current practices, their weakness etc.
    Mile stone 1: Creating a business context for a problem is a crucial
    step in becoming a practitioner. Congrats, you have done that! You
    should spend a week for this provided you put in 20 hours a week.
    2. Look at the attributes given. Brain storm whether you can create
    more attributes from them. If transactions are given, you can create
    average number of transaction per day, average value of transactions
    etc. Think and create as many new attributes as you can.

    2. Download R, Deducer (my preference). They both are open source.

    3. From the resources provided by others, learn the techniques and
    intuition behind standard data pre-processing (I mean ways in which you
    fill missing values, bin neumeric variables and merge categorical
    variables, scale data, dimensionality reduction etc.).

    4. Use Excel/Deducer and create new data and pre-process the data.
    Mile stone 2: Creating one big structured table where independent
    attributes are columns and records are rows is a huge step in solving.
    You should be able to do this with 4 weeks of work. Don’t forget to add a
    few slides in your ppt on data pre-processing
    5. Learn descriptive statistics, histogram, box plot, scatter plot and bar chart. Learn to plot these in deducer/ggplot.

    6. Do detailed descriptive statistics and visualizations on the data.
    There are excellent resources on this all over the net. I created a few
    videos myselg (http://beyond.insofe.edu.in/cate…)
    Mile stone 3: Visualizing is considered most important interfacing
    step. and you are done with it. Add these to your slide deck. Allocate
    two weeks for this.
    6. Learn linear, logistic regression and clustering from any of the resources given in these threads.

    7. Apply then on your data sets and do all diagnostics. Deducer makes it easy to do this.
    Mile stone 4: Congrats! You built your predictive models. I think, you need 3 weeks for this step.
    8. Brain storm and think about how you can simplify and present these
    results. Goal is to present to a non-data scientist. Use your
    visualization skills again. Add these slides to your deck.
    Milestone 5: Take a week or two for this.
    You have created a slide deck, some code and knowledge base. Nore
    importantly, you solved a problem end-to-end. Viola, in approximately 12
    weeks you are where 90% of data scientists are
    Now, to get to a higher level
    Add more algorithms (decision trees, neural nets etc.). Learn more
    domains and problems. Study techniques to solve unstructured data. There
    are wonderful courses in the thread. Take them slowly.
    Hope this helps.
Back to Top