Possible time line to improve my data science skills in 1-2 years.

1. Sai
1 posts
Member since:
Sep 2014

Posted 17 Sep 2014 Link to this post

There are some excellent resources here. But, I thought, the more
this list.
My goal is to create a plan where you get to the level of average industry practitioner
Skills you need: Ability to take Excel/CSV data sets, pre-process and visualize; Build a model and Visualize the results
Recommended steps:
Internet. I am deliberately not giving a link as I want you to search
through multiple sets. Create a deck of slides describing the business
problem, ROI, current practices, their weakness etc.
Mile stone 1: Creating a business context for a problem is a crucial
step in becoming a practitioner. Congrats, you have done that! You
should spend a week for this provided you put in 20 hours a week.
2. Look at the attributes given. Brain storm whether you can create
more attributes from them. If transactions are given, you can create
average number of transaction per day, average value of transactions
etc. Think and create as many new attributes as you can.

3. From the resources provided by others, learn the techniques and
intuition behind standard data pre-processing (I mean ways in which you
fill missing values, bin neumeric variables and merge categorical
variables, scale data, dimensionality reduction etc.).

4. Use Excel/Deducer and create new data and pre-process the data.
Mile stone 2: Creating one big structured table where independent
attributes are columns and records are rows is a huge step in solving.
You should be able to do this with 4 weeks of work. Don’t forget to add a
few slides in your ppt on data pre-processing
5. Learn descriptive statistics, histogram, box plot, scatter plot and bar chart. Learn to plot these in deducer/ggplot.

6. Do detailed descriptive statistics and visualizations on the data.
There are excellent resources on this all over the net. I created a few
videos myselg (http://beyond.insofe.edu.in/cate…)
Mile stone 3: Visualizing is considered most important interfacing
step. and you are done with it. Add these to your slide deck. Allocate
two weeks for this.
6. Learn linear, logistic regression and clustering from any of the resources given in these threads.

7. Apply then on your data sets and do all diagnostics. Deducer makes it easy to do this.
Mile stone 4: Congrats! You built your predictive models. I think, you need 3 weeks for this step.
8. Brain storm and think about how you can simplify and present these
results. Goal is to present to a non-data scientist. Use your