Telerik Forums RSShttp://www.telerik.com/urn:uuid:91091a5a-1652-4863-8265-0e091507b13ahttp://www.telerik.com/forums/permalink/WhoJkVIWY0iCZQ4JFQexOgPossible time line to improve my data science skills in 1-2 years.There are some excellent resources here. But, I thought, the more <br />
helpful approach might be a plan and hence am adding one more answer to <br />
this list.<br />
My goal is to create a plan where you get to the level of average industry practitioner<br />
Skills you need: Ability to take Excel/CSV data sets, pre-process and visualize; Build a model and Visualize the results<br />
Recommended steps:<br />
1. Download one data set from Kaggle/UCI or anywhere from the <br />
Internet. I am deliberately not giving a link as I want you to search <br />
through multiple sets. Create a deck of slides describing the business <br />
problem, ROI, current practices, their weakness etc.<br />
Mile stone 1: Creating a business context for a problem is a crucial <br />
step in becoming a practitioner. Congrats, you have done that! You <br />
should spend a week for this provided you put in 20 hours a week.<br />
2. Look at the attributes given. Brain storm whether you can create <br />
more attributes from them. If transactions are given, you can create <br />
average number of transaction per day, average value of transactions <br />
etc. Think and create as many new attributes as you can.<br />
<br />
2. Download R, Deducer (my preference). They both are open source.<br />
<br />
3. From the resources provided by others, learn the techniques and <br />
intuition behind standard data pre-processing (I mean ways in which you <br />
fill missing values, bin neumeric variables and merge categorical <br />
variables, scale data, dimensionality reduction etc.).<br />
<br />
4. Use Excel/Deducer and create new data and pre-process the data.<br />
Mile stone 2: Creating one big structured table where independent <br />
attributes are columns and records are rows is a huge step in solving. <br />
You should be able to do this with 4 weeks of work. Don’t forget to add a<br />
few slides in your ppt on data pre-processing<br />
5. Learn descriptive statistics, histogram, box plot, scatter plot and bar chart. Learn to plot these in deducer/ggplot.<br />
<br />
6. Do detailed descriptive statistics and visualizations on the data. <br />
There are excellent resources on this all over the net. I created a few <br />
videos myselg (http://beyond.insofe.edu.in/cate…)<br />
Mile stone 3: Visualizing is considered most important interfacing <br />
step. and you are done with it. Add these to your slide deck. Allocate <br />
two weeks for this.<br />
6. Learn linear, logistic regression and clustering from any of the resources given in these threads.<br />
<br />
7. Apply then on your data sets and do all diagnostics. Deducer makes it easy to do this.<br />
Mile stone 4: Congrats! You built your predictive models. I think, you need 3 weeks for this step.<br />
8. Brain storm and think about how you can simplify and present these<br />
results. Goal is to present to a non-data scientist. Use your <br />
visualization skills again. Add these slides to your deck.<br />
Milestone 5: Take a week or two for this.<br />
You have created a slide deck, some code and knowledge base. Nore <br />
importantly, you solved a problem end-to-end. Viola, in approximately 12<br />
weeks you are where 90% of data scientists are <br />
Now, to get to a higher level<br />
Add more algorithms (decision trees, neural nets etc.). Learn more <br />
domains and problems. Study techniques to solve unstructured data. There<br />
are wonderful courses in the thread. Take them slowly.<br />
Hope this helps.<br />