Which ML algorithm to use when?

December 7, 2018December 7, 2018Archit Vora Leave a comment

We generally find this question while starting up new project or we want to compare some algorithms to discriminate them (discrimination helps understand things better sometimes).

Although there is no definitive answer to this, I am writing here summaries from some posts. [0]

machine-learning-cheet-sheet

Factors to consider

Accuracy
- Most of people focus on accuracy but it practically it is not the only
Training time
- Naive Bayes and logistic regression are much faster than boosting or neural nets
Linearity
- LR and SVM are suitable when classes are linearly seperatble
  - Of course SVM bypasses it via kernel trick but still not as much complex decision boundary as nueral nets
- Despite the risk of non linearity in data linear algorithms tends to work well in practice and are often used as starting point
Number of parameters
- Parameters does affect training time and accuracy
- More parameters helps learning complex function, however it requires more data to prevent over-fitting
No of features
- When data point are not enough for no of features (text, NLP) SVM works well

Notes

Try out linear/logistic regression, SVM first when you most dependent variables are numeric.
SVM
- SVM suites more when no of data points are less for given no of features.
- SVM is linear classifier only. It just uses kernel trick to project linearly inseparable data on high dimension.
- SVM is solved by mathematical optimization problem unlike nueral nets. Hence tends to be bit faster.
What is the difference between LR and SVM?
- LR has linear decision boundary while SVM can have non linear decision boundary.
Reinforcement learning
- Analyses and optimized behavior of agent, (via feedback from environment)
- They try to discover different actions to maximize reward
- Trial-error and delayed reward distinguishes reinforcement learning from other ML algorithms

References

[0] : https://blogs.sas.com/content/subconsciousmusings/2017/04/12/machine-learning-algorithm-use/#prettyPhoto

[1] : https://docs.microsoft.com/en-us/azure/machine-learning/studio/algorithm-choice

On Clustering

February 1, 2018November 14, 2018Archit Vora Leave a comment

K-mean is probably most popular algorithm and most taught algorithms in academia. However it has got many limitation and listing some of them here:

You need to specify value of k
Can cluster non-clustered data
Sensitive to scale
Even on perfect data sets, it can get stuck in a local minimum
Means are continuous
Hidden assumption: SSE is worth minimizing
K-means serves more as quantification

In Hierarchical clustering you don’t need to specify values of k, you can sample any level from the tree it build either by top down or bottom up approach. Such a tree is called Dendrogram.

Scikit also supports variety of clustering algorithms including DBSCAN and lists which one suits when. http://scikit-learn.org/stable/modules/clustering.html

References:

https://stats.stackexchange.com/questions/133656/how-to-understand-the-drawbacks-of-k-means

OpenAI Gym Environment

October 19, 2017May 7, 2020Archit Vora Leave a comment

Open AI provides framework for creating environment and training on that environment. In this post I am pasting a simple notebook for a quick look up on how to use this environments and what all functions are available on environment object.

I have used environment available on github by Denny Britz and here are the references :

References :

https://github.com/dennybritz/reinforcement-learning

Learning Reinforcement Learning (with Code, Exercises and Solutions)

https://gym.openai.com/docs/

My Code : https://gist.github.com/arcarchit/2b3363e2615df7ef5c8d4941d4dfa9e8

Sorry, something went wrong. Reload?

Sorry, we cannot display this file.

Sorry, this file is invalid so it cannot be displayed.

<br /> Viewer requires iframe.<br />

view raw

gym_env.ipynb

hosted with ❤ by GitHub

Data Stories

Tag Machine Learning