Harshit Kumar

Core machine learning concepts from regression and classification to overfitting, regularization, hyperparameter tuning, and ensemble methods.

15 posts

Deep Learning
Loss vs Accuracy

The distinction between loss (cross-entropy) and accuracy in neural network training, why they can diverge and what each metric tells you.

Dec 07, 2018 · 2 min read
Data Science
Loss functions

A survey of common loss functions MSE, cross-entropy, hinge loss, with background on entropy, KL divergence, and the MLE connection.

Aug 24, 2018 · 4 min read
Data Science
Methods of Hyperparameter optimization

Comparing hyperparameter optimization strategies like grid search, random search, and Bayesian optimization with scikit-learn examples.

Aug 03, 2018 · 2 min read
Mathematics
A visual introduction to eigenvectors and eigenvalues

A geometric, visual explanation of eigenvectors and eigenvalues through linear transformations such as scaling, rotation, and shearing.

May 11, 2018 · 3 min read
Data Science
Scaling vs Normalization

The difference between feature scaling (min-max) and normalization (standardization), and when to apply each in machine learning pipelines.

Mar 23, 2018 · 5 min read
Data Science
Ensembling is the key

An overview of ensemble learning methods: bagging, random forest, boosting, and stacking, and why combining models often outperforms any single algorithm.

Mar 16, 2018 · 2 min read
Data Science
Gradient boosted trees: Better than random forest?

Comparing gradient boosted trees and random forests, their differences in training strategy, tuning requirements, and when to prefer each.

Feb 23, 2018 · 1 min read
Data Science
Data Mining: Knowledge discovery in databases

An overview of the KDD (Knowledge Discovery in Databases) process and how data mining, machine learning, and data science relate to each other.

Feb 09, 2018 · 1 min read
Data Science
The Curse of Dimensionality

Why increasing the number of features degrades kNN performance, the curse of dimensionality explained intuitively and mathematically.

Jan 26, 2018 · 3 min read
Data Science
Regularization

How regularization techniques, L1 (Lasso) and L2 (Ridge), add penalty terms to the loss function to combat overfitting in linear models.

Jan 12, 2018 · 4 min read
Data Science
Simplicity doesn't imply accuracy

Examining Occam's razor in machine learning, why simpler models aren't always more accurate and how complexity relates to overfitting.

Dec 22, 2017 · 2 min read
Data Science
Overfitting and Underfitting

Explaining overfitting and underfitting in machine learning, and how the bias-variance tradeoff helps build better-generalizing models.

Oct 13, 2017 · 2 min read
Data Science
Email spam filtering: Text analysis in R

Building and evaluating an email spam filter using text analytics and machine learning in R.

Aug 25, 2017 · 63 min read
Data Science
Moneyball: Why no prediction can't be made for baseball champion

Using logistic regression in R to explore why ML cannot reliably predict the baseball World Series champion.

Aug 04, 2017 · 27 min read
Data Science
Moneyball: How linear regression changed baseball

How Oakland A's used linear regression in R to identify undervalued players and compete despite limited budget.

Jul 28, 2017 · 17 min read