Yingfan Blog

「STOP EXISTING AND START LIVING」

Machine Learning: Ensemble Learning and Random Forest

A thorough introduction to Ensemble learning

A group of predictors is called an ensemble; thus, this technique is called Ensemble Learning, and an Ensemble Learning algorithm is called an Ensemble method. Voting classifier Hard voting class...

Machine Learning: Decision Tree

A thorough introduction to Decision Tree

Decision Trees can perform both classification and regression tasks, and even multioutput tasks. Decision Trees don’t require feature scaling or centering. Classification Training and Visual...

Machine Learning: Support Vector Machine

A thorough introduction to SVM

A Support Vector Machine (SVM) is a powerful ML model capable of performing linear or nonlinear classification, regression, and even outlier detection. It’s most significant characteristic is that ...

SQL For Interview (Strings)

A collection of string operations

This blog will cover all common string operation functions in SQL. Functions LOWER(), UPPER() 首字母大写: 1 update table set field = concat(upper(left(field, 1)),...

Machine Learning: Model Training Algorithms

A guide to different training algorithms

Linear Model Least squared method define how to measure how well the model fits data. Let us use RMSE or MSE find derivatives of MSE with respect to each parameter θ and ...

Machine Learning: Classification Problem

A guide to classification algorithm and metrics

Performance Measures Accuracy Def: the fraction of correctly predicted classes out of all predictions Accuracy is not a good measure when a dataset is skewed - some classes are much more fre...

Machine Learning: Deal with imbalanced data

A guide to handling imbalanced data in python

Imbalanced dataset is a common problem in classification problem where there is a disproportionate ratio of samples in each class of y column. This could happen in cases like spam filtering, fraud ...

Machine Learning: Types and Challenges

An overall introduction to machine learning

“Machine Learning is the science and art of programming computers so they can learn from data.” Types of Machine Learning Machine Learning systems can be classified according to the amount an...

SQL For Interview (Common Key Points)

Study notes of Liuge's blogs

Join Hive support common SQL join statement, but only support equijoin. INNER JOIN LEFT JOIN: 右边列中没有左边匹配的记录时会是NULL RIGHT JOIN FULL JOIN: 两个表中所有符合WHERE条件的记录 Group BY 常用的聚合函数有 c...

Data Analysis Methods

Study notes of Liuge's blogs

Study notes based on Liuge’s articles. 在数据分析中常会用到的方法论有: 方差分析 描述性分析 相关性分析 参数估计 幸存者偏差 辛普森悖论 RFM分析模型 AARRR模型 SWOT矩阵(uncovered) MECE分析模型(uncovered) 漏斗分析模型(uncovered) 下钻分析(维度拆...