Statistics

Data science articles related with statistic. Our Data Scientist is writing this articles.

Introduction to Feature Engineering

Introduction In a modeling process, there are 3 core concepts that will always exist: Data. Features. Type of model and its corresponding parameters. From data to the model, features are a measurable representation of the data, which would be the format for the data to be processed by the model thus method to create features …

Introduction to Healthcare Data Science

Introduction to Healthcare Data Science (Overview) Healthcare analytics is the collection and analysis of data in the healthcare field to study determinants of disease in human populations, identify and mitigate risk by predicting outcomes. This post introduces  some common epidemiological study designs and an overview of the modern healthcare data analytics process. Types of Epidemiologic …

Basic time – related machine learning models

Introduction With data that have time-related information, time features can be created to possibly add more information to the models. Since how to consider time series for machine learning is a broad topic, this article only aims to introduced basic ways to create time features for those models. Type of data that is expected for …

k-Nearest Neighbors algorithms

In this blog post, I am going to introduce one of the most intuitive algorithms in the field of Supervised Learning[1], the k-Nearest Neighbors algorithm (kNN). The original k-Nearest Neighbors algorithm The kNN algorithm is very intuitive. Indeed, with the assumption that items close together in the dataset are typically similar, kNN infers the output …

Hypothesis Testing for One – Sample Mean

I. A Brief Overview Consider an example of a courtroom trial: A car company C is accused of not manufacturing environment-friendly vehicles. The average CO2 emission per car from different manufacturers based on a survey from the previous year is 120.4 grams per kilometer. But for a random batch of 100 cars produced at C’s …

Binomial Theorem

Can you expand on ? I guess you would find that is quite easy to do. You can easily find that . How about the expansion of . It is no longer easy. It is no longer easy, isn’t it. However, if we use Binomial Theorem, this expansion becomes an easy problem. Binomial Theorem is a very intriguing topic …

Monte Carlo Simulation

On a nice day 2 years ago, when I was in the financial field. My boss sent our team an email. In this email, he would like to us propose some machine learning techniques to predict stock price. So, after accepting the assignment from my manager, our team begin to research and apply some approaches …

Bayesian estimator of the Bernoulli parameter

In this post, I will explain how to calculate a Bayesian estimator. The taken example is very simple: estimate the parameter θ of a Bernoulli distribution. A random variable X which has the Bernoulli distribution is defined as with           In this case, we can write . In reality, the simplest way …

N-gram language models – Part 2

Background In part 1 of my project, I built a unigram language model: it estimates the probability of each word in a text simply based on the fraction of times the word appears in that text.   The text used to train the unigram model is the book “A Game of Thrones” by George R. R. Martin (called train). …

N-gram language models – Part 1

Background Language modeling — that is, predicting the probability of a word in a sentence — is a fundamental task in natural language processing. It is used in many NLP applications such as autocomplete, spelling correction, or text generation.   Currently, language models based on neural networks, especially transformers, are the state of the art: they predict very accurately a …