Monte Carlo Simulation

On a nice day 2 years ago, when I was on financial field. My boss sent our team an email. In this email, he would like to us propose some machine learning techniques to predict stock price.

So, after accepting the assignment from my manager, our team begin to research and apply some approaches for prediction. When we talk about Machine Learning, we often think of supervised and unsupervised learning. But one of the algorithms we applied is one that we usually forgotten  however equally highly effective algorithm: Monte Carlo Simulation.

What is Monte Carlo simulation

Monte Carlo method is a technique that uses random numbers and probability to solve complex problems. The Monte Carlo simulation, or probability simulation, is a technique used to understand the impact of risk and uncertainty in financial sectors, project management, costs, and other forecasting machine learning models.[1]

Now let’s jump into python implementation to see how it applies,

Python Implementation

In this task we used data of DXG stock dataset from 2017/01/01 to 2018/08/24 and we would like to know what is stock price after 10 days, 1 months and 3 months, respectively

Monte Carlo Simulation

We will simulate the return of stock and next price will be calculated by

P(t) = P(0) * (1+return_simulate(t))

Calculate mean and standard deviation of stock returns

miu = np.mean(stock_returns, axis=0)
dev = np.std(stock_returns)

Simulation process

 

simulation_df = pd.DataFrame()
last_price = init_price
for x in range(mc_rep):
    count = 0
    daily_vol = dev
    price_series = []
    price = last_price * (1 + np.random.normal(miu, daily_vol))
    price_series.append(price)
    for y in range(train_days):
        if count == train_days-1:
            break
        price = price_series[count] * (1 + np.random.normal(miu, daily_vol))
        price_series.append(price)
        count += 1
    simulation_df[x] = price_series

Visualization Monte Carlo Simulation

fig = plt.figure()
fig.suptitle('Monte Carlo Simulation')
plt.plot(simulation_df)
plt.axhline(y = last_price, color = 'r', linestyle = '-')
plt.xlabel('Day')
plt.ylabel('Price')
plt.show()

Monte Carlo Simulation

Now, let’s check with actual stock price after 10 days, 1 month and 3 months

plt.hist(simulation_df.iloc[9,:],bins=15,label ='histogram')
plt.axvline(x = test_simulate.iloc[10], color = 'r', linestyle = '-',label ='Price at 10th')
plt.legend()
plt.title('Histogram simulation and last price of 10th day')
plt.show()

Monte Carlo Simulation

We can see the most frequent occurrence price is pretty close with the actual price after 10th

If the forecast period is longer, the results is not good gradually

Simulation for next 1 month

Monte Carlo Simulation

After 3 months

Monte Carlo Simulation

Conclusion

Monte Carlo simulation is used a lot in finance, although it has some weaknesses, but hopefully through this article, you will have a new look on the simulation application for forecasting.

Reference

[1] Pratik Shukla, Roberto Iriondo, “Monte Carlo Simulation An In-depth Tutorial with Python”, medium, https://medium.com/towards-artificial-intelligence/monte-carlo-simulation-an-in-depth-tutorial-with-python-bcf6eb7856c8

Please also check Gaussian Samples and N-gram language models,
Bayesian Statistics  for statistics knowledges.

 

Hiring Data Scientist / Engineer

We are looking for Data Scientist and Engineer.
Please check our Career Page.

Data Science Project

Please check about experiences for Data Science Project

Vietnam AI / Data Science Lab

Vietnam AI Lab

Please also visit Vietnam AI Lab

Hiring- Data Scientist (Algorithm Theory)

 

Job Title Data Scientist
(Algorithm Theory)
Location Ho Chi Minh
Contact recruitment  @  mti-tech.vn Employment Fulltime
Level Middle/Senior Report to Line Manager

If you want to join in exciting and challenging projects, MTI Tech could be the next destination for your career.

 MTI Technology specializes in creating smart mobile contents and services that transform and transcend customers’ lives. We design and develop our products using agile methods bringing the best deliverable results to the table in the shortest amount of time. MTI stands for an attitude: seeking a balance in excellence, pragmatism and convenience for customers. With the original members of 20 people, we grow our members up to more than 100 bright talents and continue to grow more. Looking for a place to grow your talents and be awesome? This is the place!

The Job

We are looking for Data Scientists who would like to participate in the project to use existing various data to apply to AI, moreover, combine with other data to create new value.

Currently, we are looking for candidates with experiences in Algorithms, Natural Language Processing (NLP), but any other fields of AI will be considered too.

Example of data

  • Data of medical examination results, medical questionnaire results or image of them in health check, or general medical examination.
  • Data of athletes’ training results and vital.
  • Pregnancy activities data of pregnant women.
  • Life data such as weather and navigation.
  • Text data such as magagine.
    We have many project style for outsourcing/offshoring, research etc.

Example of application

  • By combining and analyzing Healthcare data such as medical examination/ medical questionnaire results and labor data such as mental health check and overtime records etc., we can find out future health risks at an earlier stage.

Programming Language

  • Python, R, MATLAB, SPSS
  • Java, JavaScript, Golang, Haskell, Erlang/Elixir is a plus.

Currently, development is mainly in Python. It is good to understand object thinking programming in Java etc. It is also good if you have parallel processing experience in the server-side language (Golang, etc.).

In addition, engineers who can use functional languages (Haskell, Erlang / Elixir) are treasures of talented people. Such people are interested in various programming languages, have mathematical curiosity, and many of them study by themselves. Although we do not have many opportunities to use these languages in actual development, we welcome such engineers as well.

Operational Environment

  • AWS, Google Cloud Platform, Microsoft AZURE, Redmine, GitHub etc.

 Who we are looking for

Recently, in the development of Web services, engineers who have experienced APIs usage and libraries related to prominent AI such as open source, Google, IBM etc. from the standpoint of “User”, are often classified as “AI Engineers”. What MTI Group seeks is not such a technician, we find an experienced person who has learned the data sciences themselves deeply. On the other hand, if a person who has learned data science, and has not much experience in actual work, we still consider.

  • Have experiences in research and study related to Algorithm Theory such as Discrete mathematics, Search Algorithm. Top priority skill.

  • Have experiences in research and study about mathematics.

  • Have a great ambition and ability to study the most leading-edge research by yourself and apply them to your own development.
  • Have technical skills and creativity to build new technologies from scratch by yourself if it is necessary but does not exist yet.
  • Adapt yourself to our working culture in a team such as discussion or sharing together. Personality is preferred. Excellent person has a variety of personalities. However, being able to work only on your own becomes a problem.
  • Have experiences in research and study related to Statistical Mathematic such as Regression analysis, SVM or Information theory.
  • Have taken part in research/business about AI, Machine Learning, Natural language processing (NLP), Neural network and so on.
  • Have experiences in research and study related to Engineering and Science, Econometrics, Behavior Psychology, Medical Statistic and so on.
  • Have working experiences in Statistical Analysis or Data Scientist.
  • In AI development, Trial & Error repeats many times to solve the problem with unclear specification or unfixed answer (result). For this reason, we are looking for the individual with the following requirements:
    • Be agile in the cycle of Trial & Error (speed of use your thought in code)
    • Be concerned with even small issues/ problems and solve all problems efficiently by your logical thought.
    • Be curious about knowledge. The person who is greatly interested in and curious about knowledge surely grows the most.
  • Have deep experiences in research.
  • English skill: Be able to use your English reading skill to gain information related to AI.

More on MTI – what is it like to work in MTI?

At MTI Technology, our goal is to empower every individual to learn, discover, be able to communicate openly and honestly to create the best services based on effective teamwork.

Bayesian estimator of the Bernoulli parameter

In this post, I will explain how to calculate a Bayesian estimator. The taken example is very simple: estimate the parameter θ of a Bernoulli distribution.

A random variable X which has the Bernoulli distribution is defined as

Bayesian statistics

with        

Bayesian statistics 

In this case, we can write

Bayesian statistics.

In reality, the simplest way to eatimate θ is to sample X, count how many time the event occurs, then estimate the probability of occuring of event. This is exactly what the frequestists do.

In this post, I will show how do the Bayesian statisticians estimate θ. Although this doesn’t have a meaningful application, but it helps to understand how do the Bayesian statistics work. Let’s start.

The posterior distribution of θ

Denote Y as the observation of the event. Given the parameter θ, if we sample the event n time, then the probability that the event occurs k time is (this is called the probability density function of Bernoulli )

Bayesian statistics

In Bayesian statistics, we would like to calculate

Bayesian statistics

By using the Bayesian formula, we have

$$p(\theta\ |\ y) = \frac{p(y\ |\ \theta) \ p(\theta)}{p(y)}=\frac{\theta^k\ (1-\theta)^{n-k}\ p(\theta)}{p(y)}$$

With the prior distribution of theta as an Uniform distribution, p(θ) = 1, and it is easy to prove that

$$p(y)=\frac{\Gamma(k+1)\ \Gamma(n-k+1)}{\Gamma(n+2)}$$

where Γ is the Gamma distribution. Hence, the posterior distribution is

$$p(\theta\ |\ y_1, \ldots, y_{n}) = \frac{\Gamma(n+2)}{\Gamma(k+1)\ \Gamma(n-k+1)}\theta^{k}(1-\theta)^{n-k}$$

Fortunately, this is the density function of the Beta distribution: $Beta(a=k+1, b=n-k+1)$

We use the following properties for evaluating the posterior mean and variance of theta.

If $X \sim Beta(a,b)$, then   $$E(X) = \frac{a}{a+b} \quad \textrm{and} \quad Var(X) = \frac{ab}{(a+b+1)(a+b)^2}$$

Simulation

In summary, the Bayesian estimator of theta is the Beta distribution with the  mean and variance as above. Here is the Python codes for simulating data and estimating theta

def bayes_estimator_bernoulli(data, a_prior=1, b_prior=1, alpha=0.05):
    '''Input:
    data is a numpy array with binary value, which has the distribution B(1,theta)    a_prior, b_prior: parameters of prior distribution Beta(a_prior, b_prior)    alpha: significant level of the posterior confidence interval for parameter    Model:
    for estimating the parameter theta of a Bernoulli distribution    the prior distribution for theta is Beta(1,1)=Uniform[0,1]    Output: 
    a,b: two parameters of the posterior distribution Beta(a,b)
    pos_mean: posterior estimation for the mean of theta
    pos_var: posterior estimation for the var of theta'''
    n = len(data)
    k = sum(data)
    a = k+1
    b = n-k+1
    pos_mean = 1.*a/(a+b)
    pos_var = 1.*(a*b)/((a+b+1)*(a+b)**2)
    ## Posterior Confidence Interval
    theta_inf, theta_sup = beta.interval(1-alpha,a,b)
    print('Prior distribution: Beta(%3d, %3d)' %(a_prior,b_prior))
    print('Number of trials: %d, number of successes: %d' %(n,k))
    print('Posterior distribution: Beta(%3d,%3d)' %(a,b))
    print('Posterior mean: %5.4f' %pos_mean)
    print('Posterior variance: %5.8f' %pos_var)
    print('Posterior std: %5.8f' %(np.sqrt(pos_var)))
    print('Posterior Confidence Interval (%2.2f): [%5.4f, %5.4f]' %(1-alpha, theta_inf, theta_sup))
    return a, b, pos_mean, pos_var

# Example
n = 129 # sample size
data = np.random.binomial(size=n, n=1, p=0.6)
a, b, pos_mean, pos_var = bayes_estimator_bernoulli(data)

And the result is

Prior distribution: Beta(  1,   1)
Number of trials: 129, number of successes: 76
Posterior distribution: Beta( 77, 54)
Posterior mean: 0.5878
Posterior variance: 0.00183556
Posterior std: 0.04284341
Posterior Confidence Interval (0.95): [0.5027, 0.6703]
In the simulation, we simulated 129 data from the Bernoulli distribution with θ=0.6. And the Bayesian estimation of θ is the posterior mean which is 0.5878.
This is a very simple example of Bayesian estimation. In reality, it is usually tricky to determinate a closed form solution of the posterior distribution from the given prior distribution. In that case, Monte Carlo technique is one of solutions to approximate the posterior distribution.
Please also check Gaussian Samples and N-gram language models for statistics knowledges.

Hiring Data Scientist / Engineer

We are looking for Data Scientist and Engineer.
Please check our Career Page.

Data Science Project

Please check about experiences for Data Science Project

Vietnam AI / Data Science Lab

Vietnam AI Lab

Please also visit Vietnam AI Lab