Radar Image Compressing by Huffman Algorithm

In AI Lab, one of our projects is “Weather Forecasting”, in which we process the data from weather radars, and perform prediction using nowcasting techniques, i.e.  forecasting in a short-time period like 60 minutes.

To develop forecasting model and do evaluation, we need to run many test-cases. Each test-case corresponds to one rainfall event, whose duration can be several hours. In the rainy season, one rainfall event cans last for several days.

Given one event and one interested area (80km2 of Tokyo city for example). In our actual work, a list of arrays of rainfall intensity needs to be stored in machine. Each array corresponds to one minute of observation by radar. If the event lasts for 5 days, then the number of arrays is

60 minutes x 24 hours x 5 days = 7200 arrays

Therefore, storing a such amount of data costs a lot of memory.

One of solution is to modify the way feeding data to the evaluation modul. However, for the scope of this blog, let assume that we would like to keep that amount of data in computer. How do we do to compress data, and how much of data can be compressed?

The aim of this post is to introduce the Huffman algorithm, one of the basic methods for data compressing. By learning about the Huffman algorithm, we can easily understand the mechanism behind some advanced compressing methods.

Let us start with a question: how does a computer store data?

Data in computer language

Computer uses bits of {0,1} to represent numbers. One character of 0 or 1 is called a bit. Any string of bits is called a code.

Each number is encoded by a unique string of bits, or code. The encoder should satisfy the “prefix” condition. This condition states that no code is the prefix of any other codes. This makes it is always possible to decode a bit-string back to a number.

Let us consider an example. Let assume that we have a data d as a list of numbers, d = [1, 1, 1, 2, 2, 2, 3, 3, 3, 3].

To store this list d, the computer needs to encode the three numbers: 1, 2, 3. The two possible encoders are:

Code/Number 1 2 3
Code 1 ‘0’ ‘01’ ‘11’
Code 2 ‘0’ ‘10’ ‘11’

But the Code 1 violates the ‘prefix’ condition, as ‘0’ is prefix of ’01’, while the Code 2 does not. Therefore, the Code 2 can be used for encoding.

So, how many of bits needed to represent N different numbers? In theoretically, we need codes of k-bits length to encode 2^k numbers.

For example, to encode 2 numbers, we need codes of 1-bit length, which are ‘0’ and ‘1’.

To encode 4 = 2^2 numbers, we need codes of 2-bits length, which are: ’00’, ’01’, ’10’, ’11’

To encode 256 = 2^8 numbers, we need codes of 8-bits length, which are: … [we skip it :)]

Consider one radar image of dimension 401×401 including float numbers that take value from 0 to 200 mm/h. If we are interested in one decimal number, this data can be multipled by 10 and saved as interger. At that time, the array contains 401×401 intergers, whose values are from 0 to 2000.

Following the above logic, to encode 2000 different numbers, we need 11-bits codes, because 2^11 = 2048. If so, the total number of bits to encode this data is: 401 x 401 x 11 bits = 1,768,811 bits.

Compressing data aims to reduce the total number of bits that are used to encode it, in other words, to find an encoder that has a minimum average of code lengths.

Expected code length

Expected or average code length (ECL) of a data d is defined as:

Huffman Algorithmwhere x is values of data, f(x) is its frequency, and l(x) is the length of code used to encode the value x. This quantity tells us how many of bits on average is used to encode a number in the data x and by a Code C.

Minimum expected code length

It can be proved mathematically that, the ECL of a code is bounded below by the entropy of given data. What is entropy of a data? It’s a quantity to measure information, that was initially introduced by Shannon (a Mathematician and father of Information Theory). This quantity tells us how much uncertainty the data contains.

In mathematical point of view, the entropy of the data is defined as

Huffman Algorithm

And by the source coding theorem , the following inequality is always true

Huffman Algorithm

which is equivalent to

Huffman Algorithm

The ECL achieves the minimum value if l(x) = -log2 [f(x)], meaning that, the length of the code of value x depends on the frequency of that value in the data. For example, in our radar images, the most frequent number is 0 (0.1mm/h), then we should use a short string of bits to encode the value 0. On the contrary, the number 2000 (0.1mm/h) has a very low frequency, then a longer bit-string must be used. In that way, the average of bits used to encode one value will be reduced to the entropy value, which is the lower bound of the compression. If we use a code which yields an ECL less than the entropy, then the compression makes loss of information.

In summary, the compressing could be minimized if the length of codes follows the formula: l(x) = -log2 [f(x)].

The Huffman Encoding

The Huffman algorithm is one of methods to encode data into bit-strings whose ECL approximates to the lower bound of compression. This is a very simple algorithm that can be produced by a several code lines in Python. For detailed of Huffman algorithm, you can refers to the link:


Encode and Decode a radar images

As described above, a radar image contains 401×401 data points, each point is an interger taking value from 0 to 2000.

By performing the Huffman encoder, we obtained the following codes (Figure 1).

Huffman Algorithm
Figure 1. A part of Huffman codes for 2000 integer numbers

By using these codes, the total number of bits of one specific image is 416,924 bits, which is much smaller than the codes using 11-bits length.

In actual work, we can convert an image data to bits by concatenate value by value, then write everything to a binary file. As computer groups every 8 bits to a byte, then the binary file consume 416,924/8 bits = 52KB.

To decode a binary file back to the image array, we load the binary file and trasform it to bit. As the Huffman code verifies the ‘prefix’ condition, by searching and maping, the original data can always be reconstructed without loosing any information.

If we write image array in the numpy format, then the storage can be up to 1.3MB/array. Now with Huffman compression, the storage can be reduced to 52KB.

If we use ‘ZIP’ application to compress the numpy file, the storage of the zip file is only 48KB, meaning that the ‘ZIP’ is outperform our Huffman encoding. That is because ZIP is using another encoding algorithm that allows to reach closer to the lower bound of compression.

Hiring Data Scientist / Engineer

We are looking for Data Scientist and Engineer.
Please check our Career Page.

Weather Data Science Project

Please check about Weather Data Science Project

Vietnam AI / Data Science Lab

Vietnam AI Lab

Please also visit Vietnam AI Lab

Bayesian estimator of the Bernoulli parameter

In this post, I will explain how to calculate a Bayesian estimator. The taken example is very simple: estimate the parameter θ of a Bernoulli distribution.

A random variable X which has the Bernoulli distribution is defined as

Bayesian statistics


Bayesian statistics 

In this case, we can write

Bayesian statistics.

In reality, the simplest way to eatimate θ is to sample X, count how many time the event occurs, then estimate the probability of occuring of event. This is exactly what the frequestists do.

In this post, I will show how do the Bayesian statisticians estimate θ. Although this doesn’t have a meaningful application, but it helps to understand how do the Bayesian statistics work. Let’s start.

The posterior distribution of θ

Denote Y as the observation of the event. Given the parameter θ, if we sample the event n time, then the probability that the event occurs k time is (this is called the probability density function of Bernoulli )

Bayesian statistics

In Bayesian statistics, we would like to calculate

Bayesian statistics

By using the Bayesian formula, we have

$$p(\theta\ |\ y) = \frac{p(y\ |\ \theta) \ p(\theta)}{p(y)}=\frac{\theta^k\ (1-\theta)^{n-k}\ p(\theta)}{p(y)}$$

With the prior distribution of theta as an Uniform distribution, p(θ) = 1, and it is easy to prove that

$$p(y)=\frac{\Gamma(k+1)\ \Gamma(n-k+1)}{\Gamma(n+2)}$$

where Γ is the Gamma distribution. Hence, the posterior distribution is

$$p(\theta\ |\ y_1, \ldots, y_{n}) = \frac{\Gamma(n+2)}{\Gamma(k+1)\ \Gamma(n-k+1)}\theta^{k}(1-\theta)^{n-k}$$

Fortunately, this is the density function of the Beta distribution: $Beta(a=k+1, b=n-k+1)$

We use the following properties for evaluating the posterior mean and variance of theta.

If $X \sim Beta(a,b)$, then   $$E(X) = \frac{a}{a+b} \quad \textrm{and} \quad Var(X) = \frac{ab}{(a+b+1)(a+b)^2}$$


In summary, the Bayesian estimator of theta is the Beta distribution with the  mean and variance as above. Here is the Python codes for simulating data and estimating theta

def bayes_estimator_bernoulli(data, a_prior=1, b_prior=1, alpha=0.05):
    data is a numpy array with binary value, which has the distribution B(1,theta)    a_prior, b_prior: parameters of prior distribution Beta(a_prior, b_prior)    alpha: significant level of the posterior confidence interval for parameter    Model:
    for estimating the parameter theta of a Bernoulli distribution    the prior distribution for theta is Beta(1,1)=Uniform[0,1]    Output: 
    a,b: two parameters of the posterior distribution Beta(a,b)
    pos_mean: posterior estimation for the mean of theta
    pos_var: posterior estimation for the var of theta'''
    n = len(data)
    k = sum(data)
    a = k+1
    b = n-k+1
    pos_mean = 1.*a/(a+b)
    pos_var = 1.*(a*b)/((a+b+1)*(a+b)**2)
    ## Posterior Confidence Interval
    theta_inf, theta_sup = beta.interval(1-alpha,a,b)
    print('Prior distribution: Beta(%3d, %3d)' %(a_prior,b_prior))
    print('Number of trials: %d, number of successes: %d' %(n,k))
    print('Posterior distribution: Beta(%3d,%3d)' %(a,b))
    print('Posterior mean: %5.4f' %pos_mean)
    print('Posterior variance: %5.8f' %pos_var)
    print('Posterior std: %5.8f' %(np.sqrt(pos_var)))
    print('Posterior Confidence Interval (%2.2f): [%5.4f, %5.4f]' %(1-alpha, theta_inf, theta_sup))
    return a, b, pos_mean, pos_var

# Example
n = 129 # sample size
data = np.random.binomial(size=n, n=1, p=0.6)
a, b, pos_mean, pos_var = bayes_estimator_bernoulli(data)

And the result is

Prior distribution: Beta(  1,   1)
Number of trials: 129, number of successes: 76
Posterior distribution: Beta( 77, 54)
Posterior mean: 0.5878
Posterior variance: 0.00183556
Posterior std: 0.04284341
Posterior Confidence Interval (0.95): [0.5027, 0.6703]
In the simulation, we simulated 129 data from the Bernoulli distribution with θ=0.6. And the Bayesian estimation of θ is the posterior mean which is 0.5878.
This is a very simple example of Bayesian estimation. In reality, it is usually tricky to determinate a closed form solution of the posterior distribution from the given prior distribution. In that case, Monte Carlo technique is one of solutions to approximate the posterior distribution.
Please also check Gaussian Samples and N-gram language models for statistics knowledges.

Hiring Data Scientist / Engineer

We are looking for Data Scientist and Engineer.
Please check our Career Page.

Data Science Project

Please check about experiences for Data Science Project

Vietnam AI / Data Science Lab

Vietnam AI Lab

Please also visit Vietnam AI Lab