Combinatorial Optimization: From Supervised Learning to Reinforcement Learning – Part 1

Recently, I was asked to solve an interesting problem: The problem of Sorting Array:


An array A contains unique n-elements, whose values are integers. The length (n) of A is ranged from 2 to10.


A sorted array B in ascending order. The length of array B must be the same as the length of array A.

Examples: A = [3,0] -> B = [0,3], A = [1,3,2] -> B = [1,2,3], A = [5,9,1,3,7] -> B = [1,3,5,7,9]

Array sorting is not a new problem. There are many sorting algorithms such as  Straight Insertion, Shell Sort, Bubble Sort, Quick Sort, Selection Sort, Heap Sort, etc. The problem above becomes much more interesting if we consider it as a Combinatorial Optimization problem. Here, various Machine Learning approaches can be applied.

Combinatorial Optimization

“Combinatorial Optimization is a category of problems which requires optimizing a function over a combination of discrete objects and the solutions are constrained. Examples include finding the shortest paths in a graph, maximizing value in the Knapsack problem, and finding boolean settings that satisfy a set of constraints. Many of these problems are NP-Hard, which means that no polynomial-time solution can be developed for them. Instead, we can only produce approximations in polynomial time that are guaranteed to be some factor worse than the true optimal solution.”

Source: Recent Advances in Neural Program Synthesis (

The traditional solvers are often relying on handcrafted designs to make decisions. In recent years, many Machine Learning (ML) techniques have been used to solve combinatorial optimization problems. The related technologies vary from supervised learning techniques to modern reinforcement learning techniques.

Using the above sorting list problem, we will see how the problem can be solved using different ML techniques.


In this series, we will start with some supervised techniques, then we’ll apply the neuro-evolution, finally using some modern RL techniques.

Part 1: Supervised learning: Gradient Boosting, Fully Connected Neural Networks, SeqtoSeq.

Part 2: Deep Neuro-Evolution: NEAT, Evolution Strategies, Genetic Algorithms.

Part 3: Reinforcement Learning: Deep Q-Network, Actor-Critic, PPO with Pointer network and Attention-based model.

Code for Part 1:

(Note: Enable Colab GPU to speed up running time)

Supervised learning

Supervised machine learning algorithms are designed to learn by example. If we want to use Supervised learning, we have to have data.

First, we will generate 3 data sets with different sizes: 1000, 5000, 50000. Then we will use some models to train with these data sets. Then, we will compare their sorting abilities after the learning process.

1.Generate training data:

How to generate data? One possible approach is: if we consider each element of the input list as a feature, and each element of the sorted list is a label, we can easily convert the data back to a tabular form

In1 In2 In3 In4 In5 In6 In7 In8 In9 In10
0 -1 -1 -1 -1 -1 7 0 1 2 3
1 1 7 6 3 4 5 0 2 8 9
2 -1 -1 6 2 7 4 1 0 3 8
Out1 Out2 Out3 Out4 Out5 Out6 Out7 Out8 Out9 Out10
0 -1 -1 -1 -1 -1 0 1 2 3 7
1 0 1 2 3 4 5 6 7 8 9
2 -1 -1 0 1 2 3 4 6 7 8

Then we can use any multi-label regression or multi-label classification models for this training data.

2. Multi-label regression

For this Tabular dataset, I will use 2 common techniques: Gradient boosting (use XGB lib) and simple Fully connected neural networks (FCNNs).

from sklearn.multioutput import MultiOutputRegressor
from xgboost import XGBRegressor