Practice Design for Try/Fail Fast


At the moment, AI/ML/DL are hot keywords in trend of Software development. The world have more successful projects based on AI technologies such as Google Translate, AWS Alexa, …AI make machine smarter than. So, the way from idea to successfully have many challenges if want to make great solution. I have some time working with AI projects and start-up build great solution based on Algorithms and ML; I aimed to propose and implement solutions that help development team working smoothly. Today, I would like to describe about development process, Architecture, CI/CD and Programming for quickly implement multiple AI approaches with Agile software development methodology.

Sessions:
– 
Architecture
– 
Continues Integration and Continues Deployment

– Batch Processing, Parallel Processing
– Data Driven Development and Test Driven Development (to be continues)

Architecture

AI project including multiple services with domains focus on: AI/ML/DL and engineering that develop independent, integration and verification automatically. Popular, the ML services very specially with Engineering Service, resolve challenges problems linking with technologies: Machine learning, deep learning, big data, distributed computing … Microservices architecture in this case is a first choose, that help to separate business problem to specific services and can be resolve by specific domain knowledge of Data Science team, Engineering team. And more advantage of microservices with Agile development, more information here. With AI project, there will focus more on “How to resolve business by AI technology?”.

Microservices maybe not a best choose but that help to quickly development and delivery with Agile methodology.

Continues Integration and Continues Deployment

When project including multiple teams, multiple services which challenges at the integration and deployment. CI/CD is most popular with software development but i got more specific from Data Science(DS) team. The big question of DS is “We have more solutions to resolve this problem, Could you help me propose solution to quickly evaluation and integration?

With Engineering team, CI/CD pipeline is so general. With AI solution, you will meet some challenges linking to:
– How to running on distributed computing? We choose batch jobs
– How to save money with long time jobs? We choose AWS spot instances
– How to parallel jobs to improvement performance? We running parallel jobs and parallel on structure design(Python coding)
– How to control Data versions, Model versions? we choose Data Version Control and AWS S3 to versioning training/evaluation data and models

All solutions applied on my project aimed to resolve challenges of AI technology, but it interesting. The good abstract of structure will help to quickly integration and deliver multiple approaches.

This pipeline can implement with any CI/CD framework such as Gitlab CI, Jenkins, AWS Code Build … So, each framework should have function for custom distributed and parallel jobs. Because the jobs in the pipeline need specific resource and the resource should be auto scale. Example for Training Jobs need more GPUs and System Evaluation need more CPUs for parallels, scalable resource is most important to save the cost.

CI/CD pipeline including training and system will help fast try and fast result, the implementation can easy to integrate quickly, trust and able to control quality.