Practice Design for Try/Fail Fast


At the moment, AI/ML/DL are hot keywords in the trend of Software development. The world have more successful projects based on AI technologies such as Google Translate, AWS Alexa, …AI makes machine smarter than. So, the way from idea to successfully have many challenges if want to make great solutions. I have some time working with AI projects and start-ups to build great solutions based on Algorithms and ML; I aimed to propose and implement solutions that help the development team working smoothly. Today, I would like to describe the development process, architecture, CI/CD, and Programming for quickly implement multiple AI approaches with Agile software development methodology.

Sessions:
– 
Architecture
– 
Continues Integration and Continues Deployment

– Batch Processing, Parallel Processing
– Data-Driven Development and Test Driven Development (to be continued)

Architecture

AI project including multiple services with domains focus on: AI/ML/DL, and engineering that develop independent, integration and verification automatically. Popular, the ML services very specially with Engineering Service, resolve challenges problems linking with technologies: Machine learning, deep learning, big data, distributed computing … Microservices architecture, in this case, is a first choice, that helps to separate businesses problem into specific services and can be resolve by specific domain knowledge of Data Science team, Engineering team. And more advantage of microservices with Agile development, more information here. With the AI project, there will focus more on “How to resolve business by AI technology?”.

Microservices maybe not the best choice but that help to quickly development and delivery with Agile methodology.

Continues Integration and Continues Deployment

When a project including multiple teams, multiple services challenges at the integration and deployment. CI/CD is most popular with software development but I got more specific from Data Science(DS) team. The big question of DS is “We have more solutions to resolve this problem, Could you help me propose a solution to quickly evaluation and integration?

With the Engineering team, CI/CD pipeline is so general. With AI solution, you will meet some challenges linking to:
– How to running on distributed computing? We choose batch jobs
– How to save money with long-time jobs? We choose AWS spot instances
– How do parallel jobs improve performance? We running parallel jobs and parallel on structure design(Python coding)
– How to control Data versions, Model versions? we choose Data Version Control and AWS S3 to versioning training/evaluation data and models

All solutions applied to my project aimed to resolve challenges of AI technology, but it interesting. A good abstract of structure will help to quickly integrate and deliver multiple approaches.

This pipeline can implement with any CI/CD framework such as Gitlab CI, Jenkins, AWS Code Build … So, each framework should have a function for custom distributed and parallel jobs. Because the jobs in the pipeline need specific resources and the resource should be auto-scale. Example for Training Jobs need more GPUs and System Evaluation need more CPUs for parallels, a scalable resource is most important to save the cost.

CI/CD pipeline including training and system will help fast try and fast result, the implementation can easy to integrate quickly, trust and ablility to control quality.

Leave a Comment

Your email address will not be published.