Building AI system for scale - Case study

Jan 277 min read

If you're a C-level exec at a young startup and you are worried that your AI system is not built for scale — this story is for you.

In the thrilling early days of startup life, building your team feels a bit like playing Tetris with your finances; you're just trying to fit everything in before the screen fills up. You're racing against the clock to the next funding round, hoping to prove your product is the next big thing. Naturally, you make decisions that seem right—well, right enough for the moment.

But fast-forward a year or two, past the champagne toasts of your initial success, and you start to notice the cracks in the foundation. Maybe your software architecture is more "monolithic maze" than "sleek skyscraper." Perhaps there’s a critical chunk of code that only one person understands—and suddenly, they're more precious than that intern who actually knows how to refill the printer. And let's not even start on the data science being held together by the digital equivalent of duct tape.

Take, for instance, one of our favorite customers—a rising star startup which supplies its customers with a digital platform which monitors its customer’s assets. Early on, they began noticing that the data processing and algorithms couldn't handle the large variety of conditions and data-noise affecting the monitoring applications for their various customers. Their system, designed to deliver daily insights, was teetering—propped up by hopes, dreams, and an alarming amount of manual intervention.

If this sounds familiar, it's only because proper AI product development is usually not a siloed project but an orchestrated company effort, involving the executive management, product, dev-ops, and a skilled AI team working on the correct problems. Maintaining harmony among these diverse sections while staying attuned to the minutiae proves especially challenging in the fast-paced, often chaotic symphony of startup life and similarly in more compartmentalized organizations.

Many times even experienced officers in the organization have difficulties in pinpointing the root of the problems, as they are caught in the inertia of things. Fortunately for our customers, they recognized the need for external expertise and reached out for assistance. Now, after six months of dedicated effort, they are not just back on track but thriving. They have successfully launched a straightforward, scalable product that not only meets the needs of their growing customer base but also positions them prominently at the forefront of their market.

How did we achieve this? While there were numerous steps involved, let me highlight the four most crucial ones

Aligning prioritized business goals with the reality of what can be achieved with the data.

Now, this may sound trivial, but its actually far from it.

A common challenge in startups focused on specific industry verticals is that key leaders such as the CEO, COO, and sometimes even the CTO, may not have a background in AI. This often results in a gap between management's expectations and what the AI team can realistically achieve, as their wishes may not align perfectly with AI capabilities.

In AI product development, it's crucial to align what the data can realistically achieve in terms of AI performance with the business goals that need to be met. Often, what an experienced AI engineer finds extremely challenging may seem straightforward to a non-AI executive. Similarly, an organization might fail to recognize the potential value of its data to its customers because it has defined its product goals in ways that are unachievable by the AI team.

Consider a typical scenario where management demands that an AI product have no classification errors—achieving 100% accuracy is essentially impossible for systems based on statistical models. However, allowing for a classification value of 'uncertain' could make the system not fully automatic but more reliable by avoiding critical errors.

Such a gap between management and the AI team, must be properly elevated in the organization. This calls for top communication skills as the encounters bring together people with totally different ways of thinking - data-driven analytical engineer, visionary product manager and the business-driven CEO. Bridging this gap demands not only a high level of trust but also the strategic use of empathy to effectively reconcile the differing viewpoints within these discussions.

The solution - Drawing on our dual expertise as AI managers and company executives, we built a strong trust foundation with all company stakeholders. Trust involves not only attentive listening but also the ability to support arguments with compelling examples and to clearly articulate the consequences of inaction alongside the benefits of making the right choices. This approach enabled us to distill management's requests into specific, data-backed proof-of-concepts. Over time, we refined these into a concise list of AI features that were both practical and of substantial value to the company.

Data pipelines for scaling

Scaling AI products isn't just about rapidly increasing the number of GPUs for your applications but about maintaining the reliability of your AI systems across diverse customer conditions.

When I Ied the traffic sign recognition for BMW in 2006, we had planned data collection and validation trips in over 60 countries to check the performance of the system. This was driven by BMW's unwavering commitment to ensure that their prestige vehicles delivered elite performance, no matter where in the world they were sold. And this was very much needed. You wouldn’t believe the creativity that department of transportations for some countries have.

This principle applies equally to autonomous vehicles as it does to healthcare systems adapting to different patient demographics, or retail strategies varying by region. Effective scaling requires a continuous flow of data to develop and validate models for all these scenarios before deployment. Additionally, as your customer base expands, the process of collecting this data for ongoing development should be streamlined. If not managed efficiently, response times to reliability issues may become slow, or resource consumption could become excessively high.

An important aspect in AI team building is having the AI team take ownership over the data collection and its usage in development and validation, including defining the way data should be streamlined to the development team.

How We Solved It:

Our approach was simple and straight-forward

Refactoring the Data Preprocessing Pipeline: We restructured our data preprocessing pipeline into modular components. This adjustment not only improved maintainability but also enhanced the flexibility of our processing tasks.
Integrating a Central Database: We introduced a centralized database as a critical endpoint for all stages of data processing. This database serves a dual purpose: it is accessible to both the customer-facing application and our data science development team, ensuring that both can operate efficiently and in sync.
Establishing Back-Office Validation Pipelines: We implemented specialized validation pipelines for different data science modules. These are managed by the data science team members and are designed for iterative high-throughput validation and training data generation. This setup supports the continuous improvement of key AI modules within our data science framework.

3. Finding the right AI expert

AI and algorithms are very large fields, and ever so often you find yourself in need of a specialist with the right kind of toolbox for your problem. The specific nature of the noise in our customer’s data called for a noise modeling module, which should address a wide variety of noise sources in a data-driven manner and improve over time. We had a very clear job description in mind, but finding the right person for the right budget was a challenge. Working together with the HR of the company we iterated on our job requirements until we got a good stream of incoming CV’s. After that, finding the correct person was just a matter of canvassing the stock of CV’s and relying on our vast experience to find the hidden Gems. After several hours we had in our hands the perfect candidate. A few phone calls and an interview afterwards, he turned out to be exactly the person we were looking for. He is now part of the company, not only helping to overcome the severe problems we were facing, but also opening new approaches for predicting customer’s future status, increasing the offering of the company to its customers.

4. AI ownership over deployment to production.

This is how one of my customers described the AI development process in his previous company: “We had a team of researchers… well… they don’t really know how to code for production, so we passed their models to other guys who implemented them”. Sounds great, right ? Everyone is doing what they are best at…

This is a common practice that has serious weaknesses, but in a nut-shell it makes for a situation where no-one is the responsible party for the performance of the AI model in production. In AI products it is especially problematic, due to the sensitivity of the model to variation data distribution which are introduced in the integration process.

Many Startup founders are not familiar with the sensitivity of AI to Data distribution. And no, this is not how you distribute data…

Let me break it down a bit. Data distribution is basically how your data is spread out and varies. An AI model is trained to work well with a specific set of data (like pictures of dogs). But if it suddenly starts seeing different data (say, pictures of cats or images taken from a different angle), its performance can take a hit.

What often happens in many companies is that the AI that’s built doesn’t quite match up with the real-world data it encounters outside of the development environment. This mismatch can mess up the model’s effectiveness and lead to a lot of time wasted trying to pinpoint errors—time that could’ve been spent making things better.

To tackle this, we made it a rule that the AI team must take full responsibility for the models they develop—from design to deployment. This means they have to test and deploy their own work. We've upped our game in software development standards, set up solid CI/CD processes, and established dedicated testing environments for each AI project. This way, everyone knows exactly what they need to handle, leading to fewer surprises and more accountability

In conclusion, the journey from an initial AI concept to a robust, production-ready solution is fraught with challenges that demand not only technical acumen but also strategic foresight and operational expertise. At the heart of our approach is a commitment to bridging the gap between ambitious AI goals and practical execution. By fostering a culture of ownership among our AI teams, aligning business objectives with technical realities, and continuously refining our processes, we ensure that our clients not only overcome the immediate hurdles but are also well-prepared for future demands.

Looking for the right AI strategy?

Building AI system for scale - Case study

Aligning prioritized business goals with the reality of what can be achieved with the data.

Data pipelines for scaling

3. Finding the right AI expert

4. AI ownership over deployment to production.

Recent Posts

Comments