While learning about lean thinking, I got a bit concerned by the similarities I see between the German engineering tradition on the one hand, and applied machine learning (ML) on the other hand.
German Engineering
German technology has an incredibly high reputation all over the world. With automotive industry being probably the brightest example, “Made in Germany” is almost synonym with “meeting the strictest quality standards”—and rightfully so. Even Germans are not perfect, though:
A […] German weakness has been the tendency to substitute the voice of the product engineer for the voice of the customer in making trade-offs between product refinement and variety on the one hand and cost as reflected in product price on the other. While quality may be free, variety and refinement almost always entail costs, particularly when products are designed without much attention to manufacturability. Good hearing is therefore needed to ensure that product designs contain what customers want rather than what designers enjoy making.
Womack, James P., & Jones, Daniel T. (2003). Lean Thinking: Banish Waste and Create Wealth in Your Corporation.
A Cardinal Sin of Applied ML
While German tech industry might have more problems than just the one described above, many of its weaknesses are definitely balanced by a number of strengths. But rather than looking at these strengths now, we shift instead our focus to ML, and particularly to how ML is used in industry. If we spend some time looking at how ML is applied to engineering products in high-tech industry, we will notice a few things. Let’s make a concrete example. Imagine we are dealing with the ranking model to be used by the recommender system of an e-commerce platform, i.e. the software component which is responsible for picking up the most relevant recommendations to be shown to our customers on the website. During the product design stage, it might be pretty easy to completely disregard cost considerations, operational requirements, and customer expectations. For example, deciding on the degree of personalization that the ranking component should exhibit when recommending products to different customers, or choosing between a simple logistic model and a deep neural network, is a design step where engineers (or “applied scientists”) usually feel detached enough from its implications in terms of budget and operations to not actually bother about them. When facing the choice between a linear and a non-linear model, the question whether the latter model will have a lower error on the training data is usually considered the most relevant one. On the contrary, the question whether a linear approximation will ever cause a measurable difference in customer experience—and in particular, whether such difference justifies any additional engineering and operating costs—is usually perceived as less relevant or less pressing, and probably too hard to answer. Here, the reasoning is typically along the following lines: “Let’s go for the most accurate solution—if it takes a faster CPU or a larger RAM, then we’ll look for better hardware”. Or: “If maintaining the system gets too complex, then we’ll try to hire more DevOps engineers”. To some extent, this attitude might even be reasonable, given that product design in the ML area is often tightly coupled with research, hence it needs some flexibility w.r.t. the application constraints in order to properly explore the most promising solutions. Moreover, the subtle—and often very indirect—way that algorithmic choices in product design end up affecting customer experience in a measurable way hardly makes a compelling case for worrying about customer impact already at a design stage. However, every design choice generates costs downstream in the development pipeline. If a modeling choice brings adequate value to customers, then its costs are fully justified. Otherwise, it is nothing but waste. Hence, the problem with ML design is not with the costs it generates as such, but rather with its intrinsic tendency to detach itself from value stream analysis. To sum up, applied ML scientists often—and sometimes quite happily—indulge in “the tendency to substitute the voice of the product engineer for the voice of the customer”.
Compensation Patterns
Is there a problem at all? Judging by the impact that ML has made in industry over the last few years—shifting the focus of preexisting business models more and more towards data virtually everywhere—we would hardly blame ML developers for building technologies which are a bit too expensive, heavy on operations, and not always rooted in unquestionable customer needs. After all, many—if not most—ML developers still have a background in academic research, where customer-centricity or operational excellence are not relevant evaluation criteria. But for sure, business would never forgive ML its academic sins, if they were not offset by a number of strengths. And how are ML folks managing to make those sins go almost unnoticed? If we’ll look back at the situation in one or two decades, maybe we’ll summarize it as follows:
• Because skill levels were so high on the plant floor it was possible to fix each problem as it arose rather than fix the system which created the problems in the first place. The finished product handed to the customer was usually of superlative quality, even if also of high cost.
• Because the skill level of product development engineers was so high, they could reengineer designs coming from upstream rather than talk to upstream specialists about the problems their designs were creating. Again, the end product reaching the customer was superlative in achieving the promised performance, but at high cost.
• Because of the technical depth of a firm’s functions, it was often possible to add performance features to products which offset their inherently high development and production costs.
Womack, James P., & Jones, Daniel T. (2003). Lean Thinking: Banish Waste and Create Wealth in Your Corporation.
These words were actually used to explain how German manufacturing has traditionally been able to compensate for its inefficiencies. I’m genuinely impressed by how smoothly the diagnosis above can be recast from the German manufacturing domain to the applied ML scenario.
Compensating by DevOps
The first type of compensation is something we see when a ML system goes live, and problems are observed which we did not anticipate early enough. For example, in the recommender system case discussed above, once the ranking model starts serving online customer traffic, we might realize that the amount of data transfer required for the ranker to consume all relevant feature vectors is causing an unbearably high latency. We then resort to all tricks of the trade in order to overcome this issue, such as enabling data compression/decompression before/after the transfer, moving to a different hardware configuration in order to optimize network performance and (de)compression speed, or increasing the volume of data which are cached in local memory—if a local cache is available. Here, the bottom line is: If we can count on great operational skills, there’s virtually no runtime glitch that we can’t overcome.
Compensating by Reengineering
The second way to compensate for inefficiency is something which occurs when the ML product hasn’t reached our end consumers yet, but the research engineers have handed it over to the product engineers in order to “put it to production”. We move now from a software prototype to a business product which must be able to cope with any available requirements in terms of design, reliability, and performance. In the ranking system example we were imagining above, what might happen is something like the following. While the prototype was lightheartedly filling a local memory cache with as much data as possible (e.g. with all consumed feature vectors) in order to minimize latency, once we go to the large scale the original cache is not sufficient anymore, and we can’t cache all feature vectors that the ranker needs to consume. But we are so skilled at reengineering our model that we quickly think of a suitable dimensionality reduction technique. Via dimensionality reduction, we manage to squeeze the size of our feature vectors to a minimum, while not hurting significantly the original ranking accuracy. By switching to lower dimensionality (which might involve quite a bit of refactoring/reconfiguration throughout the ML pipeline), all the needed feature vectors fit again into the local memory cache. Bottom line: When engineering skills are high, no design flaw prevents us from redesigning the ML pipeline to meet any relevant constraints.
Compensating by Gimmicks
Finally, a third compensation strategy is to let production inefficiency go almost unnoticed, by surrounding it with a number of gimmicks. Just think of the many, highly beneficial side-effects of putting to production state-of-the-art ML techniques. Prestigious scientific conferences—such as NeurIPS, KDD, or RecSys for the recommender system domain—regularly host talks and publish papers contributed by ML researchers from industry, and most tech players on the market strenuously compete to get their contributions accepted for publication. If a product using ML has a high development cost, then publishing papers about the underlying technology at world-leading scientific conferences definitely makes that cost more easily acceptable—due to the return in terms of reputation, employer branding, and implicit marketing. Other gimmicks consist in winning public competitions, letting tech demos go viral on the Web, and so on. The most notable example in this direction was probably provided by IBM, when Deep Blue was awarded the Fredkin Prize because of defeating the world chess champion Garry Kasparov—although Deep Blue was still rooted in old-fashioned AI rather than in modern ML.
Forgiving ML?
Compensation efforts can give rise to genuine excellence. They play indeed a crucial role in making both German Technik and ML technology so great and successful. There’s nothing wrong with compensation per se. However, what makes me uncomfortable is the temptation to absolve ML engineering from its academic sins just because ML engineers are so damn good at compensating for those sins. Business does not adhere to any logic of forgiveness. Although we know that being German is such a good thing in engineering, that’s not a good reason to forgive ML for being so German.
