Model Building

The Modeler's Mandate

Modeling is an important part of data science. On a day-to-day basis, it may not occupy the plurality of a data scientist's working hours, but it's how you bring home the bacon. A data scientist who can't model well is just a data engineer with a superiority complex. But I'm not here to bash data engineers - they're an important part of the AI worldsphere and ecosystem, and know a lot more about how to build data cubes and parallelepipeds and things. The plumbers of our AI house. A lot of nobility in plunging out our dirty data clogs.

What I'm here to remind you of is that, whatever your key evaluation metric is, when it comes to predictive models, maximization isn't everything - it's the only thing. In order to form a more perfect model, you need to constantly update your organization to reflect the latest, greatest, most maximized, most up-to-date model at all costs.

Sometimes this is uglier and harder than it seems. Imagine there's a model that predicts some kind of binary outcome of interest. This legacy model has 82% precision, 90% recall, and its tenure has won support (perhaps somewhat begrudgingly) from stakeholders and IT. It's well-understood, battle-tested, and generally unobjectionable to those involved.

Let's say that, as part of an annual update examining this and other legacy models, you're tasked with prototyping an updated version. Good news: your model just hit 83% precision and 91% recall - both increases over the existing model's performance. At this crucial juncture, what you need to internalize is that you have a professional obligation to put this model into production as soon as possible. This is a great success, and that other modeler should understand that you have defeated their work, and they should defer to you on all related matters henceforth. Tear down their pipelines. Shutter their cloud instances. You're the best, and you need to act like it.

See, some people want to dilly dally, claiming that a performance increase of this magnitude doesn't mandate such an update; what this really means is that these people are untrustworthy, as they're unwilling to accept the results. Like a child who lost his shooter in a game of marbles, thrashing around with soiled undergarments, unwilling to create and drive value in service of efficiency and the almighty correctness of the science.

Perhaps some wet traditionalists insist that there are "significant logistical challenges" to implementation. Perhaps some naysayer stakeholders claim that they don't "understand" or "want" the new model - well, that's not their concern, because you're the one who knows the science and knows what's best. You need to be on the right side of the correctness maximization line. The evidence is obvious! It's right there in your model's improved performance!

Organizations thrive when their decisions can be more accurate and most efficient, which they can be with the best, most efficiently and correctly accurate model. If you're not implementing the best possible model immediately, you're doing your organization a disservice.