- Why your company’s data strategy is probably wrong- if you even have one
Just now·5 min read
In 2017, I worked as part of a data science team at a prominent ticketing company about a year before their IPO. Our core focus was on one thing only- accurate data to provide to the SEC in order to go public.
We had 15+ years of data that had been migrated again and again to different cold storages, platforms and languages. On top of that, we had recently acquired another ticketing company whose data wasn’t integrated with ours at all. I couldn’t even query it, it lived in a completely different infra.
Because of this disparate data, no one could tell you many tickets were actually being sold, let alone any data that needed to be aggregated between these 2 companies to provide a holistic number. All of this needed to get provided for regulatory purposes. Our overworked, jaded data engineers didn’t even try to integrate.
In hindsight, I ask- why were the data scientists the ones leading the data strategy for the IPO? Sure, we could forecast and plan for future ticket sale projections and give insight to strike price, but realistically, this entire operation needed a half dozen data analysts pushing out automated, real time reporting and about a dozen data engineers to clean and scale the 15+ years of messy data. Most critically, we needed a leader who knew the working data architecture of a business in order to connect these areas. But we didn’t have any of these things and IPO’d anyway.
Fast forward to 2022, this company is facing a massive lawsuit by shareholders because the data that was reported at IPO was false.
The irony is, those of us working on the ground during this effort saw all of this, and we knew there were problems. Those of us who understood the issues with the working data were not the people in the position making executive decisions. And the executives didn’t know the on-the-ground issues with the data strategy and execution. It seems obvious now, but these scenarios occur all day to a lesser extent.
The blindspot between individual data teams and their internal reliance on one another is missing from nearly every company in the world right now.
Data scientists need data engineers. Data engineers need business intelligence analysts. Data analysts need integrations and solutions engineers on tooling. These teams heavily depend on each other. It should be obvious that if an org needs more data engineers, the answer isn’t to hire another data scientist, yet you’d be surprised how often that happens because these areas don’t have an overarching strategy for how to support a business.
If you’ve worked in a data role in any industry, you’ve been part of conversations on hiring ratios and the lack of [insert data resource here] as a blocker to getting something done. The important piece to remember is that these teams aren’t mutually exclusive to one another. I can tell my DS manager all day that I need another DE resource but because that’s a different team entirely, blockers occur. This type of blocker shouldn’t exist, but is instead embedded in the working organizational model of data teams everywhere.
An effective data org needs to be designed and run by someone who has worked in data, period.
This one should be obvious, yet the last 4 out of my 5 managers had never worked in a data role in their entire career. I’ve been managed by sales people, marketing leads and former consultants who have no experience in how to support the needs of a DS in the broader data org on the working level. As management climbs higher, data knowledge only dwindles more, fostering an uninformed 20,000 foot view.
Unless you have expert recruiters, you’re probably recruiting for the wrong thing
Data scientists in 2022 are not the ones designing advanced, production level ranking algorithms and sophisticated AI. This is work done by Machine Learning Engineers- a completely different discipline. Analysts own reporting and dashboarding, not forecasting and predictive modeling. The disciplines within the complex universe of data practitioners have become so mixed up, it’s questionable that a company is intelligent about what they are even hiring for. The term “data architect” is so rarely sourced that when I worked at a prior role, it took us over 2 years to find even one person who fit the role (and this was San Francisco 2015). Understanding who does what is the first step in ensuring the right roles are being prioritized. It’s ok to have a small DS team if what you actually need are data engineers.
Most data science roles in 2022 are actually data analytics
It’s surprisingly rare that an organization needs the support of true data science. A/B tests are probably the most common example of using data science regularly in a way that can’t be supported by another role. More often than not, decisions can be made with standard KPI tracking and reporting on core metrics- a function owned by analysts.
The worst spotlight on the data science role was Harvard Business Reviews’ 2012 piece on data science being the “sexiest role of the 21st century”. This created a frenzy for every company from seed to profit to chase data scientists, using the size of their DS team as a point of pride. Sure, it advanced the discipline and brought more people to the field, but it also muddied the waters of what the role actually performs in practice. It also opened the door to the data scientist being the jack of all trades for all things data.
Unless you’re a gargantuan megacorporation, if your company’s DS team is bigger than 50 people, it’s a huge red flag that your team is inefficient, resources are being wasted and you’re draining your company’s salary budget for diminishing returns.
The design of your data org is just as important and the individuals who make it up.
To hammer in the above point, not all businesses have a need for many advanced data roles. Getting a true understanding of how you plan to use data, when it is valuable and what is needed to answer your questions to reach whatever goals are important are going to be what decides who to hire.
Having been a DS for over a decade, I’ve built pipelines, dashboards, A/B tests, multi variate experiments, shared spreadsheets, chron jobs and a slew of other data disciplines. Only a couple of those things are actually data science, but because it was what the business needed, now you’ve got your senior DS building a dashboard in Tableau because you didn’t know what your data needs actually were. My role as a data scientist could have been at least 50% supported by a junior level data analyst.
This is the first in a content series on data strategy for business, and why this is not something to skimp on.
- Date of publication:
- Fri, 01/14/2022 - 14:23
Click on the link - it will be copied to clipboard