The coronavirus pandemic and its economic fallout have caused massive damage around the world. For data scientists, though, these are boom times. They are modeling infections to predict the course of the virus and helping companies retool for the new environment by accelerating the digitization of their operations.
This brings challenges as well as opportunities, says Lynn Wu, an associate professor of operations, information and decisions at the University of Pennsylvania’s Wharton School. Many companies haven’t built the proper infrastructure of data warehouses, cloud computing and the like to get the most out of their information. And while data science is great for making evolutionary improvements, most fundamental breakthroughs are still driven with the support of human insight and intuition.
The increasingly virtual environment can help break down barriers by enabling firms to recruit from a wider talent pool and raise the voices of employees who often get marginalized in face-to-face meetings, says Wu. At the same time, faster digitization under COVID-19 also risks widening the digital divide between organizations that have the resources and expertise to exploit their information and those that don’t.
Wu discussed the future of data’s role in a recent Zoom interview with Ana Kreacic, a partner at Oliver Wyman and chief operating officer of the Oliver Wyman Forum.
You’ve talked about an innovation paradox: Companies are spending more on research and data analytics but generating less in terms of big breakthroughs. Why is that?
Data can capture our human interactions, such as our online behavior. When you aggregate that data, artificial intelligence is really good at figuring out what our current opinions are, but it doesn’t tell us what we really need or want. Data analytics also is great at finding new uses of existing technologies or innovating by using a new combination of existing technologies. But it is poor at finding a radical new technology. That requires human ingenuity, intuition, and experience. Steve Jobs never made a consumer report when he invented the iPhone.
Most advances in data science are advances in predictive analytics but not necessarily causal inferences. Having the ability to tease out causations from correlations will be critical for using machine learning, and AI more generally. In a study of the pharmaceutical industry’s data use, we found that AI applications increased the drug pipeline, but the quality of the drug or likelihood to make it through FDA trials is no higher than human-found drug.
The pandemic has created sweeping changes in how people shop, work, and live. What does that mean for data-driving decision making in an uncertain world?
Deep learning and machine learning are largely predictive technologies. They use historical patterns to predict what’s going to happen next, but they have less ability to make causal inferences.
In a state of uncertainty, you need to make sure that A is really causing B, not just that A is correlated with B. For example, Google Flu Trends first worked really well at predicting the flu, but then over time its search results included words like “basketball,” which has nothing to do with the flu but happened to be in the Fall. You see lots of that in decision-making as well. Something seems very plausible, but it’s actually a spurious correlation.
However, if you can capture dynamic, real-time information and run A/B testing to really know the causal element, then deep learning and machine learning are great tools for figuring out the pattern that is causing a business to increase or decrease. This might work well on e-commerce where it is cheap to do A/B tests, but for many really important questions it is hard or next to impossible to do A/B testing. Thus, human judgement will always be involved with you use data to make important decisions.
One of the most interesting pieces of your research used nanodata to predict real-estate trends. Has COVID changed the ability to predict?
I started that project in the great recession, and the fidelity of the predictions was very good. It’s in turbulent times that the dynamic nature of data gives you the edge. Unfortunately, I haven’t seen that during COVID-19. Maybe because the nature of change is different. The financial crisis was a housing bubble collapse. It was specifically about real estate. This time it’s systemic change. All industries got socked.
What are some of the characteristics of companies that leverage data well?
Number one, they have the infrastructure set up. The dirty work is done. They have the basic technology to pull all the data together and clean the data. Ninety percent of the effort is actually the infrastructure: Setting up clouds, setting up databases, and data warehousing. The superstar data scientist you hire is actually the last mile.
Number two, they are very cognizant of the problems data science can help solve and the ones it cannot, and where to put humans in the decision-making process.
You’ve also shown how collaborative technologies can boost productivity and help junior or minority and female employees get an edge. Have you seen any evidence of that under COVID?
I have seen some evidence in recruiting and virtual meetings. For recruiting, a lot of HR departments, especially in high-tech sectors, adopted best practices for online interviews since face-to-face wasn’t available.
I can see a lot of good things coming out of that. Lots of traditional biases may not be transmitted online. You’re less likely to ask “what fraternity did you go to” when you know the interview’s being recorded. But we also lose the fidelity of the face-to-face interaction. So there are pros and cons.
For virtual meetings, I’ve seen evidence of more participation on Zoom conference calls. People who are generally very quiet in in-person meetings tend to use the chat function in virtual meetings. At one of our annual conferences we generally don’t see a lot of women or minorities voicing their opinions by raising a hand and standing up – that’s very intimidating for someone who just joined the community. Zoom calls somehow reduce the status effect. You feel like anybody can ask a question on chat. It’s good to hear more diversity of voices using this online format.
What are you most excited about or worried about in data science?
Before the pandemic, one popular topic was automation and jobs, specifically whether robots were taking over all of our work and creating mass unemployment. I co-authored a paper recently looking at robot adoption and employment and productivity in Canada, and we found the opposite effect. Automation measured by robot adoption increased employment and productivity. In the short run that’s a good thing. People adopted robots not because they wanted to reduce costs but because they wanted to improve services and product quality.
The not-so great thing we found is that the employment change is not uniform. We see a very high increase in employment for low- and high-skilled labor but a drastic decrease in middle-skilled labor and managers. What do we do with displaced middle-skilled workers? How do we change the nature of work so we don’t have a bunch of professional workers and then a massive number of low-skill workers with no hope of advancing? That’s what I worry about.
If automation led to reinvestments and ultimately increases in the number of jobs, that suggests the firms that were already winning were the ones that made the changes. Does this have winner-take-all effects?
Your intuition is spot on. It’s not about adopting robots to directly replace workers. It’s about the robot-adopting firms getting so much better than the non-adopting firms that they take market share, and the non-adopters have to lay off workers.
Firms that invested in technology and digital acceleration are seeing disproportionate benefits. Has COVID accelerated that trend?
Absolutely. You need to start investing in data analytics technologies now, you don’t have a choice. You see that productivity gap widens between firms that adopt AI and firms that do not. Covid-19 unfortunately might have accelerated that process. You will likely see dramatic industry turbulence are similar to what happened when electricity replaced steam engines.
It is important to know the mechanism through which automation affects jobs. If robot-adopting firms displace the not-so productive firms, as opposed to robots directly replacing workers, the policy implication is completely different. A robot tax as the EU proposed it a couple of years ago may be misguided. We should be thinking about how to help firms use these technologies to improve productivity, and how to mitigate the negative consequences associated with skill polarizations.
How can we to mitigate those inequality effects?
Robot automation cannot do even close to what humans can do, so we need to find out what is the added value we can provide.
To mitigate the inequality, we have to invest more in education and training, and maybe entrepreneurship. It’s not every middle-skilled worker that’s going to be replaced. People will still need to repair robots, work with robots. There’s going to be a whole set of retraining about how to use AI, how to curate data. There’s a lot of dirty work around making things available for the fancy algorithms to process. And every time a new technology is born, there’s going to be a lot of process re-engineering. In that process, you’ll discover new types of jobs.