A Year In Review, Data Science in 2020

A Year In Review, Data Science in 2020

ABOUT THE DATA STANDARD

The Data Standard is the premier community of accomplished data enthusiasts and pundits. The organization was founded to connect data scientists, analysts, engineers, architects, and enthusiasts when the world went remote with the COVID-19 pandemic. Built for introverts with love, empathy, and mindfulness, The Data Standard hosts exclusive monthly Zoom events, community-oriented podcasts, and trending blog posts. Visit The Data Standard’s website for more information!

Events 

The Data Standard hosts monthly round table events over Zoom with active members of the community. We discuss the application of data science, artificial intelligence, and machine learning to society. These events are an extraordinary opportunity to hear from industry thought leaders with unique perspectives. Check out datastandard.io for more information!

Podcasts

The Data Standard’s new podcast series on leaders in the data science space is available on Spotify and Buzzsprout. Sponsored by Pandio, the podcast features accomplished executives who share their professional stories to inspire the next generation of data enthusiasts. To inspire young data enthusiasts to be more empathic and to upskill introverts on the importance of networking, The Data Standard’s podcast has proven to be one of the top data science podcasts in the country.

Blogs

The Data Standard’s weekly blogs include informative and trending topics in the field of data science. From new trends in AI and machine learning to featuring some of the most prominent data science influencers, the Data Standard’s blog covers it all! Check it out on The Data Standard’s website blog page!

EXECUTIVE SUMMARY

The Sexiest Job of the 21st Century. There is no question that data scientists are in hot demand with new emerging technologies like machine learning and AI revolutionizing the corporate world. A 2012 Harvard Business Review article by Thomas Davenport and D.J. Patil examines the prominence of big data in all aspects of society, along with the influx of data scientists to follow with it. Looking back on 2020, data science has become more prominent than ever. As companies begin to understand the value of the volumes of data they produce, they have an increased desire to hire skilled data scientists that can use data to gain insight, draw relevant conclusions, and make a positive impact on the company.

But which industries are more active than others when it comes to investing in data science and hiring data scientists? 

To understand the landscape a bit better, the team utilized LinkedIn profile data to get an updated and accurate analysis of the current trends and movement of the data science space. The data analysis of 28,664 LinkedIn profiles provided many interesting details on industries, demographics, hiring trends, and gender disparities relevant to data science in 2020. All of these attributes paint the trajectory of what data science can expect as a whole in the next coming year.

So what’s next? Several sources claim that data science is an exponentially expanding field, and universities around the world have either shifted existing programs or created entirely new programs to meet this demand. But what exactly does the future hold? The team investigated four industries where data science is having an extraordinary impact: financial services, healthcare, cybersecurity, and retail.

In financial services, more and more companies are beginning to put more emphasis on utilizing data science and data analysis to become more cost-efficient and productive. While several surveys portray that there will be more investment in AI and machine learning over the coming years, many companies up to this point have failed to get AI/ML insights into their business flow and to impact business decisions at a high level. As a result, the financial services industry is expected to continue investing in these emerging technologies with the hopes of effectively incorporating and profiting off of them.

With the prevalence of the COVID-19 pandemic in 2020 providing an incredible amount of data each day, we have already seen the first signs of an explosive boom in the healthcare industry. With the increased usage of wearables like Fitbit and Apple watches that provide for more data integration opportunities, there has already been an increased demand for data scientists and computing ability in the healthcare sector, and these attributes will also be in greater demand in the next coming years. Healthcare analytics alone is expected to grow to a $50 billion market by 2024.

Cybersecurity has also evolved in its inclusion of the role data science plays. Study after study has shown that there are more widespread interest and implementation of AI and machine learning to stop cyber breaches and detect fraud instantaneously. And with the proliferation of technology with regards to cell phones, computers, and even self-driving cars seen in 2020, data science has proven to be of great importance to the industry.

The Retail industry has also seen more investment in AI, ML, and data science to grow dramatically. With the pandemic in 2020, we have already seen a remarkable increase in technology in the retail industry with more people relying on their phones to deliver food and items rather than physically go out to shop. A full suite of digital tools is entering this industry, with data scientists needed to leverage them.

Overall, this report is aimed at helping the data science community — data science professionals, enthusiasts, and aspiring data scientists — gain a better understanding of the industries that are heavily investing in data science and which areas of the country have the largest densities of data scientists.

The team believes all will find this a valuable resource to understand the trends driving the expansion of this exciting field.

METHODOLOGY

Data in this report include third-party information cited within the text, as well as original analysis of self-reported information from Linkedin, including all publicly visible personal and company profiles, skills, professional experiences, and education posted by profile owners between May and July 2020.

The team queried LinkedIn for profiles in the US with the term “data scientist” in their profiles and then attempted to verify that these accounts were valid and active by enriching the data and testing the associated corporate email addresses. From that exercise, the team was able to validate 28,664 profiles in the United States.

The analysis regarding the top 10 cities, job titles, and industries is relative to the specific Linkedin dataset. Python was used to analyze all of the specified analyses. Values for each chart were determined by finding the total occurrences of each unique, specific location/job/industry within the dataset and sorting the values in descending order for the question of interest. For example, when analyzing the top 10 industries, the total count (number of times an industry shows up within the dataset) of each different industry was found and counts within their relative industry were sorted in descending order.

For the analysis of women in data science, the analysis was not as straightforward. Since gender is not something reported on profiles, the team used a Python package called generate_gender that uses a list of ~40,000 names and probabilities of the name being a certain gender. This allowed us to match 50.1% of the names relatively accurately. For the remaining names, the team randomly assigned a gendering weighted for men to account for 75% and women to account for the remaining 25% of the names, using third-party sources of the gender split in the industry

It’s important to emphasize that this is self-reported data and therefore is going to miss certain segments of the market, however, the team is confident that it provides a reliable directional view on this exciting profession.

DATA SCIENCE PROFESSIONALS IN 2020

Top Cities for Data Scientists

In analyzing the most popular cities for Data Scientists using the LinkedIn dataset, the team found that San Francisco, the home base of technology, ranked at the top of the list, but with 7% of all data scientists. Following San Francisco is New York City which is home to 4.6% of all data scientists. Interestingly, the top two cities for data scientists make up just over 10% of the data, portraying how data scientists are very spread out throughout the US. Other tech hubs in Boston, Austin, and Washington, D.C. followed in rankings. The top 10 are rounded out by Atlanta, Chicago, Houston, and Dallas (see Fig. 1).

Fig. 1: Top 10 Cities for Data Science Jobs

Source: The Data Standard analysis

Taking a deeper look at some of the data, the team found that San Francisco is home to 557 companies and 76 industries that hire data scientists. In New York, there are 497 companies and 75 industries, and in Boston, there are 212 companies and 45 industries (see Fig. 2). Even the last city on the list, Los Angeles, still has 135 companies that hire data scientists. The compiled table portrays cities that are ideal for data scientists who are looking for new opportunities without having to relocate.

Fig. 2: Top Cities for Data Science Jobs by Number of Companies & Industries

Source: The Data Standard analysis

While these cities may have the most companies hiring data scientists, that doesn’t necessarily mean they are offering the highest paying salaries. A 2018 analysis by TechRepublic found that data science salaries were highest in Scottsdale, Arizona; Portland, Oregon; Houston, Seattle; and the Bay Area.

Other considerations data scientists must take when considering job locations include the cost of living and the standard of living, two aspects that the Open Data Science Conference looked at last year, in addition to salary. The ODSC analysis listed Raleigh-Durham, Phoenix, Atlanta, Boston, and Palo Alto as the top cities in the US to be hired as a data scientist.

Top Industries & Companies for Data Scientists

Predictably, the industries that hire the most data scientists are those that work most closely with technology: information technology, financial services, computer software, and the internet (see Fig. 3). Interestingly, the next greatest concentration of data scientists is in industries such as higher education/health care, insurance, retail, and telecoms. While these industries are not as close to technology, they still generate significant amounts of data, which opens up more opportunities for data scientists and analysts.

Fig. 3: Top 10 Industries for Data Science Jobs

Source: The Data Standard analysis

Looking more into these industries, the analysis found that 291 companies in the information technology and services industry had data science roles, while there were 212 companies in financial services, 249 in computer software, 158 on the Internet, and 153 in higher education (see Fig. 4). This is especially interesting in comparing the financial services industry with the computer software industry. While Fig. 3 displays how more data scientists are working in the financial services industry, Fig. 4 shows how there are fewer companies with data science roles in financial services rather than computer software. On average, individual companies in financial services are hiring more data scientists than an average company in the computer software industry.

Fig. 4: Top 5 Data Science Industries

Source: The Data Standard analysis

Additionally, the analysis found that job titles related to data science had different concentrations in other industries. For example, data scientists are revealed to be the most prevalent in information technology and services, financial services, computer software, internet, and management consulting which are the top five industries (see Fig. 5). In contrast, however, job titles such as data analysts, researchers, software engineers, data engineers, etc. are prevalent in some industries that are not mentioned in the top 5 for data scientists (for example Hospital & Healthcare under data analyst). This reflects how in general, certain industries are looking for positions similar to data scientists, such as data analysts and engineers, rather than data scientists, emphasizing the distinct differences between these seemingly related roles.

Fig. 5: Top Job Titles by Industry

Source: The Data Standard analysis

To go one step further, the team researched some of the top companies that were hiring data scientists in a variety of different industries. A review by Glassdoor identified the top five employers like Accenture, Amazon, Apple, Facebook, and Fidelity Investments. Specifically during the pandemic, according to a 2020 Open Data Science Conference article, companies with the job openings in the healthcare, technology, banking, and aerospace industries are:

  1. Healthcare: CVS, Aetna, Vertex Pharmaceuticals, Thermo Fisher Scientific
  2. Technology: Facebook and Google, Apple, Microsoft, Oracle, NVIDIA
  3. Banking: JP Morgan Chase, Hartford Insurance Group, Capital One, US Bancorp.
  4. Aerospace: Randstad Technologies, Honeywell, Northrop Grumman, General Dynamics

Top Titles & Roles in Data Science

Data science is a relatively new field that is less than 20 years old, coined by former White House CTO DJ Patil and Cloudera founder Jeff Hammerbacher when working together at LinkedIn. But it’s clear that in 2005, the National Science Board called for the creation of the role of a data scientist to manage growing collections of data. Before “data scientist” became a buzzword, professionals who focused on data and analytics might be called “statistician” or “analyst.”

The team’s review of professional titles in data science found that more, about 50% to be exact, hold the title of “data scientist.” Other titles with some prevalence included data analyst, researcher, and data engineer (see Fig. 6). 

Fig. 6: Top 10 Job Titles in Data Science

Source: The Data Standard analysis

Along with that, data scientists work at 3,062 companies, 120 industries, and in 1,072 cities (see Fig. 7). This is significantly higher than data analysts, who work at 1,830 companies, in 114 industries, and 622 cities. The term data scientist is becoming increasingly used to define professionals that work with and analyze data, which is interesting given the term was hardly used 15 years ago.

Fig. 7: Top 5 Job Titles: Top Companies, Industries & Cities

Source: The Data Standard analysis

Some places that have a large number of professionals with the job title, “data scientist” are San Francisco, New York City, Boston, Austin, and Washington, D.C. (see Fig. 8). Interestingly, San Francisco ranks as the top city to find a job for all seven of the job titles the team analyzed, followed by New York City. One interesting observation is how Washington, D.C. is the 5th top city for the title “data scientist,” yet is not one of the top five cities for any role other than “consultant.”

Fig. 8: Top 5 Cities by Job Title

Source: The Data Standard analysis

Women in Data Science: Roles, Industries & Cities

Women make up between 23% and 31% of all data science professionals, meaning the gender distribution in data science is greatly imbalanced. Women are extremely important in developing team goals and contributing to different perspectives. BetterBuys’ article, “Women in Data Science”, found that companies in the top quartile for gender diversity are 15% more likely to exceed national financial medians. They also found that tech companies with female founders perform 63% better than ones with founding teams completely composed of men. While women are greatly underrepresented in the data science professional pool, they are still helping to shape the emerging field. According to a diversity study by NESTA, women authored research publications track slightly higher compared to those authored by men. While women remain a minority in technology and data science, their marginal presence may do more to benefit a team’s performance.  

In the analysis, the top industries for women differ slightly from the general data scientist population. IT still includes the lion’s share of female data scientists, followed by financial services, higher education, computer software, and management consulting (see Fig. 9). Yet notably, the retail and pharmaceutical industries rank in the top 10 for women rather than the general population.

Fig. 9: Top Industries for Women in Data Science

Source: The Data Standard analysis

In the analysis of the top cities for women in data science, female data scientists are mostly found in San Francisco, Boston, New York, New York City, and Atlanta ranking in that order (see Fig. 10), similar to the combined dataset.

Fig. 10: Top Cities for Women in Data Science

Source: The Data Standard analysis

Additionally, the analysis portrayed that the top job titles listed for women in data science (see Fig. 11) follow a similar distribution to that of the full dataset (see Fig. 7). The analysis found that the majority of female professionals in data science hold the title “data scientist,” with “data analyst” a significant number behind.

Fig. 11: Top Job Titles for Women in Data Science

Source: The Data Standard analysis

THE FUTURE OF DATA SCIENCE

As data science comes into its own in the professional world, more educational institutions are now offering specific data science degrees and training. Yet as it stands right now, many companies are still struggling to fill their data science roles. A recent study from the McKinsey Institute, predicts that by 2024, there will be 250,000 unfilled data science positions. As more companies operationalize AI and ML in their business, the need for capable data scientists will only grow. Gartner predicts that by 2024, 75% of all US companies will have moved from piloting to operationalizing AI. In the next coming years, thousands if not millions of data-oriented jobs will be created to design and maintain AI systems. 

Given the emerging nature of data science as a whole, most active data scientists did not initially set out to practice data science. When they were a candidate for their bachelors, the term “data science” simply did not exist. While the term “data science” may be a buzzword, the demand is completely valid.  As data science bleeds its way into almost every industry, the demand has exponentially increased. With data science, companies can turn their data stores into informative and quantitative insights that drive their business decisions, and in a world whose continuously innovating, both in software and hardware, data science is critical. 

It’s clear to see that data science is and will continue to be a very important catalyst to shape the next iterations of business practices. The predictions aforementioned give insights into where the market as a whole is heading and it should come as no surprise that a data-driven world needs data scientists. In the next few pages, the report breaks down the financial, healthcare, cybersecurity, and retail sectors to uncover the unique challenges they face and outlines the trajectory of their growth.

The Future of Financial Services

The financial services industry contributed a massive 7.6% to the U.S. GDP in 2019 and its potential continues to grow as the industry is expected to increasingly incorporate AI to handle their data stores. Examining the future of financial services, the World Economic Forum and the University of Cambridge noted that AI would be a very important driver in growth, with 77% of their survey respondents expecting to put a focus on AI.

As more companies adopt AI, there becomes more opportunity for data scientists. With a reach extending to both businesses and individuals, each firm in the financial sector has an overabundance of data. To turn this massive data dump into useful insights, each firm will have to operationalize an AI initiative that can address general privacy concerns and handle subjective labeling conventions and architectures. In a WEF survey, 64% of respondents reported plans to use AI to generate new revenue potential through new products and processes, process automation, risk management, customer service, and client acquisition. But currently, only 16% of respondents had the technologies and plans in place across all of those areas, though 56% had already adopted AI to manage risk. 

These findings are echoed in a study by the Economist Intelligence Unit that found 86% of respondents said they planned to make more investments into AI-related technologies and projects over the next five years, reflecting those respondents’ expectation that 51% to 75% of their workloads would be supported by AI over that time. In a study by Deloitte, 63% of CFOs expected time allocation of the finance workforce will continue to shift toward analysis, prediction, and decision support, a clear need for data science experts.

Fig. 12: Plans to significantly increase AI R&D spending in the short term within two years by current R&D spending segment

Source: Transforming Paradigms A Global AI in Financial Services Survey, World Economic Forum & University of Cambridge

When asked about barriers to implementation, respondents named talent as a critical factor. The WEF report named “access to talent” as a chief concern, the largest single hurdle to successful AI (see Fig. 13).

Fig. 13: Barriers to AI implementation.

Source: Transforming Paradigms A Global AI in Financial Services Survey, World Economic Forum & University of Cambridge

Managerial professionals are seeing this talent gap as a very significant issue. Charles Phillips, managing director at Deloitte Consulting LLP, told Workday: “Talent models for digital finance are tilting toward data science and business partnering, but many finance organizations don’t have the right people with the right skills in place to make the shift … Data gurus—such as statisticians and data scientists, and even behavioral scientists—will be critical in helping the finance function of the future turn data into fresh perspectives and strategic insight.” It is evident that as time goes on, companies will be investing more and more into data scientists and into their AI/ML initiatives with the hopes of gaining successful results.

The Future of Healthcare

As the COVID-19 pandemic continues and the numbers of cases and deaths rise across the globe, more attention has been drawn to data, analytics, and AI in the healthcare sector. Data scientists have played a key role in utilizing data from the pandemic to direct the policies of world governments to slow the spread of the virus and protect vulnerable groups. There has already been a large increase in the healthcare market and healthcare data storage in the last year, and this is expected to rise dramatically in the coming years. BIS Research expects the market will grow to $28 billion by 2025 and the following reports reflect this trend. One firm projected an increase in healthcare data storage from $2.40 billion in 2018 to $9.30 billion by 2027. That storage will be needed for the explosive growth of healthcare data, which was 153 exabytes in 2013 and has already grown to 2,314 exabytes. 

In addition to the increase in data and storage, rising consumer trends like Fitbits, Apple Watches, and other wearable devices are paving the way for the expansion in healthcare AI, projected to be a $52 billion industry by 2022. Recently, a health study by Scripps Research has been working with app and wearable device data to predict if a user has COVID using machine learning and data analysis. With the use of wearable devices increasing for individuals and hospital patients in the next few years, it is clear there will be a larger demand for data scientists, analysts, and machine learning engineers in the healthcare sector. It is estimated that healthcare analytics alone will be a $50 billion market by 2024, and looking back, it is evident that 2020 was a big step forward for data science in healthcare. 

Consumers are buying immensely data-driven products. Wearables such as Fitbit contribute to what is projected to be a $52 billion industry by 2022, with at least one-third of the consumer population already using wearable devices. The growth is also boosted by hospitals’ use of wearables as well. A report in Harvard Business Review anticipated a full 90% of hospitals would use wearable devices.

These trends point toward increased reliance on data, reflected in an analysis by Markets and Markets of the healthcare analytics market. The group estimates that it will be a $50 billion global market by 2024 (see Fig. 14).

Fig. 14: Healthcare Analytics Market, By Region (USD Billion) 2017-2024

Source: Markets and Markets, “Healthcare Analytics Market Report 2019”

The Future of Cybersecurity

An explosion of wearable devices and Internet of Things has created an increased demand for cybersecurity and, consequently, cybersecurity experts. Research firm Frost & Sullivan estimates that by 2030, there will be 91 billion devices, with 10 connected devices per human. With each of these devices generating a plethora of sensitive data, the need to secure users’ information has never been higher. It is predicted that criminals will steal 33 billion records by 2023, with the average cost of a single breach around $8 million.

Data scientists use AI and machine learning to identify, predict, and block threats, thus prompting increased investment in AI. A study by MeriTalk found that 84% of respondents used data to block threats. Accordingly, research firm Markets and Markets expects an increase in investment in AI over the coming years, up to $35 billion by 2025. A study by Capgemini found that budgets for AI have been increasing by an average of 29%. In the domain of cybersecurity, AI makes all the difference: Nearly two-thirds of survey respondents said that AI lowered the cost of detecting and responding to breaches, by an average of 12%.

As with all areas of data science, the talent gap is an issue as well. The MeriTalk study found that talent was a top barrier to big data adoption in cybersecurity (see Fig. 15). ISHIR, a software developer, predicts that the demand for data scientists will outstrip the supply by 500% over the next 10 years: “Despite the proliferation of new tools for data computation and data security, there will be a massive lack of good data scientists and cybersecurity professionals,” the company wrote in a report. “Tools for handling big data cannot perfectly input and output the right context for the data to make it useful for companies. It still needs to be deciphered with a human at the controls. Likewise, simply putting a tool in place to handle cybersecurity is not enough.” Along with the investment in AI and machine learning, finding skilled and experienced data scientists will be essential for the future of the field, and 2020 has proven to be a great step in the right direction for cybersecurity.

Fig. 15: Barriers to adoption of big data in cybersecurity

Source: MeriTalk: Navigating the Cybersecurity Equation

The Future of Retail

In the modern age, retail services rely on artificial intelligence (AI) to optimize customer experience, forecast sales trends, and supplement inventory management. By replacing what was once intuition with intelligence, retail services can supplement every aspect of their service with these emerging technologies. As more companies integrate AI into their DevOps (Development and Operations) departments, AI is no longer an auxiliary option, but rather a mandatory option to stay competitive. Retail will see AI investments and data grow dramatically, with expansion aimed at digitization from the supply chain to the online store as retailers grapple with the fallout from the pandemic. According to Accenture, “Companies will need to apply a digital lens to leapfrog ahead. Advanced data sciences. Collecting consumer data is no longer enough. Data mining for insights that systematize enhanced decision-making is no longer a nice-to-have option, but rather a need-to-have component of the business.”

Juniper Research estimates that retailer spending on AI will grow to $7.3 billion in 2022 and the “Artificial Intelligence for Retail Applications” report from Omdia expects that figure to hit $9.8 billion by 2025. The firm expects that $37.3 billion will be spent globally on AI-driven solutions by 2025.

This tracks with Gartner’s research that indicates 77% of retailers will adopt AI by 2021. In their study, robotics for warehouse picking will be the most prevalent use case. Additional focus will be on supply chain, combining AI solutions with RFID, IoT, and electronic shelf labels to improve accuracy in demand forecasting and fulfillment.

In retail, chatbots and VR/AR will also play an important role in the evolution of retail. Built off of AI, chatbots can interactively answer frequently asked questions, recommend products, and collect valuable data from the customer before connecting them to an actual person. According to a report from Goldman Sachs, retailers will spend $1.6 billion on AR/VR over the next five years. Grandview Research puts the total retail spend on chatbots at $1.23 billion by 2025. With the ability to close the gap between intuition and insights, more retailers will operationalize AI as a catalyst for their sales. 

CONCLUSION

The future of data science is bright and as the world moves towards a more data-driven space, all industries will soon follow. With innovations in wearable technology, chatbots, inventory robots, and more, data slowly positions itself as an informative business currency. Information technology stands at the forefront of the data science industry, with financial services close behind. However, with the financial sector’s planned hiring surge, this gap may close soon. Additionally, data science as a profession is trickling itself into major cities with San Francisco hiring the most data-oriented professionals. From true data scientists to data analysts, researchers, software engineers, data engineers, and more, San Francisco is leading the way. 

One interesting observation from the analysis pertains to women in data science. As the World Economic Forum Global Gender Gap Report 2020 notes: “31% of those with the relevant [data science] skill set are women even though only 25% of the roles are held by women.” Over the next few years, it will be interesting to observe how this gap either closes or widens. 

Now is the best time to be a data scientist. Most companies are planning to invest in ML and AI initiatives shortly, and this means that the need for data scientists has never been higher. Since data science is a new and growing field, most companies are struggling to find professionals with the skill set required to turn data into insights and maintain the deeply convoluted ML and AI systems. As universities and other educational institutions integrate data science programs into their curriculum, the world readily welcomes the next generation of data scientists. 

Transitioning into 2021, make sure to keep an eye out for The Data Standard’s podcasts and blogs, which will feature insights on the newest trends in data science, artificial intelligence, and machine learning. 

CONTRIBUTIONS

Thank you for the leadership and direction of everyone at The Data Standard and their help facilitating the creation of this report and supporting the team in every iteration. Additionally, the team would like to sincerely thank Pandio for supporting The Data Standard and helping make all of this possible. 

Catherine Tao – Data Scientist, Executive Podcast Host, and Producer: Sector Analysis, Sector Visualization, Overlooking Direction of Report, Content Writing, Content Formatting

Stephanie Moore – Data Scientist, Technical Writer Lead: Women analysis, Women Visualization, Overlooking Direction of Report, Content Writing, Content Formatting

Joseph Fallon – Data Scientist, Technical Writer: Content Writing, Content Formatting

Koosha Jadbabaei – Data Scientist, Technical Writer: Content Writing, Content Formatting

Laura Rich – Content Researcher, Research Content Writer: Content Writing, Content Formatting

Leave a Reply