At UC Berkeley, Imperfect Data Doesn’t Stop COVID-19 Research

At UC Berkeley, Imperfect Data Doesn’t Stop COVID-19 Research

No model is ever perfect, but when it comes to something as critical as COVID-19, it would be nice to at least start with good data.

Data scientists are doing their best with what they have, several of them noted on a April 7 roundtable discussion hosted by University of California Berkeley on Tuesday. As part of the webcast discussion, they shared the work they’re doing on COVID-19 related problems in public health, vaccines, supply chain and elections.

With so much that is still unknown about the virus and its effects, “you have to work with the data you have, and not the data you want,” said Maya Petersen, an associate professor of biostatistics and epidemiology at UC Berkeley’s School of Public Health.

The webcast was sponsored by the school’s Division of Computing, Data Science, and Society and is part of an ongoing Berkeley Conversations series on the virus.

The university is bringing a range of disciplines and datasets together to work toward solutions, including:

Using hospitalization data to determine future infection rates

Local governments and health care providers have an urgent need to understand how the epidemic is playing out in their regions and how many people are likely to be infected so they can forecast hospitalizations and the need for such things as ICU beds and ventilators, Petersen said. “The data to guide these decisions is imperfect but decisions still have to be made,” she said.

UC Berkeley scientists are collaborating with Kaiser Permanente groups in California and Washington state to use available statistics on the number of patients hospitalized with COVID-19 as the basis for making carefully reasoned inferences about where the virus is likely to occur. Governments in Finland and Canada have adopted some of their models, Petersen said. The next step is refining decision-making tools to guide how and when local governments can safely reduce interventions like shelter in place to let communities go back to normal “in a safe way,” she said.

Designing and testing vaccines with genome data

Among other things, data analytics is helping determine the likeliest candidates to participate in COVID-19 vaccine trials, said Michael Eisen, an adjunct professor of genetics and development. The biomedical field is also using genomic data to identify potential drug targets for the virus.

UC Berkeley researchers collecting data on the coronavirus’ genome have found that it is evolving more slowly than other viruses. This means that any work being done now to target the coronavirus’ spike protein composition is likely to still be viable in a year or so when a vaccine could be available,, Eisen said.

Building supply chain models to prepare for future pandemics

COVID-19 pummeled U.S. supply chains, leaving health care institutions scrambling to locate adequate stocks of N95 masks and other personal protective equipment (PPE) and consumers hoarding toilet paper and cleaning supplies against the possibility of running out before the crisis ends. Advance forecasting could have prevented panic buying and stockpiling, said Max Shen, chairman of the school’s department of industrial engineering and operations research.

Data scientists can use the current crisis to redesign supply chains so it’s easier to predict the amount of inventory to keep on hand, whether that pertains to ventilators for hospitals or toilet paper for paper products manufacturers. Data could also be used to develop more flexible manufacturing models, Shen said. That would allow governments to negotiate contracts for emergency supplies in advance of needing them so when a crisis happens, “firms can quickly start production to make the necessary equipment,” he said. “Right now, when you do it after the fact, it takes a lot of time to identify firms and negotiate, which wastes time.”

But getting the data needed to create those models is difficult, Shen said. There’s no single source of information on the U.S. supply of ventilators and other PPE, or demand for the goods. “If we had all the data the supply-demand problem would be a little easier to solve,” he said. “We need help to get the data.”

Data’s role in developing online and mail-in voting

Improved cybersecurity and digital identification systems could let people vote by mail and avoid situations like what happened in the Wisconsin primary when people risked becoming infected because they had to stand in line for hours to vote in person, said Henry Brady, dean of the school’s Goldman School of Public Policy. Online voter registration would make it easier to vote by mail or online, and that’s essentially a data challenge because it entails collecting records such as drivers’ licenses that states use to identify people, Brady said.

Before people could cast online ballots, voting systems would need beefed-up cybersecurity. And if more states switched to voting by mail, they would need to put data-based systems in place for processing absentee votes, and then design processes to tally them. “AI could be used to read signatures,” Brady said.

COVID-19 is putting data science in the spotlight, and what UC Berkeley is doing underscores how much an imperfect process continues to evolve.

Michelle V. Rafter is a Portland, Ore., business reporter who writes frequently about the intersection of data analytics, work and business. On Twitter, follow her @michellerafter.

Leave a Reply