River Campus Libraries data grants enabled student research on the gender gap in STEM, website fingerprinting, and automation.
Author: Matthew Cook
The three data grant recipients with social science librarian Kathy Wu

As of 2021, only 146 universities in the United States—three percent—are designated as R1, meeting the Carnegie Foundation’s definition for the highest level of research activity. The University of Rochester is an R1 university.

For more than six decades, Rochester has been a member of the American Association of Universities (AAU), a highly selective group of 65 public and private institutions in North America considered the highest echelon of universities for excellence in research production. As a top research university, Rochester is committed to supporting the knowledge creation of its faculty and students. (That commitment can be seen in detail through the 2030 strategic plan, Boundless Possibility.)

Rochester students conduct research and apply their knowledge in the lab and the field, independently and with industry partners, locally and all over the world. In doing so, they engage with, learn from, and are part of discoveries that influence how society approaches a wide array of challenges.

The River Campus Libraries (RCL) is enabling this work.

In 2020, the RCL launched the Data Grant Program, which enables undergraduate and graduate students in Arts, Sciences & Engineering, the Warner School of Education and Human Development, and Simon Business School to purchase data sets critical to their research.

On February 14, the fall 2023 recipients presented their projects in Rush Rhees Library's Welles-Brown Room. Below are summaries of their research.

Lipeng Chen next to a book shelf in Welles-Brown Room
Gender gap in STEM

According to the American Association of University Women (AAUW), women account for only 34 percent of the STEM workforce. The AAUW report “Why So Few?” suggests this results from societal influences (such as stereotypes and bias) and learning environments.

There are several reasons to be concerned about this. A big reason is the lack of diversity can negatively affect innovation. Another is that STEM jobs offer higher wages, so when the majority go to men, gender inequality in the labor market is exacerbated. There’s also the problem that the continued existence of the gap feeds the gap—female students who might be interested in pursuing STEM jobs have few chances to interact with women professors or professionals.

A project conducted by Lipeng Chen, a fifth-year PhD student from the Department of Economics, explores the impact external role models have on STEM-related outcomes. Through the yearlong intervention in the English curriculum of randomly selected 10th-grade classrooms in a Chinese high school, Chen is studying the influence TED Talk and TED-Ed videos featuring current and famous women scientists and engineers have on students.

“The biggest takeaway of my research is that role models can encourage more female students to enter the male-dominated STEM fields,” Chen says. “What's even cooler is that it can be accomplished through a very light-touch and low-cost program.”

Chen’s study shows a marked increase in female students’ enjoyment of learning about physics, chemistry, and biology as well as their belief that they are just as good as male students in math and suitability to pursue a science track in their education.

Steven Hai
Website fingerprinting

Proton VPN, a popular virtual private network (VPN) provider, claims, “Our anonymous VPN service keeps your browsing history private and enables an internet without surveillance.” It’s a standard claim amongst VPN providers, which create secure, encrypted connections between their servers and the users’ devices.

The problem VPN users face is bad actors don’t need to break the encryption algorithms to gain access to private information. Since the 1990s, researchers have been able to “fingerprint” websites and servers that users visit through metadata. It’s familiar territory for Steven Oufan Hai '24, a computer science major, who was among the 2023 RCL data grant winners for a project that sought to determine how well commercial, for-profit VPN services protect consumers against fingerprinting attacks. In his project, “Realistic Website Fingerprinting: An Evaluation of Information Leakage in Modern VPNs through Supervised Learning,” he examines whether current VPN technologies deter state-of-the-art traffic analysis attacks.

Hai trained and evaluated custom and state-of-the-art supervised machine learning models on VPN network data collected over two semesters. Bad news for VPN users: his preliminary findings confirm VPN traffic has low resistance to machine learning-based classification attacks. For those aiming to protect their online privacy, Hai suggests using VPNs with private browsers such as Brave and DuckDuckGo.

Hai will make all of the code and data from the project available to the public and intends to submit a paper to the Privacy Enhancing Technologies Symposium. “I hope other researchers use our datasets and discover unique insights about modern VPN technologies,” he says.

Kyeongmin Park
Quality-driven automation

When Henry Ford’s Model T proved to be a win for the Ford Motor Company, they faced the fundamental production question: How do we maximize volume and minimize cost? Their answer: an assembly line. The assembly line has evolved significantly since the early 20th century, with companies now including varying degrees of automation (technologies used to reduce or remove human involvement).

Why use automation? The reasons haven’t really changed since the days of the Model T. It offers business owners higher productivity and larger profits and provides consumers with better quality and lower prices. Research conducted by Kyeongmin Park, a fourth-year PhD student in the Department of Economics, explores quality as the motivation for US automakers to incorporate robots.

Park posits that as the economy grows, consumers demand higher-quality goods, requiring firms to make upgrades; robots provide a distinct advantage in quality. Workers displaced by robots might argue that point, but Park’s research offers compelling micro-evidence that would hurt their case.

An analysis of data on product quantity, price, quality rating, and inputs in US automotive assembly plants shows clear gains in consumer welfare from goods whose quality was enhanced by using robots. Park’s data grant gave him a direct measure of plant-level industrial robot adoption.

“I gained access to a private data service company that provides information on the timing and extent of industrial robot imports by US automotive plants,” says Park. “Without the help of the data grant, I would have been responsible for covering all data fees myself.”

One thing that tends to get lost in student research is that, like faculty research, it comes at a cost. The price of data students need to investigate their topics adequately is commonly high enough that students can’t cover it out-of-pocket. So, the RCL data grants aren’t a bonus. The grants are usually what keep research from being incomplete or never getting started. ∎

For more student research, check out the spring 2023 grant recipients. For questions about data purchasing or the data set grant program, please contact Kathy Wu, social science librarian for business, economics, government information, and law. And if you are interested in supporting the grant program, please contact Pamela Jackson, senior director of advancement for the River Campus Libraries.