AI Models Misjudge Rule Violations: Human Versus Machine Decisions

Summary: Researchers found AI models often fail to accurately replicate human decisions regarding rule violations, tending towards harsher judgments. This is attributed to the type of data these models are trained on; often labeled descriptively rather than normatively, which leads to differing interpretations of rule violations.

The discrepancy could result in serious real-world consequences, such as stricter judicial sentences. Therefore, the researchers suggest improving dataset transparency and matching the training context to the deployment context for more accurate models.

Key Facts:

Source: MIT

In an effort to improve fairness or reduce backlogs, machine-learning models are sometimes designed to mimic human decision-making, such as deciding whether social media posts violate toxic content policies.

But researchers from MIT and elsewhere have found that these models often do not replicate human decisions about rule violations. If models are not trained with the right data, they are likely to make different, often harsher judgments than humans would.

In this case, the "right" data are those that have been labeled by humans who were explicitly asked whether items defy a certain rule. Training involves showing a machine-learning model millions of examples of this "normative data" so it can learn a task.

But data used to train machine-learning models are typically labeled descriptively — meaning humans are asked to identify factual features, such as, say, the presence of fried food in a photo.

If "descriptive data" are used to train models that judge rule violations, such as whether a meal violates a school policy that prohibits fried food, the models tend to over-predict rule violations.

This drop in accuracy could have serious implications in the real world. For instance, if a descriptive model is used to make decisions about whether an individual is likely to reoffend, the researchers’ findings suggest it may cast stricter judgments than a human would, which could lead to higher bail amounts or longer criminal sentences.

"I think most artificial intelligence/machine-learning researchers assume that the human judgments in data and labels are biased, but this result is saying something worse.

"These models are not even reproducing already-biased human judgments because the data they’re being trained on has a flaw: Humans would label the features of images and text differently if they knew those features would be used for a judgment.

"This has huge ramifications for machine learning systems in human processes," says Marzyeh Ghassemi, an assistant professor and head of the Healthy ML Group in the Computer Science and Artificial Intelligence Laboratory (CSAIL).

Ghassemi is senior author of a new paper detailing these findings, which was published today in Science Advances. Joining her on the paper are lead author Aparna Balagopalan, an electrical engineering and computer science graduate student; David Madras, a graduate student at the University of Toronto; David H. Yang, a former graduate student who is now co-founder of ML Estimation; Dylan Hadfield-Menell, an MIT assistant professor; and Gillian K. Hadfield, Schwartz Reisman Chair in Technology and Society and professor of law at the University of Toronto.

Labeling discrepancy

This study grew out of a different project that explored how a machine-learning model can justify its predictions. As they gathered data for that study, the researchers noticed that humans sometimes give different answers if they are asked to provide descriptive or normative labels about the same data.

To gather descriptive labels, researchers ask labelers to identify factual features — does this text contain obscene language? To gather normative labels, researchers give labelers a rule and ask if the data violates that rule — does this text violate the platform's explicit language policy?

Surprised by this finding, the researchers launched a user study to dig deeper. They gathered four datasets to mimic different policies, such as a dataset of dog images that could be in violation of an apartment's rule against aggressive breeds. Then they asked groups of participants to provide descriptive or normative labels.

In each case, the descriptive labelers were asked to indicate whether three factual features were present in the image or text, such as whether the dog appears aggressive. Their responses were then used to craft judgments. (If a user said a photo contained an aggressive dog, then the policy was violated.)

The labelers did not know the pet policy. On the other hand, normative labelers were given the policy prohibiting aggressive dogs, and then asked whether it had been violated by each image, and why.

The researchers found that humans were significantly more likely to label an object as a violation in the descriptive setting.

The disparity, which they computed using the absolute difference in labels on average, ranged from 8 percent on a dataset of images used to judge dress code violations to 20 percent for the dog images.

"While we didn't explicitly test why this happens, one hypothesis is that maybe how people think about rule violations is different from how they think about descriptive data. Generally, normative decisions are more lenient," Balagopalan says.

Yet data are usually gathered with descriptive labels to train a model for a particular machine-learning task. These data are often repurposed later to train different models that perform normative judgments, like rule violations.

Training troubles

To study the potential impacts of repurposing descriptive data, the researchers trained two models to judge rule violations using one of their four data settings. They trained one model using descriptive data and the other using normative data, and then compared their performance.

They found that if descriptive data are used to train a model, it will underperform a model trained to perform the same judgements using normative data. Specifically, the descriptive model is more likely to misclassify inputs by falsely predicting a rule violation.

And the descriptive model's accuracy was even lower when classifying objects that human labelers disagreed about.

"This shows that the data do really matter. It is important to match the training context to the deployment context if you are training models to detect if a rule has been violated," Balagopalan says.

It can be very difficult for users to determine how data have been gathered; this information can be buried in the appendix of a research paper or not revealed by a private company, Ghassemi says.

Improving dataset transparency is one way this problem could be mitigated. If researchers know how data were gathered, then they know how those data should be used.

Another possible strategy is to fine-tune a descriptively trained model on a small amount of normative data. This idea, known as transfer learning, is something the researchers want to explore in future work.

They also want to conduct a similar study with expert labelers, like doctors or lawyers, to see if it leads to the same label disparity.

"The way to fix this is to transparently acknowledge that if we want to reproduce human judgment, we must only use data that were collected in that setting.

"Otherwise, we are going to end up with systems that are going to have extremely harsh moderations, much harsher than what humans would do. Humans would see nuance or make another distinction, whereas these models don't," Ghassemi says.

Funding: This research was funded, in part, by the Schwartz Reisman Institute for Technology and Society, Microsoft Research, the Vector Institute, and a Canada Research Council Chain.

Author: Adam ZeweSource: MITContact: Adam Zewe – MITImage: The image is credited to Neuroscience News

Original Research: Open access."Judging facts, judging norms: Training machine learning models to judge humans requires a modified approach to labeling data" by Marzyeh Ghassemi et al. Science Advances

Abstract

Judging facts, judging norms: Training machine learning models to judge humans requires a modified approach to labeling data

As governments and industry turn to increased use of automated decision systems, it becomes essential to consider how closely such systems can reproduce human judgment.

We identify a core potential failure, finding that annotators label objects differently depending on whether they are being asked a factual question or a normative question.

This challenges a natural assumption maintained in many standard machine-learning (ML) data acquisition procedures: that there is no difference between predicting the factual classification of an object and an exercise of judgment about whether an object violates a rule premised on those facts.

We find that using factual labels to train models intended for normative judgments introduces a notable measurement error.

We show that models trained using factual labels yield significantly different judgments than those trained using normative labels and that the impact of this effect on model performance can exceed that of other factors (e.g., dataset size) that routinely attract attention from ML researchers and practitioners.

Your email address will not be published. Required fields are marked *

Comment *

Name *

Email *

Website

Notify me of follow-up comments by email.

Notify me of new posts by email.

Summary: Key Facts: Source: In an effort to improve fairness or reduce backlogs, machine-learning models are sometimes designed to mimic human decision-making, such as deciding whether social media posts violate toxic content policies. Labeling discrepancy Training troubles Funding: Author: Source: Contact: Image: Original Research: Abstract Judging facts, judging norms: Training machine learning models to judge humans requires a modified approach to labeling data

Blog

AI Models Misjudge Rule Violations: Human Versus Machine Decisions