Abstract
The semantic understanding of numbers requires association with context. However, powerful neural networks overfit spurious correlations between context and numbers in training corpus can lead to the occurrence of contextual bias, which may affect the network’s accurate estimation of number magnitude when making inferences in real-world data. To investigate the resilience of current methodologies against contextual bias, we introduce a novel out-of-distribution (OOD) numerical question-answering (QA) dataset that features specific correlations between context and numbers in the training data, which are not present in the OOD test data. We evaluate the robustness of different numerical encoding and decoding methods when confronted with contextual bias on this dataset. Our findings indicate that encoding methods incorporating more detailed digit information exhibit greater resilience against contextual bias. Inspired by this finding, we propose a digit-aware position embedding strategy, and the experimental results demonstrate that this strategy is highly effective in improving the robustness of neural networks against contextual bias.
| Original language | English |
|---|---|
| Pages (from-to) | 2464-2482 |
| Number of pages | 19 |
| Journal | KSII Transactions on Internet and Information Systems |
| Volume | 18 |
| Issue number | 9 |
| DOIs | |
| State | Published - 30 Sep 2024 |
Keywords
- Contextual bias
- Natural language processing
- Number magnitude estimation
- Out of distribution
- Question answering