Abstract
Deep learning-based shadow detection methods primarily focus on achieving higher accuracy, while often overlooking the importance of inference efficiency for downstream applications. This work attempts to reduce the number of processed patches during the feed-forward process and proposes a faster framework for shadow detection (namely FasterSD) based on vision transformer. We found that most of bright regions can converge to a stable status even at early stages of the feed-forward process, revealing massive computational redundancy. From this observation, we introduce a token pausing strategy to locate these simple patches and pause to refine their feature representations ( i.e. , tokens), enabling us to use most of computational resources to the remaining challenging patches. Specifically, we propose to use predicted posterior entropy as a proxy for prediction correctness, and design a random pausing scheme to ensure that the model meets flexible runtime requirements by directly adjusting the pausing configuration without repeated training. Extensive experiments on three shadow detection benchmarks ( i.e. , SBU, ISTD, and UCF) demonstrate that our FasterSD can run 12× faster than the state-of-the-art shadow detector with a comparable performance. The code will be available at https://github.com/wuwen1994/FasterSD .
| Original language | English |
|---|---|
| Article number | 104589 |
| Journal | Computer Vision and Image Understanding |
| Volume | 263 |
| DOIs | |
| State | Published - Jan 2026 |
Keywords
- Posterior entropy
- Scene understanding
- Shadow detection
- Token pausing
- Vision transformer