Abstract: Remote sensing cross-modal text–image retrieval (RSCTIR) has emerged as a fundamental task in remote sensing analysis, aiming to bridge the semantic gap between visual and textual modalities ...