MultiLTR: Text Ranking with a Multi-Stage Learning-to-Rank Approach
Abstract
:1. Introduction
- We propose a multi-stage ranking approach. The initial stage constructs a set of retrieval models and utilizes features derived from a single text field. The subsequent re-ranking stages build local and global rankers using learning-to-rank techniques.
- We introduce a mid-evaluation strategy to select the best-performing model. This strategy evaluates a set of candidate models before generating the final ranking list. The model achieving the highest performance acts as the final ranking model.
- We conduct empirical experiments on two publicly available benchmark datasets and use three different LTR algorithms. We evaluate the results using widely adopted assessment metrics.
2. Literature Review
2.1. Multi-Stage Ranking Architectures
2.2. Neural Ranking Models
2.3. Learning-to-Rank Techniques
3. Methodology
3.1. Architecture of the Proposed MultiLTR Approach
Algorithm 1: Pseudocode for the MultiLTR Approach. |
3.2. Layers in the Proposed MultiLTR Approach
3.2.1. Input Layer
3.2.2. Feature Clustering Layer
3.2.3. Initial Ranking Layer
3.2.4. Local Re-Ranking Layer
3.2.5. Normalization Layer
3.2.6. Local Re-Ranker Selection Layer
3.2.7. Global Re-Ranking Layer
3.2.8. Mid-Evaluation Layer
3.2.9. Final Re-Ranking Layer and Output
4. Experiments and Results
4.1. Datasets
4.2. Evaluation Measures
4.3. Built Rankers
4.4. Results
- When employing LN_Local as the local re-ranker, the most effective model utilizes RF_Global as the global re-ranker. This achieves a notable improvement of 8.62%. Following closely, MR_Global as the global re-ranker yields an improvement of 6.95%;
- For RN_Local as the local re-ranker, the optimal model applies MR_Global as the global re-ranker. This leads to a significant improvement of 7.39%, with RF_Global following at 6.58%;
- In the case of RF_Local, the best-performing model integrates MR_Global as the global re-ranker. This produces a substantial improvement of 8.84%, while RF_Global ranks second with a 6.69% improvement.
- With LN_Local as the local re-ranker, RF_Global emerges as the most effective global re-ranker. It delivers a substantial 13.39% improvement in NDCG@5, closely followed by MR_Global with a 12.11% increase;
- For RN_Local, RF_Global remains the optimal choice. It achieves a remarkable 14.69% improvement, with MR_Global trailing slightly at 13.89%;
- When applying RF_Local, RF_Global again leads with a 12.29% improvement, while MR_Global secures the second-best performance at 10.40%.
5. Discussion
5.1. Comparative Analysis
5.2. Ablation Results
5.3. Time Consumption
6. Conclusions and Future Work
Author Contributions
Funding
Institutional Review Board Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
LTR | learning-to-rank |
MultiLTR | multi-stage learning-to-rank |
fLTR | field learning-to-rank |
DCG | discounted cumulative gain |
IDCG | Idea discounted cumulative gain |
NDCG | normalized discounted cumulative gain |
AP | average precision |
MAP | mean average precision |
MR | MART |
RF | random forest |
RB | RankBoost as RB |
RN | RankNet |
LM | LambdaMART |
LN | ListNet |
CA | coordinate ascent |
AR | AdaRank |
References
- Yang, H.; Gonçalves, T. Field features: The impact in learning to rank approaches. Appl. Soft Comput. 2023, 138, 110183. [Google Scholar] [CrossRef]
- Clarke, C.L.; Culpepper, J.S.; Moffat, A. Assessing efficiency–effectiveness tradeoffs in multi-stage retrieval systems without using relevance judgments. Inf. Retr. J. 2016, 19, 351–377. [Google Scholar] [CrossRef]
- Zhang, L.; Zhang, Y.; Long, D.; Xie, P.; Zhang, M.; Zhang, M. A two-stage adaptation of large language models for text ranking. In Proceedings of the Findings of the Association for Computational Linguistics ACL 2024, Bangkok, Thailand, 11–16 August 2024; pp. 11880–11891. [Google Scholar]
- Zheng, K.; Zhao, H.; Huang, R.; Zhang, B.; Mou, N.; Niu, Y.; Song, Y.; Wang, H.; Gai, K. Full stage learning to rank: A unified framework for multi-stage systems. In Proceedings of the ACM Web Conference 2024, Singapore, 13–17 May 2024; pp. 3621–3631. [Google Scholar]
- Liu, Z.; Li, C.; Xiao, S.; Li, C.; Lian, D.; Shao, Y. Matryoshka Re-Ranker: A Flexible Re-Ranking Architecture with Configurable Depth and Width. arXiv 2025, arXiv:2501.16302. [Google Scholar]
- Nogueira, R.; Yang, W.; Cho, K.; Lin, J. Multi-stage document ranking with BERT. arXiv 2019, arXiv:1910.14424. [Google Scholar]
- Fan, Y.; Xie, X.; Cai, Y.; Chen, J.; Ma, X.; Li, X.; Zhang, R.; Guo, J. Pre-training methods in information retrieval. Found. Trends® Inf. Retr. 2022, 16, 178–317. [Google Scholar] [CrossRef]
- Lu, J.; Hall, K.; Ma, J.; Ni, J. HYRR: Hybrid Infused Reranking for Passage Retrieval. arXiv 2022, arXiv:2212.10528. [Google Scholar]
- Wang, B.; Li, M.; Zeng, Z.; Zhuo, J.; Wang, S.; Xu, S.; Long, B.; Yan, W. Learning Multi-Stage Multi-Grained Semantic Embeddings for E-Commerce Search. arXiv 2023, arXiv:2303.11009. [Google Scholar]
- Huang, P.S.; He, X.; Gao, J.; Deng, L.; Acero, A.; Heck, L. Learning deep structured semantic models for web search using clickthrough data. In Proceedings of the 22nd ACM International Conference on Information & Knowledge Management, San Francisco, CA, USA, 27 October–1 November 2013; pp. 2333–2338. [Google Scholar]
- Zhang, H.; Wang, S.; Zhang, K.; Tang, Z.; Jiang, Y.; Xiao, Y.; Yan, W.; Yang, W.Y. Towards personalized and semantic retrieval: An end-to-end solution for e-commerce search via embedding learning. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual, 25–30 July 2020; pp. 2407–2416. [Google Scholar]
- Qiu, Y.; Zhao, C.; Zhang, H.; Zhuo, J.; Li, T.; Zhang, X.; Wang, S.; Xu, S.; Long, B.; Yang, W.Y. Pre-training Tasks for User Intent Detection and Embedding Retrieval in E-commerce Search. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management, Atlanta, GA, USA, 17–21 October 2022; pp. 4424–4428. [Google Scholar]
- Hai Le, N.; Gerald, T.; Formal, T.; Nie, J.Y.; Piwowarski, B.; Soulier, L. CoSPLADE: Contextualizing SPLADE for Conversational Information Retrieval. In Proceedings of the European Conference on Information Retrieval, Dublin, Ireland, 2–6 April 2023; Springer: Cham, Switzerland, 2023; pp. 537–552. [Google Scholar]
- Yang, D.; Zhang, Y.; Fang, H. An exploration study of mixed-initiative query reformulation in conversational passage retrieval. arXiv 2023, arXiv:2307.08803. [Google Scholar]
- Gao, L.; Dai, Z.; Callan, J. Rethink training of BERT rerankers in multi-stage retrieval pipeline. In Advances in Information Retrieval: Proceedings of the 43rd European Conference on IR Research, ECIR 2021, Virtual Event, 28 March–1 April 2021, Proceedings, Part II 43; Springer: Cham, Switzerland, 2021; pp. 280–286. [Google Scholar]
- Nogueira, R.; Cho, K. Passage Re-ranking with BERT. arXiv 2019, arXiv:1901.04085. [Google Scholar]
- Yilmaz, Z.A.; Wang, S.; Yang, W.; Zhang, H.; Lin, J. Applying BERT to document retrieval with birch. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP): System Demonstrations, Hong Kong, China, 3–7 November 2019; pp. 19–24. [Google Scholar]
- Lin, S.C.; Yang, J.H.; Nogueira, R.; Tsai, M.F.; Wang, C.J.; Lin, J. Multi-stage conversational passage retrieval: An approach to fusing term importance estimation and neural query rewriting. ACM Trans. Inf. Syst. (TOIS) 2021, 39, 48. [Google Scholar] [CrossRef]
- Guo, J.; Fan, Y.; Pang, L.; Yang, L.; Ai, Q.; Zamani, H.; Wu, C.; Croft, W.B.; Cheng, X. A deep look into neural ranking models for information retrieval. Inf. Process. Manag. 2020, 57, 102067. [Google Scholar] [CrossRef]
- Craswell, N.; Mitra, B.; Yilmaz, E.; Campos, D.; Lin, J. Ms marco: Benchmarking ranking models in the large-data regime. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual, 11–15 July 2021; pp. 1566–1576. [Google Scholar]
- Yates, A.; Nogueira, R.; Lin, J. Pretrained transformers for text ranking: BERT and beyond. In Proceedings of the 14th ACM International Conference on Web Search and Data Mining, Virtual, 8–12 March 2021; pp. 1154–1156. [Google Scholar]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
- Leonhardt, J.; Beringer, F.; Anand, A. Exploiting Sentence-Level Representations for Passage Ranking. arXiv 2021, arXiv:2106.07316. [Google Scholar]
- Mitra, B.; Diaz, F.; Craswell, N. Learning to match using local and distributed representations of text for web search. In Proceedings of the 26th International Conference on World Wide Web, Perth, Australia, 3–7 April 2017; pp. 1291–1299. [Google Scholar]
- Mikolov, T.; Chen, K.; Corrado, G.; Dean, J. Efficient estimation of word representations in vector space. arXiv 2013, arXiv:1301.3781. [Google Scholar]
- Pennington, J.; Socher, R.; Manning, C.D. Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 25–29 October 2014; pp. 1532–1543. [Google Scholar]
- Ahmadi, K.; Gathwala, A.; Osajima, J.; Hsiao, D.; Das, P. SLLIM-Rank: A Multi-Stage Item-to-Item Recommendation Model using Learning-to-Rank. In Proceedings of the 2024 IEEE International Conference on Big Data (BigData), Washington, DC, USA, 15–18 December 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 2264–2268. [Google Scholar]
- Lee, J.; Bernier-Colborne, G.; Maharaj, T.; Vajjala, S. Methods, Applications, and Directions of Learning-to-Rank in NLP Research. In Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2024, Mexico City, Mexico, 16–21 June 2024; pp. 1900–1917. [Google Scholar]
- Dato, D.; MacAvaney, S.; Nardini, F.M.; Perego, R.; Tonellotto, N. The Istella22 Dataset: Bridging Traditional and Neural Learning to Rank Evaluation. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, Madrid, Spain, 11–15 July 2022; pp. 3099–3107. [Google Scholar]
- Qin, Z.; Yan, L.; Zhuang, H.; Tay, Y.; Pasumarthi, R.K.; Wang, X.; Bendersky, M.; Najork, M. Are neural rankers still outperformed by gradient boosted decision trees? In Proceedings of the ICLR’2021, Virtual, 3–7 May 2021. [Google Scholar]
- Friedman, J.H. Greedy function approximation: A gradient boosting machine. Ann. Statist. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
- Burges, C.; Shaked, T.; Renshaw, E.; Lazier, A.; Deeds, M.; Hamilton, N.; Hullender, G. Learning to rank using gradient descent. In Proceedings of the 22nd International Conference on Machine Learning, Bonn, Germany, 7–11 August 2005; pp. 89–96. [Google Scholar]
- Xu, J.; Li, H. Adarank: A boosting algorithm for information retrieval. In Proceedings of the 30th annual international ACM SIGIR Conference on Research and Development in Information Retrieval, Amsterdam, The Netherlands, 23–27 July 2007; ACM: New York, NY, USA, 2007; pp. 391–398. [Google Scholar]
- Metzler, D.; Croft, W.B. Linear feature-based models for information retrieval. Inf. Retr. 2007, 10, 257–274. [Google Scholar] [CrossRef]
- Wu, Q.; Burges, C.J.; Svore, K.M.; Gao, J. Adapting boosting for information retrieval measures. Inf. Retr. 2010, 13, 254–270. [Google Scholar] [CrossRef]
- Cao, Z.; Qin, T.; Liu, T.Y.; Tsai, M.F.; Li, H. Learning to rank: From pairwise approach to listwise approach. In Proceedings of the 24th International Conference on Machine Learning, Corvalis, OR, USA, 20–24 June 2007; ACM: New York, NY, USA, 2007; pp. 129–136. [Google Scholar]
- Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
- Köppel, M.; Segner, A.; Wagener, M.; Pensel, L.; Karwath, A.; Kramer, S. Pairwise learning to rank by neural networks revisited: Reconstruction, theoretical analysis and practical performance. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Würzburg, Germany, 16–20 September 2019; Springer: Cham, Switzerland, 2019; pp. 237–252. [Google Scholar]
- Jia, Y.; Wang, H.; Guo, S.; Wang, H. Pairrank: Online pairwise learning to rank by divide-and-conquer. In Proceedings of the Web Conference 2021, Ljubljana, Slovenia, 19–23 April 2021; pp. 146–157. [Google Scholar]
- Yuan, K.; Kuang, D. Deep Pairwise Learning To Rank For Search Autocomplete. arXiv 2021, arXiv:2108.04976. [Google Scholar]
- Ai, Q.; Bi, K.; Guo, J.; Croft, W.B. Learning a deep listwise context model for ranking refinement. In Proceedings of the 41st International ACM SIGIR Conference on Research and Development in Information Retrieval, Ann Arbor, MI, USA, 8–12 July 2018; pp. 135–144. [Google Scholar]
- Sharma, A. Listwise Learning to Rank with Deep Q-Networks. arXiv 2020, arXiv:2002.07651. [Google Scholar]
- Pang, L.; Xu, J.; Ai, Q.; Lan, Y.; Cheng, X.; Wen, J. Setrank: Learning a permutation-invariant ranking model for information retrieval. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual, 25–30 July 2020; pp. 499–508. [Google Scholar]
- Swezey, R.; Grover, A.; Charron, B.; Ermon, S. PiRank: Scalable Learning To Rank via Differentiable Sorting. In Advances in Neural Information Processing Systems; MIT Press: Cambridge, MA, USA, 2021; Volume 34. [Google Scholar]
- Chen, Z.; Eickhoff, C. PoolRank: Max/Min Pooling-based Ranking Loss for Listwise Learning & Ranking Balance. arXiv 2021, arXiv:2108.03586. [Google Scholar]
- Keshvari, S.; Ensan, F.; Yazdi, H.S. ListMAP: Listwise learning to rank as maximum a posteriori estimation. Inf. Process. Manag. 2022, 59, 102962. [Google Scholar] [CrossRef]
- Chen, F.; Fang, H. An Exploration of Learning-to-re-rank Using a Two-step Framework for Fair Ranking. In Proceedings of the TREC, Online, 15–19 November 2022. [Google Scholar]
- Han, S.; Wang, X.; Bendersky, M.; Najork, M. Learning-to-Rank with BERT in TF-Ranking. arXiv 2020, arXiv:2004.08476. [Google Scholar]
- Awan, Z.; Kahlke, T.; Ralph, P.; Kennedy, P. Bi-Encoders based Species Normalization–Pairwise Sentence Learning to Rank. arXiv 2023, arXiv:2310.14366. [Google Scholar]
- Richards, J.A. Feature reduction. In Remote Sensing Digital Image Analysis; Springer: Berlin/Heidelberg, Germany, 2022; pp. 403–446. [Google Scholar]
- Liu, T.Y. Learning to rank for information retrieval. Found. Trends® Inf. Retr. 2009, 3, 225–331. [Google Scholar] [CrossRef]
- Cabello-Solorzano, K.; Ortigosa de Araujo, I.; Pe na, M.; Correia, L.; Tallón-Ballesteros, A.J. The impact of data normalization on the accuracy of machine learning algorithms: A comparative analysis. In Proceedings of the International Conference on Soft Computing Models in Industrial and Environmental Applications, Salamanca, Spain, 5–7 September 2023; Springer: Cham, Switzerland, 2023; pp. 344–353. [Google Scholar]
- Järvelin, K.; Kekäläinen, J. Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst. (TOIS) 2002, 20, 422–446. [Google Scholar] [CrossRef]
- Qin, T.; Liu, T.Y. Introducing LETOR 4.0 datasets. arXiv 2013, arXiv:1306.2597. [Google Scholar]
- Qin, T.; Liu, T.Y.; Xu, J.; Li, H. LETOR: A benchmark collection for research on learning to rank for information retrieval. Inf. Retr. 2010, 13, 346–374. [Google Scholar] [CrossRef]
- Zhang, E.; Zhang, Y. Average Precision. In Encyclopedia of Database Systems; Springer: Boston, MA, USA, 2009; pp. 192–193. [Google Scholar] [CrossRef]
- Sanderson, M. Christopher D. Manning, Prabhakar Raghavan, Hinrich Schütze, Introduction to Information Retrieval, Cambridge University Press. 2008. ISBN-13 978-0-521-86571-5, xxi+ 482 pages. Nat. Lang. Eng. 2010, 16, 100–103. [Google Scholar] [CrossRef]
- Beitzel, S.M.; Jensen, E.C.; Frieder, O. MAP. In Encyclopedia of Database Systems; Springer: Boston, MA, USA, 2009; pp. 1691–1692. [Google Scholar] [CrossRef]
- Xiong, C.; Dai, Z.; Callan, J.; Liu, Z.; Power, R. End-to-end neural ad-hoc ranking with kernel pooling. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, Tokyo, Japan, 7–11 August 2017; pp. 55–64. [Google Scholar]
- Fan, Y.; Guo, J.; Lan, Y.; Xu, J.; Zhai, C.; Cheng, X. Modeling diverse relevance patterns in ad-hoc retrieval. In Proceedings of the 41st International ACM SIGIR Conference on Research and Development in Information Retrieval, Ann Arbor, MI, USA, 8–12 July 2018; pp. 375–384. [Google Scholar]
- Tang, Z.; Yang, G.H. Deeptilebars: Visualizing term distribution for neural information retrieval. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; Volume 33, pp. 289–296. [Google Scholar]
- Fox, E.A.; Shaw, J.A. Combination of multiple searches. In NIST Special Publications SP; National Institute of Standards and Technology: Gaithersburg, MD, USA, 1994; Volume 243. Available online: https://trec.nist.gov/pubs/trec2/t2_proceedings.html (accessed on 12 February 2025).
- Askari, A.; Abolghasemi, A.; Pasi, G.; Kraaij, W.; Verberne, S. Injecting the score of the first-stage retriever as text improves BERT-based re-rankers. Discov. Comput. 2024, 27, 15. [Google Scholar] [CrossRef]
Notation | Meaning |
---|---|
T | data collection |
t | a document in D |
q | a query in D |
f | original and unclustered features provided or extracted from a data collection T |
s | the total number of features extracted from a data collection T |
d | the number of fields where features are extracted from |
k | the number of features included in a field |
the feature j from the field i | |
the ranker of the mth layer | |
H | final result |
Dataset | Selected Features | Queries | Labeled Query–Document Pairs |
---|---|---|---|
MQ2007 | 40 | 1700 | 69,623 |
MQ2008 | 40 | 800 | 15,211 |
Feature | Field | |||||
---|---|---|---|---|---|---|
Body | Anchor | Title | URL | Wholedoc | Other Place | |
TF | ∘ | ∘ | ∘ | ∘ | ∘ | ⊗ |
IDF | ∘ | ∘ | ∘ | ∘ | ∘ | ⊗ |
TF × IDF | ∘ | ∘ | ∘ | ∘ | ∘ | ⊗ |
DL | ∘ | ∘ | ∘ | ∘ | ∘ | ⊗ |
BM25 | ∘ | ∘ | ∘ | ∘ | ∘ | ⊗ |
LMIR.ABS | ∘ | ∘ | ∘ | ∘ | ∘ | ⊗ |
LMIR.DIR | ∘ | ∘ | ∘ | ∘ | ∘ | ⊗ |
LMIR.JM | ∘ | ∘ | ∘ | ∘ | ∘ | ⊗ |
Length of URL | − | − | − | ⊗ | − | − |
Number of slash in URL | − | − | − | ⊗ | − | − |
Number of child page | − | − | − | − | − | ⊗ |
Number of inlinks | − | − | − | − | − | ⊗ |
Number of outlinks | − | − | − | − | − | ⊗ |
PageRank | − | − | − | − | − | ⊗ |
Alg. | Features | @5 | @10 | @15 | @20 | @30 | Alg. | Features | @5 | @10 | @15 | @20 | @30 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
MR | baseline | 0.4150 | 0.4429 | 0.4721 | 0.5030 | 0.5604 | RN | baseline | 0.3951 | 0.4243 | 0.4560 | 0.4870 | 0.5457 |
body | 0.3958 | 0.4243 | 0.4551 | 0.4856 | 0.5436 | body | 0.3416 | 0.3734 | 0.4053 | 0.4382 | 0.5030 | ||
anchor | 0.3477 | 0.3764 | 0.4109 | 0.4415 | 0.5064 | anchor | 0.3457 | 0.3777 | 0.4122 | 0.4433 | 0.5094 | ||
title | 0.4073 | 0.4345 | 0.4625 | 0.4935 | 0.5533 | title | 0.4054 | 0.4344 | 0.4664 | 0.4968 | 0.5545 | ||
url | 0.4055 | 0.4329 | 0.4639 | 0.4931 | 0.5500 | url | 0.3988 | 0.4298 | 0.4630 | 0.4934 | 0.5504 | ||
wholedoc | 0.3977 | 0.4286 | 0.4576 | 0.4890 | 0.5495 | wholedoc | 0.3618 | 0.3977 | 0.4321 | 0.4639 | 0.5254 | ||
RB | baseline | 0.4046 | 0.4333 | 0.4665 | 0.4968 | 0.5527 | AR | baseline | 0.4013 | 0.4300 | 0.4601 | 0.4909 | 0.5480 |
body | 0.3682 | 0.3991 | 0.4299 | 0.4608 | 0.5215 | body | 0.3439 | 0.3717 | 0.4052 | 0.4386 | 0.5044 | ||
anchor | 0.3454 | 0.3823 | 0.4145 | 0.4476 | 0.5111 | anchor | 0.3308 | 0.3640 | 0.3958 | 0.4268 | 0.4958 | ||
title | 0.4048 | 0.4353 | 0.4680 | 0.4980 | 0.5539 | title | 0.3023 | 0.3353 | 0.3672 | 0.4004 | 0.4657 | ||
url | 0.4018 | 0.4327 | 0.4667 | 0.4963 | 0.5519 | url | 0.3079 | 0.3475 | 0.3865 | 0.4213 | 0.4893 | ||
wholedoc | 0.3405 | 0.3796 | 0.4153 | 0.4485 | 0.5122 | wholedoc | 0.3094 | 0.3451 | 0.3806 | 0.4157 | 0.4837 | ||
CA | baseline | 0.4087 | 0.4386 | 0.4704 | 0.4997 | 0.5560 | LM | baseline | 0.4197 | 0.4478 | 0.4781 | 0.5084 | 0.5642 |
body | 0.3656 | 0.3922 | 0.4222 | 0.4571 | 0.5194 | body | 0.3919 | 0.4210 | 0.4546 | 0.4863 | 0.5440 | ||
anchor | 0.3444 | 0.3798 | 0.4145 | 0.4462 | 0.5105 | anchor | 0.3468 | 0.3783 | 0.4131 | 0.4465 | 0.5095 | ||
title | 0.4071 | 0.4345 | 0.4669 | 0.4957 | 0.5531 | title | 0.4062 | 0.4364 | 0.4655 | 0.4951 | 0.5530 | ||
url | 0.4059 | 0.4371 | 0.4688 | 0.4985 | 0.5545 | url | 0.4046 | 0.4352 | 0.4667 | 0.4954 | 0.5523 | ||
wholedoc | 0.3534 | 0.3895 | 0.4258 | 0.4596 | 0.5223 | wholedoc | 0.4030 | 0.4353 | 0.4661 | 0.4968 | 0.5537 | ||
LN | baseline | 0.3890 | 0.4185 | 0.4482 | 0.4795 | 0.5390 | RF | baseline | 0.4129 | 0.4389 | 0.4694 | 0.4995 | 0.5569 |
body | 0.3412 | 0.3650 | 0.3964 | 0.4304 | 0.4965 | body | 0.3941 | 0.4212 | 0.4511 | 0.4818 | 0.5423 | ||
anchor | 0.3382 | 0.3733 | 0.4036 | 0.4349 | 0.5022 | anchor | 0.3444 | 0.3725 | 0.4076 | 0.4385 | 0.5048 | ||
title | 0.3957 | 0.4252 | 0.4542 | 0.4849 | 0.5432 | title | 0.4065 | 0.4342 | 0.4613 | 0.4901 | 0.5496 | ||
url | 0.3876 | 0.4202 | 0.4533 | 0.4854 | 0.5428 | url | 0.4036 | 0.4327 | 0.4629 | 0.4921 | 0.5485 | ||
wholedoc | 0.3939 | 0.4209 | 0.4525 | 0.4846 | 0.5453 | wholedoc | 0.3352 | 0.3687 | 0.4026 | 0.4360 | 0.5033 |
Alg. | Features | @5 | @10 | @15 | @20 | @30 | Alg. | Features | @5 | @10 | @15 | @20 | @30 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
MR | baseline | 0.4586 | 0.5036 | 0.5179 | 0.5257 | 0.5328 | RN | baseline | 0.4345 | 0.4843 | 0.5010 | 0.5072 | 0.5157 |
body | 0.4469 | 0.4943 | 0.5077 | 0.5165 | 0.5257 | body | 0.4055 | 0.4575 | 0.4771 | 0.4858 | 0.4948 | ||
anchor | 0.4299 | 0.4812 | 0.4989 | 0.5061 | 0.5153 | anchor | 0.4122 | 0.4688 | 0.4859 | 0.4940 | 0.5036 | ||
title | 0.4496 | 0.4977 | 0.5096 | 0.5167 | 0.5258 | title | 0.4446 | 0.4931 | 0.5084 | 0.5161 | 0.5254 | ||
url | 0.4616 | 0.5086 | 0.5229 | 0.5308 | 0.5383 | url | 0.4604 | 0.5076 | 0.5214 | 0.5299 | 0.5375 | ||
wholedoc | 0.4517 | 0.5005 | 0.5155 | 0.5230 | 0.5308 | wholedoc | 0.4224 | 0.4726 | 0.4910 | 0.4988 | 0.5085 | ||
RB | baseline | 0.4550 | 0.5003 | 0.5157 | 0.5230 | 0.5301 | AR | baseline | 0.4364 | 0.4850 | 0.5004 | 0.5075 | 0.5161 |
body | 0.4055 | 0.4575 | 0.4771 | 0.4858 | 0.4948 | body | 0.4085 | 0.4667 | 0.4839 | 0.4912 | 0.5011 | ||
anchor | 0.4251 | 0.4763 | 0.4933 | 0.5011 | 0.5118 | anchor | 0.4099 | 0.4664 | 0.4842 | 0.4934 | 0.5032 | ||
title | 0.4521 | 0.5020 | 0.5153 | 0.5225 | 0.5307 | title | 0.3618 | 0.4118 | 0.4291 | 0.4356 | 0.4470 | ||
url | 0.4681 | 0.5107 | 0.5239 | 0.5322 | 0.5403 | url | 0.3550 | 0.4164 | 0.4409 | 0.4506 | 0.4636 | ||
wholedoc | 0.4481 | 0.4922 | 0.5089 | 0.5162 | 0.5255 | wholedoc | 0.4343 | 0.4813 | 0.4957 | 0.5049 | 0.5140 | ||
CA | baseline | 0.4553 | 0.5016 | 0.5150 | 0.5224 | 0.5304 | LM | baseline | 0.4619 | 0.5050 | 0.5180 | 0.5267 | 0.5338 |
body | 0.4289 | 0.4792 | 0.4960 | 0.5029 | 0.5127 | body | 0.4457 | 0.4944 | 0.5108 | 0.5184 | 0.5279 | ||
anchor | 0.4351 | 0.4851 | 0.5019 | 0.5101 | 0.5196 | anchor | 0.4267 | 0.4779 | 0.4963 | 0.5044 | 0.5127 | ||
title | 0.4686 | 0.5142 | 0.5284 | 0.5350 | 0.5432 | title | 0.4587 | 0.5014 | 0.5163 | 0.5240 | 0.5331 | ||
url | 0.4692 | 0.5134 | 0.5267 | 0.5348 | 0.5435 | url | 0.4612 | 0.5063 | 0.5212 | 0.5308 | 0.5386 | ||
wholedoc | 0.4509 | 0.4970 | 0.5110 | 0.5182 | 0.5277 | wholedoc | 0.4532 | 0.5002 | 0.5164 | 0.5248 | 0.5318 | ||
LN | baseline | 0.4341 | 0.4851 | 0.5011 | 0.5080 | 0.5156 | RF | baseline | 0.4513 | 0.4985 | 0.5127 | 0.5200 | 0.5282 |
body | 0.4080 | 0.4585 | 0.4778 | 0.4869 | 0.4957 | body | 0.4459 | 0.4985 | 0.5123 | 0.5201 | 0.5298 | ||
anchor | 0.4128 | 0.4678 | 0.4855 | 0.4934 | 0.5030 | anchor | 0.4277 | 0.4778 | 0.4966 | 0.5040 | 0.5137 | ||
title | 0.4346 | 0.4864 | 0.5029 | 0.5103 | 0.5189 | title | 0.4556 | 0.5040 | 0.5170 | 0.5248 | 0.5340 | ||
url | 0.4545 | 0.5038 | 0.5182 | 0.5262 | 0.5348 | url | 0.4630 | 0.5089 | 0.5241 | 0.5329 | 0.5401 | ||
wholedoc | 0.4177 | 0.4728 | 0.4898 | 0.4965 | 0.5067 | wholedoc | 0.4587 | 0.5054 | 0.5212 | 0.5280 | 0.5365 |
MR | RF | RN | RB | LM | AR | CA | LN | |
---|---|---|---|---|---|---|---|---|
MQ2007 | 5 | 17 | 10 | 10 | 4 | 0 | 10 | 15 |
MQ2008 | 5 | 17 | 10 | 7 | 4 | 0 | 10 | 15 |
LTR Alg. | Method | @5 | @10 | @15 | @20 | @30 |
---|---|---|---|---|---|---|
Neural Model | KNRM | 0.3790 | 0.4120 | 0.4256 | 0.4309 | 0.4324 |
HiNT | 0.4630 | 0.4900 | 0.5102 | 0.5253 | 0.5358 | |
DeepTileBars | 0.3980 | 0.4340 | 0.4507 | 0.4605 | 0.4651 | |
LN | BM25 | 0.3890 | 0.4185 | 0.4482 | 0.4795 | 0.5390 |
fLTR | 0.3957 | 0.4193 | 0.4482 | 0.4796 | 0.5413 | |
MultiLTR | 0.4225 | 0.4460 | 0.4756 | 0.5063 | 0.5645 | |
RN | BM25 | 0.3951 | 0.4243 | 0.4560 | 0.4870 | 0.5457 |
fLTR | 0.4051 | 0.4298 | 0.4603 | 0.4893 | 0.5504 | |
MultiLTR | 0.4243 | 0.4477 | 0.4779 | 0.5072 | 0.5649 | |
RF | BM25 | 0.4129 | 0.4389 | 0.4694 | 0.4995 | 0.5569 |
fLTR | 0.4286 | 0.4489 | 0.4775 | 0.5063 | 0.5657 | |
MultiLTR | 0.4494 | 0.4647 | 0.4920 | 0.5207 | 0.5804 |
LTR Alg. | Method | @5 | @10 | @15 | @20 | @30 |
---|---|---|---|---|---|---|
Neural Model | KNRM | 0.1567 | 0.2254 | 0.2683 | 0.2941 | 0.3193 |
HiNT | 0.1916 | 0.2683 | 0.3218 | 0.3583 | 0.3953 | |
DeepTileBars | 0.1647 | 0.2377 | 0.2840 | 0.3140 | 0.3430 | |
LN | BM25 | 0.1610 | 0.2292 | 0.2824 | 0.3273 | 0.3981 |
fLTR | 0.1566 | 0.2232 | 0.2748 | 0.3187 | 0.3897 | |
MultiLTR | 0.1698 | 0.2432 | 0.2996 | 0.3465 | 0.4168 | |
RN | BM25 | 0.1461 | 0.2137 | 0.2670 | 0.3099 | 0.3822 |
fLTR | 0.1650 | 0.2344 | 0.2887 | 0.3323 | 0.4039 | |
MultiLTR | 0.1754 | 0.2476 | 0.3037 | 0.3501 | 0.4210 | |
RF | BM25 | 0.1672 | 0.2432 | 0.3006 | 0.3481 | 0.4202 |
fLTR | 0.1710 | 0.2455 | 0.3020 | 0.3489 | 0.4212 | |
MultiLTR | 0.1829 | 0.2569 | 0.3148 | 0.3631 | 0.4340 |
LTR Alg. | Model | @5 | @10 | @15 | @20 | @30 |
---|---|---|---|---|---|---|
LN | MultiLTR | 0.4225 | 0.4460 | 0.4756 | 0.5063 | 0.5645 |
MultiLTR w/o feature clustering | 0.4181 | 0.4451 | 0.4722 | 0.5310 | 0.5600 | |
MultiLTR w/o local ranking | 0.4112 | 0.4352 | 0.4650 | 0.4950 | 0.5550 | |
MultiLTR w/o normalization | 0.4202 | 0.4441 | 0.4730 | 0.5030 | 0.5620 | |
RN | MultiLTR | 0.4243 | 0.4477 | 0.4779 | 0.5072 | 0.5649 |
MultiLTR w/o feature clustering | 0.4201 | 0.4430 | 0.4730 | 0.5030 | 0.5620 | |
MultiLTR w/o local ranking | 0.4150 | 0.4401 | 0.4700 | 0.5012 | 0.5601 | |
MultiLTR w/o normalization | 0.4220 | 0.4460 | 0.4761 | 0.5060 | 0.5630 | |
RF | MultiLTR | 0.4494 | 0.4647 | 0.4920 | 0.5207 | 0.5804 |
MultiLTR w/o feature clustering | 0.4451 | 0.4600 | 0.4870 | 0.5170 | 0.5770 | |
MultiLTR w/o local ranking | 0.4350 | 0.4550 | 0.4801 | 0.5111 | 0.5703 | |
MultiLTR w/o normalization | 0.4471 | 0.4620 | 0.4890 | 0.5190 | 0.5780 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yang, H.; Gonçalves, T. MultiLTR: Text Ranking with a Multi-Stage Learning-to-Rank Approach. Information 2025, 16, 308. https://doi.org/10.3390/info16040308
Yang H, Gonçalves T. MultiLTR: Text Ranking with a Multi-Stage Learning-to-Rank Approach. Information. 2025; 16(4):308. https://doi.org/10.3390/info16040308
Chicago/Turabian StyleYang, Hua, and Teresa Gonçalves. 2025. "MultiLTR: Text Ranking with a Multi-Stage Learning-to-Rank Approach" Information 16, no. 4: 308. https://doi.org/10.3390/info16040308
APA StyleYang, H., & Gonçalves, T. (2025). MultiLTR: Text Ranking with a Multi-Stage Learning-to-Rank Approach. Information, 16(4), 308. https://doi.org/10.3390/info16040308