Using the context and order of words in sentence can lead to its better understanding and comprehension. Pre-trained language models have recently achieved great success in natural language processing. Among these models, The BERT algorithm has been increasingly popular More
Using the context and order of words in sentence can lead to its better understanding and comprehension. Pre-trained language models have recently achieved great success in natural language processing. Among these models, The BERT algorithm has been increasingly popular. This problem has not been investigated in Persian language and considered as a challenge in Persian web domain. In this article, the embedding of Persian words forming a sentence was investigated using the BERT algorithm. In the proposed approach, a model was trained based on the Persian web dataset, and the final model was produced with two stages of fine-tuning the model with different architectures. Finally, the features of the model were extracted and evaluated in document ranking. The results obtained from this model are improved compared to results obtained from other investigated models in terms of accuracy compared to the multilingual BERT model by at least one percent. Also, applying the fine-tuning process with our proposed structure on other existing models has resulted in the improvement of the model and embedding accuracy after each fine-tuning process. This process will improve result in around 5% accuracy of the Persian web ranking.
Manuscript profile
In today's information age, efficient document ranking plays a crucial role in information retrieval systems. This article proposes a new approach to document ranking using embedding models, with a focus on the BERT language model to improve ranking results. The propose More
In today's information age, efficient document ranking plays a crucial role in information retrieval systems. This article proposes a new approach to document ranking using embedding models, with a focus on the BERT language model to improve ranking results. The proposed approach uses vocabulary embedding methods to represent the semantic representations of user queries and document content. By converting textual data into semantic vectors, the relationships and similarities between queries and documents are evaluated under the proposed ranking relationships with lower cost. The proposed ranking relationships consider various factors to improve accuracy, including vocabulary embedding vectors, keyword location, and the impact of valuable words on ranking based on semantic vectors. Comparative experiments and analyses were conducted to evaluate the effectiveness of the proposed relationships. The empirical results demonstrate the effectiveness of the proposed approach in achieving higher accuracy compared to common ranking methods. These results indicate that the use of embedding models and their combination in proposed ranking relationships significantly improves ranking accuracy up to 0.87 in the best case. This study helps improve document ranking and demonstrates the potential of the BERT embedding model in improving ranking performance.
Manuscript profile