Multimodal Composed Image Retrieval Using Querying-Transformer

NCU Institutional Repository > 資訊電機學院 > 資訊工程研究所 > 博碩士論文 > Item 987654321/95554

請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/95554

題名:	Multimodal Composed Image Retrieval Using Querying-Transformer
作者:	楊歷恆;Yang, Alex Li-Heng
貢獻者:	資訊工程學系
關鍵詞:	圖片搜索;Composed Image Retrieval;deep learning;attention
日期:	2024-07-23
上傳時間:	2024-10-09 17:00:39 (UTC+8)
出版者:	國立中央大學
摘要:	基於組合影像檢索系統的重要性在於它能夠讓用戶使用視覺參考和描述文字來找到特定影像，解決了傳統僅靠文字檢索方法的局限性。在本論文中，我們提出了一種利用 Querying-Transformer 來解決傳統影像檢索方法局限性的系統。Qformer 通過基於 Transformer 的架構，將影像和文字數據整合在一起，能夠熟練地捕捉這兩種模式之間的複雜關係。通過引入影像-文字匹配損失函數，我們的系統顯著提高了影像與文字匹配的準確性，確保了視覺和文字表現之間的高度一致性。我們還在 Qformer 模型中使用了殘差學習技術，以保留重要的視覺信息，從而在學習過程中保持原始影像的質量和特徵。為了驗證我們方法的效果，我們在 FashionIQ 和 CIRR 數據集上進行了實驗。結果顯示，我們提出的系統在各種類別中顯著優於現有模型，實現了更高的召回率指標。實驗結果展示了我們系統在實際應用中的潛力，提供了在影像檢索任務中精確性和相關性方面的顯著改進。;Composed Image Retrieval (CIR) systems are crucial because they enable users to find specific images using both visual references and descriptive text, addressing the limitations of traditional text-only search methods. In this thesis, we propose a system that utilizes the Querying-Transformer (Qformer) to address the limitations of traditional image retrieval methods. The Qformer integrates image and text data through a transformer-based architecture, adeptly capturing complex relationships between the two modalities. By incorporating the Image-Text Matching (ITM) loss function, our system significantly enhances the accuracy of image-text matching, ensuring superior alignment between visual and textual representations. We also employ residual learning techniques within the Qformer model to preserve essential visual information, thereby maintaining the quality and features of the original images throughout the learning process. To confirm the efficacy of our approach, we performed experiments on the FashionIQ and CIRR datasets. The results show that our proposed system significantly outperforms existing models, achieving superior recall metrics across various categories. The experimental results demonstrate the potential of our system in practical applications, offering robust improvements in the precision and relevance of image retrieval tasks.
顯示於類別:	[資訊工程研究所] 博碩士論文

文件中的檔案:

檔案	描述	大小	格式	瀏覽次數
index.html		0Kb	HTML	35	檢視/開啟

在NCUIR中所有的資料項目都受到原著作權保護.

社群 sharing

資料載入中.....