2024年研究計畫概況

張智星


簡表

中文名稱 英文名稱 主持人 補助單位 計畫編號 補助金額 開始日期 結束日期
生成式對話之臺灣人文知識探勘系統計畫  TAIHUCAIS: TAIwan HUmanities Conversational AI Knowledge Discovery System  張智星  國科會    2000000  2024/1/1  2026/12/31 

詳細資料

  1. 生成式對話之臺灣人文知識探勘系統計畫

    • 英文名稱: TAIHUCAIS: TAIwan HUmanities Conversational AI Knowledge Discovery System
    • 計畫編號:
    • 主持人: 張智星
    • 補助單位: 國科會
    • 計畫執行期間: 2024/1/1 to 2026/12/31
    • 關鍵詞: Machine learning, large language model, chatbot
    • 摘要簡介:
      臺灣擁有各種獨有而珍貴的人文資料,然分屬不同單位,使用者必須進入分立的資料庫取用。而近年來隨著生成對話式大型語言模型的出現,知識探勘的方式也由關鍵字搜尋改為自然語言提問與多輪對話的方式來取得整合的資訊。然而目前通用的大型語言模型未能符合人文專業的需求,且中文的產出也未必貼近臺灣的語境。為提升臺灣人文資料的檢索方便性以及人文研究者與工作者的專業需求,本計畫擬結合臺灣現有的第一手全文人文資料庫與臺灣自建的通用語言模型,建置一個資料庫檢索輸出與對話平台。這個專為臺灣人文學者量身訂做的對話式知識探勘系統(台鵠開示)將連結現有的資料庫、生成所需的資訊、以符合台灣議題與語境的對話回應、並優化使用者的知識探勘體驗。使用者不再需繁瑣地進入各單獨的資料庫進行檢索,而能在單一介面中以自然語言提問並和系統互動,使知識的探勘更為便捷有效、精準、且符合需求。預期此系統能提高臺灣人文資料庫的使用率,並能藉由與語言模型對話互動的特性,吸引更多人關注了解臺灣的人文學各面向。此外,將對話式語言模型由通用型轉為專業型是目前世界各專業領域的重要目標,若此探勘系統建置成功將會是重要的技術創新,可供其他專業領域參考與應用。
      Taiwan possesses various unique and invaluable humanities datasets; however, these are managed by different entities, requiring users to access separate datasets for the information they need. With the recent development of large language models (LLMs), users' knowledge discovery methods have shifted from keyword searches to natural language queries and multi-turn dialogues to obtain integrated information. However, current available general-purpose LLMs do not meet the specialized needs of humanities researchers, and their Chinese language output may not align with the Taiwanese context. To enhance the convenience of accessing Taiwanese humanities data and meet the demands of humanities researchers and professionals, this project intends to integrate Taiwan's existing first-hand full-text humanities datasets with a locally developed general-purpose language model in Taiwan. The aim is to establish a user-friendly data retrieval and dialogue platform for knowledge discovery. This TAIwan Humanities Conversational AI knowledge discovery System (TAIHUCAIS), tailored specifically for Taiwanese humanities data, will be better suited to generate information and engage in dialogues that align with the context of relevant Taiwanese topics. It will optimize user experience, eliminating the need for them to navigate individual databases separately. Instead, users will be able to pose questions in natural language through a single interface, with the system aggregating and presenting relevant information, making knowledge discovery more convenient, efficient, precise, and tailored to their needs. The anticipated outcome of this system includes increased use of Taiwanese humanities datasets and better engagement with more users in various aspects of Taiwan's humanities. Moreover, transitioning from a general-purpose language model to a specialized one is a vital objective across many fields worldwide. The development of TAIHUCAIS will not only contribute to significant technological innovations but also potentially establish a benchmark for other specialized domains, creating new opportunities for applications.