探討使用連續性財務新聞於股價預測之影響 :以文字探勘與深度學習為例;Exploring the Impact of Using Sequential Financial News on Stock Price Prediction: A Case Study of Text Mining and Deep Learning

NCU Institutional Repository > 管理學院 > 資訊管理研究所 > 博碩士論文 > Item 987654321/93206

請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/93206

題名:	探討使用連續性財務新聞於股價預測之影響 :以文字探勘與深度學習為例;Exploring the Impact of Using Sequential Financial News on Stock Price Prediction: A Case Study of Text Mining and Deep Learning
作者:	林崇恩;Lin, Chung-En
貢獻者:	資訊管理學系
關鍵詞:	文字探勘;自然語言處理;股價預測;連續型資料;機器學習;深度學習;text mining;natural language processing;stock price prediction;continuous data;machine learning;deep learning
日期:	2023-07-18
上傳時間:	2024-09-19 16:48:06 (UTC+8)
出版者:	國立中央大學
摘要:	股價預測在金融市場上一直以來都扮演很重要的角色，並且一直以來都被視為是一個相當重要的研究議題。主要原因有投資策略、風險管理、市場分析、交易執行、資金配置等方面都具有很顯著的影響。但是股價預測受到許多複雜因素影響，讓股價預測變得困難。過去有許多研究主要集中在利用歷史股價資訊以及技術指標，以及很常探討的時間序列模型演算法來預測股價。近年也有許多研究專注在使用財經新聞以及社群媒體文本來進行文字探勘，並且透過各種文字特徵模型搭配不同的機器學習與深度學習技術來評估預測表現水準，針對不同的文字特徵以及分類器進行全面性的比較。但是較少有研究針對資料量是否連續來進行探討，因此本研究針對資料作進一步探討，探討單日資料量跟連續天數資料量對預測表現的影響。本研究結果結果顯示連續型資料相較單日資料有更好的表現。針對財經領域詞彙預訓練的模型相對一般的文字特徵模型有更好的表現。探討新聞內容、新聞標題、新聞內容+新聞標題，三者的預測水準差異，實驗結果顯示預測水準差異不大。探討移除不顯著的資料標籤對預測水準有更好的表現。探討新聞內容結合當日股價標籤進行預測並且近一步迴歸分析計算出RMSE，得知連續五天的RMSE值較小，代表預測股價跟實際股價在連續五天的情況差異較小，並且從連續五天的不同文字特徵搭配不同機器學習深度學習模型比較，得知在使用FinBERT萃取平均下的RMSE值最小。以及RF分類器相對其他分類器在計算RMSE下有更好的表現。 ;Stock price prediction plays a crucial role in the financial market and is a significant research topic. It has a significant impact on investment strategies, risk management, market analysis, trade execution, and portfolio allocation. However, predicting stock prices is challenging due to complex factors. Previous research focused on using historical stock price information, technical indicators, and time series models to predict prices. Recent studies have explored using financial news and social media text for text mining, evaluating prediction performance with different machine learning and deep learning techniques. However, few studies have investigated the impact of continuous data on prediction. This study examines the effect of data volume on prediction performance, finding that continuous data performs better. Models pre-trained on financial vocabulary outperform general text feature models. Differences in prediction performance between news content, headlines, and their combination are minimal. Removing insignificant data labels improves prediction performance. Combining news content with daily stock price labels and conducting regression analysis shows that the RMSE is smaller for a five-day period, indicating a closer alignment between predicted and actual prices. Comparing different text features and classifiers, using FinBERT for average extraction and the RF classifier yield the best performance in terms of RMSE.
顯示於類別:	[資訊管理研究所] 博碩士論文

文件中的檔案:

檔案	描述	大小	格式	瀏覽次數
index.html		0Kb	HTML	13	檢視/開啟

在NCUIR中所有的資料項目都受到原著作權保護.

社群 sharing

資料載入中.....