基于貝葉斯理論的社會(huì).doc
約82頁(yè)DOC格式手機(jī)打開(kāi)展開(kāi)
基于貝葉斯理論的社會(huì),摘要隨著web2.0技術(shù)不斷發(fā)展和完善,社會(huì)化標(biāo)注系統(tǒng)隨之而產(chǎn)生。社會(huì)化標(biāo)注秉承了web2.0所提出的用戶自由性和主動(dòng)性的特征。在社會(huì)化標(biāo)注環(huán)境下,用戶可以根據(jù)自己對(duì)相關(guān)信息資源的理解添加合適的標(biāo)簽,同時(shí)用戶可以參考其他人使用過(guò)的標(biāo)簽進(jìn)行標(biāo)注。這種標(biāo)注機(jī)制的實(shí)現(xiàn),使得信息用戶可以根據(jù)自己對(duì)資源的需求來(lái)對(duì)其進(jìn)行選擇,并根...
內(nèi)容介紹
此文檔由會(huì)員 違規(guī)屏蔽12 發(fā)布
摘 要
隨著Web2.0技術(shù)不斷發(fā)展和完善,社會(huì)化標(biāo)注系統(tǒng)隨之而產(chǎn)生。社會(huì)化標(biāo)注秉承了web2.0所提出的用戶自由性和主動(dòng)性的特征。在社會(huì)化標(biāo)注環(huán)境下,用戶可以根據(jù)自己對(duì)相關(guān)信息資源的理解添加合適的標(biāo)簽,同時(shí)用戶可以參考其他人使用過(guò)的標(biāo)簽進(jìn)行標(biāo)注。這種標(biāo)注機(jī)制的實(shí)現(xiàn),使得信息用戶可以根據(jù)自己對(duì)資源的需求來(lái)對(duì)其進(jìn)行選擇,并根據(jù)自己對(duì)資源認(rèn)識(shí)來(lái)對(duì)其進(jìn)行組織,體現(xiàn)社會(huì)化標(biāo)注系統(tǒng)的主動(dòng)性和個(gè)性化的特點(diǎn)。
由于社會(huì)化標(biāo)注本身是一種自下而上的標(biāo)注,這就使得這種 “合適”的標(biāo)簽并沒(méi)有統(tǒng)一規(guī)則予以約束,明明用少數(shù)幾個(gè)詞組就可以明確的描述出資源,但由于用戶的知識(shí)背景以及理解程度的差異,往往對(duì)信息資源進(jìn)行標(biāo)注時(shí)生成的標(biāo)簽出現(xiàn)歧義、同義、同形多義等現(xiàn)象。同時(shí),以往很少被標(biāo)注過(guò)的網(wǎng)絡(luò)資源往往被當(dāng)前瀏覽信息的用戶所忽略,這樣會(huì)導(dǎo)致大量具有重大價(jià)值的網(wǎng)絡(luò)資源被忽略掉,這些現(xiàn)象都會(huì)給新進(jìn)入的用戶搜索和獲取信息資源帶來(lái)了極大的困擾。
針對(duì)以上這些問(wèn)題,本文利用貝葉斯理論并結(jié)合相關(guān)主題聚類算法對(duì)社會(huì)化標(biāo)注環(huán)境中的信息資源主題進(jìn)行有效地挖掘,將大量用戶對(duì)特定資源進(jìn)行標(biāo)注所產(chǎn)生的標(biāo)簽集進(jìn)行一定的清除和歸類,最終在特定資源下得出只含有少數(shù)具有代表性的標(biāo)簽集合。本文的主要貢獻(xiàn)有如下幾個(gè)方面:
(1) 根據(jù)社會(huì)化標(biāo)注所存在的一詞多義、同義詞等現(xiàn)象將文本挖掘理論中的隱含語(yǔ)義挖掘理論應(yīng)用到社會(huì)化標(biāo)注上來(lái),通過(guò)構(gòu)建資源-標(biāo)簽矩陣來(lái)挖掘兩者間的語(yǔ)義空間,有效解決了用戶標(biāo)注過(guò)程中的詞義混亂現(xiàn)象;
(2) 利用三層貝葉斯網(wǎng)絡(luò),構(gòu)建基于隱狄利克雷的主題分配,并在此基礎(chǔ)上挖掘潛在的主題并對(duì)其進(jìn)行有效地分類匯總;
(3) 結(jié)合貝葉斯理論的先驗(yàn)知識(shí)及樣本空間,并提出主題空間分類,對(duì)資源的屬性識(shí)別進(jìn)行進(jìn)一步細(xì)化,使前兩方面的工作得到進(jìn)一步改善。
以上研究不但豐富了信息組織和檢索的相關(guān)理論,而且為信息主題及用戶偏好的識(shí)別提供了有效的途徑。
關(guān)鍵詞 社會(huì)化標(biāo)注;主題聚類;隱含語(yǔ)義;層級(jí)貝葉斯
Abstract
With the development and improvement of Web 2.0 technology, social tagging emerged. Social tagging proposed by adhering to the characteristics of freedom and initiative about users’ behaviors. Marked in the social environment, users set their own understanding of the relevant information resources to add the right tags, and users can refer other people to mark the label used. Mechanism to achieve this mark, making information users according to their demand for resources to select them, and according to their knowledge of resources to them, to embody the initiative of social tagging systems and personal characteristics.
However, due to social tagging itself is a bottom-up label, which prompted this "right" tag, and there is no uniform rules to be binding, you can use a few phrases to describe the specific resources obviously, but because of the user's knowledge and understanding of differences in background, often marked on the information resources generated when the label ambiguity, synonymy, polysemy and so on with the form. At the same time ,in the past rarely had marked the current view of network resources is often ignored by users of information, this will cause a lot of great value to the network resources are ignored, these phenomena will give new users access to search and bring access to information resources great distress.
For these questions, this paper Bayesian clustering algorithm combined with the topic of social tagging environment the theme of information resources effectively mining large amounts of user annotation results for a particular resource sets generated some label Clear and specific resources are classified eventually come to contain only a small number of representative labels set. The main contribution of this paper has the following aspects:
(1) Marked by the presence of the community of polysemy, synonyms, and so the theory of the text mining mining theory applied to the latent semantic social tagging up. It solve user’s semantic confusing effectivly in the process of annotation by building resources – tag matrix to mining t semantic space between them ;
(2) Use of three Bayesian network and build a topic based on latent Dirichlet allocation, and on this basis, the subject of mining and its potential to effectively subtotals;
(3) Bayesian theory with the prior knowledge and sample space, and put forward the topic of space classification, identification of resources for further refinement of the property, so that the first two aspects have been further improved.
Above research not only enriched the information organization and retrieva l relevant theory, but also for information theme and user preferences recognition provides an effective way.
Keywords Social tagging; Topic Clustering; Latent Semantic Analysis; Bayesian hierarchical model
目 錄
摘 要 I
Abstract II
目 錄 IV
CONTENTS VI
第1章 緒論 1
1.1 研究的背景與意義 1
1.2 研究現(xiàn)狀 3
1.2.1 社會(huì)化標(biāo)注國(guó)內(nèi)外研究現(xiàn)狀 3
1.2.2 Web文本主題挖掘技術(shù)研究現(xiàn)狀 6
1.3 研究?jī)?nèi)容、技術(shù)路線及組織結(jié)構(gòu) 6
1.3.1 研究?jī)?nèi)容 6
1.3.2 技術(shù)路線 7
1.3.3 論文的組織結(jié)構(gòu) 9
1.4 創(chuàng)新點(diǎn) 9
第2章 社會(huì)化標(biāo)注系統(tǒng)概述及其相關(guān)貝葉斯算法 11
2.1 社會(huì)化標(biāo)注概述 11
2.1.1 社會(huì)化標(biāo)注概念 11
2.1.2 社會(huì)化標(biāo)注的要素 13
2.1.3 社會(huì)&..
隨著Web2.0技術(shù)不斷發(fā)展和完善,社會(huì)化標(biāo)注系統(tǒng)隨之而產(chǎn)生。社會(huì)化標(biāo)注秉承了web2.0所提出的用戶自由性和主動(dòng)性的特征。在社會(huì)化標(biāo)注環(huán)境下,用戶可以根據(jù)自己對(duì)相關(guān)信息資源的理解添加合適的標(biāo)簽,同時(shí)用戶可以參考其他人使用過(guò)的標(biāo)簽進(jìn)行標(biāo)注。這種標(biāo)注機(jī)制的實(shí)現(xiàn),使得信息用戶可以根據(jù)自己對(duì)資源的需求來(lái)對(duì)其進(jìn)行選擇,并根據(jù)自己對(duì)資源認(rèn)識(shí)來(lái)對(duì)其進(jìn)行組織,體現(xiàn)社會(huì)化標(biāo)注系統(tǒng)的主動(dòng)性和個(gè)性化的特點(diǎn)。
由于社會(huì)化標(biāo)注本身是一種自下而上的標(biāo)注,這就使得這種 “合適”的標(biāo)簽并沒(méi)有統(tǒng)一規(guī)則予以約束,明明用少數(shù)幾個(gè)詞組就可以明確的描述出資源,但由于用戶的知識(shí)背景以及理解程度的差異,往往對(duì)信息資源進(jìn)行標(biāo)注時(shí)生成的標(biāo)簽出現(xiàn)歧義、同義、同形多義等現(xiàn)象。同時(shí),以往很少被標(biāo)注過(guò)的網(wǎng)絡(luò)資源往往被當(dāng)前瀏覽信息的用戶所忽略,這樣會(huì)導(dǎo)致大量具有重大價(jià)值的網(wǎng)絡(luò)資源被忽略掉,這些現(xiàn)象都會(huì)給新進(jìn)入的用戶搜索和獲取信息資源帶來(lái)了極大的困擾。
針對(duì)以上這些問(wèn)題,本文利用貝葉斯理論并結(jié)合相關(guān)主題聚類算法對(duì)社會(huì)化標(biāo)注環(huán)境中的信息資源主題進(jìn)行有效地挖掘,將大量用戶對(duì)特定資源進(jìn)行標(biāo)注所產(chǎn)生的標(biāo)簽集進(jìn)行一定的清除和歸類,最終在特定資源下得出只含有少數(shù)具有代表性的標(biāo)簽集合。本文的主要貢獻(xiàn)有如下幾個(gè)方面:
(1) 根據(jù)社會(huì)化標(biāo)注所存在的一詞多義、同義詞等現(xiàn)象將文本挖掘理論中的隱含語(yǔ)義挖掘理論應(yīng)用到社會(huì)化標(biāo)注上來(lái),通過(guò)構(gòu)建資源-標(biāo)簽矩陣來(lái)挖掘兩者間的語(yǔ)義空間,有效解決了用戶標(biāo)注過(guò)程中的詞義混亂現(xiàn)象;
(2) 利用三層貝葉斯網(wǎng)絡(luò),構(gòu)建基于隱狄利克雷的主題分配,并在此基礎(chǔ)上挖掘潛在的主題并對(duì)其進(jìn)行有效地分類匯總;
(3) 結(jié)合貝葉斯理論的先驗(yàn)知識(shí)及樣本空間,并提出主題空間分類,對(duì)資源的屬性識(shí)別進(jìn)行進(jìn)一步細(xì)化,使前兩方面的工作得到進(jìn)一步改善。
以上研究不但豐富了信息組織和檢索的相關(guān)理論,而且為信息主題及用戶偏好的識(shí)別提供了有效的途徑。
關(guān)鍵詞 社會(huì)化標(biāo)注;主題聚類;隱含語(yǔ)義;層級(jí)貝葉斯
Abstract
With the development and improvement of Web 2.0 technology, social tagging emerged. Social tagging proposed by adhering to the characteristics of freedom and initiative about users’ behaviors. Marked in the social environment, users set their own understanding of the relevant information resources to add the right tags, and users can refer other people to mark the label used. Mechanism to achieve this mark, making information users according to their demand for resources to select them, and according to their knowledge of resources to them, to embody the initiative of social tagging systems and personal characteristics.
However, due to social tagging itself is a bottom-up label, which prompted this "right" tag, and there is no uniform rules to be binding, you can use a few phrases to describe the specific resources obviously, but because of the user's knowledge and understanding of differences in background, often marked on the information resources generated when the label ambiguity, synonymy, polysemy and so on with the form. At the same time ,in the past rarely had marked the current view of network resources is often ignored by users of information, this will cause a lot of great value to the network resources are ignored, these phenomena will give new users access to search and bring access to information resources great distress.
For these questions, this paper Bayesian clustering algorithm combined with the topic of social tagging environment the theme of information resources effectively mining large amounts of user annotation results for a particular resource sets generated some label Clear and specific resources are classified eventually come to contain only a small number of representative labels set. The main contribution of this paper has the following aspects:
(1) Marked by the presence of the community of polysemy, synonyms, and so the theory of the text mining mining theory applied to the latent semantic social tagging up. It solve user’s semantic confusing effectivly in the process of annotation by building resources – tag matrix to mining t semantic space between them ;
(2) Use of three Bayesian network and build a topic based on latent Dirichlet allocation, and on this basis, the subject of mining and its potential to effectively subtotals;
(3) Bayesian theory with the prior knowledge and sample space, and put forward the topic of space classification, identification of resources for further refinement of the property, so that the first two aspects have been further improved.
Above research not only enriched the information organization and retrieva l relevant theory, but also for information theme and user preferences recognition provides an effective way.
Keywords Social tagging; Topic Clustering; Latent Semantic Analysis; Bayesian hierarchical model
目 錄
摘 要 I
Abstract II
目 錄 IV
CONTENTS VI
第1章 緒論 1
1.1 研究的背景與意義 1
1.2 研究現(xiàn)狀 3
1.2.1 社會(huì)化標(biāo)注國(guó)內(nèi)外研究現(xiàn)狀 3
1.2.2 Web文本主題挖掘技術(shù)研究現(xiàn)狀 6
1.3 研究?jī)?nèi)容、技術(shù)路線及組織結(jié)構(gòu) 6
1.3.1 研究?jī)?nèi)容 6
1.3.2 技術(shù)路線 7
1.3.3 論文的組織結(jié)構(gòu) 9
1.4 創(chuàng)新點(diǎn) 9
第2章 社會(huì)化標(biāo)注系統(tǒng)概述及其相關(guān)貝葉斯算法 11
2.1 社會(huì)化標(biāo)注概述 11
2.1.1 社會(huì)化標(biāo)注概念 11
2.1.2 社會(huì)化標(biāo)注的要素 13
2.1.3 社會(huì)&..
TA們正在看...
- 2012年高考語(yǔ)文仿真模擬檢測(cè)試題12套.rar
- 【6s培訓(xùn)】寫字樓辦公室6s管理專項(xiàng)培訓(xùn).ppt
- 企業(yè)戰(zhàn)略管理咨詢常用方法及工具分析.ppt
- 【6s精品】如何在企業(yè)管理中推行6s現(xiàn)場(chǎng)管理.ppt
- 【6s精品】如何在中國(guó)企業(yè)辦公室推廣6s管理.ppt
- 《有效溝通》8848.ppt
- 【6s培訓(xùn)】華旭集團(tuán)生產(chǎn)現(xiàn)場(chǎng)6s管理專項(xiàng)培訓(xùn).ppt
- 【6s培訓(xùn)】6s現(xiàn)場(chǎng)管理與人員品質(zhì)的提高.ppt
- 【6s管理】企業(yè)內(nèi)部6s管理知識(shí)競(jìng)賽.ppt
- 【華為培訓(xùn)精品】如何創(chuàng)造成功的項(xiàng)目管理.ppt