Portál UPOL - Prohlížení

Prohlížení (S025)

Hlavní nabídka Prohlížení IS/STAG

Najít Kvalifikační práce

Tisk/export:

Export dat do formátu PDF - který můžete pohodlně vytisknout...

Tento odkaz můžete zkopírovat a použít například jako záložku prohlížeče pro zobrazení aktuální pozice v Prohlížení IS/STAG.

Nalezené termíny, počet: 1

Stránkování výsledků vyhledávání

Nalezeno 1 záznamů Tisk Export do Xls URL na seznam

Příjmení (rod. přijm.)	Jméno	Os. číslo	Název	Stav práce		Vedoucí/školitelé	Oponenti	Typ práce	Dat. obhaj.	Název
Student	Typ práce	-	-	-	-	-	-	-	-	-	-
VALIHRACH	Dan	F220727	Užití disperze nízkofrekventovaného lexika k určení autorství textu			Faltýnek Dan	Benešová Martina	diplomová	29.05.2024	Užití disperze nízkofrekventovaného lexika k určení autorství textu
Dan VALIHRACH (F220727)	diplomová	0XX	0XX	0XX	0XX	0XX	0XX	0XX	0XX	0XX	0XX

Informace o kvalifikační práci Užití disperze nízkofrekventovaného lexika k určení autorství textu

Základní údaje

Anotace
Dokument, ke kterému přistupujete, podléhá autorskému zákonu. Jeho porušením se můžete vystavit trestnímu postihu!
Jméno	VALIHRACH Dan
Akad. rok	2023/2024
Zadávající pracoviště	KOL
Datum obhajoby	29. 5. 2024
Typ práce	diplomová
Stav práce	Dokončená práce s úspěšnou obhajobou (DUO).
Úplnost vyplnění požadovaných údajů	- Všechny požadované údaje o této VŠKP jsou vyplněny.
Hlavní téma	Užití disperze nízkofrekventovaného lexika k určení autorství textu
Hlavní téma v angličtině	Using low-frequency lexical dispersion to determine text authorship
Název dle studenta	Užití disperze nízkofrekventovaného lexika k určení autorství textu
Název dle studenta v angličtině	Using low-frequency lexical dispersion to determine text authorship
Souběžný název	-
Podnázev	-
Vedoucí	Faltýnek Dan, doc. Mgr. Ph.D.
Oponent	Benešová Martina, Mgr. Ph.D.
Anotace	Tato diplomová práce se zabývá využitím vlastností lexikální disperze slov v textu pro potřeby určování autorství textu. Zkoumá vlastnosti disperze slov v rámci korpusu textů beletrie a snaží se na základě způsobu disperze klasifikovat lexikum do tříd. K popisu disperze je zde využit čistě kvantitativní přístup. Atribuční potenciál různých skupin lexika určených pomocí jejich disperze a frekvence je analyzován pomocí výpočtu matic nepodobnosti a aplikací metod hierarchického shlukování a vícerozměrného škálování. V rámci této práce analyzujeme texty korpusu beletrie, které zpracujeme použitím vlastních programovacích skriptů. Cílem této práce je tedy nalézt lexikum, pomocí kterého lze nejlépe identifikovat autorství textu, a zároveň analyzovat možnosti využití lexikální disperze jako stylometrické vlastnosti. V této práci je věnována zvýšená pozornost lexiku s nízkou frekvencí výskytu, na rozdíl od v podobných výzkumech běžně používaného vysoce frekventovaného lexika.
Anotace v angličtině	This thesis focuses on using features of lexical dispersion of words in texts for the purposes of authorship attribution. It analyses the features of words dispersion in a corpora of fiction books, and attempts to classify words into classes based on their lexical dispersion. Dispersion is analyses using purely quantitative approach. To analyse the potential of various lexical groups for authorship attribution we use dissimilarity matrix computation and application of hierarchical clustering method and multidimensional scaling method. In this thesis we analyse corpora of fiction books, which we process using our own programming scripts. The goal of this thesis is to identify words with the highest potential to distinguish authorship and also to research viability of a lexical dispersion as a stylometric feature. This thesis pays extra attention to the low frequency words, as opposed to high frequency words, which are often used in this field of research.
Klíčová slova	disperze, nízkofrekventované lexikum, určování autorství, stylometrie, kvantitativní lingvistika, superhapax
Klíčová slova v angličtině	dispersion, low frequency lexis, authorship attribution, stylometry, quantitative linguistics, superhapax
Rozsah průvodní práce	79 s. (111 690 znaků)
Jazyk	CZ
Tato diplomová práce se zabývá využitím vlastností lexikální disperze slov v textu pro potřeby určování autorství textu. Zkoumá vlastnosti disperze slov v rámci korpusu textů beletrie a snaží se na základě způsobu disperze klasifikovat lexikum do tříd. K popisu disperze je zde využit čistě kvantitativní přístup. Atribuční potenciál různých skupin lexika určených pomocí jejich disperze a frekvence je analyzován pomocí výpočtu matic nepodobnosti a aplikací metod hierarchického shlukování a vícerozměrného škálování. V rámci této práce analyzujeme texty korpusu beletrie, které zpracujeme použitím vlastních programovacích skriptů. Cílem této práce je tedy nalézt lexikum, pomocí kterého lze nejlépe identifikovat autorství textu, a zároveň analyzovat možnosti využití lexikální disperze jako stylometrické vlastnosti. V této práci je věnována zvýšená pozornost lexiku s nízkou frekvencí výskytu, na rozdíl od v podobných výzkumech běžně používaného vysoce frekventovaného lexika.
Anotace v angličtině
This thesis focuses on using features of lexical dispersion of words in texts for the purposes of authorship attribution. It analyses the features of words dispersion in a corpora of fiction books, and attempts to classify words into classes based on their lexical dispersion. Dispersion is analyses using purely quantitative approach. To analyse the potential of various lexical groups for authorship attribution we use dissimilarity matrix computation and application of hierarchical clustering method and multidimensional scaling method. In this thesis we analyse corpora of fiction books, which we process using our own programming scripts. The goal of this thesis is to identify words with the highest potential to distinguish authorship and also to research viability of a lexical dispersion as a stylometric feature. This thesis pays extra attention to the low frequency words, as opposed to high frequency words, which are often used in this field of research.
Klíčová slova
disperze, nízkofrekventované lexikum, určování autorství, stylometrie, kvantitativní lingvistika, superhapax
Klíčová slova v angličtině
dispersion, low frequency lexis, authorship attribution, stylometry, quantitative linguistics, superhapax
Zásady pro vypracování	Tato práce se bude zabývat využitím měření disperze nízkofrekventovaných slov pro účely určení autorství a nalezení specifických vlastností daných autorů. Primárním cílem bude testování metod automatické strojové extrakce nízkofrekventovaných slov nesoucích informaci o autorech. Metody budou testovány a evaluovány na předem identifikovaných autorských textech. Výstupem práce bude určení nejvhodnější metody pro automatickou extrakci nízkofrekventovaného lexika s autorskou informací, či případně identifikace výhod a nevýhod více různých metod.
Zásady pro vypracování
Tato práce se bude zabývat využitím měření disperze nízkofrekventovaných slov pro účely určení autorství a nalezení specifických vlastností daných autorů. Primárním cílem bude testování metod automatické strojové extrakce nízkofrekventovaných slov nesoucích informaci o autorech. Metody budou testovány a evaluovány na předem identifikovaných autorských textech. Výstupem práce bude určení nejvhodnější metody pro automatickou extrakci nízkofrekventovaného lexika s autorskou informací, či případně identifikace výhod a nevýhod více různých metod.
Seznam doporučené literatury	Grzybek, P. (2013). Homogeneity and heterogeneity within language(s) and text(s): theory and practice of word length modeling. In: Köhler, R. Altmann, G. (eds.): Issues in Quantitative Linguistics 3. Lüdenscheid: RAM, 66–99. Grzybek, P. (2007). History and methodology of word length studies: The State of the Art. In: Grzybek, P. (ed.), Contributions to the Science of Text and Language. Dordrecht: Springer, 15–90. Koppel, M., Schler, J., and Argamon, S. (2009). Computational methods in authorship attribution. J. Assoc. Inf. Sci. Technol, 60, 9–26. Mikros, G., K. (2009). Content words in authorship attribution: an evaluation of stylometric features in a literary corpus. In Köhler, R. (ed), Studies in Quantitative Linguistics, vol. 5. Lüdenscheid: RAM, s. 61–75. Mikros, G. K., Argiri, E. K. (2007). Investigating topic influence in authorship attribution. In B. Stein, M. Koppel & E. Stamatatos (Eds.), Proceedings of the SIGIR 2007 International Workshop on Plagiarism Analysis, Authorship Identification, and Near-Duplicate Detection (Vol. 276, pp. 29–35). Amsterdam, Netherlands: CEUR. Peng, F., Schuurmans, D., Keselj, V., and Wang S. (2003). Language independent authorship attribution using character level language models. In: 10th Conference of the European Chapter of the Association for Computational Linguistics, EACL. Zhao, Y., & Zobel, J. (2005). Effective and scalable authorship attribution using function words. In: Proceedings of 2nd Asian Information Retrieval Symposium, 174–189.
Seznam doporučené literatury
Grzybek, P. (2013). Homogeneity and heterogeneity within language(s) and text(s): theory and practice of word length modeling. In: Köhler, R. Altmann, G. (eds.): Issues in Quantitative Linguistics 3. Lüdenscheid: RAM, 66–99. Grzybek, P. (2007). History and methodology of word length studies: The State of the Art. In: Grzybek, P. (ed.), Contributions to the Science of Text and Language. Dordrecht: Springer, 15–90. Koppel, M., Schler, J., and Argamon, S. (2009). Computational methods in authorship attribution. J. Assoc. Inf. Sci. Technol, 60, 9–26. Mikros, G., K. (2009). Content words in authorship attribution: an evaluation of stylometric features in a literary corpus. In Köhler, R. (ed), Studies in Quantitative Linguistics, vol. 5. Lüdenscheid: RAM, s. 61–75. Mikros, G. K., Argiri, E. K. (2007). Investigating topic influence in authorship attribution. In B. Stein, M. Koppel & E. Stamatatos (Eds.), Proceedings of the SIGIR 2007 International Workshop on Plagiarism Analysis, Authorship Identification, and Near-Duplicate Detection (Vol. 276, pp. 29–35). Amsterdam, Netherlands: CEUR. Peng, F., Schuurmans, D., Keselj, V., and Wang S. (2003). Language independent authorship attribution using character level language models. In: 10th Conference of the European Chapter of the Association for Computational Linguistics, EACL. Zhao, Y., & Zobel, J. (2005). Effective and scalable authorship attribution using function words. In: Proceedings of 2nd Asian Information Retrieval Symposium, 174–189.
Přílohy volně vložené	1 skript v programovacím jazyce python, 2 skripty v programovacím jazyce R
Přílohy vázané v práci	-
Převzato z knihovny	Ne
Plný text práce
Přílohy
Posudek(y) oponenta
Hodnocení vedoucího
Záznam průběhu obhajoby	-
Soubor s průběhem obhajoby

Prohlížení - Portál UPOL

Navigace první úrovně

Prohlížení (S025)

Hlavní nabídka Prohlížení IS/STAG

Najít Kvalifikační práce

Nalezené termíny, počet: 1

Stránkování výsledků vyhledávání

Informace o kvalifikační práci Užití disperze nízkofrekventovaného lexika k určení autorství textu