|
Vyučující
|
-
Bond Francis Charles, Ph.D.
|
|
Obsah předmětu
|
1. Intro 2. Markup and Annotation 3. Metadata for Corpus Work 4. Multimodal and Multilingual Corpora 5. A survey of Available Corpora 6. DIY Corpora, Web as Corpus, Processing Raw Text, SQL 7. Statistics (Excel) 8. Encoding, tokenization + CJK Corpora 9. Case Studies (pronouns) 10. The Czech National Corpus 11. Project Presentations 12. Corpora and Language Engineering 13. Conclusion
|
|
Studijní aktivity a metody výuky
|
|
nespecifikováno
|
|
Výstupy z učení
|
This course is an introduction to the fast growing field of corpus linguistics. It aims to familiarise students with key concepts and common methods used in the construction of language corpora, as well as tools that have been developed for searching and using major corpora such as the British National Corpus, Czech National Corpus, and CJK corpora. Students will be given hands-on experience in pre-editing, annotating, and searching corpora. Criteria and methods used for evaluating corpora and analytical tools will also be discussed. This lays the groundwork for research using big data. The main aim of this module is to master the uses of text corpora in linguistics research and data analysis.
|
|
Předpoklady
|
nespecifikováno
|
|
Hodnoticí metody a kritéria
|
nespecifikováno
Assignment 1: investigate modality Assignment 2: write/improve a wikipedia page Assignment 3: harvest or analyze data/prompt AI
|
|
Doporučená literatura
|
|