| Short tests | Assigments | Final project (Report&Presentation) | Total. | |
|---|---|---|---|---|
| 1. | 8% | 8% | 8% | 24% |
| 2. | 9% | 9% | 8% | 26% |
| 3. | 9% | 9% | 8% | 26% |
| 4. | 9% | 9% | 6% | 24% |
| Total. | 35% | 35% | 30% | - |
| Class schedule | HW assignments (Including preparation and review of the class.) | Amount of Time Required | |
|---|---|---|---|
| 1. | Course Overview and PC Environment Setup: This session provides an overview of the course objectives and structure, explains how the course will be conducted, and introduces the assigned textbook and required software, including installation procedures. In addition, students will set up a Python environment on their own computers and engage in basic programming exercises. |
Preparation for the upcoming short test 1 From this point forward, students are required to bring a laptop, earphones, and the textbook. |
100minutes |
| Prepare for class by reading Chapter 1 of the textbook. If you were unable to install the software or set up the Python environment during class, make sure to complete the setup by the third week. Additionally, submit the Python file created with ZipcodeAPI. |
|||
| 2. | Chapter 1: Collecting Language Data: Students will deepen their understanding of corpus construction for linguistic research, including the design, collection, management, and preprocessing of data, which form the foundation for subsequent analyses. |
Preparation for the upcoming short test 2 | 100minutes |
| Submit a report assuming the creation of a ”Japanese Magazine Corpus” with balance and representativeness. Prepare for the next class by reading Chapter 2 of the textbook. |
|||
| 3. | Chapter 2: ”Quantifying Language (Part 1)”: Students will use text analysis software to load and examine corpora. Using existing corpora, they will practice fundamental operations of text analysis software, including morphological analysis, frequency counting, and graph creation, which constitute the preliminary steps of linguistic data analysis. |
Preparation for the upcoming short test 3 | 100minutes |
| Analyze the provided pre-existing corpus and create both a file and a report for submission. | |||
| 4. | Chapter 2: Counting Words (Part 2): Students will use the Python programming language to perform API requests in order to collect linguistic data and construct a corpus. They will then conduct co-occurrence network analysis using text analysis software. |
Preparation for the upcoming short test 4 | 100minutes |
| Submit a report that includes two types of co-occurrence network diagrams. Prepare for the next class by reading Chapter 3 of the textbook. |
|||
| 5. | Chapter 3: Examining Data Characteristics: Students will learn methods for calculating the proportion of words within a corpus and for quantitatively representing characteristics of linguistic data, including lexical density, variance, interquartile range, MVR values, and TF-IDF scores. |
Preparation for the upcoming short test 5 | 100minutes |
| Calculate the adjusted frequency of a specific word across multiple corpora and submit your observations as a report. Prepare for the next class by reading Chapter 4 of the textbook. |
|||
| 6. | Chapter 4: Visualizing Data: Students will practice visualizing frequency tables using graphs. By learning appropriate methods and points of caution in visualization, they will deepen their understanding of how to represent characteristics of corpora. As an exercise, students will create histograms based on literary works from the Meiji to Showa periods and learn methods for comparing frequency distributions using t-tests. |
Preparation for the upcoming short test 6 | 100minutes |
| Construct a corpus from works by Dazai Osamu and Akutagawa Ryunosuke, create histograms, and submit a report based on t-test results. Preparre for the next class includes reading Chapter 5 of the assigned textbook. | |||
| 7. | Chapter 5: Examining Differences in Data: Students will practice inferential statistical methods by conducting tests of independence (chi-square tests) to examine whether observed differences in data proportions are statistically meaningful rather than due to chance. In addition, they will learn how to calculate effect sizes, which represent the magnitude of differences independently of sample size, rather than relying solely on statistical significance. |
Preparation for the upcoming short test 7 | 100minutes |
| Conduct a chi-square analysis using the provided data and submit a report summarizing the results. Prepare for the next class by reading Chapter 6 of the assigned textbook. | |||
| 8. | Chapter 6: Extracting Data Characteristics: Students will continue to apply inferential statistical methods by conducting tests of independence (chi-square tests) and performing multiple comparisons using the Bonferroni correction. Through an exercise comparing the proportion of verbs in works by Natsume Soseki, students will practice identifying characteristic features of corpora. |
Preparation for the upcoming short test 8 | 100minutes |
| Conduct a chi-square analysis using three works by Natsume Soseki and submit a report summarizing the results. Prepare for the next class by reading Chapter 8 of the assigned textbook. | |||
| 9. | Chapter 8: Observing Changes in Data — Linguistic Analysis Using Machine Learning (1): Students will learn several types of machine learning approaches. As an example of supervised learning, they will examine relationships among multiple variables by applying regression analysis following correlation analysis. In addition, as another supervised learning application, students will use sentiment analysis provided by Google Cloud Platform via API access (with Python code) to quantify emotional information in language data, such as negative and positive sentiment. |
Preparation for the upcoming short test 9 | 100minutes |
| Perform sentiment analysis on hotel reviews using Python-based API requests, apply correlation and regression analyses to the resulting scores, and submit a report summarizing the findings. | |||
| 10. | Chapter 10: Grouping Data — Linguistic Analysis Using Machine Learning (2): As an example of unsupervised learning, students will learn methods for analyzing qualitative linguistic data, including hierarchical cluster analysis and correspondence analysis. These methods represent latent patterns in data as distances and compress qualitative features into low-dimensional space for visualization in scatter plots, in a manner similar to principal component analysis. Students are expected to prepare for this class by reading Chapter 10 of the assigned textbook. |
Preparation for the upcoming short test 10 | 100minutes |
| Continue the analysis of hotel reviews conducted in the previous session. | |||
| 11. | Corpus Construction Using Python: Students will learn techniques for collecting linguistic data from the web through API requests and web scraping, with the goal of independently constructing their own corpora. This session includes hands-on exercises in API requests and web scraping. |
Preparation for the upcoming short test 11 | 100minutes |
| Retrieve comments using the YouTube API, and collect review data and external variables via web scraping from approved websites. | |||
| 12. | Linguistic Analysis Using Machine Learning (3): Supervised Learning As an example of supervised learning, students will use Python to build models that vectorize words and documents using the Word2Vec and Doc2Vec algorithms implemented in the gensim library. Using the resulting vector representations, students will conduct similarity analysis based on cosine similarity and perform dimensionality reduction through principal component analysis using the scikit-learn library, and visualize the results with scatter plots. |
Understand the mechanisms of machine learning and deep learning, convert textual data into numerical representations, and apply statistical and machine learning–based analytical methods. | 300minutes |
| 300minutes | |||
| 13. | Linguistic Analysis Using Machine Learning (4): Advanced Topics As an advanced topic, students will examine pretrained large language models represented by BERT (Transformer) within the framework of supervised learning. The session provides an overview of their underlying mechanisms and the concept of fine-tuning, and explores their application to inference tasks. |
Understand the mechanisms of machine learning and deep learning, convert textual data into numerical representations, and apply statistical and machine learning–based analytical methods. | 300minutes |
| 14. | Final Project Presentations: Students will present the results of their analyses to their peers, engage in question-and-answer sessions, and conduct mutual evaluations. |
Prepare presentation slides (PPT) and present the analysis results to the class. Engage in peer questions and mutual evaluation of the analyses. | 300minutes |
| Collect and analyze linguistic data using methods learned in the course and/or self-selected approaches, and present the results in PPT format during the final session (Session 14). | |||
| Total. | - | - | 2300minutes |
| ways of feedback | specific contents about "Other" |
|---|---|
| Feedback in the class |
| Work experience | Work experience and relevance to the course content if applicable |
|---|---|
| N/A | 該当しない |

