Cognitive Science of Language

Course title

4M125000

Purpose of class

This course aims to provide interdisciplinary knowledge by integrating linguistic theory with techniques from information science. Students will learn to objectively analyze language used in everyday social contexts and acquire practical skills that can be applied not only to academic research but also to real-world tasks such as customer analysis and survey analysis in professional settings.

The course is structured so that, in the first half (up to Session 8), students focus on textbook-based instruction to acquire classical methods for processing linguistic data, including frequency analysis and statistical approaches. In the second half of the course (from Session 9 onward), building on the textbook content, students learn more advanced methods for processing linguistic data using machine learning and deep learning techniques.

Course content

In today’s information-oriented society, data mining techniques for extracting knowledge from large-scale data have become increasingly important. In this course, Japanese is used as the primary object of analysis, and students learn methods for quantitatively and qualitatively analyzing linguistic data by applying techniques from text mining and statistics in order to extract meaningful insights.

Specifically, the course covers the collection of linguistic data through API requests using Python, hands-on practice with software tools required for analysis, formulation of hypotheses and selection of appropriate analytical methods, as well as interpretation and reporting of analytical results. The course is conducted through a combination of lectures and practical exercises. Students are expected to prepare for each class by reading the assigned textbook in advance and to engage in review activities, including preparation for short quizzes and completion of applied assignments.

Goals and objectives

Students will be able to appropriately collect linguistic data, construct corpora, and perform preprocessing in order to prepare data suitable for analysis.
Students will be able to appropriately analyze linguistic data and scale data using programming languages and text mining software.
Students will be able to appropriately select analytical methods to test formulated hypotheses, conduct analyses, and logically interpret the resulting findings.
Students will be able to summarize analytical results in the form of an academic report and clearly communicate their findings.

Relationship between 'Goals and Objectives' and 'Course Outcomes'

	Short tests	Assigments	Final project (Report＆Presentation)	Total.
1.	8%	8%	8%	24%
2.	9%	9%	8%	26%
3.	9%	9%	8%	26%
4.	9%	9%	6%	24%
Total.	35%	35%	30%	-

Language

Japanese

Class schedule

	Class schedule	HW assignments (Including preparation and review of the class.)	Amount of Time Required
1．	Course Overview and PC Environment Setup: This session provides an overview of the course objectives and structure, explains how the course will be conducted, and introduces the assigned textbook and required software, including installation procedures. In addition, students will set up a Python environment on their own computers and engage in basic programming exercises.	Preparation for the upcoming short test 1 From this point forward, students are required to bring a laptop, earphones, and the textbook.	100minutes
1．		Prepare for class by reading Chapter 1 of the textbook. If you were unable to install the software or set up the Python environment during class, make sure to complete the setup by the third week. Additionally, submit the Python file created with ZipcodeAPI.
2．	Chapter 1: Collecting Language Data: Students will deepen their understanding of corpus construction for linguistic research, including the design, collection, management, and preprocessing of data, which form the foundation for subsequent analyses.	Preparation for the upcoming short test 2	100minutes
2．		Submit a report assuming the creation of a ”Japanese Magazine Corpus” with balance and representativeness. Prepare for the next class by reading Chapter 2 of the textbook.
3．	Chapter 2: ”Quantifying Language (Part 1)”: Students will use text analysis software to load and examine corpora. Using existing corpora, they will practice fundamental operations of text analysis software, including morphological analysis, frequency counting, and graph creation, which constitute the preliminary steps of linguistic data analysis.	Preparation for the upcoming short test 3	100minutes
3．		Analyze the provided pre-existing corpus and create both a file and a report for submission.
4．	Chapter 2: Counting Words (Part 2): Students will use the Python programming language to perform API requests in order to collect linguistic data and construct a corpus. They will then conduct co-occurrence network analysis using text analysis software.	Preparation for the upcoming short test 4	100minutes
4．		Submit a report that includes two types of co-occurrence network diagrams. Prepare for the next class by reading Chapter 3 of the textbook.
5．	Chapter 3: Examining Data Characteristics: Students will learn methods for calculating the proportion of words within a corpus and for quantitatively representing characteristics of linguistic data, including lexical density, variance, interquartile range, MVR values, and TF-IDF scores.	Preparation for the upcoming short test 5	100minutes
5．		Calculate the adjusted frequency of a specific word across multiple corpora and submit your observations as a report. Prepare for the next class by reading Chapter 4 of the textbook.
6．	Chapter 4: Visualizing Data: Students will practice visualizing frequency tables using graphs. By learning appropriate methods and points of caution in visualization, they will deepen their understanding of how to represent characteristics of corpora. As an exercise, students will create histograms based on literary works from the Meiji to Showa periods and learn methods for comparing frequency distributions using t-tests.	Preparation for the upcoming short test 6	100minutes
6．		Construct a corpus from works by Dazai Osamu and Akutagawa Ryunosuke, create histograms, and submit a report based on t-test results. Preparre for the next class includes reading Chapter 5 of the assigned textbook.
7．	Chapter 5: Examining Differences in Data: Students will practice inferential statistical methods by conducting tests of independence (chi-square tests) to examine whether observed differences in data proportions are statistically meaningful rather than due to chance. In addition, they will learn how to calculate effect sizes, which represent the magnitude of differences independently of sample size, rather than relying solely on statistical significance.	Preparation for the upcoming short test 7	100minutes
7．		Conduct a chi-square analysis using the provided data and submit a report summarizing the results. Prepare for the next class by reading Chapter 6 of the assigned textbook.
8．	Chapter 6: Extracting Data Characteristics: Students will continue to apply inferential statistical methods by conducting tests of independence (chi-square tests) and performing multiple comparisons using the Bonferroni correction. Through an exercise comparing the proportion of verbs in works by Natsume Soseki, students will practice identifying characteristic features of corpora.	Preparation for the upcoming short test 8	100minutes
8．		Conduct a chi-square analysis using three works by Natsume Soseki and submit a report summarizing the results. Prepare for the next class by reading Chapter 8 of the assigned textbook.
9．	Chapter 8: Observing Changes in Data — Linguistic Analysis Using Machine Learning (1): Students will learn several types of machine learning approaches. As an example of supervised learning, they will examine relationships among multiple variables by applying regression analysis following correlation analysis. In addition, as another supervised learning application, students will use sentiment analysis provided by Google Cloud Platform via API access (with Python code) to quantify emotional information in language data, such as negative and positive sentiment.	Preparation for the upcoming short test 9	100minutes
9．		Perform sentiment analysis on hotel reviews using Python-based API requests, apply correlation and regression analyses to the resulting scores, and submit a report summarizing the findings.
10．	Chapter 10: Grouping Data — Linguistic Analysis Using Machine Learning (2): As an example of unsupervised learning, students will learn methods for analyzing qualitative linguistic data, including hierarchical cluster analysis and correspondence analysis. These methods represent latent patterns in data as distances and compress qualitative features into low-dimensional space for visualization in scatter plots, in a manner similar to principal component analysis. Students are expected to prepare for this class by reading Chapter 10 of the assigned textbook.	Preparation for the upcoming short test 10	100minutes
10．		Continue the analysis of hotel reviews conducted in the previous session.
11．	Corpus Construction Using Python: Students will learn techniques for collecting linguistic data from the web through API requests and web scraping, with the goal of independently constructing their own corpora. This session includes hands-on exercises in API requests and web scraping.	Preparation for the upcoming short test 11	100minutes
11．		Retrieve comments using the YouTube API, and collect review data and external variables via web scraping from approved websites.
12．	Linguistic Analysis Using Machine Learning (3): Supervised Learning As an example of supervised learning, students will use Python to build models that vectorize words and documents using the Word2Vec and Doc2Vec algorithms implemented in the gensim library. Using the resulting vector representations, students will conduct similarity analysis based on cosine similarity and perform dimensionality reduction through principal component analysis using the scikit-learn library, and visualize the results with scatter plots.	Understand the mechanisms of machine learning and deep learning, convert textual data into numerical representations, and apply statistical and machine learning–based analytical methods.	300minutes
12．			300minutes
13．	Linguistic Analysis Using Machine Learning (4): Advanced Topics As an advanced topic, students will examine pretrained large language models represented by BERT (Transformer) within the framework of supervised learning. The session provides an overview of their underlying mechanisms and the concept of fine-tuning, and explores their application to inference tasks.	Understand the mechanisms of machine learning and deep learning, convert textual data into numerical representations, and apply statistical and machine learning–based analytical methods.	300minutes
14．	Final Project Presentations: Students will present the results of their analyses to their peers, engage in question-and-answer sessions, and conduct mutual evaluations.	Prepare presentation slides (PPT) and present the analysis results to the class. Engage in peer questions and mutual evaluation of the analyses.	300minutes
14．		Collect and analyze linguistic data using methods learned in the course and/or self-selected approaches, and present the results in PPT format during the final session (Session 14).
Total.	-	-	2300minutes

Evaluation method and criteria

Students will be evaluated based on short quizzes (35%), assignments (35%), and the final project (presentation slides and presentation) (30%). A total score of 60% or higher is required to pass the course.

A score of 60% corresponds to a level at which students have a general understanding of the knowledge required for linguistic research, are able to conduct analyses independently using appropriate methods, report their results, and critically examine potential issues in their analyses.

Feedback on exams, assignments, etc.

ways of feedback	specific contents about "Other"
Feedback in the class

Textbooks and reference materials

1. Software and Materials:
Because the in-class exercises involve the analysis of Japanese language data, students are required to bring a personal laptop computer running a Japanese-supported Windows operating system to every class session. The software used in this course is not compatible with macOS, tablets, or smartphones; therefore, a Windows-based laptop is mandatory. As video materials are also used, students should prepare earphones.

2. Required Textbook:
Kobayashi, Yuichiro. "Kotoba no Data Science," Asakura Publishing, 2024. (As of March 2024, JPY 2,700)

3. Supplementary Materials:
Lecture slides, fill-in worksheets, video materials, and programming-related materials will be distributed during the course.

Prerequisites

Prepare a laptop computer running a Japanese-supported Windows operating system and earphones. Consider a topic of personal interest related to the relationship between language and society.

Office hours and How to contact professors for questions

The instructor is available to students before and after each session or via email to answer questions.

Regionally-oriented

Non-regionally-oriented course

Development of social and professional independence

Course that cultivates an ability for utilizing knowledge
Course that cultivates a basic problem-solving skills

Active-learning course

Most classes are interactive

Course by professor with work experience

Work experience	Work experience and relevance to the course content if applicable
N/A	該当しない

Education related SDGs:the Sustainable Development Goals

4.QUALITY EDUCATION
9.INDUSTRY, INNOVATION AND INFRASTRUCTURE

Last modified : Sat Mar 14 14:07:38 JST 2026