Course title
4M1250001
Cognitive Science of Language

SHINTANI Mayu
Course content
In today's society, often referred to as the information age, data mining techniques for extracting knowledge from large amounts of data are gaining increasing attention. This course focuses on the use of text mining techniques to extract valuable knowledge from language data, such as Japanese and English, which are commonly used in everyday life. In this course, students will learn methods for data collection and corpus construction in language research using programming, data preprocessing, hypothesis formulation, selection of analysis methods, operation of analysis software, attempts at qualitative analysis from a linguistic perspective in addition to quantitative analysis, and methods for discussing and reporting results. The course consists of both lectures and practice. In addition, students are required to read assigned textbooks and articles before each class and engage in test preparation and assignments as part of their review.
Purpose of class
This course aims to acquire knowledge that combines linguistic knowledge from the humanities with text-mining techniques from computer science. By objectively analyzing languages used in everyday life, students will acquire practical skills useful in research and after entering the workforce.
Goals and objectives
  1. Collect linguistic data, build a corpus, perform preprocessing, and prepare data for appropriate analysis.
  2. Select analysis methods, and interpret the results appropriately.
  3. Use text mining tools and programming to process and analyze linguistic data.
  4. Analyze and explain linguistic phenomena using qualitative and quantitative methods.
  5. Write academic reports and communicate linguistic findings appropriately.
Relationship between 'Goals and Objectives' and 'Course Outcomes'

Short tests Assigments Final project (Academic Report) Total.
1. 10% 8% 8% 26%
2. 10% 8% 8% 26%
3. 10% 8% 8% 26%
4. 10% 6% 6% 22%
Total. 40% 30% 30% -
Language
Japanese
Class schedule

Class schedule HW assignments (Including preparation and review of the class.) Amount of Time Required
1. Class outline description:
Students will understand the positioning of this course, and the method of conducting lessons, and be introduced to the textbooks and software used (including installation).
Students will also set up an environment for using Python on their PCs and practice simple programming.
Preparation for the upcoming short test 1
From this point forward, students are required to bring a laptop, earphones, and the textbook.
100minutes
Prepare for class by reading Chapter 1 of the textbook.
If you were unable to install the software or set up the Python environment during class, make sure to complete the setup by the third week. Additionally, submit the Python file created with ZipcodeAPI.
2. Chapter 1: "Collecting Language Data":

Students will deepen their understanding of corpus creation for linguistic research, focusing on data design, collection, management, and preprocessing, which will be essential for future research.
Preparation for the upcoming short test 2 100minutes
Submit a report assuming the creation of a "Japanese Magazine Corpus" with balance and representativeness.
Prepare for the next class by reading Chapter 2 of the textbook.
3. Chapter 2: "Quantifying Language (Part 1)":

Students will use software for text quantification to load and process corpora. Using an existing corpus, they will practice basic operations of text analysis software, including morphological analysis, frequency counting, and graph creation, as the preliminary steps for further analysis.
Preparation for the upcoming short test 3 100minutes
Analyze the provided pre-existing corpus and create both a file and a report for submission.
4. Chapter 2: "Quantifying Language (Part 2)":

Students will practice building their corpus. They will collect language data and construct a corpus using Python as the programming language to make API requests. Additionally, they will manually create a corpus to observe the differences in data creation methods.
Preparation for the upcoming short test 4 100minutes
Submit a report that includes two types of co-occurrence network diagrams.
Prepare for the next class by reading Chapter 3 of the textbook.
5. Chapter 3: "Exploring Data Overview":

Students will learn how to calculate word proportions within a corpus and extract key characteristics from language data.
Preparation for the upcoming short test 5 100minutes
Calculate the adjusted frequency of a specific word across multiple corpora and submit your observations as a report.
Prepare for the next class by reading Chapter 4 of the textbook.
6. Chapter 4: "Visualizing Data":

Students will practice visualizing frequency tables through graphs. They will learn to represent corpus features by creating scatter plots (for correlation and regression analysis), histograms, and mosaic plots.
Preparation for the upcoming short test 6 100minutes
Create graphs using a pre-existing corpus and submit your observations as a report.
Prepare for the next class by reading Chapter 5 of the textbook.
7. Chapter 5: "Examining Data Differences":

Students will practice using inferential statistics, specifically the chi-square test for independence, to determine whether there are significant differences in word proportions within the corpus. They will also learn about effect size.
Preparation for the upcoming short test 7 100minutes
Perform a chi-square analysis using the given data and submit the results as a report.
Prepare for the next class by reading Chapter 6 of the textbook.
8. Chapter 6: "Extracting Data Features":

Students will continue practicing inferential statistics, focusing on the chi-square test and the likelihood ratio test for independence. This time, they will compare word proportions within the total word count of two corpora to identify distinguishing features. Additionally, students will learn methods to compare corpora using standardized frequencies that do not rely on total word count.
Preparation for the upcoming short test 8 100minutes
Perform a chi-square analysis using the given data and submit the results as a report.
Prepare for the next class by reading Chapter 7 of the textbook.
9. Chapter 7: "Measuring the Strength of Data Associations":

Students will learn various methods for measuring co-occurrence strength. Using the collected language data, they will create co-occurrence networks and practice visualizing co-occurrences.
Preparation for the upcoming short test 9 100minutes
Create a co-occurrence network using a self-made corpus and interpret meaningful results from the extracted concepts, then submit a report.
Prepare for the next class by reading Chapter 8 of the textbook.
10. Chapter 8: "Observing Data Changes":

Students will use regression analysis to examine changes in language usage over time.
Preparation for the upcoming short test 10 100minutes
Students are expected to submit the assignments given in class.
Prepare for the next class by reading Chapter 10 of the textbook.
11. Chapter 10: "Grouping Data":

Students will learn methods for correspondence analysis and topic modeling.
Preparation for the upcoming short test 11 100minutes
Students are expected to submit the assignments given in class.
12. Building and Analyzing a Corpus for the Final Assignment (1):

Students will collect the language data necessary for the final assignment, build a corpus, and analyze it using the techniques learned throughout the course.
Collect language data using the methods learned in class, construct a corpus, and analyze it to write the final report. In doing so, students must use at least three of the analytical methods learned during the course. 300minutes
300minutes
13. Building and Analyzing a Corpus for the Final Assignment (2):

Students will collect the language data necessary for the final assignment, build a corpus, and analyze it using the techniques learned throughout the course.
Collect language data using the methods learned in class, construct a corpus, and analyze it to write the final report. In doing so, students must use at least three of the analytical methods learned during the course. 300minutes
14. Presentation and review of the final project: Each student will present the results of his/her analysis and discuss the methodology and results with the peers. Discuss people's perceptions of language and society as well as social trends revealed by the analysis of language. For presentation: Submit a final report on the theme presented in the class. Be prepared to explain the contents of your analysis to your peers. 300minutes
Total. - - 2300minutes
Evaluation method and criteria
The grade will be based on short tests (40%), assignments (30%), and the final project (Academic report) (30%), with 60% or more being awarded overall.
Students will have reached the 60% level if they have a general understanding of the knowledge required for linguistic research, analyze the results by themselves using the necessary methods, report and discuss the results, and present future tasks.
Feedback on exams, assignments, etc.
ways of feedback specific contents about "Other"
Feedback in the class
Textbooks and reference materials
1. Software: A laptop computer with Windows is required for the exercises in each class. Since the software we will be using is not compatible with Macs, tablets, or smartphones.
2. Also, please prepare earphones for the video material.
3. Textbook: "An Introduction to Data Science for Linguistics" by Yuichiro Kobayashi, Asakura Shoten (2,700 yen).
4. Other materials: Papers, video materials, etc.
Prerequisites
Prepare a laptop with Windows and earphones. Please think about the linguistic phenomena you are interested in.
Office hours and How to contact professors for questions
  • The instructor is available to students before and after each session or via email to answer questions.
Regionally-oriented
Non-regionally-oriented course
Development of social and professional independence
  • Course that cultivates an ability for utilizing knowledge
  • Course that cultivates a basic problem-solving skills
Active-learning course
About half of the classes are interactive
Course by professor with work experience
Work experience Work experience and relevance to the course content if applicable
N/A 該当しない
Education related SDGs:the Sustainable Development Goals
  • 4.QUALITY EDUCATION
Last modified : Tue Sep 17 18:24:53 JST 2024