Schedule: 2009-09-11 (12:45 - 13:30)
Parallel Session 5 (Room A-31)
Title: How should we build a word list for teaching Academic Japanese? - A straightforward approach
Authors: Yoshiko Muraki, Kaori Miyatake, Kohji Shibano
Abstract: This study aims to reconsider the procedure of building a word list for language teaching. Various word lists have been made for vocabulary teaching. Leech, Rayson, and Wilson (2001) made a word frequency lists based on the British National Corpus (BNC). A word list made by Coxhead (2000) developed for English for academic purposes (EAP). Coxhead’s word list is derived from written academic texts (3.5 million words).
Oshio et al. (2008) reported a process of making a new word list for Japanese language proficiency test (JLPT) which is to be revised from 2010. The new word list (Yamauchi 2008) is based on Japanese dictionaries and spoken language data.
The problem here is what kind of corpus data is the most appropriate for a word list building for teaching Academic Japanese. Word lists of Leech, Rayson, and Wilson (2001) and Oshio et al. (2008) are supposed to be reflect language in daily life. Coxhead (2000) limits academic vocabulary. However Japanese learners who want to get higher education must learn both daily use and academic use language.
Thus we have developed a corpus based on texts for Japanese high school education comprising 160 textbooks and the size of the corpus is about 12 million words (Yu 2008). The Japanese textbook corpus properly reflects native Japanese students’ vocabulary. The textbook corpus was processed by using Japanese POS tagger called ChaSen (Matsumoto 2003). The largest Japanese corpus is the Google Japanese N-gram (Google 2007) comprising 255 billion words. Google Japanese 1-gram data comprises of 2,565,428 words. The textbook corpus together with the Google Japanese N-gram properly reflect the academic and daily use of the Japanese language. We have tabulated a word list by subjects based on the Japanese textbook corpus. The selection of words is based on the calculation of weighted sum of numbers of subjects, textbook frequencies, and Google frequencies. In order to certify the validity of the resulted word list, the list is tested by Japanese university students for their passive and active vocabulary. We also compare the word list against existing word lists including JLPT word lists.

Leech,G., Rayson,P., and Wilson, A. (2001). Word frequencies in spoken and written English. London: Longman. [The accompanying Web site is at www.comp.lancs.ac.uk/ucrel/bncfreq/flists.html (14 February 2009)]
Coxhead, A. (2000). A new academic word list. TESOL Quarterly, 34, 213-238.
Oshio, Kazumi, et al. (2008). Toward a Vocabulary list for the new Japanese language proficiency test. The Japan Foundation Japanese-Language Education Bulletin, 3, 71-86.
Yamauchi, Hiroyuki (Ed.). (2008). A Tentative Plan for Japanese Standard –Vocabulary. Tokyo: Hituzi shobo.
Matsumoto, Yuji (2003) Japanese Morphological Analysis System ChaSen. ( Available from http://chasen-legacy.sourceforge.jp/).
Google (2007). Web Japanese N-gram ver.1. (Available from http://www.gsk.or.jp/catalog.html).
Keywords: vocabulary teaching, word list, Academic Japanese, textbook corpus, Google Japanese N-gram
Main topic: Corpora
Biodata: Yoshiko Muraki, Graduate student of Tokyo University of Foreign Studies. Kaori Miyatake, Graduate student of Tokyo University of Foreign Studies. Kohji Shibano, Professor of Research Institute for Languages and Cultures of Asia and Africa, Tokyo University of Foreign Studies.
Type of presentation Paper presentation
Paper category Research
Target educational sector Higher education
Language of delivery English
EU-funded project No