A Reassessment of Chomsky’s View on the Use of Corpus Databases in Linguistic Research: Between Theoretical Challenges and Empirical Opportunities
DOI:
https://doi.org/10.1234/ic.v1i1.62245Keywords:
corpus database, linguistics, artificial intelligence, syntax, digital dataAbstract
In the generative linguistics tradition, Noam Chomsky has consistently rejected the use of empirical corpus data to study language structure, especially in syntax research. He believes that native speaker intuition is more important in language studies and argues that corpus data is not reliable because it can be affected by variation and does not show true linguistic competence. However, with the fast growth of artificial intelligence and language technologies, the availability of large corpus databases, and the increasing need for wider empirical analysis, this view has been debated again in today’s linguistic research. This paper aims to re-examine Chomsky’s arguments against corpus use by applying a corpus-based method in syntax studies. This can help us understand universal syntactic structures more clearly. Some challenges of using corpora include their limits in showing native speaker competence, the lack of negative data, their inability to reflect how the mind works, and the possibility of biased or limited data. However, there are also new opportunities in corpus-based research, such as having access to billions of words from many sources and types of texts, using advanced technology to find morphosyntactic patterns, and using big data to test hypotheses and theories. In conclusion, combining corpus-based research with theory is very important today. Corpus data is not an enemy of theory, it is a valuable tool that supports and strengthens modern linguistic analysis.Downloads
Published
2025-12-07
