A Theoretical Review of Multimodal Teaching Methods for Enhancing English Oral Communication Competence
DOI:
https://doi.org/10.14456/au-ejir.2026.9Keywords:
Multimodal Teaching, English Oral Communication, Systematic Theoretical Review, Input-Interaction-Output Pedagogy, Multimodal Communicative Competence Model (MCCM)Abstract
This Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guided systematic theoretical review synthesizes multimodal teaching interventions published between 2019 and 2024 to evaluate how visual, auditory, gestural, and interactive modalities influence English language learners’ oral communication competence. Across five databases (Web of Science, Scopus, ERIC, PubMed, and Google Scholar), 73 records were identified, 51 underwent full-text screening, and 41 studies met the inclusion criteria for final analysis. Findings indicate that multimodal role-play and task-based activities consistently enhance learners’ willingness to communicate and sustain engagement. Computer-assisted pronunciation training (CAPT) and automated speech recognition (ASR) tools produce moderate improvements in intelligibility when incorporated into scaffolded instructional cycles. Conversational AI chatbots provide additional speaking opportunities but yield meaningful gains only under structured pedagogical guidance. Immersive virtual and augmented reality environments reliably lower speaking anxiety and increase motivation, though effects on fluency and accuracy depend on task complexity and cognitive load management. This review’s unique contribution is the introduction of the Multimodal Communicative Competence Model (MCCM) and a three-stage Input-Interaction-Output pedagogy, which together offer a coherent framework for aligning multimodal affordances with oral competence development. Practical implications include improved instructional design, multimodal assessment practices, and targeted teacher training to support effective integration.
References
Amrate, M., & Tsai, Y. (2024). Automated speech recognition-supported pronunciation training in EFL classrooms: Impacts on learner clarity and confidence. Computer Assisted Language Learning, 37(2), 455-478.
Anggraini, R., & Apriana, D. (2023). Augmented reality-enhanced speaking tasks for Indonesian senior high school students. Indonesian Journal of Applied Linguistics, 13(1), 25-39.
Asratie, S., Wale, A., & Aylet, A. (2023). Educational speaking technologies and their effects on EFL learners’ fluency and lexical richness. Journal of Language and Education, 9(3), 112-125.
Bateman, J. A. (2014). Text and image: A critical introduction to the visual/verbal divide. Routledge.
Canale, M., & Swain, M. (1980). Theoretical bases of communicative approaches to second language teaching and testing. Applied Linguistics, 1(1), 1-47.
Celce-Murcia, M. (2021). Rethinking communicative competence in the multimodal era. TESOL Quarterly, 55(2), 470-499.
Chen, H. (2020). Effects of multimodal input on L2 listening and pronunciation development. Language Learning & Technology, 24(3), 45-64.
Chandler, P., & Sweller, J. (1991). Cognitive load theory and the format of instruction. Cognition and Instruction, 8(4), 293-332.
Chiew, W. H., Tan, P. J., & Ramli, N. A. (2025). AR-supported pronunciation and speaking gains among young ESL learners. Journal of Educational Technology & Society, 28(1), 77-92.
Djonov, E., & Zhao, S. (2023). Assessing multimodal communication in language education: New directions and challenges. Multimodal Communication, 12(1), 89-110.
Furlong, G. (2009). Mediated learning and technological tools in L2 development. In R. Batstone (Ed.), Sociocognitive perspectives (pp. 129-148). Oxford University Press.
Guo, Y., Chen, P., & Guo, X. (2024). Trends in multimodal pedagogy research: A bibliometric and content analysis (2012-2023). System, 119, 103095.
Hardison, D. (2018). Prosody training and L2 speaking development. Studies in Second Language Acquisition, 40(2), 447-472.
Hellermann, J., & Cole, E. (2009). Multimodality and classroom interaction in L2 learning. Journal of Applied Linguistics, 6(1), 63-86.
Inceoglu, S., Chen, H., & Lim, H. (2023). Accent bias and error patterns in ASR-based pronunciation assessment. Computer Assisted Language Learning, 36(5), 1257-1280.
Idham, A., Yusuf, Y. Q., & Gani, S. A. (2022). Role-play effects on EFL university students’ fluency and communicative confidence. Asian EFL Journal, 24(2), 89-110.
Jenkins, J., & Meng, Q. (2020). Multimodal pragmatics in L2 interaction: Gesture-speech coordination among EFL learners. Journal of Pragmatics, 163, 1-13.
Jewitt, C. (2016). The Routledge handbook of multimodal analysis (2nd ed.). Routledge.
Klimova, A., Park, H., & Lee, J. (2024). AI chatbots in EFL instruction: Learner anxiety, engagement, and speaking development. ReCALL, 36(1), 78-99.
Kress, G., & van Leeuwen, T. (2021). Reading images: The grammar of visual design (3rd ed.). Routledge.
Lantolf, J. P., Thorne, S., & Poehner, M. (2022). Sociocultural theory and second language development (2nd ed.). Oxford University Press.
Mayer, R. E. (2009). Constructivism as a theory of learning versus constructivism as a prescription for instruction. In Constructivist instruction. Routledge.
Mayer, R. E. (2020). Multimedia learning (3rd ed.). Cambridge University Press.
McNeill, D. (2020). Gesture and thought (2nd ed.). University of Chicago Press.
Miyazoe, T. (2024). Voice-AI partners in EFL speaking instruction: Effects on engagement and oral performance. Language Learning & Technology, 28(1), 1-25.
Moreno, R., & Mayer, R. (2007). Interactive multimodal learning environments. Educational Psychology Review, 19(3), 309-326.
Neri, A., Mich, O., Gerosa, M., & Giuliani, D. (2021). ASR-based corrective feedback and its impact on segmental and suprasegmental improvement. Computer Speech & Language, 68, 101199.
Ngo, H., Chen, J., & Lai, Y. (2024). Effects of CAPT and ASR technologies on L2 pronunciation learning: A meta-analysis. System, 122, 103220.
O’Halloran, K. (2011). Multimodal discourse analysis. In K. Hyland & B. Paltridge (Eds.), Continuum companion to discourse analysis (pp. 120-137). Bloomsbury.
Plass, J. L., & Moreno, R. (2017). Multimedia learning through design: Cognitive and affective perspectives. Routledge.
Plass, J., Moreno, R., & Brünken, R. (2010). Cognitive load theory and multimedia learning. Cambridge University Press.
Saito, K., & Plonsky, L. (2019). Effects of pronunciation instruction on L2 comprehensibility: A synthesis of research findings. TESOL Quarterly, 53(4), 1160-1187.
Sastre, C., Lambert, C., & Tucker, B. (2022). Creativity and multimodal expression in second language development. Applied Linguistics, 43(4), 712-739.
Sueyoshi, A., & Hardison, D. M. (2005). The role of gestures and facial cues in second language listening comprehension. Language Learning, 55(4), 661-699.
Sweller, J., Ayres, P., & Kalyuga, S. (2019). Cognitive load theory (2nd ed.). Springer.
Thomas, J., & Harden, A. (2008). Methods for thematic synthesis of qualitative research. BMC Medical Research Methodology, 8(1), 45.
Uchihara, T., & Saito, K. (2019). Second language fluency: A study of measurement, development, and pedagogy. Language Teaching Research, 23(4), 1-25.
Vygotsky, L. S. (1978). Mind in society: The development of higher psychological processes. Harvard University Press.
Winke, P., Gass, S., & Sydorenko, T. (2018). The effectiveness of captioned and uncaptioned videos for L2 listening comprehension. Studies in Second Language Acquisition, 40(3), 551-577.
Yudintseva, A. (2023). Immersive VR and speaking anxiety reduction among L2 learners: A systematic review. Journal of Virtual Learning Research, 15(2), 87-108.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Jian Zu, Marilyn Fernandez Deocampo

This work is licensed under a Creative Commons Attribution 4.0 International License.
A separate Copyright Form will be sent to authors whose paper is accepted for publication.

