Audio versus Multimodal Input: A Case Study of Speech Perception among Learners in English as a Foreign Language (EFL) Context

Authors

  • Sartika Putri Sailuddin Universitas Negeri Jakarta
  • Zainal Rafli Universitas Negeri Jakarta
  • Muhammad Kamal bin Abdul Hakim Universitas Negeri Jakarta

DOI:

https://doi.org/10.34050/els-jish.v8i4.49140

Keywords:

speech perception, multimodal input, EFL

Abstract

This study investigates how audio-only and multimodal (audio plus visual) techniques affect speech perception among Indonesian EFL learners. Using a quasi-experimental design, 60 third-semester English literature students were divided into two groups that received either an audio recording or an audio-visual video of the same narrative, “The Little Red Hen,” followed by a 20-item speech perception test and questionnaires on emotional engagement and learning satisfaction. Results show that the multimodal group achieved significantly higher comprehension scores than the audio-only group, with a large effect size indicating a substantial advantage of visual cues such as facial expressions and gestures in supporting listening. Correlation analyses also revealed significant positive relationships between emotional engagement, learning satisfaction, and speech perception in both conditions, with stronger coefficients for the multimodal group. These findings suggest that multimodal input not only improves comprehension by reducing cognitive load and enriching contextual information but also enhances affective factors that are crucial for successful language learning. The study recommends that EFL educators incorporate multimodal materials to optimize listening instruction and calls for further research on the long-term impact of different visual cue types in varied learning contexts.

References

Adinda, R., Sosrohadi, S., Syafitri, B. A., & Andini, C. (2025). Cognitive And Cultural Barriers In Synonim Acquisition: A Psycolinguistic Study Of Indonesian Learners Of Korean. TPM–Testing, Psychometrics, Methodology in Applied Psychology, 32(4), 881-888.

Agybayeva, S., Orazayeva, G., Shubayeva, G., & Denissova, I. (2025). Evaluating educational achievements in inclusive classrooms: a quasi-experimental study using information technologies for students with special educational needs. Education and Information Technologies. https://doi.org/10.1007/s10639-025-13792-2

Al-Muttairi, F. Z. S., & Al-Alusi, A. H. S. H. (2025). Multimodal Communication in ESL Learning: Examining the Integration of Visual, Auditory, and Textual Elements in Digital Media with a Focus On Quality Education (SDG 4). Journal of Lifestyle and SDGs Review, 5(3). https://doi.org/10.47172/2965-730x.sdgsreview.v5.n03.pe04773

Bekaryan, L. (2016). Developing Learners’ Top-Down Processing Skills in Listening. Armenian Folia Anglistika, 12, 74. https://doi.org/10.46991/afa/2016.12.1.074

Bernabeu, A. P. (2019). Comprensión auditiva y percepción multimodal: una nueva mirada a la comprensión de la oralidad desde la coherencia al paradigma comunicativo de enseñanza de idiomas. Doblele Revista de Lengua y Literatura, 5, 47. https://doi.org/10.5565/rev/doblele.57

Cárdenas‐Claros, M. S., Sydorenko, T., Huntley, E., & Perez, M. M. (2023). Teachers’ voices on multimodal input for second or foreign language learning. Language Teaching Research. https://doi.org/10.1177/13621688231216044

Chang, C., Lei, H., & Tseng, J.-S. (2011). Media presentation mode, English listening comprehension and cognitive load in ubiquitous learning environments: Modality effect or redundancy effect? Australasian Journal of Educational Technology, 27(4). https://doi.org/10.14742/ajet.942

Chikha, A. B., Hawani, A., Eken, Ö., Goumni, C., Zoghlami, W., Mrayeh, M., Kurtoğlu, A., Souissi, N., & Aldhahi, M. I. (2024). The impact of the “treasure game” on geometric thinking and post-learning mood in first-grade children. Medicine, 103(50). https://doi.org/10.1097/md.0000000000040695

Fathi, T. E., Saad, A., Larhzil, H., Lamri, D., & Ibrahmi, E. M. A. (2025). Integrating generative AI into STEM education: enhancing conceptual understanding, addressing misconceptions, and assessing student acceptance. Disciplinary and Interdisciplinary Science Education Research, 7(1). https://doi.org/10.1186/s43031-025-00125-z

Feijóo, S., & Anglada, M. (2024). Multimodal input in the foreign language classroom: the use of hand gesture to teach morphology in L2 Spanish. Frontiers in Communication, 9. https://doi.org/10.3389/fcomm.2024.1370898

Feng, Q., & Guo, Z. (2024). A Case Study: Investigating High School English Student Engagement in Language Learning Through YouTube Music Videos. Forum for Linguistic Studies, 7(1), 260. https://doi.org/10.30564/fls.v7i1.7631

Fernández-Pacheco, N. N. (2018). The Impact of Multimodal Ensembles on Audio-Visual Comprehension: Implementing Vodcasts in EFL Contexts. Multimodal Communication, 7(2). https://doi.org/10.1515/mc-2018-0002

Frei, V., & Giroud, N. (2025). Presenting Natural Continuous Speech in a Multisensory Immersive Environment Improves Speech Comprehension and Reflects the Allocation of Processing Resources in Neural Speech Tracking. Journal of Cognitive Neuroscience, 1. https://doi.org/10.1162/jocn_a_02306

Guo, X. (2023). Multimodality in language education: implications of a multimodal affective perspective in foreign language teaching. Frontiers in Psychology, 14. https://doi.org/10.3389/fpsyg.2023.1283625

Hardison, D. M., & Pennington, M. C. (2020). Multimodal Second-Language Communication: Research Findings and Pedagogical Implications. RELC Journal, 52(1), 62. https://doi.org/10.1177/0033688220966635

Hidayati, D., Dharmawan, Y. Y., Prasatyo, B. A., & Luciana, L. (2024). Exploring how translanguaging and multimodal learning improve EFL students’ enjoyment and proficiency. Journal on English as a Foreign Language, 14(2), 446. https://doi.org/10.23971/jefl.v14i2.8012

Hu, L., & Wang, H. (2024). Unplugged activities in the elementary school mathematics classroom: The effects on students’ computational thinking and mathematical creativity. Thinking Skills and Creativity, 54, 101653. https://doi.org/10.1016/j.tsc.2024.101653

Hu, S. (2024). The Effect of Artificial Intelligence-Assisted Personalized Learning on Student Learning Outcomes: A Meta-Analysis Based on 31 Empirical Research Papers. Science Insights Education Frontiers, 24(1), 3873. https://doi.org/10.15354/sief.24.re395

Huang, Y., Zhang, Z., Yu, J., Liu, X., & Huang, Y. (2022). English Phrase Learning With Multimodal Input. Frontiers in Psychology, 13, 828022. https://doi.org/10.3389/fpsyg.2022.828022

Idris, K. (2018). Teaching and learning statistics in college: how learning materials should be designed. Journal of Physics Conference Series, 1088, 12032. https://doi.org/10.1088/1742-6596/1088/1/012032

Kajiura, M., Kinoshita, T., & Smith, A. B. (2023). Fast-Rate Multimodal Training Improves L2 Listening and Fast-Speech Adjusting Skills. System, https://doi.org/10.2139/ssrn.4467038

Kajiura, M., Smith, A. B., & Kinoshita, T. (2025). Deferred multimodal input enhances L2 listening and fast-speech adaptation: A predictive coding perspective. System, 135, 103852. https://doi.org/10.1016/j.system.2025.103852

Karabıyık, C., Arslan, S., & Kavaklı, N. (2022). Comparison of input modes: L2 comprehension and cognitive load. Participatory Educational Research, 9(6), 173. https://doi.org/10.17275/per.22.134.9.6

Karubaba, S., & Rahman, F. (2025). Code-Switching and Code-Mixing in Indonesian EFL Classrooms: Teacher-Student Interactions in North Biak. Dialectica Online Publishing Journal, 1(1), 107-115.

Katz, W. F., & Mehta, S. (2015). Visual Feedback of Tongue Movement for Novel Speech Sound Learning. Frontiers in Human Neuroscience, 9. https://doi.org/10.3389/fnhum.2015.00612

Kim, N. (2021). The More, the Better? Effects of Multiple Modalities on EFL Listening and Reading Comprehension. STEM Journal, 22(3), 29. https://doi.org/10.16875/stem.2021.22.3.29

Lai, C.-J. (2024). Examining the impact of multimodal task design on English oral communicative competence in fourth-grade content-language integrated social studies: A quasi-experimental study. Asian-Pacific Journal of Second and Foreign Language Education, 9(1). https://doi.org/10.1186/s40862-024-00289-7

Lan, Y. (2013). Towards a Revised Motor Theory of L2 Speech Perception. In Proceedings of the 27th Pacific Asia Conference on Language, Information and Computation (pp. 136-142). https://aclanthology.org/Y13-1011/

Li, W., Yu, J., Zhang, Z., & Liu, X. (2022). Dual Coding or Cognitive Load? Exploring the Effect of Multimodal Input on English as a Foreign Language Learners’ Vocabulary Learning. Frontiers in Psychology, 13, 834706. https://doi.org/10.3389/fpsyg.2022.834706

Lin, Y., Yang, L., & Ergün, A. L. P. (2025). EFL learners’ learning involvement and emotions in the integration of technology in their classrooms. Porta Linguarum Revista Interuniversitaria de Didáctica de Las Lenguas Extranjeras, 89. https://doi.org/10.30827/portalin.vixiii.33529

Muñoz, C., Pujadas, G., & Pattemore, A. (2021). Audio-visual input for learning L2 vocabulary and grammatical constructions. Second Language Research, 39(1), 13. https://doi.org/10.1177/02676583211015797

Nushi, M., & Jahanbin, P. (2024). The Effect of Audio-Assisted Reading on Incidental Learning of Present Perfect by EFL Learners. Open Education Studies, 6(1). https://doi.org/10.1515/edu-2024-0043

Pangaribuan, T., Sinaga, A. V., & Sipayung, K. T. (2017). The Effectiveness of Multimedia Application on Students Listening Comprehension. English Language Teaching, 10(12), 212. https://doi.org/10.5539/elt.v10n12p212

Panyathikul, W., Poopatwiboon, S., & Phusawisot, P. (2024). Improving EFL Secondary Learners’ Pronunciation through Multimodal Teaching. Journal of Education and Learning, 14(1), 148. https://doi.org/10.5539/jel.v14n1p148

Polydoros, G., & Antoniou, A. (2025). Empowering Students with Learning Disabilities: Examining Serious Digital Games’ Potential for Performance and Motivation in Math Education. Behavioral Sciences, 15(3), 282. https://doi.org/10.3390/bs15030282

Portillo, J. L. del, & Bernal-Ballén, A. (2022). Video and audio platforms for improving listening skills in Spanish´s students of EFL: A preliminary and descriptive study. ELT Forum Journal of English Language Teaching, 11(2), 73. https://doi.org/10.15294/elt.v11i2.50910

Rababah, L., Al-Khawaldeh, N., & Rababah, M. A. (2023). Mobile-Assisted Listening Instructions with Jordanian Audio Materials: A Pathway to EFL Proficiency. International Journal of Interactive Mobile Technologies (iJIM), 17(21), 129. https://doi.org/10.3991/ijim.v17i21.42789

Radjuni, M., Sahraeny, S., & Latief, M. R. A. (2025). The Relationship between Self-Efficacy and EFL Students' Speaking Performance: A Case Study of English Department Students. SHIELD: Journal of Studies on Human Interaction, Education, and Language Development, 1(1).

Rahman, F., Abbas, A., Hasyim, M., Rahman, F., Abbas, A., & Hasyim, M. (2019). Facebook group as media of learning writing in ESP context: A case study at Hasanuddin University. Asian EFL Journal Research Articles, 26(6.1), 153-167.

Rahmanu, I. W. E. D., & Molnár, G. (2024). Multimodal immersion in English language learning in higher education: A systematic review [Review of Multimodal immersion in English language learning in higher education: A systematic review]. Heliyon, 10(19). Elsevier BV. https://doi.org/10.1016/j.heliyon.2024.e38357

Rohi, M. P., & Nurhayati, L. (2024). Multimodal Learning Strategies in Secondary EFL Education: Insights from Teachers. Voices of English Language Education Society, 8(2). https://doi.org/10.29408/veles.v8i2.26546

Rong, W., & Fan, L. (2022). On-Screen Texts in Audiovisual Input for L2 Vocabulary Learning: A Review [Review of On-Screen Texts in Audiovisual Input for L2 Vocabulary Learning: A Review]. Frontiers in Psychology, 13. Frontiers Media. https://doi.org/10.3389/fpsyg.2022.904523

Salamanti, E., Park, D., Ali, N., & Brown, S. (2023). Efficacy of Collaborative and Multimodal Learning Strategies in Enhancing English Language Proficiency Among ESL/EFL Learners: A Quantitative Analysis. Research Studies in English Language Teaching and Learning, 1(2). https://doi.org/10.62583/rseltl.v1i2.11

Sayed, W. S., Noeman, A. M., Abdellatif, A., Abdel-Razek, M., Badawy, M. G., Hamed, A., & El-Tantawy, S. (2022). AI-based adaptive personalized content presentation and exercises navigation for an effective and engaging E-learning platform. Multimedia Tools and Applications, 82(3), 3303. https://doi.org/10.1007/s11042-022-13076-8

Shamsi, E., & Bozorgian, H. (2024). Collaborative listening using multimedia through metacognitive instruction: a case study with less-skilled and more-skilled EFL learners. Asian-Pacific Journal of Second and Foreign Language Education, 9(1). https://doi.org/10.1186/s40862-023-00248-8

Shaojie, T., Samad, A. A., & Ismail, L. (2022). Systematic literature review on audio-visual multimodal input in listening comprehension [Review of Systematic literature review on audio-visual multimodal input in listening comprehension]. Frontiers in Psychology, 13. Frontiers Media. https://doi.org/10.3389/fpsyg.2022.980133

Sodiq, S., Indarti, T., Resdianto, P. R., Rokib, R., & Wijaya, T. (2023). Implementation of Multimodal Literacy Principles in Scientific Journal Article Writing Course: Enhancing Learning Experience in Indonesian Language Education. In Advances in Social Science, Education and Humanities Research/Advances in social science, education and humanities research (p. 893). https://doi.org/10.2991/978-2-38476-152-4_86

Song, J., & Iverson, P. (2018). Listening effort during speech perception enhances auditory and lexical processing for non-native listeners and accents. Cognition, 179, 163. https://doi.org/10.1016/j.cognition.2018.06.001

Sun, W. (2023). The impact of automatic speech recognition technology on second language pronunciation and speaking skills of EFL learners: a mixed methods investigation. Frontiers in Psychology, 14, 1210187. https://doi.org/10.3389/fpsyg.2023.1210187

Toyama, M., & Hori, T. (2025). Technology-enhanced multimodal approaches in classroom L2 pronunciation training. Frontiers in Education, 10. https://doi.org/10.3389/feduc.2025.1552470

Weda, S., Atmowardoyo, H., Rahman, F., Said, M. M., & Sakti, A. E. F. (2021). Factors Affecting Students' Willingness to Communicate in EFL Classroom at Higher Institution in Indonesia. International Journal of Instruction, 14(2), 719-734.

Weda, S., Rahman, F., Atmowardoyo, H., Samad, I. A., Fitriani, S. S., Said, M. M., & Sakti, A. E. F. (2022). Intercultural Communicative Competence of Students from Different Cultures in EFL Classroom Interaction in Higher Institution. International Journal of Research on English Teaching and Applied Linguistics, 3(1), 1-23.

Yang, K.-H., Chu, H., Hwang, G.-J., & Liu, T. ‐Y. (2025). A progressive concept map-based digital gaming approach for mathematics courses. Educational Technology Research and Development, 73(3), 1623. https://doi.org/10.1007/s11423-025-10461-6

Yang, L. (2022). Student Engagement With Teacher Feedback in Pronunciation Training Supported by a Mobile Multimedia Application. SAGE Open, 12(2). https://doi.org/10.1177/21582440221094604

Yang, Z., & Yang, H. (2024). Integrating gesture and posture analysis in enhancing English language teaching effectiveness. Molecular & Cellular Biomechanics, 21(3), 571. https://doi.org/10.62617/mcb571

Yaumi, M. T. A. H., Rahman, F., & Sahib, H. (2023). Exploring WhatsApp as Teaching and Learning Activities during Covid-19/New Normal era: A Semiotic Technology Analysis. International Journal of Current Science Research and Review, 6(12), 7627-7634.

Zeng, Y. (2023). The Application of Multimodal Learning to Enhance Language Proficiency in Oral English Teaching. Adult and Higher Education, 5(18). https://doi.org/10.23977/aduhe.2023.051806

Zhang, P., & Yue, P. (2024). Multimodal reading in reading-only versus reading-while-listening modes: evidence from Chinese language learners. Chinese as a Second Language Research, 13(2), 215. https://doi.org/10.1515/caslar-2024-2003

Zhang, Y., Ding, R., Frassinelli, D., Tuomainen, J., Klavinskis-Whiting, S., & Vigliocco, G. (2023). The role of multimodal cues in second language comprehension. Scientific Reports, 13(1). https://doi.org/10.1038/s41598-023-47643-2

Zhang, Y., Frassinelli, D., Tuomainen, J., Skipper, J. I., & Vigliocco, G. (2021). More than words: word predictability, prosody, gesture and mouth movements in natural language comprehension. Proceedings of the Royal Society B Biological Sciences, 288(1955), 20210500. https://doi.org/10.1098/rspb.2021.0500

Downloads

Published

2025-12-26

How to Cite

Putri Sailuddin, S., Rafli, Z., & Kamal bin Abdul Hakim, M. (2025). Audio versus Multimodal Input: A Case Study of Speech Perception among Learners in English as a Foreign Language (EFL) Context. ELS Journal on Interdisciplinary Studies in Humanities, 8(4), 1291–1301. https://doi.org/10.34050/els-jish.v8i4.49140

Issue

Section

Articles

Similar Articles

1 2 3 4 5 6 7 8 9 10 > >> 

You may also start an advanced similarity search for this article.