Publications
See Google Scholar for more
# denotes equal contributions
* denotes corresponding authors
2026
- RMALHow do language models handle emotional content in video game localization? A computational linguistics approachXiaojing Zhao, Emmanuele Chersoni, Chu-Ren Huang, and 1 more authorResearch Methods in Applied Linguistics, 2026
This study employs emotion analysis, a natural language processing technique, to examine how language models handle emotional content compared to human translators in video game localization. The analysis is based on a corpus consisting of Chinese subtitles from Black Myth: Wukong, their official English translations, and translations generated by a language model. The findings reveal that, despite similarities between humans and the language model in their translation of emotions, differences exist. Human translators often neutralize emotions through context-dependent strategies, such as omission, addition, and substitution, to address cultural sensitivities and enhance player engagement. In contrast, the language model relies on direct translation to preserve diverse emotions, including negative ones. Such an approach may risk misalignment with the preferences of target audiences due to limited adaptation of tone and cultural nuances. In addition, occasional mistranslation and hallucination were also found. This study highlights the promise of integrating language models into localization workflows and demonstrates the potential of emotion analysis for assessing translation accuracy.
@article{zhao2026language, title = {How do language models handle emotional content in video game localization? A computational linguistics approach}, author = {Zhao, Xiaojing and Chersoni, Emmanuele and Huang, Chu-Ren and Xu, Han}, journal = {Research Methods in Applied Linguistics}, volume = {5}, number = {1}, pages = {100294}, year = {2026}, publisher = {Elsevier}, }
2025
- PerspectivesTranslating vulgar language in video game localization: a case study of Black Myth: WukongXiaojing Zhao, Emmanuele Chersoni, Chu-Ren Huang, and 1 more authorPerspectives, 2025
This study examines the translation of vulgar language in the localization of the video game Black Myth: Wukong, using a corpus of original Chinese subtitles and their English translation. It investigates how the offensiveness of vulgar language varies after translation, the strategies employed, and their impact on the localization process. The findings reveal that the overall offensiveness of vulgar language in the source text was mitigated, suggesting an alignment with the domestication approach to address cultural sensitivities. Specifically, vulgar language related to religion, gender, and social morals tended to be mitigated through softening, omission, or implicitation. On the other hand, vulgar language that enhances character interactions was mostly preserved or intensified to emphasize the game’s themes, enrich character development, and drive the plot, thereby improving player immersion. These results highlight the delicate balance localization attempts to achieve between translation accuracy and cultural acceptability, contributing to the creation of a cross-cultural gaming experience.
@article{zhao2025translating, title = {Translating vulgar language in video game localization: a case study of Black Myth: Wukong}, author = {Zhao, Xiaojing and Chersoni, Emmanuele and Huang, Chu-Ren and Xu, Han}, journal = {Perspectives}, pages = {1--20}, year = {2025}, publisher = {Taylor \& Francis}, } - LM4DH@RANLPCan LLMs Help Sun Wukong in his Journey to the West? A Case Study of Language Models in Video Game LocalizationXiaojing Zhao, Han Xu, Huacheng Song, and 2 more authorsIn Proceedings of the First on Natural Language Processing and Language Models for Digital Humanities, Sep 2025
Large language models (LLMs) have demonstrated increasing proficiency in general-purpose translation, yet their effectiveness in creative domains such as game localization remains underexplored. This study focuses on the role of LLMs in game localization from both linguistic quality and sociocultural adequacy through a case study of the video game Black Myth: Wukong. Results indicate that LLMs demonstrate adequate competence in accuracy and fluency, achieving performance comparable to human translators. However, limitations remain in the literal translation of culture-specific terms and offensive language. Human oversight is required to ensure nuanced cultural authenticity and sensitivity. Insights from human evaluations also suggest that current automatic metrics and the Multidimensional Quality Metrics framework may be inadequate for evaluating creative translation. Finally, varying human preferences in localization pose a learning ambiguity for LLMs to perform optimal translation strategies. The findings highlight the potential and shortcomings of LLMs to serve as collaborative tools in game localization workflows. Data are available at https://github.com/zcocozz/wukong-localization.
@inproceedings{zhao-etal-2025-llms-help, title = {Can {LLM}s Help Sun Wukong in his Journey to the West? A Case Study of Language Models in Video Game Localization}, author = {Zhao, Xiaojing and Xu, Han and Song, Huacheng and Chersoni, Emmanuele and Huang, Chu-Ren}, booktitle = {Proceedings of the First on Natural Language Processing and Language Models for Digital Humanities}, month = sep, year = {2025}, address = {Varna, Bulgaria}, publisher = {INCOMA Ltd., Shoumen, Bulgaria}, pages = {164--173}, } - ACLLearning to Look at the Other Side: A Semantic Probing Study of Word Embeddings in LLMs with Enabled Bidirectional AttentionZhaoxin Feng, Jianfei Ma, Emmanuele Chersoni, and 2 more authorsIn Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Jul 2025
Autoregressive Large Language Models (LLMs) demonstrate exceptional performance in language understanding and generation. However, their application in text embedding tasks has been relatively slow, along with the analysis of their semantic representation in probing tasks, due to the constraints of the unidirectional attention mechanism. This paper aims to explore whether such constraints can be overcome by enabling bidirectional attention in LLMs. We tested different variants of the Llama architecture through additional training steps, progressively enabling bidirectional attention and unsupervised/supervised contrastive learning. Our results show that bidirectional attention improves the LLMs’ ability to represent subsequent context but weakens their utilization of preceding context, while contrastive learning training can help to maintain both abilities.
@inproceedings{feng-etal-2025-learning, title = {Learning to Look at the Other Side: A Semantic Probing Study of Word Embeddings in LLMs with Enabled Bidirectional Attention}, author = {Feng, Zhaoxin and Ma, Jianfei and Chersoni, Emmanuele and Zhao, Xiaojing and Bao, Xiaoyi}, editor = {Che, Wanxiang and Nabende, Joyce and Shutova, Ekaterina and Pilehvar, Mohammad Taher}, booktitle = {Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)}, month = jul, year = {2025}, address = {Vienna, Austria}, publisher = {Association for Computational Linguistics}, pages = {23226--23245}, isbn = {979-8-89176-251-0}, } - PreprintGlobal PIQA: Evaluating Physical Commonsense Reasoning Across 100+ Languages and CulturesTyler A. Chang, Catherine Arnett, Abdelrahman Eldesokey, and 335 more authorsarXiv preprint arXiv:2510.24081, Jul 2025
To date, there exist almost no culturally-specific evaluation benchmarks for large language models (LLMs) that cover a large number of languages and cultures. In this paper, we present Global PIQA, a participatory commonsense reasoning benchmark for over 100 languages, constructed by hand by 335 researchers from 65 countries around the world. The 116 language varieties in Global PIQA cover five continents, 14 language families, and 23 writing systems. In the non-parallel split of Global PIQA, over 50% of examples reference local foods, customs, traditions, or other culturally-specific elements. We find that state-of-the-art LLMs perform well on Global PIQA in aggregate, but they exhibit weaker performance in lower-resource languages (up to a 37% accuracy gap, despite random chance at 50%). Open models generally perform worse than proprietary models. Global PIQA highlights that in many languages and cultures, everyday knowledge remains an area for improvement, alongside more widely-discussed capabilities such as complex reasoning and expert knowledge. Beyond its uses for LLM evaluation, we hope that Global PIQA provides a glimpse into the wide diversity of cultures in which human language is embedded.
@article{chang2025global, title = {Global PIQA: Evaluating Physical Commonsense Reasoning Across 100+ Languages and Cultures}, author = {Chang, Tyler A. and Arnett, Catherine and Eldesokey, Abdelrahman and Sadallah, Abdelrahman and Kashar, Abeer and Daud, Abolade and Olanihun, Abosede Grace and Mohammed, Adamu Labaran and Praise, Adeyemi and Sharma, Adhikarinayum Meerajita and Gupta, Aditi and Iyigun, Afitab and Simplício, Afonso and Essouaied, Ahmed and Chorana, Aicha and Eppa, Akhil and Oladipo, Akintunde and Ramesh, Akshay and Dorkin, Aleksei and Kondoro, Alfred Malengo and Aji, Alham Fikri and Çetintaş, Ali Eren and Hanbury, Allan and Dembele, Alou and Niksarli, Alp and Arroyo, Álvaro and Bajand, Amin and Khanna, Amol and Chkhaidze, Ana and Condez, Ana and Mkhonto, Andiswa and Hoblitzell, Andrew and Tran, Andrew and Poulis, Angelos and Majumder, Anirban and Vacalopoulou, Anna and Wong, Annette Kuuipolani Kanahele and Simonsen, Annika and Kovalev, Anton and S, Ashvanth. and Lana, Ayodeji Joseph and Kinay, Barkin and Alhafni, Bashar and Busole, Benedict Cibalinda and Ghanem, Bernard and Nathani, Bharti and Đurić, Biljana Stojanovska and Agbonile, Bola and Bergsson, Bragi and Fischer, Bruce Torres and Tutar, Burak and Çınar, Burcu Alakuş and Kane, Cade J. Kanoniakapueo and Udomcharoenchaikit, Can and Arnett, Catherine and Helwe, Chadi and Nerella, Chaithra Reddy and Liu, Chen Cecilia and Nwokolo, Chiamaka Glory and España-Bonet, Cristina and Amol, Cynthia and Lee, DaeYeop and Arad, Dana and Dzenhaliou, Daniil and Pugacheva, Daria and Choi, Dasol and Abolade, Daud and Liu, David and Semedo, David and Popoola, Deborah and Mataciunas, Deividas and Nyaboke, Delphine and Kumar, Dhyuthy Krishna and Glória-Silva, Diogo and Tavares, Diogo and Goyal, Divyanshu and Lee, DongGeon and Anajemba, Ebele Nwamaka and Grace, Egonu Ngozi and Mickel, Elena and Tutubalina, Elena and Herranen, Elias and Anand, Emile and Habumuremyi, Emmanuel and Ajiboye, Emuobonuvie Maria and Yulianrifat, Eryawan Presma and Adenuga, Esther and Rudnicka, Ewa and Itiola, Faith Olabisi and Butt, Faran Taimoor and Thekkekara, Fathima and Haouari, Fatima and Tjiaranata, Filbert Aurelian and Laakom, Firas and Grasso, Francesca and Orabona, Francesco and Periti, Francesco and Solomon, Gbenga Kayode and Ngo, Gia Nghia and Udhehdhe-oze, Gloria and Martins, Gonçalo and Challagolla, Gopi Naga Sai Ram and Son, Guijin and Abdykadyrova, Gulnaz and Einarsson, Hafsteinn and Hu, Hai and Saffari, Hamidreza and Zaidi, Hamza and Zhang, Haopeng and Shairah, Harethah Abu and Vuong, Harry and Kuulmets, Hele-Andra and Bouamor, Houda and Yu, Hwanjo and Debess, Iben Nyholm and Deveci, İbrahim Ethem and Hanif, Ikhlasul Akmal and Cho, Ikhyun and Calvo, Inês and Vieira, Inês and Manzi, Isaac and Daud, Ismail and Itzhak, Itay and Iuliia and Alekseenko and Belashkin, Ivan and Spada, Ivan and Zhelyazkov, Ivan and Brinton, Jacob and Isbarov, Jafar and Čibej, Jaka and Čuhel, Jan and Kocoń, Jan and Krito, Jauza Akbar and Purbey, Jebish and Mickel, Jennifer and Za, Jennifer and Kunz, Jenny and Jeong, Jihae and Dávalos, Jimena Tena and Lee, Jinu and Magalhães, João and Yi, John and Kim, Jongin and Chataignon, Joseph and Imperial, Joseph Marvin and Thevakumar, Jubeerathan and Land, Judith and Jiang, Junchen and Kim, Jungwhan and Sirts, Kairit and R, Kamesh and V, Kamesh and Tshinu, Kanda Patrick and Kukk, Kätriin and Ponkshe, Kaustubh and Huseynova, Kavsar and He, Ke and Buchanan, Kelly and Sarveswaran, Kengatharaiyer and Zaman, Kerem and Mrini, Khalil and Kyars, Kian and Kruusmaa, Krister and Chouhan, Kusum and Krishnakumar, Lainitha and Sánchez, Laura Castro and Moscoso, Laura Porrino and Choshen, Leshem and Sencan, Levent and Øvrelid, Lilja and Alazraki, Lisa and Ehimen-Ugbede, Lovina and Thevakumar, Luheerathan and Thavarasa, Luxshan and Malik, Mahnoor and Keita, Mamadou K. and Jangid, Mansi and Santis, Marco De and García, Marcos and Suppa, Marek and D'Ciofalo, Mariam and Ojastu, Marii and Sikander, Maryam and Narayan, Mausami and Skandalis, Maximos and Mehak, Mehak and Bozkurt, Mehmet İlteriş and Workie, Melaku Bayu and Velayuthan, Menan and Leventhal, Michael and Marcińczuk, Michał and Potočnjak, Mirna and Shafiei, Mohammadamin and Sharma, Mridul and Indoria, Mrityunjaya and Habibi, Muhammad Ravi Shulthan and Kolić, Murat and Galant, Nada and Permpredanun, Naphat and Maugin, Narada and Corrêa, Nicholas Kluge and Ljubešić, Nikola and Thomas, Nirmal and de Silva, Nisansa and Joshi, Nisheeth and Ponkshe, Nitish and Habash, Nizar and Udeze, Nneoma C. and Thomas, Noel and Ligeti-Nagy, Noémi and Coulibaly, Nouhoum and Faustin, Nsengiyumva and Buliaminu, Odunayo Kareemat and Ogundepo, Odunayo and Fejiro, Oghojafor Godswill and Funmilola, Ogundipe Blessing and God'spraise, Okechukwu and Samuel, Olanrewaju and Oluwaseun, Olaoye Deborah and Akindejoye, Olasoji and Popova, Olga and Snissarenko, Olga and Chiemezie, Onyinye Anulika and Kinay, Orkun and Tursun, Osman and Moses, Owoeye Tobiloba and Joshua, Oyelade Oluwafemi and Fiyinfoluwa, Oyesanmi and Gamallo, Pablo and Fernández, Pablo Rodríguez and Arora, Palak and Valente, Pedro and Rupnik, Peter and Ekiugbo, Philip Oghenesuowho and Sahoo, Pramit and Prokopidis, Prokopis and Niau-Puhipau, Pua and Yahya, Quadri and Mignone, Rachele and Singhal, Raghav and Kadiyala, Ram Mohan Rao and Merx, Raphael and Afolayan, Rapheal and Rajalakshmi, Ratnavel and Ghosh, Rishav and Oji, Romina and Solis, Ron Kekeha and Guerra, Rui and Zawar, Rushikesh and Bashir, Sa'ad Nasir and Alzaabi, Saeed and Sandeep, Sahil and Batchu, Sai Pavan and Kantareddy, SaiSandeep and Pranida, Salsabila Zahirah and Buchanan, Sam and Rutunda, Samuel and Land, Sander and Sulollari, Sarah and Ali, Sardar and Sapkota, Saroj and Tautvaisas, Saulius and Sen, Sayambhu and Banerjee, Sayantani and Diarra, Sebastien and M, SenthilNathan. and Lee, Sewoong and Shah, Shaan and Venkitachalam, Shankar and Djurabaeva, Sharifa and Ibejih, Sharon and Dutta, Shivanya Shomir and Gupta, Siddhant and Suárez, Silvia Paniagua and Ahmadi, Sina and Sukumar, Sivasuthan and Song, Siyuan and A., Snegha and Sofianopoulos, Sokratis and Simon, Sona Elza and Benčina, Sonja and Gvasalia, Sophie and More, Sphurti Kirit and Dragazis, Spyros and Kaufhold, Stephan P. and S, Suba. and AlRashed, Sultan and Ranathunga, Surangika and Someya, Taiga and Pungeršek, Taja Kuzman and Haklay, Tal and Jibril, Tasi'u and Aoyama, Tatsuya and Abashidze, Tea and Cruz, Terenz Jomar Dela and Blevins, Terra and Nikas, Themistoklis and Idoko, Theresa Dora and Do, Thu Mai and Chubakov, Tilek and Gargiani, Tommaso and Rathore, Uma and Johannesen, Uni and Ugwu, Uwuma Doris and Putra, Vallerie Alexandra and Kumar, Vanya Bannihatti and Jeyarajalingam, Varsha and Arzt, Varvara and Nedumpozhimana, Vasudevan and Ondrejova, Viktoria and Horbik, Viktoryia and Kummitha, Vishnu Vardhan Reddy and Dinić, Vuk and Sewunetie, Walelign Tewabe and Wu, Winston and Zhao, Xiaojing and Diarra, Yacouba and Nikankin, Yaniv and Mathur, Yash and Chen, Yixi and Li, Yiyuan and Xavier, Yolanda and Belinkov, Yonatan and Abayomi, Yusuf Ismail and Alyafeai, Zaid and Shan, Zhengyang and Tam, Zhi Rui and Tang, Zilu and Nadova, Zuzana and Abbasi, Baber and Biderman, Stella and Stap, David and Ataman, Duygu and Schmidt, Fabian and Gonen, Hila and Wang, Jiayi and Adelani., David Ifeoluwa}, journal = {arXiv preprint arXiv:2510.24081}, year = {2025}, }
2023
- PreprintArguGPT: evaluating, understanding and identifying argumentative essays generated by GPT modelsYikang Liu, Ziyin Zhang, Wanyang Zhang, and 5 more authorsarXiv preprint arXiv:2304.07666, Jul 2023
AI generated content (AIGC) presents considerable challenge to educators around the world. Instructors need to be able to detect such text generated by large language models, either with the naked eye or with the help of some tools. There is also growing need to understand the lexical, syntactic and stylistic features of AIGC. To address these challenges in English language teaching, we first present ArguGPT, a balanced corpus of 4,038 argumentative essays generated by 7 GPT models in response to essay prompts from three sources: (1) in-class or homework exercises, (2) TOEFL and (3) GRE writing tasks. Machine-generated texts are paired with roughly equal number of human-written essays with three score levels matched in essay prompts. We then hire English instructors to distinguish machine essays from human ones. Results show that when first exposed to machine-generated essays, the instructors only have an accuracy of 61% in detecting them. But the number rises to 67% after one round of minimal self-training. Next, we perform linguistic analyses of these essays, which show that machines produce sentences with more complex syntactic structures while human essays tend to be lexically more complex. Finally, we test existing AIGC detectors and build our own detectors using SVMs and RoBERTa. Results suggest that a RoBERTa fine-tuned with the training set of ArguGPT achieves above 90% accuracy in both essay- and sentence-level classification. To the best of our knowledge, this is the first comprehensive analysis of argumentative essays produced by generative large language models. Machine-authored essays in ArguGPT and our models will be made publicly available at https://github.com/huhailinguist/ArguGPT.
@article{liu2023argugpt, title = {ArguGPT: evaluating, understanding and identifying argumentative essays generated by GPT models}, author = {Liu, Yikang and Zhang, Ziyin and Zhang, Wanyang and Yue, Shisen and Zhao, Xiaojing and Cheng, Xinyuan and Zhang, Yiwen and Hu, Hai}, journal = {arXiv preprint arXiv:2304.07666}, year = {2023}, }