default search action
Hannah Kirk
Person information
Refine list
refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
2020 – today
- 2024
- [j1]Hannah Rose Kirk, Bertie Vidgen, Paul Röttger, Scott A. Hale:
The benefits, risks and bounds of personalizing the alignment of large language models to individuals. Nat. Mac. Intell. 6(4): 383-392 (2024) - [c12]Paul Röttger, Valentin Hofmann, Valentina Pyatkin, Musashi Hinck, Hannah Kirk, Hinrich Schütze, Dirk Hovy:
Political Compass or Spinning Arrow? Towards More Meaningful Evaluations for Values and Opinions in Large Language Models. ACL (1) 2024: 15295-15311 - [c11]Jessica Quaye, Alicia Parrish, Oana Inel, Charvi Rastogi, Hannah Rose Kirk, Minsuk Kahng, Erin Van Liemt, Max Bartolo, Jess Tsang, Justin White, Nathan Clement, Rafael Mosquera, Juan Ciro, Vijay Janapa Reddi, Lora Aroyo:
Adversarial Nibbler: An Open Red-Teaming Method for Identifying Diverse Harms in Text-to-Image Generation. FAccT 2024: 388-406 - [c10]Khyati Khandelwal, Manuel Tonneau, Andrew M. Bean, Hannah Rose Kirk, Scott A. Hale:
Indian-BhED: A Dataset for Measuring India-Centric Biases in Large Language Models. GoodIT 2024: 231-239 - [c9]Paul Röttger, Hannah Kirk, Bertie Vidgen, Giuseppe Attanasio, Federico Bianchi, Dirk Hovy:
XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models. NAACL-HLT 2024: 5377-5400 - [i28]Paul Röttger, Valentin Hofmann, Valentina Pyatkin, Musashi Hinck, Hannah Rose Kirk, Hinrich Schütze, Dirk Hovy:
Political Compass or Spinning Arrow? Towards More Meaningful Evaluations for Values and Opinions in Large Language Models. CoRR abs/2402.16786 (2024) - [i27]Jessica Quaye, Alicia Parrish, Oana Inel, Charvi Rastogi, Hannah Rose Kirk, Minsuk Kahng, Erin Van Liemt, Max Bartolo, Jess Tsang, Justin White, Nathan Clement, Rafael Mosquera, Juan Ciro, Vijay Janapa Reddi, Lora Aroyo:
Adversarial Nibbler: An Open Red-Teaming Method for Identifying Diverse Harms in Text-to-Image Generation. CoRR abs/2403.12075 (2024) - [i26]Bertie Vidgen, Adarsh Agrawal, Ahmed M. Ahmed, Victor Akinwande, Namir Al-Nuaimi, Najla Alfaraj, Elie Alhajjar, Lora Aroyo, Trupti Bavalatti, Borhane Blili-Hamelin, Kurt D. Bollacker, Rishi Bomassani, Marisa Ferrara Boston, Siméon Campos, Kal Chakra, Canyu Chen, Cody Coleman, Zacharie Delpierre Coudert, Leon Derczynski, Debojyoti Dutta, Ian Eisenberg, James Ezick, Heather Frase, Brian Fuller, Ram Gandikota, Agasthya Gangavarapu, Ananya Gangavarapu, James Gealy, Rajat Ghosh, James Goel, Usman Gohar, Subhra S. Goswami, Scott A. Hale, Wiebke Hutiri, Joseph Marvin Imperial, Surgan Jandial, Nick Judd, Felix Juefei-Xu, Foutse Khomh, Bhavya Kailkhura, Hannah Rose Kirk, Kevin Klyman, Chris Knotz, Michael Kuchnik, Shachi H. Kumar, Chris Lengerich, Bo Li, Zeyi Liao, Eileen Peters Long, Victor Lu, Yifan Mai, Priyanka Mary Mammen, Kelvin Manyeki, Sean McGregor, Virendra Mehta, Shafee Mohammed, Emanuel Moss, Lama Nachman, Dinesh Jinenhally Naganna, Amin Nikanjam, Besmira Nushi, Luis Oala, Iftach Orr, Alicia Parrish, Cigdem Patlak, William Pietri, Forough Poursabzi-Sangdeh, Eleonora Presani, Fabrizio Puletti, Paul Röttger, Saurav Sahay, Tim Santos, Nino Scherrer, Alice Schoenauer Sebag, Patrick Schramowski, Abolfazl Shahbazi, Vin Sharma, Xudong Shen, Vamsi Sistla, Leonard Tang, Davide Testuggine, Vithursan Thangarasa, Elizabeth Anne Watkins, Rebecca Weiss, Chris Welty, Tyler Wilbers, Adina Williams, Carole-Jean Wu, Poonam Yadav, Xianjun Yang, Yi Zeng, Wenhui Zhang, Fedor Zhdanov, Jiacheng Zhu, Percy Liang, Peter Mattson, Joaquin Vanschoren:
Introducing v0.5 of the AI Safety Benchmark from MLCommons. CoRR abs/2404.12241 (2024) - [i25]Hannah Rose Kirk, Alexander Whitefield, Paul Röttger, Andrew M. Bean, Katerina Margatina, Juan Ciro, Rafael Mosquera, Max Bartolo, Adina Williams, He He, Bertie Vidgen, Scott A. Hale:
The PRISM Alignment Project: What Participatory, Representative and Individualised Human Feedback Reveals About the Subjective and Multicultural Alignment of Large Language Models. CoRR abs/2404.16019 (2024) - [i24]Cailean Osborne, Jennifer Ding, Hannah Rose Kirk:
The AI Community Building the Future? A Quantitative Analysis of Development Activity on Hugging Face Hub. CoRR abs/2405.13058 (2024) - [i23]Andrew M. Bean, Simi Hellsten, Harry Mayne, Jabez Magomere, Ethan A. Chi, Ryan Chi, Scott A. Hale, Hannah Rose Kirk:
LINGOLY: A Benchmark of Olympiad-Level Linguistic Reasoning Puzzles in Low-Resource and Extinct Languages. CoRR abs/2406.06196 (2024) - [i22]Katherine M. Collins, Valerie Chen, Ilia Sucholutsky, Hannah Rose Kirk, Malak Sadek, Holli Sargeant, Ameet Talwalkar, Adrian Weller, Umang Bhatt:
Modulating Language Model Experiences through Frictions. CoRR abs/2407.12804 (2024) - [i21]Shachar Don-Yehiya, Ben Burtenshaw, Ramón Fernandez Astudillo, Cailean Osborne, Mimansa Jaiswal, Tzu-Sheng Kuo, Wenting Zhao, Idan Shenfeld, Andi Peng, Mikhail Yurochkin, Atoosa Kasirzadeh, Yangsibo Huang, Tatsunori Hashimoto, Yacine Jernite, Daniel Vila-Suero, Omri Abend, Jennifer Ding, Sara Hooker, Hannah Rose Kirk, Leshem Choshen:
The Future of Open Human Feedback. CoRR abs/2408.16961 (2024) - 2023
- [c8]Hannah Kirk, Andrew M. Bean, Bertie Vidgen, Paul Röttger, Scott A. Hale:
The Past, Present and Better Future of Feedback Learning in Large Language Models for Subjective Human Preferences and Values. EMNLP 2023: 2409-2430 - [c7]Siobhan Mackenzie Hall, Fernanda Gonçalves Abrantes, Hanwen Zhu, Grace Sodunke, Aleksandar Shtedritski, Hannah Rose Kirk:
VisoGender: A dataset for benchmarking gender bias in image-text pronoun resolution. NeurIPS 2023 - [c6]Mark Mazumder, Colby R. Banbury, Xiaozhe Yao, Bojan Karlas, William Gaviria Rojas, Sudnya Frederick Diamos, Greg Diamos, Lynn He, Alicia Parrish, Hannah Rose Kirk, Jessica Quaye, Charvi Rastogi, Douwe Kiela, David Jurado, David Kanter, Rafael Mosquera, Will Cukierski, Juan Ciro, Lora Aroyo, Bilge Acun, Lingjiao Chen, Mehul Raje, Max Bartolo, Evan Sabri Eyuboglu, Amirata Ghorbani, Emmett D. Goodman, Addison Howard, Oana Inel, Tariq Kane, Christine R. Kirkpatrick, D. Sculley, Tzu-Sheng Kuo, Jonas W. Mueller, Tristan Thrush, Joaquin Vanschoren, Margaret Warren, Adina Williams, Serena Yeung, Newsha Ardalani, Praveen K. Paritosh, Ce Zhang, James Y. Zou, Carole-Jean Wu, Cody Coleman, Andrew Y. Ng, Peter Mattson, Vijay Janapa Reddi:
DataPerf: Benchmarks for Data-Centric AI Development. NeurIPS 2023 - [c5]Hannah Kirk, Wenjie Yin, Bertie Vidgen, Paul Röttger:
SemEval-2023 Task 10: Explainable Detection of Online Sexism. SemEval@ACL 2023: 2193-2210 - [i20]Jakob Mökander, Jonas Schuett, Hannah Rose Kirk, Luciano Floridi:
Auditing large language models: a three-layered approach. CoRR abs/2302.08500 (2023) - [i19]Hannah Rose Kirk, Wenjie Yin, Bertie Vidgen, Paul Röttger:
SemEval-2023 Task 10: Explainable Detection of Online Sexism. CoRR abs/2303.04222 (2023) - [i18]Hannah Rose Kirk, Bertie Vidgen, Paul Röttger, Scott A. Hale:
Personalisation within bounds: A risk taxonomy and policy framework for the alignment of large language models with personalised feedback. CoRR abs/2303.05453 (2023) - [i17]Leon Derczynski, Hannah Rose Kirk, Vidhisha Balachandran, Sachin Kumar, Yulia Tsvetkov, Mark R. Leiser, Saif Mohammad:
Assessing Language Model Deployment with Risk Cards. CoRR abs/2303.18190 (2023) - [i16]Alicia Parrish, Hannah Rose Kirk, Jessica Quaye, Charvi Rastogi, Max Bartolo, Oana Inel, Juan Ciro, Rafael Mosquera, Addison Howard, Will Cukierski, D. Sculley, Vijay Janapa Reddi, Lora Aroyo:
Adversarial Nibbler: A Data-Centric Challenge for Improving the Safety of Text-to-Image Models. CoRR abs/2305.14384 (2023) - [i15]Brandon Smith, Miguel Farinha, Siobhan Mackenzie Hall, Hannah Rose Kirk, Aleksandar Shtedritski, Max Bain:
Balancing the Picture: Debiasing Vision-Language Datasets with Synthetic Contrast Sets. CoRR abs/2305.15407 (2023) - [i14]Siobhan Mackenzie Hall, Fernanda Gonçalves Abrantes, Hanwen Zhu, Grace Sodunke, Aleksandar Shtedritski, Hannah Rose Kirk:
VisoGender: A dataset for benchmarking gender bias in image-text pronoun resolution. CoRR abs/2306.12424 (2023) - [i13]Hannah Rose Kirk, Angus R. Williams, Liam Burke, Yi-Ling Chung, Ivan Debono, Pica Johansson, Francesca Stevens, Jonathan Bright, Scott A. Hale:
DoDo Learning: DOmain-DemOgraphic Transfer in Language Models for Detecting Abuse Targeted at Public Figures. CoRR abs/2307.16811 (2023) - [i12]Paul Röttger, Hannah Rose Kirk, Bertie Vidgen, Giuseppe Attanasio, Federico Bianchi, Dirk Hovy:
XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models. CoRR abs/2308.01263 (2023) - [i11]Khyati Khandelwal, Manuel Tonneau, Andrew M. Bean, Hannah Rose Kirk, Scott A. Hale:
Casteist but Not Racist? Quantifying Disparities in Large Language Model Bias between India and the West. CoRR abs/2309.08573 (2023) - [i10]Hannah Rose Kirk, Bertie Vidgen, Paul Röttger, Scott A. Hale:
The Empty Signifier Problem: Towards Clearer Paradigms for Operationalising "Alignment" in Large Language Models. CoRR abs/2310.02457 (2023) - [i9]Hannah Rose Kirk, Andrew M. Bean, Bertie Vidgen, Paul Röttger, Scott A. Hale:
The Past, Present and Better Future of Feedback Learning in Large Language Models for Subjective Human Preferences and Values. CoRR abs/2310.07629 (2023) - [i8]Bertie Vidgen, Hannah Rose Kirk, Rebecca Qian, Nino Scherrer, Anand Kannappan, Scott A. Hale, Paul Röttger:
SimpleSafetyTests: a Test Suite for Identifying Critical Safety Risks in Large Language Models. CoRR abs/2311.08370 (2023) - 2022
- [c4]Hannah Kirk, Abeba Birhane, Bertie Vidgen, Leon Derczynski:
Handling and Presenting Harmful Text in NLP Research. EMNLP (Findings) 2022: 497-510 - [c3]Hugo Berg, Siobhan Mackenzie Hall, Yash Bhalgat, Hannah Kirk, Aleksandar Shtedritski, Max Bain:
A Prompt Array Keeps the Bias Away: Debiasing Vision-Language Models with Adversarial Learning. AACL/IJCNLP (1) 2022: 806-822 - [c2]Hannah Kirk, Bertie Vidgen, Paul Röttger, Tristan Thrush, Scott A. Hale:
Hatemoji: A Test Suite and Adversarially-Generated Dataset for Benchmarking and Detecting Emoji-Based Hate. NAACL-HLT 2022: 1352-1368 - [i7]Hugo Berg, Siobhan Mackenzie Hall, Yash Bhalgat, Wonsuk Yang, Hannah Rose Kirk, Aleksandar Shtedritski, Max Bain:
A Prompt Array Keeps the Bias Away: Debiasing Vision-Language Models with Adversarial Learning. CoRR abs/2203.11933 (2022) - [i6]Leon Derczynski, Hannah Rose Kirk, Abeba Birhane, Bertie Vidgen:
Handling and Presenting Harmful Text. CoRR abs/2204.14256 (2022) - [i5]Conrad Borchers, Dalia Sara Gala, Benjamin Gilburt, Eduard Oravkin, Wilfried Bounsi, Yuki M. Asano, Hannah Rose Kirk:
Looking for a Handsome Carpenter! Debiasing GPT-3 Job Advertisements. CoRR abs/2205.11374 (2022) - [i4]Hannah Rose Kirk, Bertie Vidgen, Scott A. Hale:
Is More Data Better? Re-thinking the Importance of Efficiency in Abusive Language Detection with Transformers-Based Active Learning. CoRR abs/2209.10193 (2022) - 2021
- [c1]Hannah Rose Kirk, Yennie Jun, Filippo Volpin, Haider Iqbal, Elias Benussi, Frédéric A. Dreyer, Aleksandar Shtedritski, Yuki M. Asano:
Bias Out-of-the-Box: An Empirical Analysis of Intersectional Occupational Biases in Popular Generative Language Models. NeurIPS 2021: 2611-2624 - [i3]Hannah Kirk, Yennie Jun, Haider Iqbal, Elias Benussi, Filippo Volpin, Frédéric A. Dreyer, Aleksandar Shtedritski, Yuki Markus Asano:
How True is GPT-2? An Empirical Analysis of Intersectional Occupational Biases. CoRR abs/2102.04130 (2021) - [i2]Hannah Rose Kirk, Yennie Jun, Paulius Rauba, Gal Wachtel, Ruining Li, Xingjian Bai, Noah Broestl, Martin Doff-Sotta, Aleksandar Shtedritski, Yuki Markus Asano:
Memes in the Wild: Assessing the Generalizability of the Hateful Memes Challenge Dataset. CoRR abs/2107.04313 (2021) - [i1]Hannah Rose Kirk, Bertram Vidgen, Paul Röttger, Tristan Thrush, Scott A. Hale:
Hatemoji: A Test Suite and Adversarially-Generated Dataset for Benchmarking and Detecting Emoji-based Hate. CoRR abs/2108.05921 (2021)
Coauthor Index
aka: Bertram Vidgen
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.
Unpaywalled article links
Add open access links from to the list of external document links (if available).
Privacy notice: By enabling the option above, your browser will contact the API of unpaywall.org to load hyperlinks to open access articles. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Unpaywall privacy policy.
Archived links via Wayback Machine
For web page which are no longer available, try to retrieve content from the of the Internet Archive (if available).
Privacy notice: By enabling the option above, your browser will contact the API of archive.org to check for archived content of web pages that are no longer available. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Internet Archive privacy policy.
Reference lists
Add a list of references from , , and to record detail pages.
load references from crossref.org and opencitations.net
Privacy notice: By enabling the option above, your browser will contact the APIs of crossref.org, opencitations.net, and semanticscholar.org to load article reference information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Crossref privacy policy and the OpenCitations privacy policy, as well as the AI2 Privacy Policy covering Semantic Scholar.
Citation data
Add a list of citing articles from and to record detail pages.
load citations from opencitations.net
Privacy notice: By enabling the option above, your browser will contact the API of opencitations.net and semanticscholar.org to load citation information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the OpenCitations privacy policy as well as the AI2 Privacy Policy covering Semantic Scholar.
OpenAlex data
Load additional information about publications from .
Privacy notice: By enabling the option above, your browser will contact the API of openalex.org to load additional information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the information given by OpenAlex.
last updated on 2024-10-09 21:31 CEST by the dblp team
all metadata released as open data under CC0 1.0 license
see also: Terms of Use | Privacy Policy | Imprint