Limit your CO2 impact

Mitigration strategies

We have collected suggestions for concrete strategies that the individual researcher can take to reduce their carbon footprint when using generative AI in research. This list will be expanded continuously as we learn about new techniques. We created this list for Human-Computer Interaction researchers, but we believe it will be relevant to other human-focused researchers outside Machine Learning-heavy disciplines. Please contact us if you have suggestions, comments, or critiques.

Efficiency is not enough

Increasing efficiency by decreasing the amount of compute does not necessarily translate into energy savings, and decreasing energy does not necessarily translate into saving carbon emissions because operational carbon emissions are a function of both energy and carbon intensity, which is dependent on time and location, and energy is a complex function of several factors which metrics of compute (e.g. FLOPS, number of parameters, and runtime) do not fully capture [Wright et al. 2023]. Increasing efficiency might have other unintended effects, such as when a more efficient system leads to greater overall use of that system over time. The hardware platforms on which ML compute runs come with their own complexities in terms of sustainability which efficiency solely in ML models cannot mitigate.

Reporting your carbon footprint

Our main call for action is to encourage the HCI community to report their carbon footprint when doing research with GenAI, or at least to report indicators used to calculate this (i.e. model, usage, and size of input/output files, as well as number of test runs and number of users/interactions for GenAI research involving prototypes or user tests. Furthermore, researchers could estimate potential carbon footprints in funding applications for GenAI research.

Time and place

When planning research experiments with GenAI there is a range of trade-offs which can be made to reduce the carbon footprint: When performing computationally intensive tasks (such as developing or using GenAI models), scheduling the computations during low-carbon intensity locations and times can result in substantial reductions in the carbon footprint {Anthony et al. 2020, Henderson et al. 2029]. For instance, the same compute job when performed in Iceland can have
about 15x smaller carbon footprint than if it were run in US. Similarly, scheduling jobs during low carbon intensity
times can result in up to 7x reduction in the overall carbon emissions [ Zhao et al. 2024].

How much data is enough?

The energy consumption grows almost linearly with the task load i.e., longer prompts or more images or images of higher resolution cost more in energy. Experiment designers should consider if increasing the task load to high ranges is always necessary. Power analysis can help determine the necessary amount of subjects for effect detection. When using models for user tests, experiment designers can also consider if there should be a limit to prompting or make a visible counter to show the user how many times they have prompted to increase their own awareness of the use. Finally, researchers can consider whether it is worth it to prototype a whole new generative AI system, or whether they could arrive at comparable results with the use of existing platforms and a Wizard-of-Oz setup.

Are new datasets necessary?

We encourage HCI researchers to carefully consider if the generation of large amount of GenAI output is necessary, or if their research question can be answered by utilizing existing datasets. Generally, researchers should publish their data adhering to the FAIR guidelines, so that other researchers can reuse data instead of regenerating the data, and increasing the overall carbon footprint. The far most carbon intensive HCI research phase (aside from developing, training and fine-tuning new models) on average in the CHI 2024 corpus is the Data Collection phase – which is largely caused by large-scale open-ended explorations of model output [Inie, Falk & Selvan 2025].

Choosing the right models

We encourage HCI researchers to consider whether it is possible to use task-specific models rather than using the large “multi-purpose” models, which are orders of magnitude more energy expensive to use [Luccioni et al. 2024]. Read more about task-specific models here.

Quality-Checking Prompts

Researchers can consider using tools for improving their prompts before they, for example, interact with a GenAI model
to see which prompts better fit their research goal. Hao and colleagues developed a tool for optimizing prompts for
text-to-image GenAI models [Hao et al. 2023]. Alternatively, for some research where users have to interact with GenAI models, users could be taught strategies for prompt engineering tailored for the specific research goal to reduce the amount of useless output.

Reporting and Reviewing Negative Results

Research publications which report negative results have greatly decreased across domains, which risks contributing to publication bias or positive-outcome bias [Fanelli 2012, Mlinarić et al. 2017]. Contributions which illustrate when not to design technology is still a valuable contribution. Reporting negative results – for example, unexpectedly large carbon footprints – could prompt discussions on whether the direction which the GenAI research experiment aimed for is worth to pursue or not. The same principle is important to keep in mind when reviewing submissions, so as to not reject papers which report negative results for the reason that their contribution is not important.

Conference participation

As an example case, the ACM CHI Conference on Human Factors in Computing 2024 had at least 3995 in-person attendants in Honolulu [ 7]. According to the Hawaii Climate Portal.gov, the average visitor to “the most remote inhabited archipelago from any continental land mass on the planet” travels about 7,000 miles, and causes an emission burden of 1.8 tons CO2e from a round trip flight [ 252].15 For 3995 attendants traveling unaccompanied (without partners or children), this totals 7,191 tons CO2e in travel footprint alone, equivalent to 1,553 gasoline-powered passenger vehicles driven for one year.1617 We believe physical conference presentation is invaluable for building strong research communities. But we encourage the community to carefully consider alternative ways of supporting collaboration and network building, such as facilitating and attending local clusters of conferences, stronger shared publication pipelines between journal submissions and conference presentations, and facilitating better virtual participation.

References

Lasse F. Wolff Anthony, Benjamin Kanding, and Raghavendra Selvan. 2020. Carbontracker: Tracking and Predicting the Carbon Footprint of Training Deep Learning Models. ICML Workshop on Challenges in Deploying and monitoring Machine Learning Systems. arXiv:2007.03051

Daniele Fanelli. 2012. Negative results are disappearing from most disciplines and countries. Scientometrics Scientometrics 90, 3 (2012), 891 – 904. https://doi.org/10.1007/s11192-011-0494-7

Yaru Hao, Zewen Chi, Li Dong, and Furu Wei. 2023. Optimizing Prompts for Text-to-Image Generation. In Advances in Neural Information Processing Systems, A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine (Eds.), Vol. 36. Curran Associates, Inc., 66923–66939. https://proceedings.neurips.cc/paper_files/paper/2023/file/d346d91999074dd8d6073d4c3b13733b-Paper-Conference.pdf

Peter Henderson, Jieru Hu, Joshua Romoff, Emma Brunskill, Dan Jurafsky, and Joelle Pineau. 2020. Towards the systematic reporting of the energy and carbon footprints of machine learning. Journal of Machine Learning Research 21, 248 (2020), 1–43

Nanna Inie, Jeanette Falk & Raghavendra Selvan: How CO2STLY Is CHI? 2025. The Carbon Footprint of Generative AI in HCI Research and What We Should Do About It. Proceedings of the ACM Conference on Human Factors in Computing (CHI).

Sasha Luccioni, Yacine Jernite, and Emma Strubell. 2024. Power hungry processing: Watts driving the cost of AI deployment?. In The 2024 ACM Conference on Fairness, Accountability, and Transparency. 85–99.

Anamarija Mlinarić, Martina Horvat, and Vesna Šupak Smolčić. 2017. Dealing with the positive publication bias: Why you should really publish your negative results. Biochemia Medica 27, 3 (2017), 447–452. https://doi.org/10.11613/BM.2017.030201

Dustin Wright, Christian Igel, Gabrielle Samuel, and Raghavendra Selvan. 2023. Efficiency is Not Enough: A Critical Perspective of Environmentally Sustainable AI. arXiv:2309.02065 [cs.LG] https://arxiv.org/abs/2309.02065

Yiyang Zhao, Yunzhuo Liu, Bo Jiang, and Tian Guo. 2024. CE-NAS: An End-to-End Carbon-Efficient Neural Architecture Search Framework. arXiv preprint arXiv:2406.01414 (2024)