“AI” resource use is a hot topic these days.1 Back in September 2023 AP had a story about Microsoft’s water use in Iowa. In December 2023, The Guardian ran a piece about how the energy footprint of AI is underappreciated (headline: “Why AI is a disaster for the climate”). And on social platforms, I see a steady-ish stream of people discussing AI energy use (often, but not always, negatively). I’ve written a few threads about this on Bluesky and decided to collect and expand on my thoughts here.
I’m working mainly from two recent papers about AI resource use:
Li et al. (2023), “Making AI Less "Thirsty": Uncovering and Addressing the Secret Water Footprint of AI Models”: this one’s mostly about AI water use. It also has some really great discussion of scopes.
Luccioni et al. (2023), “Power Hungry Processing: Watts Driving the Cost of AI Deployment?”: this one’s mostly about AI energy use. It has some good coverage of different model uses.
TL,DR: If you're playing an hour of UHD PS5 while eating a quarter pounder, you're at energy parity with ~72 image generations from the gaming and water parity with ~17k GPT queries (which is ~35 days of querying at ~current GPT-4 use limits) from the burger. If you’re minding your meat consumption and eating an Impossible burger instead, you get to water parity at ~700 GPT queries per burger (~35 hours of querying).
Scopes and additionality
Any discussion of energy/water use impacts should (probably) start with mention of scopes and additionality. So: what are the “scopes” of resource use/GHG emissions?
Scope 1: direct resource use/emissions by the entity itself. Scope 1 occur wherever the final goods/services are produced (e.g., at the data center).
Scope 2: indirect resource use/emissions due to the entity’s purchase of electricity, steam, heat, or cooling. Scope 2 occur where the intermediate good/service is produced (e.g., at the power plant), but it wouldn’t happen without the final good/service’s demand.
Scope 3: indirect value chain effects due to one entity’s demands which are scope 1 or 2 to another entity, e.g., emissions/water use associated with chip production for data center use would be scope 3 for the data center, but scope 1 or 2 for the chip manufacturer.
Most people talking about AI resource use seem to be talking about scope 1 and 2 impacts. Given how data centers work, I agree with Li et al. (2023) that it’s reasonable to consider “electricity demand to run the things” and “water demand to cool the things” as scope 1 uses. Water use at the power plants would be within scope 2.
Next, additionality. When can we say that an activity has actually caused an increase (or decrease) in something we care about? To assess this, we want to know whether the activity in question led to additional impacts.2 That is, if the activity didn’t occur, what else would have happened instead? This is a big deal in assessing carbon offsets: when someone supplies an offset they’re saying, “I would have emitted X tons of carbon, but since you paid me I won’t.” If the offset isn’t additional it means your payment wasn’t necessary to avoid the emissions. Assessing additionality is hard, but it’s a useful device to think through the issues involved in AI resource use more clearly.
Simple energy use math
From Luccioni et al. (2023), the most energy-intensive use of AI models seems to be image generation: about 0.0029 kWh per image. An hour of UHD gaming on a Playstation 5 is about 0.21 kWh. So about 72 AI images gets you to parity with 1 hour of UHD PS5 gaming. Image captioning, the next-most energy-intensive application, is about 0.000064 kWh/query. So about 3,300 queries to get to parity with an hour of UHD PS5 gaming.
Simple water use math
Li et al (2023) estimate it takes about 500 ml of water per 20-70 queries to GPT-3 across scopes 1 and 2, depending on where the data center is located (Table 1).3 Let’s assume we only get 5 queries for every 500 ml of water used to be conservative about GPT-4, or 100 ml per query. It takes about 1,847 gallons (~6.99 million ml) of water to produce 1 lb of beef. That’s about 69,000 GPT-4 queries per lb of beef.
GPT-4 has usage limits on the order of 50 queries every 3 hours (seems to vary between 25-100). Let’s call it 60 queries every 3 hours to make the numbers nice, so 20 queries per hour. That’s 3,450 hours — just over 143 days — of querying whenever possible per lb of beef.
To beef air, beef is a very water-intensive product. Impossible burgers use quite a bit less water — about 185 gallons (~70,000 ml) (water use assessment on page 30). That’s about 700 GPT-4 queries per Impossible burger, or 35 hours of querying.
What if you expanded the aperture to agriculture overall? California uses about 34 million acre-feet (41.9 billion gallons) of water for irrigation annually. If all of the additional water use Microsoft reported between 2021 and 2022 (0.431 billion gallons globally) is for AI, then we’re talking ~1% of California’s agricultural water use. Total agricultural water use in the US is on the order of 30 trillion gallons per year.
Emissions and scopes
So far I haven’t said much about emissions associated with AI. While the papers do discuss this, the picture is a bit more complex. As with EVs, the emissions impacts depend on the characteristics of the grid and the energy sources the data centers are using.
Luccioni et al. assume emissions and energy use are tightly coupled and switch between the two depending on what they want to highlight. This is true (and to be fair they do their analysis on a single data center, AWS us-west-2), but as mentioned the coupling coefficient will depend on the specifics of where the compute happens. Li et al. take a more detailed approach to quantifying emissions, using EPA’s average CO2e per unit power (0.954 pounds kilowatt-hour) to convert AI power use into emissions. This factor will most likely decrease over time as the grid greens.
Li et al. (2023) goes into some depth on AI model water use.4 Based on their estimates, 85-134 TWh of energy use for AI by 2027 could come up to 4.2-6.6 billion cubic meters of combined scope-1 and scope-2 water withdrawal. As they note, “[i]f the US hosts half of the global AI workloads… AI may take up about 0.5-0.7% of [US] annual water withdrawal.” This does not seem huge to me!
Scope 1 and 2 energy use/emissions impacts are (I argue) what it takes to run the AI data center. For new entrants who wouldn’t use data centers if not for AI (e.g., OpenAI, Anthropic), it’s safe to call all energy use and emissions impacts additional. For companies that aren’t pivoting out of their existing data center businesses (e.g., Microsoft, Google), the energy use and associated emissions are only additional to the extent that they didn’t displace expansions of existing data center business lines. Who knows to what extent such displacement might be a thing; seems reasonable to go ahead and call all of it additional.
As they point out, it’s worth distinguishing between two types of water use: water withdrawal (pulling freshwater out of aquifers or diverting surface flows) and water consumption (water withdrawn that evaporates, gets embodied into some final good, or for other reasons isn’t discharged back into the environment “right away”). While one may argue AI uses a lot of water, the more impactful question is whether is consumes a lot of water. Let’s dig into that a bit.
Scope 1 water use is all about cooling the data centers. Some of the water is going to evaporate, and what’s left is hot. That heat is an industrial pollutant so the water can’t be discharged into a water body till it’s close to the water body’s temperature.5 But realistically the water is going to be recirculated, either with some cooling (in a closed-loop part of the system) or evaporative loss (in an open-loop part of the system). “Consumption” here is mostly evaporative loss.
Scope 2 water use is all about the grid. Thermoelectric plants like coal and natural gas need water to keep the equipment cool so they can keep pumping out electricity. As the grid greens, this impact will likely trend downwards. Again, “consumption” here is mostly evaporative loss.
What about scope 3?
Scope 3 water use/energy use/emissions impacts are all about supply chain and components. It’s much less clear what’s going on here. Electronics manufacturing tends to involve toxic chemicals in water, and it takes a bunch of energy to make it usable again (to the extent that’s doable/economical). Shipping chips around the world produces a bunch of emissions. Fabs (where chips are made) use a lot of energy. In general, scope 3 impacts are hard to measure.
While scope 1 and 2 impacts are relatively straightforward to attribute to data center demands, it’s also hard to suss out how “additional” scope 3 impacts are. To the extent that the same chips would have been made and sold to someone else, the additionality of AI data center uses is unclear. Similarly, if AI-optimized chips (e.g., GPUs, DLPs) have lower scope 3 impacts than non-AI chips that could have been made with similar facilities, the marginal impact of diverting investment to AI chip production is actually lower emissions. To be clear, I have no idea what the truth is here — I’m just trying to illustrate why the scope 3 piece is trickier than scopes 1 and 2. What we would really want to know is how much incremental chip production occurred for AI, what incremental chips were produced, and then compare lifecycle impacts of those vs the other chips. Someone should look into that.
Yeah yeah but won’t this all get worse?
To the extent companies and products are going to use a lot more AI stuff going forward, one could argue their impacts will only increase. This is a fair argument. Yes, total AI resource use will likely increase. On the other hand, there are (at least) two factors that give me pause before proclaiming doom by scaling:
Since data center energy and water use is priced at almost every stage in the pipeline, there are strong incentives to reduce the resource footprint of individual models.
The ways these models will be used at scale isn’t really clear yet. People are still experimenting, with venture capital and large firms subsidizing lots of uses to see what sticks.6 To the extent they eventually enable people to do things they were already doing at lower resource costs, there may be net resource savings from AI model deployment.
1 is a simple fact of markets and profit maximization. Yes, firms will expand AI model use to the extent they can earn profits doing it, potentially even reshaping society (to the extent they can) to make their AI bets more profitable. But at the margin of actually using those AI models, the price system incents them to find ways to eke out performance gains, reducing resource use (cost, really) per model runtime. We’re already seeing this play out in the chip space broadly (e.g., government-industry initiatives) and in AI specifically.7 Greater energy efficiency will also enable companies to build more embedded systems products with these models — I’m not saying that’s necessarily good (I generally hate IoT things), just that it’s another channel incenting energy efficiency in these models even as their total deployment scale increases. The net effect will depend on both, but I think it’s reasonable to speculate that total resource use for AI will increase relative to today.
(I will be less sanguine on this channel if firms successfully lobby for special data center energy/water pricing. This kind of thing has been a big issue with agricultural water use in places like California, where some users got cheap water and grew water-intensive crops like alfalfa. Not what an efficient price system would incent in a region facing water scarcity. There’s been some action on this front, so we’ll see how it goes. )
2 is more complicated. I may get into this more in another post: any new technology functions as a mix of substitutes and complements to existing tasks/technology uses. People today are really focused on the substitution cases for AI, particularly image and language models, but there are also use cases where these might complement human effort (e.g., 1, 2). These complementarities may enable reduced energy use.
Let’s consider a very simple example. Imagine an in-IDE language model that fills out boilerplate code quickly and lets a developer finish an otherwise-4 hour task in 2 hours. If they use the time savings to get up and go for a 2 hour walk, the model could reduce the developer’s energy use by up to 2 hours of computer time.8 This is a super simplified example just to illustrate the point about complementarities. The truth is we don’t know how it’s going to play out yet and any attempt at measurement will need to be very careful about comparing what’s actually done to the counterfactual activities done in the absence of AI (e.g., see the extensive discussion about baseline selection here). I want to emphasize our ignorance on this question: I don’t yet know if this will happen and neither does anyone else. We can all speculate now, only people in the future know.
I know, I know, “it’s not real intelligence,” “AI has been lots of different things over the years,” “language/image models is more precise,” … . Look, people are calling this AI, so I’m going to run with that language here.
Economists would call this a “marginal” impact, but people tend to think of “marginal” as “small” and “additional” is a bit clearer.
If I’m following the paper correctly the “500 ml per 20-70 queries” estimate includes training as well as inference. Training is an interesting piece of the resource cost because it’s (kind of) a fixed cost across model use. So as the same model performs more inference, the training component should account for declining share of the model’s average total resource cost (assuming it doesn’t need to be trained more or get replaced with a new model). Li et al. consider the cost for a fixed time interval rather than across scales of inference. The marginal water use of an inference appears to be substantially lower than the headline figure: using the average electricity water intensity of 2.177 L/kWh at Microsoft’s Taiwan data center and the official GPT-3 inference energy use of 0.004 kWh/page implies a little under 9 ml of water per page. I don’t know how many queries come up to a page of output, but it seems reasonable to infer the average total resource cost of these models has a nontrivial downward-sloping region, i.e., economies of scale in resource use.
Love to see UCR researchers doing good work. Especially on water consumption. Fact of questionable funness: before I started working on space, I was working on water management, particularly in Australia.
Newton’s law of cooling gives us a reasonable way to bound the time it would take to cool the water. If it’s freezing out, you need to wait about a day to let 1 cubic meter of boiling hot water in a 1 meter diameter vessel cool to 1 C. Since you need to get it to close to ambient temperature to avoid heat pollution, warmer ambient temperatures mean less time. So if you want the data centers to let the water back into the environment, it seems doable.
While this allows for more use overall, it doesn’t change the logic of point 1 unless the subsidies are specifically in the form of energy/water price reductions. The intuition is the same as a carbon fee and dividend: a general pot of money from a VC/bigcorp lets the AI operator buy lots of things, but at the margin energy/water prices still incent use efficiencies to maximize what fits into the budget constraint.
To be fair this is about SVMs, not diffusion (image) or transformer (language) models. But the principle holds.
If they do something energy-intensive with that time — say play PS5 — those savings could become net increases in energy use. This is an old issue with energy-saving technologies.