We hear a lot these days on how expensive and environmentally ‘unfriendly’ AI and especially Large Language / Video / Audio Models are.
To a large extend, these concerns are valid: Large Models need significant hardware resources to be trained, running at peak performance for a considerable amount of time (that can range from weeks to months), during training and validation.
When one considers the specification of the hardware that is utilised for these activities: multiple top-end GPUs, CPUs, coupled with significant amounts of fast RAM and storage, it is easy to deduce that the power consumption needs of owning and operating a Large AI Model are high.
The figures would probably become ‘stratospheric’, if one were to consider the power consumption of the environments that provide a global quality service from the likes of OpenAI for Chat-GPT, Microsoft for Co-pilot etc, Cohere etc.
Depending on the source of the electricity produced for this consumption, which would include not only the compute infrastructure but also the cooling systems, one could argue that modern AI significantly adds to global pollution.
Along the same lines, there are also concerns of the democratisation of AI ownership: the need to ‘own’ and operate such powerful and expensive compute Infrastructure, puts premium AI ownership out of reach for all but the few largest corporate titans of the industry. We seem to have gone a full cycle from the democratisation of IT resources through the adoption of Cloud compute, to scarcity and concentration to the hands of few corporates in the world of AI.
It is certainly the case, I hope you agree, that the costs of training such large and capable AI models are, most likely, out of reach for the average professional or small to medium size IT department. The business case would probably not stack-up for a large department either, even if the data were available. Which means, we are restricted to using what is already available by Google, Microsoft or OpenAI etc.
Fine-tuning utilising a LoRA approach does help to some degree in adapting an open source LLM to one’s needs. So does quantisation in helping deploy a large model to more cost-efficient infrastructure. However, there are always the risks of catastrophic forgetting in LoRA and you’d need a large environment to execute the quantisation, which defeats the purpose (or rely on someone else to release a quantised model). What you gain in size reduction, you lose in capabilities – mainly accuracy.
However, I believe that the future is not so bleak. The reasons I am optimistic are two-fold:
When it comes to power consumption, most of the AI infrastructure is Cloud based. Most Cloud providers do have a significant interest in reducing their costs in providing this infra, which is translating into novel ideas such as:
- the placement of datacentres underwater (AWS, Microsoft – although they seem to be abandoning this approach recently…)
- the use of renewables through the deployment of datacentres in normally sunny areas
Both of these approaches would become even more prevalent and the use of nuclear energy (that has a low pollution foot-print) will only improve matters.
I also believe that a lot of the early cost of ownership and operation figures that have been released in various white papers (eg ‘BloombergGPT: A Large Language Model for Finance’), are based on experiments that did not have efficiencies in mind.
For example, the paper mentions that the architecture used was AWS’ Sagemaker, Sagemaker is a great environment for setting things up and getting to results fast, but not cost effective to run large AI workloads in production (on AWS). The other paper that comes to mind is the training of Bloom, which was sponsored by the French government. I believe that, within reason, at least training costs are (or can become) more optimal than what has been published.
It is certainly the case that the hardware industry is responding to the demand through specialised AI focused chipsets, which should drive performance up and costs of (AI) compute down. We also see the AMD chipset becoming tentatively utilised by AI frameworks, as an alternative to Nvidia – which should help with hardware costs in the medium to longer term.
Not to be outdone, the academic field is also responding with new network paradigms in the form of Kolmogorov–Arnold Networks (KAN), that promise better learning: you only need to train part of the network with the specific data relevant to you, albeit at a cost of slower training.
What really excites me, however, is the possibility of wide adoption of quantum computing.
With the adoption of a computing paradigm that (so far), appears to be several multiples more powerful in computing capabilities than the current models, the tantalising prospects of being able to train and operate complex large models at a fraction of a cost would lead to a democratisation of ‘serious’ AI capability.
Assuming a ‘reasonable’ unit cost, at least in the medium to longer term, the compute power of quantum computers should allow us to not only democratise the training and operating of Large Models, but it should also do so more efficiently from a power consumption perspective, given their compute capabilities. I am a great believer that quantum computing is the next step of Moore’s law, especially in the AI field.
What has your experience been so far? Have you considered the costs of operating a Large AI model and wondered how you could improve? Have you reconsidered fine-tuning your own model due to costs, even though you may have had the right data to do so? Are you also looking forward to the promises of quantum computing on AI?
I would love to hear of your thoughts and experience.





