Software For The Environmental Sciences

by Eduard Kamburjan — 21 December 2023
Topics: interview

Prof. Gordon Blair is Head of Environmental Digital Strategy at the UK Centre for Ecology & Hydrology (UKCEH). He is also a Distinguished Professor of Distributed Systems at Lancaster University and Co-Director of the Centre of Excellence in Environmental Data Science (CEEDS), a joint initiative between UKCEH and Lancaster University.

You work closely with environmental scientists and focus, among others, on how they use a lot of models and how the software they write is not optimal. Can you elaborate a little bit on where the challenges are, where the problems are right now?

I think the first thing to say is that software is increasingly important in the environmental sciences. The underlying modeling work is extremely complex, and the models can easily be over a million lines of code. It has a complex internal structure, and it evolves over time—but the people who write the code are environmental scientists, mathematicians or physicists who do not have the training in computer science or software engineering. That is proving to be a big problem right now, and the challenge is getting worse because we are trying to get more ambitious in the modeling world. There’s a whole new area called digital twins, which is pushing the modeling paradigm. And without the software engineering expertise, we are just not able to develop projects of that magnitude. So there’s a whole raft of software engineering and software sciences skill sets that is needed to address these big challenges.

Is this a problem that there is no training to write software in those fields, or is this a problem that we do not provide them the right tools?

I think it’s a bit of both, but largely the former. We have a lot of discussions within my organization, the UK Centre for Ecology & Hydrology, where I always talk about the difference between programming and software engineering to the environmental science domain experts. They do not necessarily see that. They think people write a program and then the job is done. But as a trained software engineer, I think about software architecture, I think about patterns, I think about writing code that’s extensible and flexible and rigorous and so on. And that’s not in the mindset because people have picked up programming as a necessity and have not been trained in the broader aspects of software engineering.

Also, it’s quite surprising that most of the code that’s written in the environmental sciences is still written in Fortran. There’s lots of great tools out there, but you can not necessarily take advantage of all of them, because they target more contemporary languages.

What software sciences can contribute? Do you think that we are at the point where we can use, for example, model-based software engineering approaches, or do we need to do a little bit more there?

Do a little bit more. I gave a keynote at MODELS a few years ago, which was really enjoyable. It’s a fantastic community. And the message in my keynote is still very valid today. I was basically saying that they are sitting on a goldmine. I love the work done in model-based software engineering, how you can deal with an abstraction that’s meaningful in the domain and derive robust software from that model. That whole concept is fantastic and there are some great tools there. But in that keynote, I challenged the community in two directions. Firstly, I think the tools are a little bit clunky: In the world of environmental sciences, they do not want to invest lots of time in learning a tool and having to learn XML to use this tool. They just want to do it. For another, I’m not convinced we have cracked the domain-specific bit. It means co-design and in computer science, we are still inclined to interpret domain-specific as we think. For example, many domain-specific languages you look at often resemble finite state machines or some abstraction that is familiar to computer scientists. But the language of an environmental scientist or a modeller or a data scientist in that domain might be quite different. So I think we have to work harder as a discipline at really listening and really understanding the language different disciplines use.

You mentioned Digital Twins as an approach, in what sense do you understand them and how they can be useful for environmental science?

Well, let me be quite specific, because I’m not talking about digital twins generically. I try to be quite specific about what it means for the environmental sciences, and I’m quite clear in my own mind. I’m looking at a new paradigm of modeling that does better environmental science. And the way of doing that is to bring together process models and data models, look at the system as a whole and then look at the synergies between process models and data models to understand such systemic aspects.

And that’s useful because in the environmental sciences, there is a lot of investment in physical models and models of scientific processes that capture equations representing, for example, the hydrology and the flow and the fluxes. There are a lot of them around, but they have high levels of uncertainty and are heavily parameterized—on their own they are not always helpful. And then alongside that we have this real plethora of new data available to the environmental scientists. Lots and lots of data that is so complex it’s hard to analyze, even with new techniques emerging from AI. We need to look at the interactions between data and the process model and see where they each can learn from each other.

I wrote a short paper, how some emergent properties that you see in the data might inform the process model to change its behavior or inject some process that you might not have captured yet in your process model. That sounds quite far-fetched to environmental scientists, but in computer science, these adaptive systems are quite well understood: Injecting in new behaviors, fitting models to contexts, context awareness. There’s great work on self-organizing systems. Then your modelling becomes a continuum, a learning process, where your models, whether it’s of a river catchment or of weather patterns, is constantly evolving to represent the current data, at the current time, in the current place. That, for me, is quite a fundamental step forward in modelling. And that is what I mean by digital twin. As a software engineer, I would call it a framework. You place modelling in context, challenge it and improve it.

That is quite different from how engineers understand digital twins, which is all about feedback loops and so on.

A lot of environmental digital twins are now converging towards having a big platform using knowledge representation and machine learning. It is also a different kind of modeling, what are your experiences with bringing them into software-driven applications?

I think of data science as being a spectrum. I think of it as a toolbox. There’s a toolbox of techniques out there. And at one end, there are the AI techniques, such as knowledge graphs and deep neural networks. At the other end, there are some very useful simpler techniques. Simple algorithms that will, for example, detect changes in time series, or reason about extreme value. If you have a distribution, and you are interested in extreme events, you are looking at the tails of the distribution, and those statistical techniques help with that.

Training an emulator instead of running one of the big complex process models can really help and can allow you to study the sensitivities and uncertainties in that model. It’s a raft of techniques working together. Back to my toolbox, we do have software that’s used by environmental scientists, but there’s also a lot of fuss around green IT and how we can reduce energy efficiency or resource consumption of computing.

Can you maybe just summarize your opinion on that?

I have a split view, which I have captured in a recent short paper. I’m an enthusiast about technology. I’ve been in technology all my life, and I get excited about digital innovation and how it can transform the world in a positive direction. By positive, I mean that we are using digital technology to help us understand and manage our way through this climate crisis. There’s of course the positive side of this impact, but you have to be aware of the negative side. I’ve done some work with colleagues, where we have looked at particularly the carbon emissions of technology. I am deeply concerned that carbon emissions are very significant and are probably underestimated. And we are at a time when carbon emissions have to be reduced drastically. I think we have a responsibility to think about carbon emissions of technology, raise awareness of it, to make sure it’s incorporated in computer science education, and to make sure that software engineers are trained to think about this.

There is one other point I would like to make: There’s great innovation, many people are thinking about green IT and how it can make things better through reducing energy consumption. Our experience and insights are that that will not make any difference whatsoever because of rebound effects: If you make something more efficient, you use more of it. So if we make our environmental models run 10 times more efficiently on our HPC, what will happen is that we will run it 100 times rather than 10 times. In the end, it is not a solution to look at energy efficiency alone and or to just make technology more efficient. We have to look at the problem more systematically and be aware of the rebounds, and then work to resolve the rebounds. At that point, you need economists in the room: How do you design a system with efficiency gains that does not rebound? Do you have to bring a constraint? Is that a carbon tax representing the actual cost to the environment, not as an externality, but as an actual cost? Technologies like compiler analysis and optimization work well as a building block, but they have to be placed in the context of something systemic so that you benefit from the efficiency gains, and it does not just rebound on you. I mean, it’s very interesting and very challenging, but it’s a difficult message to hear, and we have talked a lot about this with different stakeholders, and they do not like it because it’s not the answer they want.

Is there some advice you would give somebody in computer science who wants to help fighting climate change?

I came to this area quite late in my career, and I wish I’d done it earlier. It’s not abandoning your discipline. I know that “applied” can be a dirty word sometimes, and there is sometimes this mind set that there is pure computer science and applied computer science, and somehow the applied is less meaningful. I would say ignore that. I think you only live once. You have a certain skill set. Do something meaningful with the skill sets and the life you have, and that’s so important. Actually, I’ve found that by doing this way more applied work, I’m still being challenged as a computer scientist. I’m still wrangling with deep computer science problems, but I’m doing it in a context that’s meaningful. And there’s nothing like the real world to really evaluate what you are doing. It pushes you, there are surprises. And it makes you a better computer scientist, I can assure you. So if there’s anybody out there that’s going to read this, come on in. We need computer scientists in the environmental sciences. There’s not many computer scientists or software engineers in the environmental world. You will be embraced, and you will find challenges that will deeply stretch you and be very meaningful.