Photo of Guy van den Broeck in a wooden auditorium
Back to all posts

The Weather in Australia Is Beautiful, the Kangaroos Are TACAS

by Elizabeth Polgreen — 10 April 2026
Topics: Interview

Guy Van den Broeck begins by describing himself as “a bit of an outsider.” We are talking ahead of his keynote talk at TACAS, one of the top conferences in formal methods. He may seem a surprising choice for a TACAS speaker, given that his career has largely taken place outside the formal methods community, covering machine learning, databases, probabilistic reasoning, and knowledge representation. Listening to him, however, it quickly becomes clear why we, a room full of formal methods researchers, should listen to what he has to say.

Because the central problem now facing AI is one that the formal methods community has been studying for decades: correctness.

The progress of AI, particularly large language models (LLMs), in recent years has been dizzying. LLMs can now generate software projects, solve maths problems, and assist in many everyday tasks. But despite their impressive abilities, they offer no guarantees of correctness.

At their core, LLMs generate outputs by sampling from probability distributions over tokens—like flipping coins to produce words. The question Guy’s group asks is: How can we guide this generation process so that the model produces outputs that satisfy formal requirements?

For more than a decade, his group has worked on integrating machine learning with symbolic reasoning. “Ten or fifteen years ago, this meant combining graphical models such as Bayesian networks or Markov random fields with first-order logic. Today, it mostly means figuring out how to get large language models to work well with formal methods.”

The key idea in his recent work is to integrate symbolic reasoning directly into the decoding process of a language model. “We intercept what the language model is doing at every token it generates and perform some form of formal reasoning about how likely each choice is to achieve the overall goals. We then reweight our sampling using Bayes’ rule.” This biases the model toward making choices that are more likely to lead to a correct outcome.

Many approaches in the literature aim to achieve a similar goal. “Silicon Valley spends billions of dollars training models to produce correct outputs by using gradients to update model parameters. Alternatively, you can use brute force: generate a thousand outputs and pick the best one. What we are doing instead is reasoning during generation: ‘if I generate this token, what is the probability that I will do a good job later?’”

One immediate question is how this relates to constrained decoding. “In some sense, it is similar, but constrained decoding is a hard problem and, in my view, still unsolved. The mainstream approach is essentially to mask choices that would violate the specification, like slamming on the brakes just before driving off a cliff. The problem is that this is not goal-oriented. If the goal lies further in the future, many choices could still lead to it, so you still don’t know which ones to make.”

“Consider a specification that says ‘generate a sentence that ends with the word TACAS.’ Most structured generation tools will produce something nonsensical, like ‘The weather in Australia is beautiful, the kangaroos are TACAS,’ because they don’t know how to start the sentence in a way that builds toward the goal.” Guy’s controlled decoding, on the other hand, will build a far more reasonable sentence by considering the goal right from the first token.

So is this the future? Will major AI companies adopt formal reasoning techniques in the coming years? “I don’t like interviews like this because I don’t like to be a fortune-teller—I’m not into crystal ball predictions. But I do think there is a lot of value in this technology. Large companies are already doing constrained generation in the mainstream way, so I think they will pick it up.”

He continues: “Some people argue that we are hitting a wall in terms of what can be achieved through scaling alone (GPT5 was perhaps a bit of a disappointment), and it’s clear that researchers are starting to look for alternative ways to make a difference.”

“The second point is that the game of scale can only be won by two or three different players. Everyone else has to be more clever. Our techniques can enable better performance with smaller models. In a recent project on generating code for robotics pipelines, we took an open-source 7B parameter LLM and doubled its accuracy on planning tasks.”

As for lessons the formal methods community can take from his experience: “My approach to neurosymbolic AI has always been to keep the logic simple. There are many machine learning problems where just a small amount of logic can make a big difference.”

He gives a concrete example: “For example, monotonicity. In the deep learning era, people trained models to predict house prices from feature vectors. Intuitively, we know that if everything else is equal, the bigger the house, the higher the price. But machine learning models trained purely on data didn’t have this basic invariance in their predictions. If you just, with formal methods, enforce such simple invariants, accuracy goes up.”

It might be counter to formal methods researchers’ usual way of thinking, but Guy believes the way forward for formal methods adoption in AI is straightforward: “Keep the logic simple, show the value in what that can do already, and then slowly, we can spoon-feed them more logic and fancier formalisms.”