VUGENE – New Approaches For “Getting Data to Confess”

Juozas Gordevičius, CTO and Founder of VUGENE

Over recent decades, improved understanding of the omics sciences has been crucial to the development of new treatments and screening methods for disease.

So far, genomics has stood out from the crowd, becoming widely accepted as a tool capable of inducing real change. It has already enabled great strides to be made in cancer detection and treatment, thanks to the advent of advanced liquid biopsy technologies, and has significantly improved our ability to treat rare diseases.

A recent highlight in the genomics field came in May of this year. A baby boy, aged between six and seven months, received a tailored CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats) therapeutic aimed at restarting his body’s natural enzyme production. The baby’s inability to produce enzymes was due to a metabolic disease known as severe carbamoyl phosphate synthetase 1 (CPS1) deficiency, a rare and life-threatening urea cycle disorder.

After three doses of the lipid-nanoparticle based therapy, which targeted his liver, he is now growing well, marking a huge milestone for the genomics field.

In addition to genomics, other omics sciences are making their impacts felt. Transcriptomics is becoming increasingly important to our understanding of cancer. Epigenomics is allowing researchers to more precisely link lifestyle and disease. And metabolomics is enabling diagnostics manufacturers to more accurately predict cancer at earlier stages.

However, while each field is having its own renaissance, everyone in the life sciences sphere acknowledges one thing: each omics science has its strengths, but together they could be so much more.

For Juozas Gordevičius, it was this idea of unity that led to the creation of his company, VUGENE.

“VUGENE is a data science company that wishes to simplify the data analysis and interpretation step for biomedical research,” says Gordevičius.

In particular, VUGENE wants to induce change in the preclinical setting – the stage at which researchers are trying to either work backwards from the presentation of a disease to identify a cause, or work forwards from the discovery of a molecule to see how it could act within the body.

The biggest problem facing researchers, explained Gordevičius, is how rapidly our data science capabilities and the understanding of the body are evolving.

“We have so many gene expression experiments looking at understanding DNA methylation differences, understanding the transcription at a single cell or spatial level… people look into proteomics, metabolomics and so on,” he says. “Each of these biomedical data types needs a very specific approach to analyze and interpret. And each field is vast and rapidly evolving.”

“Bioinformatics is a discipline that combines computational expertise with biological expertise. Having a person who knows both to such a high level is close to impossible,” he argues.

Together, VUGENE has worked to solve this. The company has now built a platform capable of taking biomedical data of all types, processing it, and outputting clear, concise results.

Getting Data to Confess

VUGENE does have its competitors – there are companies out there that offer analysis tools which promise easier, faster interpretation of complex biomedical data.

But Gordevičius is quick to highlight that while many of these can deliver great results, they often fail to do so because they fail to properly acknowledge the “inherent complexity of the data being studied and the additional hurdles presented by the experimental design.”

“Things like working out how many cell types are present in your data, or how to detect and remove or account for a batch effect in your data these are things that you can do with just a mouse or code BUT You really need to know what you are doing, which is where VUGENE comes in,” he says.

Gordevičius also notes that because VUGENE does not create any data for researchers, it is impartial in how it is analyzed – VUGENE does not need to overfit data to prove any machinery or laboratory equipment works.

“If you torture the data for long enough, it will confess,” he remarks. “One thing I see too often in data science is attempts to change all possible parameters during the data analysis,” he explained. “You have a million different ways to normalize data, and another million ways to impute the data, filter the data, remove outliers – all of which can change the outcome. If you play around with these parameters, with just the outcome in mind, you will eventually choose a set of parameters that gives the result you want.”

VUGENE avoids this by utilizing publicly-available data like a comparator – by seeing whether findings present themselves in other datasets, VUGENE can be more confident in the results it provides to its customers.

“For any customer who comes to us with their data, we look for publicly available, similar data and apply it to a similar problem,” he explains. “We can not only look for similar patterns, but also assess what is new in our customer’s data, allowing them to make original observations.”

What is AI to VUGENE?

Defining AI is tricky.

The European Commission was widely lambasted after it published the draft iteration of the EU AI Act – legislation which has imposed some curbs on the development of AI in Europe, specifically targeting sensitive applications like healthcare and defense – due to how badly it had defined AI.

Originally, AI was classed as any code making use of “statistical methods,” meaning that software calculating simple values like means, medians and modes – the kind of mathematics most ten-year-olds can do – would have become highly regulated in the bloc.

Gordevičius believes that AI is found at the cutting edge of data science. He explained that for years, data analysis tools used Random Forest methods to make predictions – a supervised learning algorithm that builds a “forest” of decision trees and merges their results to improve predictive accuracy and control overfitting. However they faced a limitation in that despite more data being fed into them, the predictions did not improve.

A “revolution” occurred when multilayered neural networks began to evolve.

“These deep neural networks with multiple layers have become very impressive,” he says. “At first, they weren’t very good, due to only being fed a small amount of data. However, as this amount increased, they have improved (aside from backpropagation algorithm invented to train NNs). To me, this is what AI is – where neural networks are excelling with multiple layers of variables and analysis.”

VUGENE also makes use of agentic AI. Agentic AI is still reliant on neural networks to function – what makes it “agentic” is that you can provide it with prompts or ask it questions, a bit like ChatGPT.

These agents are utilized by VUGENE to automate, or make easier, the data extraction steps involved in analysis. Each piece of lab equipment, each technician, each lab will produce data that is subtly different – agentic AI can streamline any normalization without introduction of further human biases.

Challenges and Highlights of Multi-omics

Even with an innovative platform, multi-omics still poses challenges.

Gordevičius points out that any omics experiment suffers from the same problem most of medical science does – small sample sizes.

But because the dimensionality of multi-omics is so high – there are a vast number of variables involved – even sample sizes in the thousands are sometimes insufficient.

“You can have an experiment with 100 samples, and people will call it big,” says Gordevičius. “1000 samples, and people will call it very big. But just with DNA methylation, we are measuring 28 million CpG sites – that’s the dimensionality of our data. You really need more data samples than you have dimensions to do things seriously.”

“The reality is that we still need to use dimensionality reduction tools from classic machine learning… a lot of the buzz around AI makes use of a large amount of data applied to a small number of dimensions… however, in biotech we still have to deal with limited number of data samples huge dimensionality and enormously noisy signal.”

Challenges aside, Gordevičius believes multi-omics is the only true way forward for many human sciences.

“You don’t gain a very good understanding of disease by looking into only one layer of cellular control – like epigenomics or transcriptomics. People therefore really want to question things like how does DNA methylation work with transcription, and then the metabolome,” he says.

“At VUGENE, we are asking how these different layers communicate with each other. And that seems to be a very interesting take on multi-omics in general… for example, we can observe a minor change in methylation, and a minor change in metabolite abundance, which by themselves might not mean much. But if you combine the two, you suddenly see in a certain disease that a relationship between them breaks down in sick patients.”

Turning Data into Understanding

Currently, VUGENE acts as a service provider – customers approach with a challenge they wish to solve, which VUGENE takes away to work on by itself.

“We ask them to give us a very short write up of the research problem that they are facing, the questions that they want to be answered. And then for the description of customer samples, we usually get through an interview,” explains Gordevičius. “So we talk to the customer, we get to know each other. This also builds VUGENE’s credibility since they want to know that we know what we are doing.”

VUGENE is evolving, however. Gordevičius explains that, due to the human back-and-forth required to build out a program of operations, set timeframes and then deliver results, projects currently revolve around one-month milestones.

Behind the scenes, though, VUGENE’s analysis is much faster. In time, the company wishes to transition to become more of a software as a service provider – customers will be able to access a VUGENE portal, input their data, discuss the outcomes with VUGENE staff, or agentic AI, and receive results in just a few hours.

“As the field of data science changes, so do our customers’ needs. We want VUGENE to become recognized as the trusted first stop for anyone conducting complex bioinformatics studies,” concludes Gordevičius.

Photo credits: Andrej Vasilenko
Written by: Barnaby Pickering