How we handle language in the Metaverse could set the tone for its future
Missed a Data Summit session? Watch on demand here.
“Heads up. Conversations like this can be intense. Don’t forget the human behind the screen.
Twitter’s dialogue warning is the latest in a long-running battle to help us be more civil with each other online. Perhaps more troubling is the fact that we are training large-scale AI language models with data from often toxic online conversations. No wonder we see the prejudices are returned to us in machine-generated language. What if, as we build the metaverse – effectively the next version of the web – we use AI to permanently filter out toxic dialogue?
A Facetune for language?
Currently, researchers are doing a lot with AI language models to adjust their accuracy. In multilingual translation models, for example, a human in the loop can make a huge difference. Human editors can verify that cultural nuances are correctly reflected in a translation and effectively train the algorithm to avoid similar errors in the future. Think of humans as a focus for our AI systems.
If you imagine the metaverse as some kind of scaling SimCity, this type of AI translation could instantly make us all multilingual when we talk to each other. A borderless society could level the playing field for people (and their avatars) who speak less common languages and potentially foster better cross-cultural understanding. It could even open up new opportunities for international trade.
Using AI like Facetune for language raises serious ethical questions. Yes, we can introduce some control over the language style, flag cases where the templates don’t work as expected, or even change the literal meaning. But how far is too far? How can we continue to encourage diversity of opinion, while limiting abusive or offensive remarks and behaviour?
A framework for algorithmic fairness
One way to make language algorithms less biased is to use synthetic data for training in addition to using the open internet. Synthetic data can be generated based on relatively small “real” data sets.
Synthetic datasets can be created to reflect the real-world population (not just the loudest talkers on the internet). It is relatively easy to see where the statistical properties of a certain dataset are out of whack and therefore where synthetic data could be better deployed.
All of this begs the question: will virtual data play a critical role in creating fair and equitable virtual worlds? Could our decisions in the metaverse even impact how we think and speak in the real world? If the endgame of these technological decisions is a more civil global discourse that helps us understand each other, synthetic data can be worth their weight in algorithmic gold.
Yet, tempting as it is to think that we can press a button and improve behavior to build a virtual world in a whole new image, it’s not a question that only technologists will decide. It is unclear whether corporations, governments, or individuals will control the rules governing fairness and standards of behavior in the metaverse. With many conflicting interests in the mix, it would be wise to listen to leading tech experts and consumer advocates on how to proceed. It is perhaps delusional to assume that there will be a collaborative consortium among all competing interests, but it is imperative that we create one, in order to have an unbiased linguistic AI discussion now. Every year of inaction means that dozens, if not hundreds, of metaverses would have to be upgraded to meet all potential standards. These questions surrounding what it means to have a truly accessible virtual ecosystem need discussion now before the mass adoption of the metaverse, which will be here before we know it.
Vasco Pedro is co-founder and CEO of the AI-powered language operations platform Unbabel. He has spent more than a decade in academic research focused on language technologies and previously worked at Siemens and Google, where he helped develop technologies to better understand data computation and language.
DataDecisionMakers
Welcome to the VentureBeat community!
DataDecisionMakers is where experts, including data technicians, can share data insights and innovations.
If you want to learn more about cutting-edge insights and up-to-date information, best practices, and the future of data and data technology, join us at DataDecisionMakers.
You might even consider writing your own article!
Learn more about DataDecisionMakers