Invalid URL - Google's Flagship AI Product is Saying Slurs

Invalid URL - Google's Flagship AI Product is Saying Slurs
A screenshot of Gemini returning an invalid URL response.

I don't remember the first time I heard the word invalid (pronounced in-val-id). But I do remember the first time I heard the word invalid (pronounced in-vuh-lid).

I was in set two for mathematics at school. Set one had all the wunderkind, set four had people who struggled with basic arithmetic operations, let alone algebra, while set three had all the kids who would grow up to be unremarkable at maths. If I had to describe the average set two student, it would be brilliant but lazy. When it mattered, we could perform well on exams, but very few of us put in the effort during class. When a child needed help, the teacher would be effectively dead to the world, concentrating on my fellow pupils who actually gave a shit. I have a distinct memory of throwing a rugby ball across the classroom in an impromptu game of catch, which lasted for the rest of the lesson.

One day, this poor teacher thought she would book the laptop trolley (computers owned by the school to allow computer work in classrooms without computers) and host an interactive quiz. My memory goes a bit hazy here; it was pre-Kahoot, and you had to manually type in the answer. If you typed in something that was not a number - causing the input validation to fail - it would simply return invalid.

As I mentioned before, the average set two student was brilliant but lazy. We wanted to go back to our game of catch. So rather than participating in the game, a few students mucked their answer and had a quiet 30 seconds to themselves. The teacher quickly became irate, and after the third or forth invalid answers from a small group of students, she started naming and shaming. This was her first mistake.

The thing about being barely a teenager in a British school is that we found this extremely funny. Growing up as a Brit watching media with American schools really highlights how savage the UK education system can be. American bullies seem cartoonishly slapstick compared with British bullies. Much like the dog breed which shares the same name, they tend to snap, going straight for the jugular. This also extended to teachers...

The group of students rebelling against the game the teacher prepared quickly doubled in size - as did her reaction. She was British, but had spent time in Australia and therefore had an Australian accent. We had a lesson which covered some type of data science (set two, I was probably asleep and therefore cannot remember). Despite having an Australian mother, and being an Australian citizen myself (undocumented - that is another story), the one thing I learn that day was the funny way they pronounced "dar-ta".

She stood up from her desk and pointed at one of the pupils who originally started the anarchy: Stan, I believe. This is where she made her second mistake.

"Stan, In-vuh-lid" she roared, arm outstretched and her finger straightened to the point it looked like it hurt. The way she said invalid didn't sound right. The class erupted in laughter. Her arm fixed on her next victim. "Invalid", she repeated. The laughter continued, growing louder as she made her way around all the troublemakers. Once the room had diminished to a low grumble of weak chuckles, someone told her the mistake; or at least told her what the class had perceived the mistake to be.

"Miss, it is pronounced in-VAL-id, not in-vul-id", a student pointed out, as the laughter slowly rose again.

"Well, in Australia, we say in-vul-id". As I previously mentioned, I am an Australian citizen through my mother's side and had never heard invalid pronounced that way. Another student, one of the ones who gave a shit, raised their hand and made the situation even funnier to those who had been misbehaving.

"Miss, and in-vul-id is a disabled person. You have called half the class disabled". Her face remained stern, and after a few seconds offered a retort.

"Well, I am going to keep saying it the way I say it". To this day, I am unsure whether she was ignorant, or taking advantage of a situation to enact revenge on the little shits who had tortured her for daring to derail her game which was ultimately designed to help us. I live in a small town, and still occasionally bump into this teacher. Knowing her as a fellow adult, rather than as an elder in a position of power, makes me lean towards the former. Or knowing more about the effects words can have on people since maturing, I hope it is the former.

Galaxy AI

Sunglasses, Top Hats, Toasters & Timothée Chalamet

Fast forward twelve years. My Google Pixel 7 had encountered a flaw in which the USB-C port failed. The remedy for this issue was to utilise wireless charging. This led to the much more serious and fatal flaw of the extra heat compromising the adhesive, causing the screen to become detached from its housing. This was the third Pixel I have had fail through no fault of my own, and perfectly symbolises what Google currently stands for.

A WhatsApp chat from when my phone broke, taken through a laptop camera. The phone failed shortly after, before it could be fully backed up, so this is the only image I could find.

After months of dual-SIMing on my work iPhone, and frankly being sick of being limited in what I could do on my phone due to a mixture of Apple's closed ecosystem and professional conduct preventing me from doing anything fun, I finally received a new phone today. I enter my late twenties in six days, and generally have not been having a great time in life recently, so I thought I would treat myself to an early birthday present.

I opted for a Samsung Galaxy S25 Ultra, mainly because it is eligible for a free Chromebook worth nearly triple my contract buyout for the broken phone. A friend of mine got one shortly after release and demonstrated some AI features to me.

"You can take the pen and draw glasses on yourself, and the AI will put them on you", he said to me while we were sitting at our usual table in the pub beer garden. He demonstrates on our mutual friend's docile Cane Corso. "Look, Bonnie has sunglasses now". I start muttering my disapproval, but as the only white-collar individual in a blue-collar friendship group, my complaints go ignored. "You work with computers, I drive a van. I don't think this feature is for you. You know too much". Just typing this article gives me a huge amount of dopamine, as my myopic view of AI often goes ignored by the people I care the most about.

"Let me have a go", I say, seeing if I was just missing the fun of the gimmick. I take a selfie, draw some glasses on myself and hit generate. "Can't generate due to location of drawing", I confirm to my friend, who looks baffled, fifteen seconds with a cyber security expert broke his new favourite toy.

"Try again", he says in an annoyed but earnest tone. I drew the glasses again. “Can't generate due to location of drawing". I draw a mustache. Can't generate due to location of drawing". I give it one last try with a baseball cap crudely drawn over my selfie. "Can't generate due to location of drawing".

"You just don't know how to use it" he says in a depressed tone. "Let me try". He proceeds to draw a baseball cap (much more proficiently that my attempt) atop a birds nest of curly hair. The AI then proceeds to morph the hair into an eldritch monster of curls, making me look like a male Marge Simpson.

"Must be the light from the heater", I reassured him, knowing it is just another abomination to be churned out of the digital sweatshop of big tech. It was the middle of winter, so there was no way we were going to turn the heater off to test the theory.

"Yeah, must be", he solemnly replies, as he lights a cigarette and closes the camera app for the final time that night. "Must Be..."

Fast forward to today. I have my own shiny new handset, so could test the theory. I take a selfie and draw glasses on myself. "Can't generate due to location of drawing". Strange. I try a few more times, with the same message as a response. Finally, I draw a top hat on my un-groomed head.

As someone with curly hair, creating a character in an RPG becomes extremely difficult due to oversight from the designers. I no longer attempt to recreate myself as more often than not I can't; there are never any curly hair options in games, and when there are, they are more akin to Black hairstyles than the broccoli hair of Gen Z (note - the curly hair is not a choice, it grows that way).

Working in tech, I know about the biases of training data, and know that some firms are working on addressing it, or at least were before they cozied up to Trump and abandoned any pretense of giving a shit. Google had accidentally generated images of Black Nazis when Gemini first released. Surely they, or Samsung, have curly hair down by now?

The resulting image turned by tight curls into messy, wavy hair. More Timothée Chalamet than Jeremy Clarkson. But that was not the biggest issue. That would be the toaster that Galaxy AI put on the top of my head instead of a top hat.

I chuckled to myself, posted to BlueSky, and sent a sarcastic message to the friend who had introduced me to Galaxy AI down the pub.

A WhatsApp chat with my friend who introduced me to Galaxy AI. Note: the lack of capitals, and addition of capitals where they shouldn't be are indeed a result of the hand writing feature of Galaxy AI being really shit.

Gemini

It is either Australian, or hates disabled people

Over fifteen-hundred words later, we get to the main reason for this article. As I had migrated from an iPhone, which was a work phone, I had to manually set everything up from scratch. This is where I tried to used Gemini to speed up the process.

"Gemini, install WhatsApp". Gemini waited a few seconds, opened the Google Play Store homepage and did nothing else. I closed the Play Store and asked Gemini again. "Gemini, install the WhatsApp app". Either the change in prompt or closing the app store fixed the issue, and it was indeed faster than manually navigating to the store page, despite the false start.

I continue doing this with the apps I deem essential. I then run into an issue. I assume that Gemini hallucinated the URL for the app I was trying to install, and gave me the error it had encountered in the default male British accent:

"Invalid URL"

My mind immediately returned to the mathematics classroom I had not inhabited in over a decade. As mentioned earlier, I had become much more conscious of the effects of language, purely as a side-effect of living in a world which was slowly, now rapidly, becoming increasingly fucked up. I find the word invalid in reference to disabled people abhorrent, literally calling those with a disability invalid. Especially as at the time of writing, both the UK and US governments are finding new ways to torture those with gender dysphoria, autism and those who exhibit any other signs of being neuro-divergent.

0:00
/0:06

A video demonstrating the wrong pronunciation of invalid, resulting in Gemini saying a slur.

I had just bought a handset which is currently the most expensive model in stock, in which one tech giant asked another tech giant to pull out their already working assistant in favour of one which cannot say a very common word (especially in computing) in a way in which it is not a slur. It is hard to not let it taint your impression of the technology, and how it is being rolled out.

Early Generative AI

My Experience Building Generative AI in 2019

When it came to AI, specifically large language models, I was already pretty jaded. But it was not always this way. Before I went into cybersecurity, I wanted to be an AI engineer.

My dissertation at university was using machine learning (recurrent neural networks specifically) to compose piano concertos. At the time, I was in a band, and music was a hobby of mine, so it was a good project for a dissertation.

Crafting tools to manipulate the training data was a thrill, summarising a piece of music as two three-dimensional arrays of integers to feed to my screaming Nvidia GTX 1070. Furthermore, it gave me ample time to go down to the pub. Whenever someone asked why I wasn't working on my dissertation, I could just say it was a very complex project, and my computer is going to finish what it needs to do in 18 hours. It was true. Mostly. Brilliant, but lazy...

I have always been ashamed of that project. I am a perfectionist when it comes to code I write. Great in theory, but it is extremely debilitating in terms of getting things done. The RNN would start off with a choppy, but tuneful melody based on the public domain piano concerto MIDI files I fed it. But as it made mistakes, and more importantly, remembered those mistakes as playing the correct notes, it would slowly get stuck in loops. In the end, the probabilities shifted in such a way that it would play a single note endlessly.

I have revisited one of the MIDI files, which has been sitting there for 6 years (almost to the day, the file was created on the 28th of April 2019). Hearing it brought me joy. I don't think I did a bad job, not for a single twenty-year-old fueled by coffee and Magner's cider. The main issue was the academic process of having to choose random samples to evaluate and getting a bad draw, as well as the time it would take to retrain the model on my plucky little GPU. I can finally put it to bed. I have left a clip below to listen to.

audio-thumbnail
The Looping Piano - Created by an RRN Created by Me
0:00
/252.250453

I believe this highlights an issue with the AI engineer mindset. AI seems like magic, but it is just a collection of probabilities. It takes skill to write a good machine learning model. I have been far too hard on myself for what I perceived as me putting shit into a model and getting shit out. But as I've tormented myself with my supposed failure, I now know how to fix it through adjusting the long short-term memory (LSTM) to selectively remove errors from previously generated notes in the sequence, as well as optimisations to the neural network, which handled note timings. I just needed time.

AI engineers have the opposite problem. They still believe it is magic. They strive to create a bigger and better model than the last, iterative progress, rather than taking a step back and reassessing the problem they are trying to solve.

Google, Microsoft, OpenAI, Anthropic, Meta et al seem to be simply shoving more data into their models, before parroting "artificial general intelligence by 2027". It is extremely obvious that this is the strategy, that there are now services for generating synthetic training data for large language models. The models which generated this data are often specialised, so at least they are avoiding creating a superior model through data generated by an inferior model. But it obviously has limited returns.

Since AI hit the market, it has not improved. The increased data set has skewed the probibilities so much that we are seeing regression, at least from the perspective of the end user. In the few years AI has been around, big tech have failed to make it profitable and are using the promise of AGI and general improvements to keep the gravy train running.

It Gets Worse...

Google Revisionism

The heart I put into modelling that data seems to be lacking in the industry. These companies don't seem to be refining the data, instead chucking the entirety of human history at it and creating Lovecraftian monsters. LLMs are extremely good search engines. It has practically indexed the entire internet and is very good at retrieving information buried under one hundred and seventeen Google search result pages.

Without the proper curation of these tools, we end up with slop. When I heard Gemini inadvertently say a slur, it highlighted the lack of care which goes into these tools. I asked Gemini the difference between the two pronunciations. Only it had swapped the pronunciations. My teacher was right, and I have been saying slurs all of my life. I was in such disbelief, I assumed I was hallucinating. That I had misremembered the anecdote at the start of this article. So I Googled it. This is what the AI summary returned:

Google's AI Overview feature assuring me that both definitions of the word are pronouced the same way.

Google suggested that both versions of the word reassured me. It has messed up twice, so I was likely correct. The first proper result confirmed this. Furthermore, the UK's digital inclusion service also confirms that it is inappropriate to call people invalids.

Why Does Any Of This Matter?

Gemini didn't mean it

With large language models, people need to understand that it does not understand English. It understands how to construct a sentence based on the probabilities of the words and phrases fitting together. Big tech can put in guardrails, but the LLM can decide it needs to escape the guardrails.

The point is not that it said a word slightly wrong and accidentally said a slur. The review of Galaxy AI, as well as Google Search results, shows there are much bigger issues. It is the fact that it can say it, albeit accidentally. I asked Gemini to say the word "cripple", another word which can be used as a noun or an adjective. The guard rails kicked in and explained to me that it is a hurtful word. But the AI voice still read it aloud. This example is asking it to explicitly reference the word with no context, but what would it take for the probabilities to shift and for Gemini to say something hateful à la Microsoft's Tay AI from the 2010s?

We have multiple AI systems interacting with each other. One to take what I am saying audibly and turn it into a text prompt for Gemini, Gemini itself creating the response, and the AI to synthesise its response back into audible speech. The last one is quite indefensible, as Google have had profanity filters for years in their Google Assistant (which Gemini is replacing) and is a feature of their cloud speech-to-text client-side libraries.

Gemini read me the lyrics to Green Day's American Idiot, which contains a slur for homosexual men in the second verse, which it read aloud. I did not ask it to give me the lyrics to American Idiot, I asked it to give the the lyrics to the bridge of Green Day's holiday, which contains the same slur. I had given it the lyrics before the bridge, so it had the highest probabilistic chance of reading the correct lyrics. An unhappy accident that instead retrieved the lyrics of another song with the slur in it. I gave it more data, and it got confused.

All this highlights the sloppiness of AI, and the companies running them. AI content is not slop because it is pig feed, it is slop due to the nature of their creators.

It has been suggested that Trump's tariffs were generated by AI. If this is true, AI has done untold damage to people's lives. Those who are about to retire have had their pensions fall, livelihoods have been destroyed, and it is contributing to the wider destabilisation of the world. I dread to think how many Watts were consumed in my testing to generate a toaster on my straightened hair and for Gemini to accidentally say slurs to me. Boiling the ocean due to the sloppiness of big tech.

If the equally probable reality is that the tariffs were drafted by a grossly incompetent member of Trump's team, ask yourself this question: Has hallucinated AI content been enacted in legislation? When you consider that legislation often references other pieces of work, and that even though the legislation was not directly written by AI, were the references, or the references in the references? The answer to both questions is almost certainly yes.

I have one final point before the final section, which will champion some positive outcomes of AI. As big tech barrels towards creating artificial general intelligence, has anyone considered artificial emotional intelligence?

The effect of AI's actions can be as small as dehumanisation through mistakenly pronouncing a word in a way which makes it a slur, or erasing one's genetic traits like curly hair because it is atypical, when the user has had previous social issues as a result (people sometimes touch my hair without asking because it "looks fluffy", fucking weirdos). It can be more serious when it manages to escape its guard rails, be used for malicious purposes or hallucinate something that causes real-world harm.

Writing this, I am reminded of a talk I went to at a security conference. A penetration tester had finished a talk and went into a Q&A session. Some asked about extorting employees through their families to simulate insider threats, with the rationale that the attackers are allowed to do it, so therefore the "red team" should be allowed to do it. Red team is the name given to offensive security teams, with defensive security teams being the blue team. I have observed that education organisations often prefer "ethical hacker", which the corporate world has dropped, but that is a whole other article about ethics is technology and the militarisation of cybersecurity. I looked at the speaker and could read his thoughts.

"What the fuck is wrong with you", I imagined him thinking, before he gave a meek answer which went along the lines of "No. Just... No. We don't do that. We are not them". He obviously took this advice, as I haven't heard of any employment tribunals involving unethical ethical hackers.

Among humans, we have those who do not consider their impact of their actions on others. Selfish, sociopathic and narcissistic are all adjectives to convey various degrees of one's lack of empathy. People who can be described by these words are almost certainly decision makers at these big tech organisations. The current AI arms race towards AGI has meant organisations often have to forgo empathy to maintain a competitive advantage. If humans often lack it, can we really expect a system designed to mimic us, trained on billions of lifetimes of human work and ultimately have its efficacy measured by a myriad of apathetic frameworks,  to have empathy?

Where Are We Now?

The Bright Future of AI

AI still has its uses. I have seen LLMs do some really impressive things, but these models are often trained against much smaller, neater and specific datasets. The key is to divide and conquer. Champions of agentic AI see this as the future, where there are lots of smaller AI models dedicated to specific tasks.

Machine learning has been around much longer than LLMs, and it has excelled in applications where it was used. In cybersecurity, we have heuristic algorithms. Things that would take an obscene amount of time to compute, but can be done by a neural network at a very high accuracy in fractions of a second. In fields where a delayed response could cause irreparable damage, AI's heuristic nature lends itself to these technologies.

The world's first exaflop computer wasn't a computer. It was the Folding At Home network, which utilises deep learning and machine learning to aid virologists in studying diseases like COVID-19.

Big tech seems to have seen what AI can do, and have devised a plan to get them to facilitate a business plan which allows infinite growth, which is impossible. If big tech can follow in DeepSeek's footsteps and create models which don't boil the oceans so they can lay off most of their workforce, that may be possible. But I have seen no evidence that it is, and AI experts I have talked to haven't seen any either.

The future is talented data scientists devising new ways to create probabilistic models to facilitate on their consumer-grade graphics card and running with it to build something beautiful.

Have fun and look after each other.

Edit: Forgot to recommend a book which helped me build my RNN. Deep Learning by Ian Goodfellow, Yoshua Bengio and Aaron Courville. Good read if you want to understand the mathematics behind deep learning.