Machine learning is starting to take over the economy the very way the internet already has. A variety of newly available tools have caused a great deal of controversy in art circles and free software communities.. These advancements in AI-powered tools are effectively game-changers in the domains they disrupt.
Hailed by techno-utopians of all stripes, these advancements are huge part of driving the so-called 4th Industrial revolution. If we are not careful, we will run straight into the trap of allowing these tools to be weaponized against everyone. Understanding how these tools work on a basic level is important.
You may have heard “Data is the new oil”. That “oil” is fuel for massively powerful machine learning algorithms. No publicly (or easily) accessible data, media, or information is safe from the “data beast”. The data beast is hungry for any and all information, it doesn’t matter how trivial or private, it wants everything. The reason is because every addition of well-collected information is invaluable in building a wide array of powerful machine learning models.
What is a machine learning ‘model’?
A model is the sum-total of all insights a machine learning model “learns” while analyzing large amounts of data. The larger and more complicated the model the more sophisticated problems the model can be used to solve. For example, imagine you wanted to be an olympian athlete but hadn’t decided on a sport. An AI could use a model trained on data from every previous athlete, like their physical attributes, record, diet, and other things to give you a “life plan” it calculates is most likely to succeed.
Models are “trained” by analyzing massive, well-categorized datasets. This means that the raw data itself often isn’t enough. It needs to be carefully curated into a highly detailed and correct set of information and it’s quite easy to fall short of that. Any and all deviations from reality will distort outputs from machine learning models. This gives rise to concerns about AI models treating people unfairly. Models trained on data we collect is effectively holding a mirror up to ourselves. Whatever small slice of reality we simplify and package to train on is how the model sees the world.
The only way to solve that problem would be a highly invasive all-pervasive surveillance and control system that tracks nearly every atom in the area of concern. In addition to being highly impractical, it’s highly unlikely people would knowingly choose that kind of world. It’s what we unknowingly choose that is a problem however, what systems are we building by refusing to take an active role in shaping the future?
Employee of the month: Terminator
A great deal of resources are invested in understanding how robots can recognize, track and destroy things.
When it comes to the workforce, soldiers and police may be the most at risk of being automated away entirely. Such a shift could be used to eliminate any trace of accountability or morality in such institutions.
AI has “learned to code”
This demo is quite impressive. The presenter is using Github copilot to build a simple program with spoken words instead of typing. It’s initially quite striking how much of a win this kind of technique is for making development more accessible. It is fair to be skeptical, many live tech demonstrations are often highly staged to smooth over many real-life issues. That said, this specific goal does not seem very far away.
Are software developers obsolete now? Likely not for a while, even using these tools requires some technical skill or understanding. It’s also a lot harder to make guarantees of safety or security for systems that aren’t carefully designed by humans. That’s not to say there may not be powerful analysis tools built from these models that could strengthen security.
The Dark side
Github is what’s known as a code forge. A place for developers to host their code and collaborate on improving it. Different forges have different features but Github, owned by Microsoft is the largest and most popular. Developers often choose to distribute their code under various licenses. These licenses dictate the terms in how the code can be used in other software projects.
Code is just information. Nothing has stopped Github from training copilot with all the data on their platform, and potentially other publicly accessible repositories as well. It is arguable that Microsoft has violated the license of many of these projects, however when it comes to those who use copilot to make new projects it’s not clear at all.
Can a robot create a beautiful masterpiece?
Gearheads aren’t alone, creatives are now struggling with the same issues. DeviantArt, a large platform for sharing artistic works announced DreamUp which is their equivalent of tools like stable diffusion. Trained on a massive collection of categorized creative works, the model could be used to create entirely new works.
The artists raised some concerns about how this was done.
Did DeviantArt train their model with artists art without explicit permission?
How can artists opt-out of such data-mining?
Will DeviantArt remove artists art from the model or dataset(s) if the artist did not wish for their works to be used for training AI.
Concerns
As you can see, these problems are highly generalizable. Sure it’s programmers and digital artists today, but how far away are authors, musicians, or even lawyers from being assimilated? It is tempting to nit-pick particular implementations and say that the flaws of today are an inherent fixtures of AI problem solving. This is naive, because it can be hard to predict what problems can be solved quickly and what problems will take ages.
Copyright & Trademark concerns
Is using publicly accessible content to train AIs a derivative work? Are models fair use? Even if one is fully within their rights to use an artists works to train a model, what about use of the artists name in the creation of new works? If someone trained a model on your voice and published songs of you singing, should you have any right to demand compensation or restrict how it’s used?
Asymmetric advantages
It is naturally impossible to know who is making models out of what data unless those entities chose to disclose that information. This means there is no way in knowing how much your data has been used to train models used to manipulate you. What recourse will you really have when violations take place in secret?
Large platforms have many inherent advantages over smaller projects, even when there isn’t any malice or bad behavior. Amazon has been caught using their platform to produce knock-offs of viable products from smaller sellers.
Is machine learning too powerful not to use? In any competitive environment, can you really afford not to use performance enhancing tools regardless of the ethical or legal considerations? How can people and organizations properly set conditions to not incentivize massive fraud and privacy violations?
Workforce displacement
Since we’re already seeing problems in education, the wider workforce likely isn’t far behind. Just because your entire profession can’t be automated, that doesn’t mean that machine learning can’t simply be used to assimilate away the most viable parts of your business model. Even despite a so-called labor shortage workers are having a difficult time negotiating higher wages and better working conditions.
Cybernetics, novel biologics, and performance enhancing drugs can all be things people may feel forced to take on to compete in a rapidly changing work environment. While we are to be concerned with the consolidation of state power through CBDCs, we must also be vigilant against corporations using the very same mechanisms to control people. We do not have the luxury of passively allowing “futurists” to decide what they’re willing to have us all sacrifice for their own greater wealth. As this continues, the road ahead looks quite grim indeed. The following video is about a fictional setting in the game Warframe,it’s a warning worth heeding.
Ethics concerns
When it comes to surveillance and aglorithmic manipulation much of the work takes place out of sight and out of mind. There are real questions of ownership, rights, and autonomy when it comes to how personal and private information is used. It’s important not to be defeatist on this point. Just because information can be used in particular ways doesn’t mean we can’t recognize it as a crime. A great many terrible things are nearly impossible to stop, the solution isn’t to accept them but to create a proper framework for addressing them.
The bright side
It’s not all bleak and terrible. There is good news in this story too. Just as these tools are powerful they can be used for good. At it’s core, machine learning is a useful way to extract actionable information from massive datasets. Creating large, accurate, ethically-sourced public datasets can create fascinating new possibilities.
New public goods
Mozilla Common Voice is a project where people collaborate on creating a massive, diverse, high quality voice dataset. This can be used to power useful models like Text to Speech for all and DeepSpeech. DeepSpeech can be used to create your own voice-controlled applications. This is a very cute demo of someone using stable diffusion to place an AR elephant in a tree. Despite it’s silliness, it does show how effective this technology can be in creating new and fascinating experiences.
Of course there are tons of medical advancements related to using machine learning to recognize patterns. These are fantastic for things like diagnosing issues from x-rays or other data.
To reiterate, machine learning clearly isn’t evil, but rather the impetus to spy on everyone to dominate them absolutely is. There are responsible ways to collect, store and curate data and these need to be at least enforced but ideally supported.
Taking action
Fundamentally we the people have to choose. Either between allowing our collective destiny to be dictated to us by those who have no regard for humanity, or to take on the great effort of wrestling control back into our own hands. As radical as this sounds, it doesn’t require a massive shift.
It never been more urgent to stop contributing to tyrannical systems of control. If you’re involved in making top-down control into a reality, I urge you to reconsider the downsides of building a future you likely don’t want to live in. It is high time we not only step away from top-down control, but assist others doing so.
On a higher level, it’s worth coming up with new standards and approaches to how information is collected, stored, and shared. Anything we can do to break away from the cloud goes a long way in mitigating the nefarious technocratic forces. Storing your own data instead really helps mitigate how it can be used against you.
Wow—excellent article. Reminds me of why, at the airport this past weekend, we declined to sign up for having our eyes and fingerprints recorded for easy identification and transportation in the future (supposedly).
Very informative and educational article. Thank you,