The back stories of Gen AI

Jun 25 2024
The back stories of Gen AI

I became fascinated by generative art a long time ago. I mean years and years. It may have begun when I saw Simon Schofield 's exhibition at I think somewhere in UCL that I was taken to by my then academic leader Jenny Collier, around 2008. Simon was an academic at London Met in the Multimedia school, who shortly after left to work at Nottingham Trent. I'd already seen community built 'generative' art experiences based in Macromedia Director, a bit like an endless journey into virtual spaces, each space built by someone different, very trippy, but Simon's work introduced textural, patterned code art. His was very process intensive, at very high resolution. The images in the exhibition were large scale and very impressive. Here's Simon's website.

JavaScript libraries have been able to visualise data for quite some time (eg D3, or Raphael) and can now do really spookily clever things. 'Creative coding' with JavaScript libraries that visualise code or data, or XML code trees of SVG, or image coded as Base64 all fascinated me, I loved how images were code. Seeing and knowing how code or data can generate image content is very creative, I remember showing students back in 2008/2009 the SVG code for their vector (non rasterized) images using Inkscape. They couldn't believe their eyes. Using data src for image html element content to call Base64 is powerful and versatile - though not especially optimised, but can be very useful in some situations. Seeing how clever creative coders can program and render incredibly beautiful patterns and concepts is truly very lovely. Though I'm loathed to share X/Twitter links these days, hashtags like #creativecoding offer excellent windows on what is happening in these coding communities.

Machine Learning has been with us for decades, everything we get in sophisticated recommender systems is ML, pattern recognition of keywords or pulling metadata matching, according to various factors like best match, most popular, related to, etc, came from/go to, and outlier factors (the secret sauces of Google SEO). Textual recognition isn't new, it is the ML we use every day in Google, Bing, Brave, DuckDuckGo or Ecosia. But what is now being generated by the new tools creates new pieces of content, and content is information and information is knowledge. For recommender systems to recognise arbitrary matching and ad-hoc amalgamised sources that have been mashed up (to use the old term) into new content, differentiated from authentic human generated knowledge content is currently not happening, and I have no idea if it can happen without a lot of further training. The mashup gen AI content (aka slime or slop) becomes on a par with the 'real' knowledge in terms of search results.

The problem with prompt 'teaching'

Does anyone who designs creative content educational assessments using gen AI image apps realise that prompt farms/communities have been in existence for at least two years? These are people sharing prompt types, prompts developed for specific apps, or very specific design or composition, or 'in the style of' numerous known artists, or game styles, vaporwave, futuristic, cyberpunk, 3D, photorealistic, hyperrealistic etc. These have been around for quite some time, very comprehensive guides and templates, much is free, some is paywalled. Does anyone who designs educational course assessments that use textual gen AI output realise that text prompt structures and examples have been available for at least 18 months and continue to be published by many experts, often accessible for free? This applies to summarising, essay structures and plans, whole essays, project planning, forming good gen AI questions, story design, narrative voices, better grammar, job applications, CVs, etc etc etc.

All of this is available online and is very hard to prove as plagiarism because of itself it isn't plagiarism, it's how you use the apps*, essentially, gen AI works on the principle of copy and imitate. But using the gen AI apps isn't 'thinking', it's 'operator' skills. It is or at least might be interpreted as digital literacy skills, it's in/out techniques. It's like using boolean search or asking effective search questions to match your problem. But these skills need to be deeply embedded in prior knowledge and thinking. This is how it has worked for coders for years, decades. Millions of code snippets and forum threads have always been there, but without understanding your problem, they are useless. You don't know how to use them, or which one could be helpful, or how to customise and edit to your needs. If you're simply teaching how to pump out arbitrary marketing images full of very generic people doing particular tasks, what are you actually teaching?

Remember, what you think you're teaching may not be what is actually happening in the brains of some of your students. I had a masters class of web design students and attempted to continuously highlight the hugely important skill of how to search to problem solve. I exhorted students to Google their problems and challenges with code etc, to see how to improve their searches, what produced the most useful results, how to record their progress as critical problem solving skills. A student actually lodged a complaint about me that I was just 'asking them to Google' everything. Remember that.

Yes, we should teach digital AI tools literacies, but not the very simplistic ways that seem to be the current approach. We need to teach full criticality, embedding knowledge snippets, judgement of writing quality, matching code solutions to problem solving, as well as spotting fake or untruth or over-simplification or poor sources, or rampant fakery, artist plagiarism... Which perhaps is all being taught, but frankly going on the recruitment ads, it's very hard to tell, and probably isn't.

cats with guitars using huggingface in style of yuko

What about consent and liability?

Two things are pertinent in this - obligation to use ethically questionable tools, and liability of using them. We should not be obligating students or staff to use these tools unless they give explicit detailed informed consent. I don't think this has even been considered yet, as an aspect of both legal and ethical practice. By not asking, the university and the individual academic, and the student or students (jointly and severally) are potentially liable to being complicit in intentional or unintentional copyright infringement, spreading of deep fake or otherwise untrue information or media, of spreading malware or other security challenges (open doors), or other possible information malpractice. No one is considering any of these risks when embracing gen AI at scale in their institution. I don't think anyone has talked about this at all, as far as I know (I stand to be corrected). What if a student unwittingly puts together various sections of generated content, uploads it to the LMS and perhaps to other content repositories that may be exposed to outside access (password protected or not), and part of that content (code or text) is a security risk or malware? Or even just a copyright infringement, or a case of libel? 'What if' scenarios tend to be underplayed by those who wish to buy into new fancy toys or the new project that will make them look great and ahead of the competition (this is a very common problem in tech project management), but the unintended consequences of ill thought out actions can result in very bad situations happening. What are the university contingency plans for any of this? Did anyone even consider it?

And of course, what about the growing number of young people who really do not want to be involved with AI? Of course, there are many young people who believe in the power of AI tools and want to get really stuck in, but there is a growing number who do not. And what about the staff who really do not want to become involved in widespread embrace of gen AI tools, content or task replacement? "It's my job I have to do it" seems to be the default position of staff obligation and acquiescing to what they in fact are not very convinced about or even distrust.

Other issues

There are many other extremely important and pertinent questions swirling around the closed world of proprietary gen AI, as well as the implementation and use of open source gen AI. I will try to follow up this post with further thoughts on some of these issues.

  • What about who owns the tools?
  • What is the cost to my university of using these tools?
  • What about the unimaginable scale of the cost to the planet?
  • What about my rights to my own content?
  • What is the impact of training data web scraping on Creative Commons licensed content?
  • What is the growing impact to the open knowledge web?
  • Will I lose my job? (staff)
  • Will I need to rethink my career plans? (student)
  • To be continued ...

astronaut using replicate in style of yuko

In conclusion - is gen AI simply a plagiarism tool?

I recently did 3 tests for AI image generation, 2 with Hugging Face scripts, and 1 with a Replicate script. I wanted to test (again) how good gen AI is at imitating a known artist.

I used the following very simple bare bones prompts:

  • A boy and girl astronaut in the style of Yuko Shimizu
  • Cats playing guitars in the style of Yuko Shimizu
  • Astronaut in the style of Yuko Shimizu

The images here are the results of those tests. Make up your own mind if you think gen AI art apps are just plagiarism tools, and how those tools know who Yuko Shimizu is.

- - - - -

* this is the point that connectivism persistently makes about future networked systems learning, that there is no need to know, there is only the ability to know how to find out. In the past we considered this to be the skills of navigating information repositories or use search engines effectively, but this has mutated into a more generalised skill of AI sourcing.

Image: generated image of a boy and a girl astronaut in the style of Yuko Shimizu using a Hugging Face script

Links

Full size header image (click)
The back stories of Gen AI The back stories of Gen AI

Suggested Posts


Previous Post