Imagine the gigantic publicity that AI is receiving today, it is considered one of the biggest revolutions that happened to us. Yes, sure it is on many accounts, but we need to note that AI is not that clever on many accounts too.
I was trying to get a popular image generation AI platform to draw me a blackboard containing specific text which I supplied. It adamantly refused to do so. I tried half a dozen different prompts to help it understand what I wanted. However much I tried to improve clarity, reduce complexity, it would give me pictures of a blackboard with random forms/letters, pictures of the human brain and diagrams of clouds with AI written in some corner. But it would not the give me the text I wanted. Finally, I asked it to write ABCD on a blackboard. Even then the output that I got is wrong and is displayed as the title image of this post. This shows that image and colour recognition and ability to generate creative images, colours and textures is the limit of the capability of the image generator, a form of vertical AI.
Image generators clearly cannot understand simple commands which prompt it to even display text. While LLMs and the latest SLMs like Microsoft’s Phi-3 can manipulate reams of text but cannot handle images. Hence AI is still immature. We have surely discovered ways for machines to learn by themselves without explicit coding in specific domains. Yet even there, we have not fully understood the entirety of methods or the mechanics behind the learning happening in neural networks and analytical models. We are not able to explain many parts of how the learning occurs.
In my book “Applied Human-Centric AI“, I have referred to capability to replace a ‘human function’ such as an artist as Human Function Replacement (HFR) page 153. HFR can be thought of as specific skills set for performing a group of interconnected role tasks like that of an artist. Scaffolding techniques (Page 341) refers to linking tasks to achieve the capability to perform a series of tasks. I have also discussed the unexplained behaviour of DNNs and tools to investigate and fix the issues (Page 337, 643).
We are also not able to seamlessly connect the Image generators and LLMs yet although this is an area of research, viz. GILL (Generating images with LLMs). This would be interconnection between HFRs and takes us to the next level of AI capabilities. Which brings us to the important debatable question of whether the AI industry should focus on connecting the two generator models to work together since this is a more easily attainable, differentiating and revenue generating goal as against thoroughly understanding the workings of each vertical AI product as a first step before venturing to interconnect such vertical AI capabilities.
In all such cases of innovation and progress, the thorough and exhaustive understanding of capabilities of a new technology is essential to make sustainable progress without human losses and suffering. Unfortunately, this has not been the practice particularly since the early 2000s and the emergence of the free internet and social media. The landscape in which we live has seen inexorable change since then in life-defining ways.
Let us just say that we need to be conservative and exercise caution not to upset our peaceful ways of living with a giant leap into the unknown. With that in mind the recommendation would be to take the conservative path forward to benefit from greater understanding first, leading to sustainable progress and growth later.
Follow link