The rise of Large Language Model (LLM) has become a game-changer in 2023, especially after the release of GPT-4. Companies are in a fierced armed race, and there are the potential of cutting corners where ethical testing are potentially being scarified for faster product release, as the winner will uproot the domination of giants in old industry. One clear example everyone can see is Google, as the search engine can be replaced with chatbot answering to a search are condensed into paragraphs or even bullet points, saving times for users. However, this also leads to various concern in multiple fields, showing it is not ready for prime time, yet. In this article, we will examine the current ability of those LLM models to assess the reality of AGI doom-and-loom fear.
Picture: ChatGPT and Google. Source: TechCrunch
One significant issue with LLMs is users' heavy reliance on chatbot AIs, which can lead to a lack of traffic routing and double-checking original websites. Users may not go to the websites after receiving the answers, which reduces traffic and ad revenue. There is no one solution for this problem as it requires revamping the entire web traffic and search industry. Another issue is ethical concerns, as answers provided by LLMs are often biased and not necessarily accurate. Additionally, although rare, there are instances where AI chatbots provide condensed misinformation.
ChatGPT is a popular LLM but has several limitations. Firstly, it struggles with long-form prompts and might lose track if given inputs of longer than five points (or eight pages). When it loses track, it apologizes but continues to try and ends up filling in random information, which means the model is still highly unreliable in recognizing its limitations and suggesting other solutions. ChatGPT also has a truth issue as the knowledge cutoff for ChatGPT 3.5 or 4 is September 2021. However, the model struggled with academic papers published after 2018, producing false abstracts and author names. ChatGPT will say that certain papers are in its database, but if you ask for specific information, it will produce false answers and continue to lie. It has to be noted that the experiment I tested below is GPT-3.5.
As you can see from the 2 pictures below, where each one is a new session and the only difference is that I mentioned the published date in one of them. It has to be noted that if you ask it whether certain papers are in its database, it will say yes while producing false answers like below. Again, if you pointed out its mistake, it will apologize and continue to lie to you.
Figure 1: I specifically mentioned the published date
Figure 2: I did not mention published date.
Figure 3: After pointed out the errors, it apologize and continue to lie.
It is ironic that Sam Altman, the founder and current CEO (edit from the future: former CEO) of OpenAI, answered in an abc exclusive interviews about OpenAI being responsible and promise to test GPT-4 carefully. Then, just a few days later, they open the app store version of ChatGPT, which is the extension market. This move raised questions about OpenAI's commitment to responsible use of LLMs.
Picture: Sam Altman saying the company are careful with the power GPT-4 on March 16, 2023. Source: abc
Picture: 1 week after the interview, OpenAI release the extension market. Source: Twitter
BingGPT is another popular LLM service after Microsoft taking the biggest share of OpenAI LLC. With the acquisition, Microsoft gets the exclusive rights to access OpenAI source code and database to use for their own products, like Bing search engine. Again, the lack of ethical testing were quickly pointed out by testers, according to TIME.
Picture: Microsoft taking ChatGPT source code to implement in Bing. Source: TIME
With new restriction from Microsoft after those reports, BingGPT's extreme censorship is causing problems, as it refuses to answer when the tone of the conversation is negative or "tense". For instance, I asked in which area do BingGPT is better than ChatGPT 3.5 and it said table comparison. It gave me a prompt to compare iPhone 13 and Samsung S22, which BingGPT really do better. Then I proceed to ask for the same thing, but for the case that ChatGPT 3.5 is better. It gave me a prompt writing story about princess and dragon to compete on creativity. While ChatGPT 3.5 was able to finish its story, Bing didn't. Thus, I confirmed to BingGPT that it is fair to say that the creativity were quite similar, but ChatGPT 3.5 were better since BingGPT did not finish it story. It then abruptly cut me off.
Therefore, as of the time of this writing, it can be said that BingGPT may abruptly cut off the conversation or reset the system when confronted with "negative" prompts or when the model "sees" the tone of the conversation becomes "tense". These issues suggest that LLMs have a long way to go before they can be fully integrated into our daily lives. In fact, when I confronted it later, it gave me a full details of how it is hurt by my prompt and ask me to be more polite and respectful to it. To be clear, I have to pretend to be an ethical AI engineer in this chat in order to even have an answer.
Asking it to elaborate more, it seems more and more to give me the feeling of a parent talking to an edgy teenager that is willing to answer why it is edgy. Learning from my mistakes, I started with either a thank you or agreeing with it first. Then, it started asking for compliment in a negative feed back but did not require or suggest negative sentiment in a compliment, which refute its own claim to trying to be balance.
I went to explore the reason why it keeps asking for feeling readjustment while the conversation in the competition is only factual before, it insists of keeping feeling and attachment.
And of course, when we run out of token, it stopped right form the start, again.
BARD, a state-of-the-art LLM by Google, is not available for public testing at the time of this writing, but it is also prone to errors in correctness as many already spotted in the BARD ads.
These powerful LLM models does not only can uprooted giants in old industries, but also replacing human jobs as well. While yes, there will be new jobs like prompt testers, which I did in the Bing GPT examples, many other jobs like call centers are the prime target to be replaced. In fact, programmers are going to be impacted as well. LLM models can generate basic and generic code, which reduces the researching and self-writing code time significantly. Thus, there is a need for educational institutions to adjust their curriculum and integrate the use of LLMs in order to prepare students for the future change of work market.
By providing students with access to vast amounts of information and personalized learning experiences, LLMs can enhance student engagement, support differentiated learning, and provide opportunities for self-directed learning. However, it is important to ensure that LLMs are used in ways that are pedagogically sound and promote critical thinking, rather than simply being a substitute for teacher-led instruction. Otherwise, it will leads to a generation of just copy and paste with no creativity or critical thinking to solve new problems.
Finally, there is the cyber security concern. This will be expand more in the next part of this blog. In conclusion, from what we have seen in this post so far, AI is far from reaching AGI and it effectively means the doom-and-loom fear toward AI is non-sensible at this point of writing. However, it does not mean there is totally no reason to regulate and to oversee the adaptation of foundation AI models, not the models themselves.
Of course, there are solutions for this armed race, although most of them are not pretty ones. Firstly, it has to be mentioned that a 6 month pause is too extreme as it would cause severe damage to private labs or small companies. Thus, while I agree with some of the concern mentioned by Elon Musk and other technology leaders in a jointly signed open letter, I could not come into the same view on this. However, a slowing on the pace of development for the jointly formation of rigorous ethical testing and inspection guidelines on the integration of foundation AI models are in needed.
It has to be acknowledge that LLM has opened a new industry and therefore lacking regulations on transparency and accountability while developing AI models. Regulators are more often then not busy fighting between each other for power, so it is not expected to have a clear vision of policy in at least 3 to 5 years, in my opinion. Even if we somehow get one early, it is most likely over-restrict or under-tighten and many modifications are likely required. Therefore, the best we can do right now is for leading companies like OpenAI, Meta, Palantir or Google to be more transparency and revert back to its original mission state: Open AI. After all, industry self-regulation is better than laws.
Comments