Beyond the hype, artificial intelligence can be fascinating, but ethically responsible AI uses call for understanding how technology works.
It is the age of artificial intelligence (AI). AI is slowly changing the way we live and work, and AI’s popularity is driving adoption in enterprise and consumer apps. AI presents a huge opportunity for developers to infuse apps with solutions powered by generative AI and large language models (LLMs), and also boost personal developer productivity.
However, AI conversations these days often feel like buzzword bingo with a high barrier to entry. Things may seem less complex with a better understanding of the underlying technologies. So, let’s burst some myths and demystify modern AI.
Today’s generative AI is a part of the broader artificial intelligence (AI) field that aims to create intelligent computers that can try to mimic human intelligence. Beginnings of AI can be traced back to the 1950s with evolution through various types of deep machine learning (ML)—enabling computers to learn from existing data to make better decisions or predictions.
Generative AI today employs layers of neural networks to process huge amounts of data and attempts to understand/converse in human language. Given existing data and intelligent prompts, generative AI can generate satisfactory text, code, images, videos, audio and various other types of outputs.
An AI model is essentially a computational system that has been trained on a set of data to recognize patterns or make decisions without human intervention. Traditional AI models often operate in a silo with high cost of development/maintenance—models are often trained to work on specific tasks within a specific data context. In the past, data used to train AI models used to be labeled by humans—nowadays, AI models can learn on their own and figure out connections/patterns in data.
Modern generative AI operates on foundational general purpose large language models (LLMs)—this is a tech-tonic shift in how AI models function and the amount of computing they operate on. Today’s large language models can be trained on enormous amounts of various types of data, which then gets transformed to adapt to various types of automation outputs for given human inputs in natural language.
Compared to traditional machine learning (ML), which learns from human-tagged data within a given context, today’s LLMs are fed enormous quantities of data without human intervention. The goal is unsupervised learning to figure out patterns in training data.
There are several popular public AI large language models—GPT family from OpenAI, BERT from Google, Llama from Meta, Orca from Microsoft and more. While the popular LLMs are all general purpose, there are differences in training data used, and, as such, there are unique areas where each of the LLMs shines.
Given the data that a large language model is trained on and the prompts being asked, a generative AI response represents a probability of possible solutions—it is not meant to be an exact science. While there is a high likelihood of responses being the same or similar, given a trained AI model and the same prompt, output responses can be different—there in lies the non-predictability of generative AI.
A common parameter when interacting with LLMs is temperature—this indicates variability. Higher temperatures mean more risk and increase likelihood of random generated responses that might be good for creativity, while lower temperatures provide more predictability.
Contrary to popular opinion, AI models do not continuously learn—the training data used to train the models is timestamped. Essentially, the knowledge base of a generative AI model is fixed at certain point of time—when the training data was fed to the model. AI models do not learn about the latest events or news unless explicitly trained on them with further model training or augmenting the knowledge with trusted data sources.
For scalability reasons, generative AI endpoints are stateless by design—it would be difficult to carry the baggage of context with every interaction. That said, a lot of AI applications like chat or code completions do require the context of an entire conversation to be effective. Most generative AI development platforms do include agents/tooling to preserve conversational context while talking to AI endpoints.
Large language models understand human communication through natural language processing (NLP) and a common way to understand prompts is tokens. Given a prompt, LLMs will break things up into tokens to understand what is being asked and then predict the next token that likely makes sense.
This exchange of tokens is not limitless—every large language model has a limit on token size and this dictates pricing. Context window is the total number of tokens used in the context of a conversation—once limits are reached, the model will start forgetting things from earlier in the conversation to make space for newer tokens.
For all their smartness, generative AI models can’t actually do things. They cannot look up news, do precise math or classify with accuracy. Large language models simply have the vast amounts of knowledge from which they have done unsupervised learning. Given a prompt, LLMs can predict an answer that would be the closest possible match with respect to the knowledge they already have. Modern LLMs are exceptionally good, but it helps to understand that it is not an exact science—just the highest probability of what the answer should be.
Given the vast amount of code in public repositories, AI models have gotten very good at learning patterns and predicting code to accomplish given tasks. Coding assistants like GitHub Copilot also learn from existing codebases and predict code within a given context—this can be exceptionally accurate for popular programming languages.
However, just like everything else, code generated by AI models is just a prediction based on timestamped knowledge—not guaranteed to be compilable or correct code. Human intervention is required and software developers are always the “pilots” in charge. AI is only here to help with productivity and automate repeatable tasks.
While large language models bring a wealth of smartness, developers don’t have to integrate LLMs directly inside their apps to benefit from AI infusion—encompassing middleware in the form of Assistant APIs might make things easier for developers. Popular frameworks that wrap AI model APIs/configurations for easier consumption include LangChain, Semantic Kernel, AutoGen and more. While there may be differences in programming models, the goal is to assist developers by abstracting complicated setups for easier integrations.
There are lots of industry verticals that might have reservations against leveraging public generative AI cloud services—this is where the conversations need to happen about secure, ethical and responsible AI. However, some of the benefits of modern AI models can be leveraged behind enterprise firewalls. Local embeddings and vector databases can be beneficial, without being as computationally expensive. Embeddings can be useful for semantic searches where natural language strings are converted into numerical vectors—closer vectors indicate similarity in terms. Documents can be vectorized and fed into AI models—this opens up the potential for deeply contextual learning within AI models to support enterprise workflows.
Most AI conversations gravitate toward popular AI services and generative AI with large language models. There are certain verticals, however, that are very sensitive toward using public AI, even at the expense of lesser natural language processing (NLP)—think governments, military, healthcare and such.
Small language models (SLMs) may be a better fit for certain scenarios. They have much smaller footprint with parameters/tokens and can be hosted behind firewalls, but training data can be more contextual. Microsoft’s Phi-3 has gotten some attention, as have Google Gemini Nano and Llama-2-13b.
LLMs are trained on vast volumes of data and use billions of parameters to generate outputs based on training data. However, some common challenges include knowledge from non-authoritative sources, falsifying information and presenting out-of-date or generic information.
Retrieval-augmented generation (RAG) is the process of optimizing the output of an AI model so it references an authoritative pre-determined knowledge source outside of its training data—a way to ground responses in reality. With RAG, enterprises gain greater control over AI model–generated responses, and users build trust with LLMs producing accurate information with source attribution.
Sam Basu is a technologist, author, speaker, Microsoft MVP, gadget-lover and Progress Developer Advocate for Telerik products. With a long developer background, he now spends much of his time advocating modern web/mobile/cloud development platforms on Microsoft/Telerik technology stacks. His spare times call for travel, fast cars, cricket and culinary adventures with the family. You can find him on the internet.