Exploring the nuts, bolts and biases of AI

By Andrew “A.J.” Tibbetts, BridgeTower Media Newswires//July 1, 2025//

Cyber law or internet law concept with 3d rendering law scale with artificial intelligence program

Depositphotos.com image

By Andrew “A.J.” Tibbetts, BridgeTower Media Newswires//July 1, 2025//

Listen to this article

In Brief

AI tools generate outputs based on probabilities, not facts
“Hallucinated” legal citations reflect statistical patterns, not truth
Bias arises from gaps or skew in training data, not malicious design
Lawyers must understand tool design, training, and input sensitivity

Much of the discussion around artificial intelligence in legal publications focuses on AI risks and mitigation strategies. Lawyers should be well-versed on the subject given the focus of their discussions with clients and the fact that it’s an important consideration when they’re choosing tools for their own practices.

An understanding of risk, however, is aided by an understanding of technology. Textbooks have been filled on the topic, but a brief technical survey may help with appreciation of sources of AI risks.

While AI has caught the public imagination lately, the technology is not new. Some essential AI mathematics date from the 1950s, and many of today’s common techniques were developed in the ’80s and ’90s. Improvements in storage capacities, processing speeds, network communication speeds, and data availability over the past 15 years made these techniques commercially viable and prompted the recent innovation wave that continues to accelerate.

Over these decades, diverse AI mathematical techniques have been proposed, each adapted for different tasks. Each technique is fundamentally an application of statistics and mathematics, linked to determination of probabilities. Specific mathematics (the AI tool’s “model”) are applied to the provided inputs, and the output may be the one the mathematics assigns the highest probability of being accurate from among potential outputs.

Of course, the “highest probability” output is not necessarily correct. Most text generation AI tools, for example, are not querying databases or the web in an effort to find a right answer. They instead generate, in response to input, sentences that are highly likely to at least look correct by outputting a sequence of words where each word is the most likely to follow the previous word, in a sentence for the topic. What word is most likely to follow the preceding one is guided by the specific samples with which the tool was trained; with different samples, different sets of word outputs might be generated.

When a lawyer asks such a text generation tool to draft a brief, it thus might be unsurprising that the tool’s draft may include citations to cases that look like they could exist but do not. The term “hallucination” for such an occurrence masks that the tool is simply outputting a sequence of high likelihood words for legal text. Legal text includes citations, and citations follow patterns, and the tool outputs words that match citation patterns.

Similar mathematics and processes underlie image generation errors such as extra fingers. Fingers are depicted where there is a high likelihood of fingers appearing, even if there are too many of them overall. When an AI tool is asked to generate a summary of a document, transcript, etc., the output is a set of words that has a high probability but isn’t necessarily accurate or comprehensive.

This understanding of the technology may also help explain how AI bias arises. When an AI tool is learning probabilities during its training, often it is learning relationships between example pairs of inputs and outputs. Later, if a given input is a good match to one of the sample inputs, the tool might generate an output like one of the sample outputs and perform reliably.

If the sample data did not cover a particular situation, though, the AI tool may not perform reliably for an input for that situation. Putting this into the human population context, it is possible that an AI tool’s sample data may not reflect underserved populations well, and the tool may be unreliable or inaccurate for inputs relating to such populations.

But AI bias may also arise due to real-world bias. In a well-known example from a decade ago, a tech company sought to train an AI tool to assist with first-level resume review by giving it a training set of resumes previously received, with the applicants successfully hired from those resumes. Without being prompted to do so, the AI tool learned from that prior data that men had been more likely to be hired and so ranked male applicants higher than female when given new resumes.

There is always a risk of “baking in” real-world bias to an AI model when training with real-world data. Builders of AI tools should be sensitive to bias in data. Purchasers of tools should be sensitive to how a tool was trained and how bias was managed.

Because AI is driven by applying mathematics and probabilities to an input to produce an output, the risks associated with using a specific AI tool for a specific task will be informed by the precise nature of the AI’s mathematics, whether the input matches the type and format of inputs the tool is configured to receive, and whether those mathematics and training are even sufficient for the particular input that is provided. Taking these points one at a time, there are many different AI techniques, and different ones work better or worse for different tasks. When choosing a tool for a task, it is therefore important to choose a tool with mathematics that work well for that task. AI techniques that are good for generating images, for example, may not be suitable for analyzing images to reliably detect medical conditions.

It is also important to note that an AI tool’s mathematics may expect a certain style of input, phrased or formatted in a certain way. If the input deviates from expectation, the analysis of the input may be undermined and the output may be unreliable or inaccurate.

Knowing the style of inputs that are required to generate a good answer is therefore critical. And if different inputs can provide better outputs, that is helpful to know, too.

For text-based tools, ambiguity in your input undermines the mathematics. A best practice is simple language and precision, like giving instructions to a child.

And due to the bias or training insufficiency mentioned above, even if the mathematics are appropriate for the task and the input matches the expected form, the output may be inaccurate or of poor quality if the training didn’t effectively cover the specific type of input provided. It can thus be helpful to know before using a tool how the model was trained and for what tasks.

Perhaps the most important tip to be aware of when using AI is to ensure a human double-checks its work. The fundamental nature of AI discussed above underlies many regulatory requirements for a human positioned to review and overrule the AI, particularly in high-risk circumstances.

AI is indisputably a useful tool that will increasingly be used in law and other fields, but it should be understood as a tool rather than a replacement.

Andrew “A.J.” Tibbetts is an intellectual property and technology shareholder at Greenberg Traurig in Boston. A former software engineer, he counsels on matters related to software-implemented tech across a range of industries, from networking, financial technology and natural language processing to life sciences, AI, medical records and medical devices.