Company can be Nice

When in the pursuit of knowledge, agreement is evidence (sorta) and disagreement is information. In the spirit of this idea, my language modeling efforts have debate and solo analysis modes—let's talk about why.

Working hypothesis

The best description I've heard for LLMs is that they're "mansplaining as a service." In the spirit of such an activity, they are notoriously overconfident, even when they're wrong—maybe even especially when they're wrong.

I've played with different methods of trying to rein in some of their hallucinatory shenanigans, but one idea that I'm particularly smitten with is having a "skeptic" agent whose sole purpose is to make life for the other agents hard.

I don't know if I've succeeded here in increasing the final output's quality—and realistically I won't for a while—but it's been interesting to implement.

Structure of a debate

In April of 2025, it had started as:

Five Junior Linguists would propose and argue for a root word's possible definition, starting an ensemble approach.
Senior Linguist synthesizes the Junior Linguists' arguments into one argument for the inclusion of the new root word, completing an ensemble approach.
Skeptic points out flaws in the Senior Linguist's reasoning and why the proposed root should not be added to the dictionary.
Senior Linguist responds and provides additional support for the root word's inclusion in the new dictionary.
Skeptic has the final word and gives any closing criticisms about the new support.
Adjudicator weighs in and evaluates the debate, declaring either Accepted or Rejected.
Glossator defines the word if and only if the Adjudicator decided the proposed root should be accepted into the dictionary, giving a dictionary style definition.
Archivist is called to summarize the debate for a quick read when going through logs.

Total number of LLM calls: 11.

This structure has shifted bit by bit. Rather than talking about every iteration, let me share where it is at now, as of October 2025:

Five Junior Linguists propose and argue for a root word's possible definition and inclusion.
If two or more of the five Junior Linguists have very similar output, it slashes the remaining Junior Linguists from the process and skips directly to the next step.
Senior Linguist synthesizes the Junior Linguists' arguments as before.
Skeptic points out flaws in the Senior Linguist's reasoning, just as before.
Senior Linguist defends their proposal, responding to the Skeptic's objections, and indicates if there has been any change in its position.
Adjudicator weighs in on whether the debate should continue. If there is enough information to stop, the Adjudicator says STOP and the process moves directly to the Glossator. If there isn't enough information, the Adjudicator says CONTINUE and specifies criteria the Senior Linguist and Skeptic must address. Assuming it continues...
Skeptic addresses the Adjudicator's points, further argues against the word's inclusion, and indicates if there has been any change in its position.
Senior Linguist defends their proposal again, responding to the Skeptic's objections, attempts to accomplish the Adjudicator's tasks, and indicates if there has been any change in its position.
The Adjudicator is looped in again to weigh in on whether the debate should continue or not. If so, this and the above two steps loop up to four times, at which point the Adjudicator is passed over to move directly to the Glossator.
Glossator decides whether or not a word should be added based on the debate. Regardless of whether or not it is added, a JSON is generated for the word.
Archivist is called to summarize the debate for a quick read when going through logs.

Total number of LLM calls: 8–22.

Impatience

Sometime in June of 2025, I started to explore the idea of a solo analysis mode. The reason was simple: LLMs debating each word can take a while, especially when run locally on a gaming laptop without any bonus GPU's assisting.¹ Remote calls to deepseek-r1:0528 could take anywhere from 1–2 minutes to complete. As seen above, this means anywhere from 8 to 44 minutes per cluster—meaning a single root word could take hours!² 😐

Given that there are thousands of possible root words to go through, this translates as weeks of running the enochian-analysis script, 24/7, to complete all of the ngrams.³ To date, I have not been satisfied with the architecture to the extent that I let the debate engine run through all of the possible root words.

Needless to say, I needed a quicker way as an alternative.

The solo analysis

Like it sounds, this is where I feed all of the relevant information possible into a single prompt to an LLM and record the output.

For an idea of what that prompting looks like, I've decided to provide a gist.

Unlike the debate process, the solo analysis process has run through all the possible root words—twice.

Which is better?

I don't know yet. I have aligned the solo analysis and the debate process to generate similar output: a standardized JSON. This means I can compare the two, cluster by cluster, once the process is finished for both.

This provides an excellent opportunity to see if the skeptic agent idea I cooked up has any real worth.

Anyway, that's all for now—thanks for reading!

ps: if you want to see what debates look like, go here!

Footnotes

Although, at this point, I had shifted to using OpenRouter for the vast majority of my calls and the local implementation I had started with was not entirely. ↩
I talk about what I mean by clusters elsewhere; to summarize: a cluster is a group of meanings shared across multiple words that contain the proposed root. ↩
Obviously I'm still going to do it anyway—for science! For reference, there are presently roughly 18,467 ngrams—only about half of which occur across more than one word. ↩