Englang is the LLM of my company and I start a new series of articles to describe them.

In order to help you learn language models I built a simple one with few hundred lines and no external toolkit.

redplant

Copyright© Miklos Szegedi, 2023.

Chatbots are not new. They were widespread on chat applications like ICQ twenty years ago. I remember friends played with them eight years ago. Alexa was a surprisingly good one with voice, still cumbersome to use. Chatgpt from OpenAI is not new. What makes the difference? The results of its most advanced models are remarkable.

AI is also not a new revolution. Most algorithms were available for a long time, still the hardware was not there to apply them conveniently.

Quite a few companies approached me with the suggestion to implement a chatbot.

Let's say you want to implement an AI model based on the suggestion of a business partner. What are the things that you need to consider?

First of all most AI models rely somehow on Python and Tensorflow toolboxes. This is nice, but if you start generating revenue with AI, you need to make sure that you have options. This will lower your prices, and you will avoid vendor lock-in. Contact me, if you need alternative vendors.

ChatGPT is great, still it is a toy. I remember I asked a derivative chatbot about the version, and when it responded something else than gpt, it crashed.

Llama from Facebook is a better option. It is downloadable. Unfortunately, the download is tied to your account, allowing differences in theory.

Real AI models must be certified, just like CPAs certify your books.

Make sure that they are downloadable.

Make sure you can completely run them on your own box or cloud node.

Certify, that the results are the same as the cloud version.

You can then run either the cloud version or the downloaded one.

Ask your provider, if the model is labeled. Labeled models help to figure out what happened, if something went wrong. A model that requires two hundred researchers to fix a rare mistake will generate expenses.

Our model is a simple one just for demonstration. Everything is in Englang. Even an accountant can read and certify the model and the training set. There is no need of experts.

It is unsupervised artificial intelligence. What that means is that it grabs the training data, and it generates as many consistent results as possible. You can then index the result and use it as auto complete, chat, or spell checker.

It will help you understand artificial intelligence and how it works.

The basic training set are the following sentences.

The horse runs. The horse is an animal.

The dog runs. The dog is an animal.

The turkey runs.

This is nice, and we expect that we could generate more logic into it.

This is what generative artificial intelligence is about.

We run the Englang model. Englang is the name of our feature set. It stands for engineering language and the rules are the same as English.

The horse runs. The horse is an animal.

The dog runs. The dog is an animal.

The turkey runs. The turkey is an animal.

The {1} runs. The {1} is an animal.

You will notice that a new model was generated with a logic that applies to turkey as well.

This will be useful to generate code complete results. It can also generate answers to questions.

The bonus is that it is just a few hundred lines of code. You can see what it does.

The first step is tokenization. It splits the training set into sizeable chunks.

The next step is matching rules and sentences in random. We use a Monte Carlo algorithm. These species of algorithms avoid ordering and complexity, so they are super for artificial intelligence.

Fewer constraints also let them run on more machines, so that you can scale.

This allows us to set a five-second training limit, which should be useful for simple documents.

You can download it here: hop

Continued ....

ad1