Marco's Blog

All content personal opinions or work.
en eo

Introducing the Dayada Encoding

2023-12-01 5 min read Programming marco

What Is It?

The Dayada encoding is an encoding from binary of any kind to text written in something that from 30.000 feet looks like human language. Basically, you take a binary file, encode it using Dayada encoding, and you have text you can send. This is meant for the times when you need to transfer data over a channel designed to transfer text, like SMS or a messaging app or a comment on a blog post.

Where Do I Find It?

Currently, the main repository is on GitHub. This article will be updated with the main locations at all times.

For now, the only interface available is a slow Python library (the slow part comes from ineptitude by the developer, not slowness of the language). Interfaces in JavaScript (as a Node module) and Rust (with interface to Python for higher speed) are in planning stages.

The Python library will eventually also be provided as a PyPi installable. As soon as that happens, this article will be updated.

How Was It Born?

The terrible idea behind the Dayada encoding was to encrypt passwords and store them in plain view on public web pages. Think a quote in a blog post that turns out to be an encoded version of an encypted file. This way, you could have stored secret information in plain sight, protected by a password. Until the password was found, the data was safe.

In the original version, you would write a blog post with the secret embedded. Then you would highlight the secret and press a button. You would be asked for a password, the file would be encrypted and encoded using Dayada encoding, and then you’d post the article with the embedded Dayada quote.

You might then later click on the encoded quote. A popup would ask you for your password. If the correct password was entered, the original text would be replaced by the unencrypted version.

Why Did It Not Work?

It worked really well. But then it became increasingly clear that storing any kind of data in plain sight was highly risky and ill-advised. If you have an urgent need to transmit secret data that is going to be obsolete, soon, then this method might work for you. But in general you can rely on the fact that any data protected only by password is going to be cracked.

Why Dayada?

The technique used still has advantages when it comes to sending data over channels that don’t like binary data, not even when encoded. This is the case in email, text messaging, in fact any kind of messaging application where filters rule.

In my particular case, I was sending web site reports to myself daily, but the mail server complained about the embedded malware links and threatened to shut down my account over it. I realized I had this technology still sitting around and checked whether it would work. It did: the encoded data passed straight through the content filters.

So, if you have some small amount of binary data you need to transfer over a filtered messaging system, then you can use Dayada with a certain confidence that it won’t be inspected any further.

What Does It Look Like?

The image that intros the article is an example of Dayada text. It is in fact part of the Dayada encoded version of the favicon.ico file of this web site.

In general, Dayada encoded text looks like regular human language text in a language that doesn’t exist.

How Does It Work?

In technical terms, Dayada is a variable length stream encoding. That means that the content bits are parsed from left to right and a variable number of them is put into each chunk of output information.

In Dayada, information is represented by words. Each word if composed of an initial syllable, 0 or more regular syllables, and a final syllable. Strict rules determine which bits represent which initial, regular, final syllables and word breaks. This way, each input corresponds exactly to one output with trivial variants, whereas each output corresponds to exactly one binary input.

Encryption

Strictly speaking, Dayada is an encoding of binary data to text and vice versa. As such, it has nothing to do with encryption. But since the use case openly invites sending encrypted data, the library has encryption built-in.

You can take some input data (like a secret) and encrypt with a password using Dayada, and you will receive some text as output. This text looks just like any regular human language and can be sent over any messaging channel.

On the other end, the text can be reassembled with Dayada. Provided one knows it was encrypted using Dayada and one also knows the password, it is trivial to regenerate the original.

Languages

Users are not limited in the “language” used in Dayada and can add their own definitions. By default, Dayada provides four sample “languages:”

  • en, inspired by English phonology, which contains all 26 letters of the English Latin alphabet
  • jp, inspired by the Latin transcription of Japanese, highly legible because of the strict syllable rules of Japanese
  • it, inspired by Italian, missing a few letters but presenting several diphtongs and double consonants for enlargement of the syllable space
  • hi, inspired by Hawaiian, with only a few consonants (7) but twice the vowels as English because all vowels come in short and long version

Of these, only the “hi” language contains characters outside ASCII.

You can just copy one of the language definitions in the library and modify it to create a different language that suits your needs better.

Beyond Encoding and Encryption

The Dayada language generator is also great at producing filler text, instead of Lorem Ipsum!