There is No Math Genre on Here?

A Story by Nusquam Esse

Oh sure, in my infinity unhinged mind, I decided to write about poetry using math... And that apparently isn't a genre on here? XD

Grinding the chalk into the chalkboard with a finality that sends a tingle of gooseflesh through me, I exclaim, “See, Pure… F****n Language! It's Poetry more pure than anything a human could write!”

a hand aggressively shoots up in the back. Gesturing with a dismissive grunt I reply, “yes?”

“so like, what the actual f**k?”

From the other side of the room, a concerned voice, “Uhhh, professor… are you alright?”

Objectively speaking? Absolutely not, I haven't slept in days, but that is when I can see it best. Poetry, flickering about in those taunting, teasing vectors. After all, poetry, words, they go somewhere don't they? So that makes them vectors, and if they are vectors, then we can understand them, mathematically that is. And isn't that the most beautiful way to understand something?

Wearily, I reply, “Of course, subjectively speaking, I’ve never been better. Why would you even ask that?”

In the back, the same damn heckler, "It's all just gibberish!"

Somewhere between a growl and a sigh, I reply, “only because you are hampered by a mind indoctrinated on prose. See? All natural language is governed by the Zipf-Mandelbrot distribution, in which word frequency is governed by a harmonic power series. The very nature of language is a fractal, a thing of beauty! Yet it is dominated by so many useless words.”. I pause resentfully glaring at 'the' word.

The class just stares at me, as if I am a feral beast, based on the feeling in my mouth, they may not be wrong.

Slamming my hand against my already drawn, and partially erased, set of matrices, I continue, “So then why? Why is it that when I apply Principle Component Analysis on this matrix, do the least used words inevitably have the greatest eigenvalues? Their eigenvectors are literally explaining the majority of the variance! Meanwhile, all the words we say most, they just form this aimless cloud of complex conjugate pairs? What would explain that?”

The teacher's pet raises his hand, forgetting that I am a professor, and have no need for pets. I stare at him, hoping to shame him into silence, but unfortunately it just emboldens him

“As your language model is stored in a large tensor, and your distribution itself is piecewise, this leads to eigenvalues forming distinct clusters, not through the inherent nature of the words, but rather due to structure and a skew in your measurement system itself.”

Perhaps if I just ignore him? With a grunt I turn back to my model. “As you can see, after performing the Singular Value Decomposition, we can then sort our eigenpairs by the strength of their eigenvalues. This then allows us to remove the words which ultimately don't matter, and in turn, we can reduce the dimensionality of our data. In this way, our language model is compressed without losing its true essence.”

Ignoring that know-it-all’s raised hand, I finish with a flourish, “from there, we simply take our PCA model, and extract our words probabilistically from it with our previously used Zipf-Mandelbrot distribution. It functions just like natural language, but now, the CDF is dominated by words that matter!”

Impatiently the kid blurts out, “instead of using Frequentist models, have you considered something Bayesian, in which each predecessor word is a prior for forming a posterior output?”

In the back, again, “what the actual f**k?”

Thank goodness someone said something first. Looking at the front row I snarl, “Why do you hate poetry?”. Every time you come up with a beautiful way to mathematically arrange words, some newfangled prick and his ‘Bayesian Inference” goes trying to shove it into some ridiculous trigonometric transformed gamma posterior, what the hell is with kids these days…. Why do they hate pure f****n poetry?

© 2025 Nusquam Esse

Author's Note

Since many of the terms used here are not going to familiar for people who haven't extensively studied Linear Algebra, Linguistics, Data Structures, and Statistics, included is a brief glossary and summary of what is even being talked about.

The professor is basically proposing a mathematical model which basically reverses common and rarely used words into a natural language model, then being annoyed when some asks if the words should be connected or not. A very esoteric approach to minimalist 'poetry' I suppose.

Zipf-Mandelbrot Distribution - this is the distribution of how frequently words recur in most human languages. Basically, certain words are used with predictable frequencies.

Eigenvalues, Eigenvectors, Complex Conjugate Pairs - if data is organized in a matrix, then using linear algebra, you can find eigenvectors which basically function as paths of stability. The eigenvalues then represent how prominent these vectors are. In contrast, eigenvalues which are complex numbers (which exist in pairs, forming a conjugate) will generally represent oscillating behaviors. There is a lot more to these...

Principle Component Analysis - this is a compression method in which a matrix is broken into its eigenpairs (such as with SVD), and then they are arranged by their eigenvalues (the ratio of an eigenvalues to the total values, will show what percentage of variation in the data is caused by its respective vector). You can then remove low value eigenvalues, giving you a lower 'dimensional' matrix which still contains most of what you need. Image compression and Facial recognition systems will use PCA. The idea is that you can shrink your data to "what matters".

Frequentist vs Bayesian - there are two main branches of statistics. Frequentist is what most people learn, where a moment is taken independently. But alternatively, there are also Bayesian statistics, which are based on how probability changes with new results. It uses a system of priors and posteriors (the math can get exceptionally ugly, such as trying to use it on the Zipf's distribution.). This sort of statistics is generally used in Computer Science, within learning models and Markov Chains. It is generally based on the idea of what occurred recently will change how likely something is to occur next. Basically, the professor wants a random scattering of words that matches the distribution he wants, rather than 'sentences' where a word influences the next

My Review

Would you like to review this Story?
Login | Register

Stats

145 Views
Added on March 25, 2025
Last Updated on March 25, 2025

Author

Nusquam Esse

Ogden, UT

About

I have been a member on here for quite a long-time, sometimes I will swing by to read, but I don't post on here much. As such, forgive me if I am slow to reply. more..