The algorithm is a Bayesian classifier, commonly used for spam detection.
When your mark a message as spam in your inbox, the internal database gets trained to recognize future messages similar to the marked one as spam.
A similar method is used in I Write Like's style analyzer: we trained it on numerous texts from famous authors by extracting words and various stylistic features, such as sentence structure and punctuation. When analyzing your text, the algorithm extracts these features and determines the author who used the same ones.
Writing style is not precisely defined. Different people may describe an author's style differently, leading to varying opinions. Therefore, determining the accuracy of human evaluations of styles can be challenging. Algorithmic evaluation is even more complicated.
I Write Like employs a statistical analysis of word choice and sentence structure. Initially, this analysis was conducted for each author in our database. The program then applies the same metrics to your text and calculates probabilities to identify the author with the most similar metrics.
While such analysis can't be 100% accurate, it's still fun.
At this time, I Write Like's style analyzer only supports English.
If you provide text in a different language, it may still produce a result based on some sentence structure metrics, but the outcome is meaningless.
The training corpus contained 50 authors. You can find the list of authors here.
The Bayesian classifier, used by I Write Like's style analyzer, is a machine learning algorithm. However, it is considerably simpler and less resource-intensive compared to large language models (LLM) like ChatGPT, which most people associate with artificial intelligence (AI).
Both I Write Like's style analyzer and large language models utilize statistics about text, but they do so in different ways.
The Bayesian classifier analyzes the statistical properties of text to make predictions or classifications based on probabilities derived from training data. It calculates the likelihood of a given piece of text belonging to different categories (in our case, authors).
Large language models, on the other hand, use neural networks trained on huge amounts of text to make predictions.
We open-sourced previous versions of I Write Like. You can find them on our GitHub account.
The latest version of the style analyzer that you see now uses the same (slightly improved) algorithm as the Go version, and, with a few differences, the Racket version. The original version released in 2010 used our Python-based Bayesian classifier.
We do not store your texts on our servers. We cannot read them.
The latest version that you are using right now doesn't even send texts to our servers for analysis; it does all the processing on your device, locally. You can even use it offline.
We do not use your texts for any analytical or training purposes because we can't even see them.
I Write Like was architected as a text processor app that runs on your device, available in a browser via a secure TLS connection.