On my laptop, I have two keyboard layouts installed: QWERTY and Dvorak. If I want to switch between them, I simply press Win+Space. But for some reason, Windows likes to switch my keyboard layouts automatically whenever I switch windows or change tabs. It is a bizarre behavior which sometimes results in me typing text kjak pssv; pgvd kjg; (translation: that looks like this). The gibberish results when I try to type in Dvorak and the layout is set to QWERTY.
Last night as usual, Windows switched the keyboard layout on me while I was working on an assignment. But instead of some nonsensical text following, my keystrokes produced an actual word. I wondered if there were any words that, when typed in QWERTY, would result in a valid word in Dvorak.
The analysis
To find out, I wrote a small Python script to go through a list of words and emulate the keystrokes of that word. Internally, it uses two dict
s to map the keys in QWERTY to the keys in Dvorak, and vice versa. For example, the location of the “H” key in QWERTY corresponds to “D” in Dvorak. Using all the mappings, I could determine for example that “HELLO”, when typed with the same QWERTY keystrokes but using Dvorak, would result in “D>NNR”. This is obviously not a valid word, but given over 250 thousand words in the English language, there must be some keystroke sequence that produces valid words in both languages.
The dictionary that I used was the official Scrabble dictionary. As a text file, it contains one word per line, and over 250,000 entries.
The results
After running the Python script, I was surprised at the results. Over two hundred valid word pairs resulted from keystrokes in QWERTY and Dvorak. Some entries in this list are quite obscure, such as “DUMA -> EGMA”, but some very interesting ones surfaced. For example, “SOAP” typed with QWERTY keystrokes is “ORAL” in Dvorak, while “MAMMA” is idempotent; the M and A keys are in the same place in both layouts. Additionally, most words in the list are very short; no more than six characters.
Going in the other direction, with the user typing in Dvorak while the computer is set in QWERTY, produces exactly the same number of words. However, the word pairs are different; namely most pairs are swapped. This is to be expected, as if a character α in QWERTY maps to β in Dvorak, then the reverse mapping must be true.
The analysis
I mentioned before that the letters “A” and “M” are in the same position on both keyboard layouts. These letters could be thought of as “pivot” letters, as their positions in each pair of words would be the same.
To find out this and a couple more metrics, I wrote another script, matchAnalyze.py
that would read any matches files and process them line by line. Here are the data that I gathered:
Total number of word pairs | 231 |
% pairs containing “A” | 76.62% |
% pairs containing “M” | 14.72% |
% pairs containing “A” and/or “M” | 77.06% |
Distribution of word lengths |
|
Percent distribution of word lengths |
|
The raw data
For those interested in the raw data, I have uploaded two text files with all the entries.
Dvorak to QWERTY
QWERTY to Dvorak
The technical details
The user runs the program in the following format:
py explore.py dictionary-file.txt -<which-way>
To parse the text file, each line is converted to UPPERCASE, then added to a set
. In Python this is a one-liner using list comprehension:
words = set([line.strip().upper() for line in open(sys.argv[1])])
The script has a map at the top to map QWERTY keys to Dvorak keys, and vice versa.
qwertyToDvorak = { "Q": "\"", "W": "<", "E": ">", "R": "P", "T": "Y", # and so on } dvorakToQWERTY = {v: k for k, v in qwertyToDvorak.items()} # basically, a reverse mapping of the above
For every single word in the dictionary, it converts the keystrokes using the mappings, and prints to stdout
each time it finds a match. Again, makes this very convenient with comprehensions.
# Mode is an enum representing "QWERTY to Dvorak" or "Dvorak to QWERTY" def getEquiv(word, mode): convert = qwertyToDvorak if mode == Layouts.DVORAK else dvorakToQWERTY return "".join([convert.get(char.upper(), char) for char in word])
for word in words: cWord = getEquiv(word, mode) if cWord in words: print("%s\t%s" %(word, cWord))
The source code
You can check out the source code at https://github.com/g-liu/keyboard-layout-explorer.
I came looking for this information because I use Dvorak and when I tried to type Brad with the keyboard unknowingly on QWERTY, it typed Noah! Also, I’ve noticed that dry becomes hot in the same situation.
That’s some crazy trivia! Thanks for sharing!