Abstract

QWERTY has been the de facto layout for English text input since its invention in 1874. Its continued usage has led to concerns about its ergonomic shortcomings. Previous attempts at layout creation have usually relied on manual observations of typing data rather than a predictive model. To address this issue, we propose a methodology that incorporates both corpus data from 22 million English websites and 8,228 hours of real-world typing data from participants. The corpus data is processed into bigrams and their number of occurrences. The typing data is preprocessed to exclude user-made typos, and then each bigram is tabulated along with its associated typing times. A regression model (MAE: 26ms, R^2: 0.76) for predicting typing time is constructed using each bigram’s average typing time, frequency, and positional features on the keyboard. A layout's cost is calculated by taking the sum of each bigram’s occurrence count multiplied by its predicted typing time, effectively estimating the layout's typing time across the corpus. Simulated annealing is then employed to settle on a near-global minimum for the cost function by iteratively swapping keys and heuristically accepting these swaps. The result is a keyboard layout with an estimated 9% improvement on QWERTY’s typing time.

College

College of Science & Engineering

Department

Computer Science

Campus

Winona

First Advisor/Mentor

Mingrui Zhang

Second Advisor/Mentor

Sudharsan Iyengar

Location

Ballroom - Kryzsko Commons

Start Date

4-18-2024 1:00 PM

End Date

4-18-2024 2:00 PM

Presentation Type

Poster Session

Format of Presentation or Performance

In-Person

Session

2a=1pm-2pm

Poster Number

38

Share

COinS
 
Apr 18th, 1:00 PM Apr 18th, 2:00 PM

Optimizing Keyboard Layouts for English Text

Ballroom - Kryzsko Commons

QWERTY has been the de facto layout for English text input since its invention in 1874. Its continued usage has led to concerns about its ergonomic shortcomings. Previous attempts at layout creation have usually relied on manual observations of typing data rather than a predictive model. To address this issue, we propose a methodology that incorporates both corpus data from 22 million English websites and 8,228 hours of real-world typing data from participants. The corpus data is processed into bigrams and their number of occurrences. The typing data is preprocessed to exclude user-made typos, and then each bigram is tabulated along with its associated typing times. A regression model (MAE: 26ms, R^2: 0.76) for predicting typing time is constructed using each bigram’s average typing time, frequency, and positional features on the keyboard. A layout's cost is calculated by taking the sum of each bigram’s occurrence count multiplied by its predicted typing time, effectively estimating the layout's typing time across the corpus. Simulated annealing is then employed to settle on a near-global minimum for the cost function by iteratively swapping keys and heuristically accepting these swaps. The result is a keyboard layout with an estimated 9% improvement on QWERTY’s typing time.

 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.