Abstract
QWERTY has been the de facto layout for English text input since its invention in 1874. Its continued usage has led to concerns about its ergonomic shortcomings. Previous attempts at layout creation have usually relied on manual observations of typing data rather than a predictive model. To address this issue, we propose a methodology that incorporates both corpus data from 22 million English websites and 8,228 hours of real-world typing data from participants. The corpus data is processed into bigrams and their number of occurrences. The typing data is preprocessed to exclude user-made typos, and then each bigram is tabulated along with its associated typing times. A regression model (MAE: 26ms, R^2: 0.76) for predicting typing time is constructed using each bigram’s average typing time, frequency, and positional features on the keyboard. A layout's cost is calculated by taking the sum of each bigram’s occurrence count multiplied by its predicted typing time, effectively estimating the layout's typing time across the corpus. Simulated annealing is then employed to settle on a near-global minimum for the cost function by iteratively swapping keys and heuristically accepting these swaps. The result is a keyboard layout with an estimated 9% improvement on QWERTY’s typing time.
College
College of Science & Engineering
Department
Computer Science
Campus
Winona
First Advisor/Mentor
Mingrui Zhang
Second Advisor/Mentor
Sudharsan Iyengar
Location
Ballroom - Kryzsko Commons
Start Date
4-18-2024 1:00 PM
End Date
4-18-2024 2:00 PM
Presentation Type
Poster Session
Format of Presentation or Performance
In-Person
Session
2a=1pm-2pm
Poster Number
38
Included in
Optimizing Keyboard Layouts for English Text
Ballroom - Kryzsko Commons
QWERTY has been the de facto layout for English text input since its invention in 1874. Its continued usage has led to concerns about its ergonomic shortcomings. Previous attempts at layout creation have usually relied on manual observations of typing data rather than a predictive model. To address this issue, we propose a methodology that incorporates both corpus data from 22 million English websites and 8,228 hours of real-world typing data from participants. The corpus data is processed into bigrams and their number of occurrences. The typing data is preprocessed to exclude user-made typos, and then each bigram is tabulated along with its associated typing times. A regression model (MAE: 26ms, R^2: 0.76) for predicting typing time is constructed using each bigram’s average typing time, frequency, and positional features on the keyboard. A layout's cost is calculated by taking the sum of each bigram’s occurrence count multiplied by its predicted typing time, effectively estimating the layout's typing time across the corpus. Simulated annealing is then employed to settle on a near-global minimum for the cost function by iteratively swapping keys and heuristically accepting these swaps. The result is a keyboard layout with an estimated 9% improvement on QWERTY’s typing time.