Accurately Measuring Behavioral Biometric Performance

Justin Macorin
4 min readApr 12, 2021

Biometric security systems use, False Acceptance Rates (FAR) and False Recognition Rates (FRR) to measure accuracy and performance. The lower these numbers, the more accurate the system is.

However, FAR and FRR return different values at different times and under different circumstances. Therefore, we should scrutinize these numbers to ensure they accurately depict a system’s real performance.

This article will touch on measuring keyboard, mouse, and touch dynamics to ensure real, accurate, uniform, and predictable results.

What are FAR and FRR?

FAR — False Acceptance Rate: Percentage (%) of intruders that got in.

FRR — False Recognition Rate: Percentage (%) of authorized users locked out.

Our goal is to ensure FAR and FRR are as low as possible while maintaining good security and user experience levels.

We cannot always assume that a FAR of 0.01% and an FRR of 1% are good. This is because we don’t know how much data or time was used to generate these results.

How much data and time was used to calculate system performance?

We must understand how time and circumstance affect accuracy levels. For example, a user may get up to get a coffee or go to the bathroom; during this time, behavioral analysis is impossible to perform as no data is actively generated.

Keyboard dynamics performance

We can only gather keyboard dynamics data when a user is typing on their keyboard. Keyboard dynamics rely on “keypress” events to generate enough statistical data to make a prediction.

Time, in minutes, is often not a good approach to measure keyboard FAR and FRR because a user may use their keyboard in different ways throughout the day. For example, suppose a user is browsing the internet, performing research, reading an article, or playing a point-and-click game. In these cases, we can safely assume that minimal keyboard activity is being generated. We can also assume that the mouse is being used significantly more.

A good alternative is to measure performance based on the number of “keypress.” By doing so, we eliminate variation and control the calculation.

Mouse dynamics performance

Like the keyboard, we can only gather mouse dynamics data when a user is actively using their mouse. Mouse dynamics rely on mouse events, such as clicks and movement, to make predictions.

Time, once again, is not a good approach to measure mouse FAR and FRR because a user may use their mouse in different ways throughout the day. For example, suppose a user uses a command-line terminal application or writes lots of content. In these cases, mouse movement will be minimal as it is not required. On the other hand, keyboard activity will increase.

A better alternative is to measure mouse dynamics performance based on static features such as the combination of pixel distance and click events. By doing so, we eliminate variation and control the calculation.

Touch dynamics performance

A smartphone device is different from a mouse and keyboard. They are compact and constantly require finger motion for navigation and interaction. Touch dynamics relies on a user touching, swiping, and scrolling a touchscreen.

Time, in this case, may be used to calculate FAR and FRR since a user is much more likely to interact with their device uniformly across the day. For example, suppose a user checks their email, messaging, and social media applications every hour. In that case, we can assume that a lot of tapping, swiping, and scrolling is performed. However, if a user reads a long article, email, or comment, these actions will be reduced — but not eliminated.

It is good practice to control the FAR and FRR calculation and never let user variation impact this. Therefore, performance should be measured using a combination of swipe and scroll distances combined with touch events.

Conclusion

Time is a poor indicator to measure behavioral biometric performance and accuracy. The number of user actions is the most critical factor in determining the accuracy of a system.

It is also important to note that a system will generate better predictions given more data. For example, a user who continuously uses their mouse for 10 minutes will create more data than a user who only uses their mouse for 10 seconds.

It is best to generate FAR and FRR performance ratings based on user actions instead of time blocks.

--

--

I support organizations in strengthening their data and machine learning capabilities to better defend against next-generation cyber threats.