JeFaPaTo - A joint toolbox for blinking analysis and facial features extraction

Analyzing facial features and expressions is a complex task in computer vision. The human face is intricate, with significant shape, texture, and appearance variations. In medical contexts, facial structures and movements that differ from the norm are particularly important to study and require precise analysis to understand the underlying conditions. Given that solely the facial muscles, innervated by the facial nerve, are responsible for facial expressions, facial palsy can lead to severe impairments in facial movements. One affected area of interest is the subtle movements involved in blinking. It is an intricate spontaneous process that is not yet fully understood and needs high-resolution, time-specific analysis for detailed understanding. However, a significant challenge is that many computer vision techniques demand programming skills for automated extraction and analysis, making them less accessible to medical professionals who may not have these skills. The Jena Facial Palsy Toolbox (JeFaPaTo) has been developed to bridge this gap. It utilizes cutting-edge computer vision algorithms and offers a user-friendly interface for those without programming expertise. This toolbox makes advanced facial analysis more accessible to medical experts, simplifying integration into their workflow.

data, which is essential for a thorough analysis of the blinking process as most blinks are shorter than 100 ms.We developed JeFaPaTo to go beyond the simple eye state classification and offer a method to extract complete blinking intervals for detailed analysis.We aim to provide a custom tool that is easy for medical experts, abstracting the complexity of the underlying computer vision algorithms and high-temporal processing and enabling them to analyze blinking behavior without requiring programming skills.An existing approach by Kwon et al. [2013] for high temporal videos uses only every frame 5 ms and requires manual measuring of the upper and lower eyelid margins.Other methods require additional sensors such as electromyography (EMG) or magnetic search coils to measure the eyelid movement [VanderWerf et al., 2007[VanderWerf et al., , 2003]].Such sensors necessitate additional human resources and are unsuitable for routine clinical analysis.JeFaPaTo is a novel approach that combines the advantages of high temporal resolution video data [Kwon et al., 2013] and computer vision algorithms [Soukupova, 2016] to analyze the blinking behavior.

Overview of JeFaPaTo
JeFaPaTo is a Python-based [Riverbank Computing Limited, 2023] program to support medical and psychological experts in analyzing blinking and facial features for high temporal resolution video data.We follow a two-way approach to encourage programmers and non-programmers to use the tool.On the one hand, we provide a programming interface for efficiently processing high-temporal resolution video data, automatic facial feature extraction, and specialized blinking analysis functions.This interface is extendable, allowing the easy addition of new or existing facial featurebased processing functions (e.g., mouth movement analysis [Hochreiter et al., 2023] or MRD1/MRD2 [Chen et al., 2021].On the other hand, we offer a graphical user interface (GUI) entirely written in Python to enable non-programmers to use the full analysis functions, visualize the results, and export the data for further analysis.All functionalities of the programming interface are accessible through the GUI with additional input validations, making it easy for medical experts to use.JeFaPaTo is designed to be extendable and transparent and is under joint development by computer vision and medical experts to ensure high usability and relevance for the target group.
JeFaPaTo leverages the mediapipe library [Lugaresi et al., 2019, Kartynnik et al., 2019] to extract facial landmarks and blend shape features from video data at 60 FPS (on modern hardware).With the landmarks, we compute the EAR (Eye-Aspect-Ratio) Soukupova [2016] for both eyes over the videos.Additionally, JeFaPaTo detects blinks, matches the left and right eye, and computes medically relevant statistics.Furthermore, a visual summary for the video is provided in the GUI, shown in subsection 2.1, and the data can be exported in various formats for further independent analysis.The visual summary lets medical experts quickly get an overview of the blinking behavior.As shown in subsection 2.1, the blinks per minute are shown as a histogram over time in the upper axis, and the delay between blinks is shown in the right axis.The main plot comprises the scatter plot of the EAR score for the left and right eye, and the dots indicate the detected blinks, with the rolling mean and standard deviation shown as a line plot.This summary creates a compact overview by summarizing the blinking behavior throughout the video, enabling a quick individualized analysis for each video.
We leverage PyQt6 [Riverbank Computing Limited, 2023, The Qt Componany, 2023] and pyqtgraph [Campagnola, 2020] to provide a GUI on any platform for easy usage.To support and simplify the usage of JeFaPaTo, we provide a

Functionality and Usage
JeFaPaTo was developed to support medical experts in extracting, analyzing, and studying blinking behavior.Hence, the correct localization of facial landmarks is of high importance and the first step in the analysis process of each frame.Once a user provides a video in the GUI, the tool performs an automatic face detection, and the user can adapt the bounding box if necessary.Due to the usage of mediapipe [Lugaresi et al., 2019, Kartynnik et al., 2019], the tool can extract 468 facial landmarks and 52 blend shape features.To describe the state of the eye, we use the Eye-Aspect-Ratio (EAR) [Soukupova, 2016], a standard measure for blinking behavior computed based on the 2D coordinates of the landmarks.The ratio ranges between 0 and 1, where 0 indicates a fully closed eye and higher values indicate an open eye, whereas most people have an EAR score between 0.2 and 0.4.This measure describes the ratio between the vertical and horizontal distance between the landmarks, resulting in a detailed motion approximation of the upper and lower eyelids.Please note that all connotations for the left and right eye are based on the subject's viewing perspective.
We denote this measure as EAR-2D-6, and the six facial landmarks are selected for both eyes, as shown in section 3.They are computed for each frame without any temporal smoothing.As mediapipe [Lugaresi et al., 2019, Kartynnik et al., 2019] belongs to the monocular depth reconstruction approaches for faces, each landmark contains an estimated depth value.We offer the EAR-3D-6 feature as an alternative, computed from 3D coordinates of the landmarks, to leverage this information to minimize the influence of head rotation.However, the first experiments indicated that the 2D approach is sufficient to analyze blinking behavior.
JeFaPaTo optimizes io-read by utilizing several queues for loading and processing the video, assuring adequate RAM usage.The processing pipeline extracts the landmarks and facial features, such as the 'EAR' score for each frame, and includes a validity check ensuring that the eyes have been visible.On completion, all values are stored in a CSV file for either external tools or for further processing JeFaPaTo to obtain insights into the blinking behavior of a person, shown in subsection 2.1.The blinking detection and extraction employ the scipy.signal.find_peaksalgorithm [Virtanen et al., 2020], and the time series can be smoothed if necessary.We automatically match the left and right eye blinks based on the time of apex closure.Additionally, we use the prominence of the blink to distinguish between 'complete' and 'partial' blinks based on a user-provided threshold (for each eye) or an automatic threshold computed using Otsu's method [Otsu, 1979].The automatic threshold detection uses all extracted blinks for each eye individually.Considering the personalized nature of blinking behavior, a graphical user interface (GUI) is provided, enabling experts to adjust the estimated blinking state as needed manually.Additional functions are included in calculating blinking statistics: the blink rate (blinks per minute), the mean and standard deviation of the Eye Aspect Ratio (EAR) score, the inter-blink delay, and the blink amplitude.A graphical user interface (GUI) for the JeFaPaTo codebase is provided, as depicted in Figure 3, to facilitate usage by individuals with limited programming expertise and to streamline data processing.
In Figure 3, we show the blinking analysis graphical user interface composed of four main areas.We give a short overview of the functionality of each area to provide a better understanding of the tool's capabilities.The A-Area is the visualization of the selected EAR time series for the left (drawn as a blue line) and right eye (drawn as a red line) over time.Additionally, after successful blinking detection and extraction, the detected 'complete' blinks (pupil not visible) are shown as dots, and 'partial' blinks (pupil visible) as triangles.If the user selects a blink in the table in the B-Area, the graph automatically highlights and zooms into the according area to allow a detailed analysis.
The B-Area contains the main table for the blinking extraction results, and the user can select the according blink to visualize the according period in the EAR plot.The table contains the main properties of the blink: the EAR score at the blink apex, the prominence of the blink, the internal width in frames, the blink height, and the automatically detected blinking state (none, partial, complete).If the user provides the original video, the user can drag and drop the video into the GUI into the D-Area, and the video will jump to the according frame to manually correct the blinking Figure 3: The graphical user interface (GUI) designed for blinking analysis utilizes the Eye Aspect Ratio (EAR) metric.This interface comprises four primary components state.The content of the table is used to compute the blinking statistics and the visual summary.These statistics are also shown in the B-Area at different tabs, and the user can export the data as a CSV or Excel file for further analysis.
The C-Area is the control area, where the user can load the extracted EAR scores from a file and select the corresponding columns for the left and right eye (an automatic pre-selection is done).The user can choose the parameters for the blinking extraction, such as the minimum prominence, distance between blinks, and the minimum blink width.Additionally, users can define the decision threshold for estimating 'partial' blinks should the 'auto' mode prove inadequate.Upon data extraction, corrections to the blinking state can be made directly within the table, following which the computation of blinking statistics and the generation of the visual summary can be initiated.
The D-Area displays the current video frame, given that the user supplies the original video.While this feature is optional, it helps manually correct the blinking state when required.

Extracted Medical Relevant Statistics
We a set of relevant statistics for medical analysis of blinking behavior, which are valuable to healthcare experts, see Table 1.The JeFaPaTo software is being developed in partnership with medical professionals to guarantee the included statistics are relevant.Future updates may incorporate new statistics based on medical expert feedback.A sample score file is available in the 'examples/' directory within the repository, enabling users to evaluate the functionality of JeFaPaTo without recording a video.

Platform Support
As JeFaPaTo is written in Python, it can be used on any platform that supports Python and the underlying libraries.We recommend the usage of anaconda [ana, 2020] to create encapsulated Python environments to reduce the interference of already installed libraries and possible version mismatches.The script dev_init.shautomatically creates the custom environment with all dependencies with the main.pyas the entry point for running the JeFaPaTo.The user can also use the 'requirements.txt'file to install the dependencies manually, even though we recommend creating a virtual environment at the very least.As JeFaPaTo is designed to be used by medical experts, we provide a graphical user interface (GUI) to simplify usage during clinical studies and routine analysis.We give each release a standalone executable for Windows 11, Linux (Ubuntu 22.04), and MacOS (version 13+ for Apple Silicon and Intel).We offer a separate branch for MacOS version pre-13 (Intel), which does not contain blend shape extraction, to support older hardware.The authors and medical partners conduct all user interface and experience tests on Windows 11 and MacOS 13+ (Apple Silicon).

Ongoing Development
JeFaPaTo finished the first stable release and will continue to be developed to support the analysis of facial features and expressions.Given the potential of high temporal resolution video data to yield novel insights into facial movements, we aim to incorporate standard 2D measurement-based features into our analysis.An issue frequently associated with facial palsy is synkinesis, characterized by involuntary facial muscle movements concurrent with voluntary movements of other facial muscles, such as the eye closing involuntarily when the patient smiles.Hence, a joint analysis of the blinking pattern and mouth movement could help better understand the underlying processes.The EAR is sensitive to head rotation.Careful setting up the experiment can reduce the influence of head rotation, but it is not always possible.
To support the analysis of facial palsy patients, we plan to implement a 3D head pose estimation to correct the future EAR score for head rotation.

Figure 1 :
Figure 1: The plot presents a visual summary of blinking patterns captured over 20 minutes, recorded at 240 frames per second (FPS).It illustrates the temporal variation in paired blinks, quantifies the blink rate as blinks per minute, and characterizes the distribution of the time discrepancy between left and right eye closures.

Figure 2 :
Figure 2: Visualization of the Eye-Aspect-Ratio for the left (blue) and right (red) eye inside the face.