Advertisement

Geometric Graph Representation with Learnable Graph Structure and Adaptive AU Constraint 阅读笔记

阅读量:

IEEE Transactions on Affective Computing上的一篇文章,做微表情识别,阅读完做个笔记。不愧是Fellow在Trans上的文章,做得很扎实。本文用面部特征点为主,AU为约束,利用起始帧、峰值帧和结束帧三帧数据,采用GNN做了微表情识别。提出了几个子模型用于提取特征,设计了新的损失函数,邻接矩阵也是能学习调整的。

摘要

the faciallandmark is a low-dimensional and compact modality,whichachieves lower computational cost and potentially concentrateson ME-related movement features.

面部关键点是一种低维和紧凑的模态,它实现了较低的计算成本,并可能集中于ME相关的运动特征。

a geometric two-stream graph network is constructed to aggregate the low-orderand high-order geometric movement information from faciallandmarks to obtain discriminative ME representation.

通过构建几何双流图网络来聚合来自人脸关键点的低阶和高阶几何运动信息,从而获得具有判别性的ME表示

a self-learning fashion is introduced to automatically model thedynamic relationship between nodes even long-distance nodes.

引入了一种自学习的方式来自动建模节点之间甚至长距离节点之间的动态关系。

an adaptive action unit loss is proposed to reason-ably build the strong correlation between landmarks,facial actionunits and MEs.

提出一种自适应的动作单元损失,合理构建地标、面部动作单元和运动元之间的强相关性。

1.简介

in this paper,we make thefirst step to study the discriminability of facial landmarks forMER and employ only landmarks as input,for exploring itsadvantages compared to image-based input.

在本文中,我们第一步研究了面部特征点对MER的鉴别能力,并仅使用特征点作为输入,以探索其相对于基于图像输入的优势。

thispaper introduces a learnable adjacency matrix to learn a morereasonable and flexible graph structure based on a pre-definedgraph structure.

本文在预定义图结构的基础上,引入可学习的邻接矩阵来学习更合理、更灵活的图结构。

our work uses the AU labels as a constraint to learnthe geometric features related to AUs from facial landmarks.

我们的工作使用AU标签作为约束,从人脸特征点中学习与AU相关的几何特征。

considering the AU losses in different layers mayhave different contribution,this paper proposes an adaptiveAU loss to automatically learn the constraint intensity of AUloss in different layers.

考虑到不同层的AU损失可能具有不同的贡献,本文提出了一种自适应的AU损失来自动学习不同层AU损失的约束强度。

以下是本文的四大贡献:

1)The graph-based models areemployed to extract discriminative geometric movementfeatures with spatial-temporal information only from fa-cial landmarks.

基于图的模型仅从人脸特征点中提取具有时空信息的判别性几何运动特征。

2)To comprehensively explore geometric movement fea-tures from facial landmark,a Geometric Two-StreamsGraph Network(GTS-GN)is proposed to aggregate thelow-order and higher-order geometric information fromfacial landmarks.

为了全面挖掘人脸特征点的几何运动特征,提出一种几何双流图网络( Geometric Two-Streams Graph Network,GTS-GN )来聚合人脸特征点的低阶和高阶几何信息。

3)To overcome the shortcoming of the fixed adjacencymatrix,this paper proposes a Learnable Adjacency Matrix(LAM)to learn a reasonable and flexible graph structure.As a result,LAM can automatically model the discrimi-native relationship between facial landmarks for MER.

为了克服固定邻接矩阵的缺点,本文提出了可学习邻接矩阵(可学习Adjacency Matrix,LAM )来学习合理灵活的图结构。因此,LAM可以自动为MER建模面部特征点之间的判别关系。

4)Based on the strong correlations between AUs and MEs,an Adaptive AU(AAU)loss is proposed to automaticallyexplore a more reasonable and efficient way to introduceAU information.AAU loss can adaptively constrain thegeometric features in a multi-scale fashion and emphasizethe contributions of AU information at different semanticlevels.

基于AUs和MEs之间的强相关性,提出了自适应AU ( Adaptive AU,AAU )损失,以自动探索更合理高效的引入AU信息的方式。AAU损失能够以多尺度的方式自适应地约束几何特征,强调不同语义层次上AU信息的贡献。

2.相关工作

The experimental results haveproved the effectiveness of the graph-based methods in MER.

实验结果证明了基于图的方法在MER中的有效性。

However,these local region-based methods still takeup computing resources to extract the features in the local area.For some discriminative features,the computational cost stillis huge,e.g.Optical Flow.In addition,region cropping is alsoa not trivial issue.

然而,这些基于局部区域的方法仍然占用计算资源来提取局部区域内的特征。对于一些具有判别性的特征,计算成本仍然是巨大的,例如。光流。此外,区域裁剪也是一个不小的问题。这是利用图像信息直接计算的缺点。(这个是直接用图像信息的缺点)

we explore moreeffective modules and components to aggregate spatial andtemporal information in facial landmarks.Since the landmarksare structure data lying in the Non-Euclidean space,a newgraph model is designed to learn the geometric feature repre-sentations of MEs from the facial landmarks.

我们探索更有效的模块和组件来聚合面部特征点的空间和时间信息。由于特征点是位于非欧氏空间的结构数据,因此设计了一种新的图模型从人脸特征点中学习微表情的几何特征表示。

According to the way to construct graph,these methods canbe divided into two types:landmark-based and AU-based.

根据构建图的方式,这些方法可以分为基于特征点和基于AU的两类。

However,on one hand,the existing works withlandmark-based graph aggregated dynamic information,e.g.optical flow and magnified shape feature,which still spendstoo much computational cost to extract these features.Onthe other hand,the works with AU-based graph need extralearning model to extract AU features as node feature,whichincreases the model complexity and computational cost.

然而,一方面,现有的基于特征点的图聚合动态信息的工作,如光流和放大的形状特征,仍然需要花费大量的计算成本来提取这些特征。另一方面,基于AU图的工作需要额外的学习模型来提取AU特征作为节点特征,增加了模型复杂度和计算成本。

instead of providing much complicated and costly appearancefeatures or not easily accessible AU features,we provide asimple and much efficient way that directly takes landmarkcoordinate-based geometric features as node features.

我们提供了一种简单高效的方法,直接将基于特征点坐标的几何特征作为节点特征,而不是提供复杂且昂贵的外观特征或不易获取的AU特征。

Different from these methods,we first pre-defined a fixed adjacency matrix based on facialmuscle structure.Then,we define a LAM that is added to thefixed adjacency matrix.LAM can be updated automatically tolearn a more reasonable relationship between nodes.

与这些方法不同的是,我们首先基于面部肌肉结构预先定义了一个固定的邻接矩阵。然后,我们定义了一个添加到固定邻接矩阵中的LAM。LAM可以自动更新,学习节点间更合理的关系。

the proposedAAU loss aggregate AU information without extra models.It can constrain multi-scale features in an adaptive way toreasonably construct the strong relationship between faciallandmarks,AUs and MEs.

所提出的AAU损失聚合了AU信息,无需额外的模型。它可以自适应地约束多尺度特征,从而合理地构建人脸关键点、AUs和MEs之间的强关系。

3.提出的方法

Specifically,thiswork constructs a geometric movement graph(GM-Graph),designs SS module to deal with GM-Graph,and builds theGTS-GN model.In addition,two key components(LAM andAAU loss)are proposed to be applied to the designed moduleand model.

具体来说,本文构建了几何运动图( GM-Graph ),设计了SS模块来处理GM - Graph,并构建了GTS - GN模型。此外,还提出将两个关键组件( LAM和AAU损耗)应用到所设计的模块和模型中。

these three frames contain abundant movementinformation and remove a lot of redundant information in theentire ME video.

起始帧、顶点帧与终止帧包含了丰富的运动信息,去除了整个ME视频中的大量冗余信息。

the landmarks of the onset,apex and offset frames in ME videos are taken as nodesin GM-Graph to capture the ME movements

将ME视频中的起始帧、顶点帧和偏移帧的标志点作为GM - Graph中的节点来捕捉ME的运动。(这是GM-Graph的作用)

we choose the landmarks around themouth,eyebrows and nose to construct the graph based ontheir contributions.

我们选择嘴巴、眉毛和鼻子周围的标志点,根据它们的贡献度来构建图。

based on the Go of the onset,apex and offset frames,the GM-Graph GGM is constructed to establish the spatial-temporalrelationship between facial landmarks.

基于起始帧、顶点帧和偏移帧的Go,构建GM - Graph,建立人脸特征点之间的时空关系。

only landmarkcoordinates n=(x,y)are adopted as nodes features to studythe effectiveness of landmarks,and higher-order semanticfeatures(distance and angle between landmarks)are addedto explore the interaction of low and high-order geometricinformation.

仅采用地标坐标n = ( x , y)作为节点特征来研究地标的有效性,并加入高阶语义特征(地标之间的距离和角度)来探索低阶和高阶几何信息的相互作用。

we define two types ofnode features that(x,y)is Type A,and(x,y,D,α)is TypeB,where(x,y)is the landmark coordinates.

我们定义了两类节点特征,( x , y)为A型,( x , y , D , α)为B型,其中( x , y)为特征点坐标。

Inspired by CNN+LSTM[20]that extracts spatial-temporalfeatures separately,we design SS module to aggregate spatialand temporal information in GM-Graph.

受单独提取时空特征的CNN + LSTM [ 20 ]的启发,我们设计了SS模块来聚合GM - Graph中的时空信息。(SS模型的作用)

GCN can simultaneously extract the spatial andtemporal features in the spatial-temporal graph.However,itcannot focus on the extraction for the temporal features,whichmaybe neglect some small movement features.

GCN可以同时提取时空图中的空间特征和时间特征。但是,它不能专注于时间特征的提取,可能会忽略一些小的运动特征,这是GCN的缺陷。

SSmodule adopts GCN to aggregate the spatial information forthree frames,respectively.Then,TCN is adopted to aggregatethe temporal information between three frames.

SS模块采用GCN分别对三帧图像的空间信息进行聚合。然后,采用TCN对3帧之间的时间信息进行聚合。

Then,L is used for Fourier transform.

通过使用归一化的邻接矩阵和傅里叶变换,可以更有效地在空间图中传播信息,有助于提取图中节点的特征。这是GCN模型中常见的操作,目的是更好地利用图结构的信息。

As the calculation ofeigenvectors matrix is expensive,a Chebyshev polynomialwith R-th order is employed to well-approximate the filtergφ

用R阶Chebyshev多项式是为了简化计算。

计算挺复杂的,没有深究。

we introduce LAMexpressed as AL to learn a more reasonable relationshipbetween nodes.

我们引入表示为AL的LAM来学习节点之间更合理的关系。

TCN is employed to extracttemporal features

TCN被用来提取时间特征

this paper proposes a novelgraph model called GTS-GN to process low-order and high-order geometric features in two streams.

本文提出了一种新的图模型GTS - GN来处理两个数据流中的低阶和高阶几何特征。

Different from these works,GTS-GN tries to fuse thetwo features at the earlier layer,not limited to the last layer.

与这些工作不同的是,GTS - GN试图在更早的层融合这两个特征,而不仅仅局限于最后一层。

Two steams adopt the same structure that stacksseveral SS modules with the same number.After several SSmodules,the outputs of two streams are added together.Theadded features are inputted into several SS modules or FullConnected(FC)layer to continue aggregate geometric featureinformation.Finally,the softmax is used to classify featuresand predict the ME categories.

两个流采用相同的结构,将多个相同数量的SS模块堆叠在一起。在若干个SS模块之后,两个流的输出相加。将添加的特征输入到多个SS模块或全连接( Full Connected,FC )层中,继续聚合几何特征信息。最后,使用softmax对特征进行分类并预测ME类别。

our model builds this correlation by learning the geometricmovement features related to AUs before the ME classificationlayer.

我们的模型通过在ME分类层之前学习与AUs相关的几何运动特征来建立这种相关性。

we introducemulti-label AU loss before the ME classification layer andME loss in the classification layer.

我们在ME分类层之前引入多标签AU损失,在分类层引入ME损失。

整个损失函数的构建如图所示

Inorder to better aggregate low-order and high-order informationindividually before fusion,AAU loss constrains the geometricfeatures after fusing the two streams.

为了在融合前更好地单独聚合低阶和高阶信息,AAU损失对两流融合后的几何特征进行了约束。

we introduce the learnable weights to these losses for empha-sizing the contributions of the features at different semanticlevels

我们将可学习的权重引入到这些损失中,以强调特征在不同语义级别上的贡献。

to overcome the two problems,we takeprobability form as the weight to get AAU Loss

为了克服这两个问题,我们采用概率形式作为权重得到AAU Loss

MElabels are employed to calculate cross-entropy loss as ME loss(LME)in the final classification layer.

在最终的分类层中,使用MElabels计算交叉熵损失作为ME损失( LME )

4.实验

实验部分很充分,对比实验肯定是自己的好,消融实验测试了几个关键部件对整体性能的影响,及一些超参数的选择。

Itdemonstrates that the magnified movement of facial landmarkscan effectively represent the muscle movement related toMEs.In addition,ME images include redundancy information,which may decrease the MER performance.Instead,faciallandmark is a more compact modality which can retain dis-criminative geometric features for MER and achieves promis-ing performance.

表明面部特征点的放大运动可以有效地表征与MEs相关的肌肉运动。此外,ME图像包含冗余信息,可能会降低MER性能。相反,人脸特征点是一种更紧凑的模态,可以为MER保留具有判别性的几何特征,并取得了良好的性能。

Theseresults demonstrate that compared with image-based methods,landmark-based methods have an obvious advantage in calcu-lation cost and efficiency.

这些结果表明,与基于图像的方法相比,基于特征点的方法在计算成本和效率上具有明显的优势。

compared withthe image-based methods,the landmark-based graph methodshave much higher computational and parameter efficiency witha competitive recognition rate.

与基于图像的方法相比,基于特征点的图方法具有更高的计算和参数效率,并具有竞争力的识别率。

In addition,image-basedmethods also are greatly affected in complex environments,thus,MER under complex environments is another unsolvedand challenging task.

此外,基于图像的方法在复杂环境下也会受到很大的影响,因此,复杂环境下的MER是另一个尚未解决且具有挑战性的任务。

how tobetter combine the appearance features with the geometricfeatures is also worthy of further study.

如何将外观特征与几何特征更好的结合也是值得深入研究的问题。(作者指出的一个further work)

It demonstrates that the features extracted by SS moduleare more discriminative than those extracted by GCN.Also,extracting spatial and temporal geometric features separatelyis a better choice for MER.

说明SS模块提取的特征比GCN提取的特征更具判别性。同时,单独提取空间和时间几何特征也是MER的较好选择。

SS-GN+Type A is superior toSS-GN+Type B,which proves that simply introducing high-order geometric information cannot ensure performance im-provement.

说明SS模块提取的特征比GCN提取的特征更具判别性。同时,单独提取空间和时间几何特征也是MER的较好选择。

It demonstrates that GTS-GN canaggregate the low-order and high-order geometric informationmore effectively than SS-GN.

说明GTS - GN比SS - GN能更有效地融合低阶和高阶几何信息。

Overall,SS module can effectively aggregate spatial-temporal information in GM-Graph,and the GM-Graph withonly geometric features as node features can be processedby graph model to get promising results.

总体而言,SS模块可以有效地聚合GM - Graph中的时空信息,并且仅以几何特征作为节点特征的GM - Graph可以通过图模型进行处理,得到很好的效果。

Itdemonstrates that LAM is effective for MER to learn a morereasonable adjacency matrix.

证明了LAM对于MER学习更合理的邻接矩阵是有效的。

From this figure,the different layers have different LAMs,andas the layer increases,the average value of the learned LAMis smaller.

从图中可以看出,不同的层数具有不同的LAM,并且随着层数的增加,学习到的LAM的平均值越小。

LAM can learn the relationship betweendifferent facial muscle regions.These regions do not haveconnections in the pre-defined adjacency matrix,

LAM可以学习不同面部肌肉区域之间的关系。这些区域在预定义的邻接矩阵中不存在连接,

Overall,LAM can consider the difference between differentlayers to build the relationship between different facial organsin multi-scales.In this way,more reasonable adjacency matri-ces can be learned to aggregate the node features.

总体而言,LAM可以考虑不同层次之间的差异,在多尺度上建立面部不同器官之间的关系。通过这种方式,可以学习到更合理的邻接矩阵来聚合节点特征。

Overall,the performance is improved after usingAU loss.

总体而言,使用AU损失后性能得到了提升。

It demonstrates that AAU loss is helpful tolearn more discriminative features for recognizing MEs.

这说明AAU损失有助于学习到更有判别力的特征用于识别MEs。

The above results show that before theclassification layer,the use of AU information to constraintfeatures can improve the performance of MER.

以上结果表明,在分类层之前,使用AU信息对特征进行约束可以提高MER的性能。

AAUloss is superior to AU loss.It demonstrates that it is moreadvantageous to adaptively constrain multi-scale features inmultiple layers than fixedly constrain the features of a certainlayer.

AAUloss优于AU loss。这说明,在多层中自适应地约束多尺度特征比固定约束某一层的特征更有优势。

AAUloss can constrain the features to achieve the aggregation ofhigh-level AU information in more early layers.

AAUloss可以对特征进行约束,实现高层AU信息在更多早期层的聚合。

comparingwith not using AAU loss,the extracted features have someregularities on the first two layers after using AAU loss.

与未使用AAU损失相比,使用AAU损失后,提取的特征在前两层具有一定的规律性。

Overall,in earlier layers,the AAU loss-constrained modelfocuses on learning high-level AU features from facial land-marks,while in deeper layers,it focuses on learning high-levelME features from high-level AU features.

总体而言,在较早的层中,AAU损失约束模型侧重于从人脸特征中学习高层的AU特征,而在较深的层中,则侧重于从高层的AU特征中学习高层的ME特征。

14 points set has an obvious advantage and theperformance of 68 points set is worst.It turns out that underincluding the key information of eyebrows,nodes and mouth,with the point number increases,the performance drops.

14点集优势明显,68点集表现最差。结果表明,在包含眉毛、节点和嘴巴的关键信息下,随着点数的增加,性能下降。

it is a betterchoice to take AU information as auxiliary information andremain the maintain dominance of ME loss.

将AU信息作为辅助信息,保持ME损失的主导地位是较好的选择。

Overall,different from these existing methods,the compactlandmarks input makes the proposed method achieving acompetitive performance with a lot less computational cost.

总的来说,与这些现有的方法不同,紧凑的地标输入使得所提出的方法以更少的计算成本获得了具有竞争力的性能。

5.结论

Wefirst customized a GM-Graph based on the facial landmarksof three key frames to model the geometric and dynamicinformation in ME videos.

我们首先基于3个关键帧的面部特征点定制了一个GM - Graph,用于建模ME视频中的几何和动态信息

SS module was proposed tolearn the deep spatial and temporal features of GM-Graph.

为了学习GM - Graph的深层时空特征,提出了SS模块。

it’smore suitable to introduce both GCN and TCN to separatelyaggregate the spatial and temporal information.

同时引入GCN和TCN来分别聚合空间和时间信息更为合适。

Therefore,a new graph model GTS-GNwas proposed,which models information interaction and takesbetter use of complementary information from two types ofgeometric features.

因此,提出了一种新的图模型GTS - GN,该模型对信息交互进行建模,更好地利用了两类几何特征的互补信息。

LAM can automatically learna more reasonable graph structure that builds the relationshipbetween different facial muscle regions and between differentnodes.

LAM可以自动学习更合理的图结构,建立不同面部肌肉区域之间以及不同节点之间的关系。

AAU loss can adaptively constrain the multi-scale movementfeatures to aggregate AU information with an efficient way,learning more discriminative ME features.

AAU损失可以自适应地约束多尺度运动特征,以高效的方式聚合AU信息,学习更具判别性的ME特征。

Thisframework is more valuable for practical applications due toits low computational costs and at the same time comparableand even superior performance.

由于该框架具有较低的计算成本,同时具有可比甚至优越的性能,因此更具有实际应用价值。

全部评论 (0)

还没有任何评论哟~