Low Level Multimodal Fusion

The component supports machine learning to feature-extraction components by putting together features from individual modalities, catering for early fusion and emotion recognition. Learning follows from sequences of samples, considering non atomic feature sets (speech prosody features, facial features, etc), of a particular relevance when facial expressivity is investigated. The component exploits the short-term memory function of recurrent neural networks in order to adapt to sets of changing features, for example raising eyebrows or opening mouths. The component is trained upon data in text (CSV) format and it foresees both a static and a dynamic training.

Availability of the component: Reference contacts are Amaryllis Raouzaiou and Stelios Asteriadis of Institute of Communication and Computer Systems - National Technical University of Athens.
Notes: A prototype version of the component can be provided upon request

CALLAS

Low Level Multimodal Fusion

Main Menu

Contact us

Notice Board