Plain English Summary
Background and study aims
Autism Spectrum Disorders (ASD) are neurodevelopmental disorders that are characterized by social and emotional deficits. As a consequence, autistic people may have general difficulties behaving in a social context, expressing their emotions appropriately and inferring others’ minds. Particularly, autistic people may be impaired at distinguishing between emotional intonations in voices during conversations which can cause misunderstandings and negatively affect their social life. Drug-free training based on interactions with a humanoid social robot has been recently developed and was found to help autistic children improve their social abilities. Consequently, the ultimate aim of this study is to develop a new kind of social robot-based training using NAO, which is a talking and friendly-shaped 57-cm-high humanoid robot, to help autistic children distinguish emotion in voices. As autistic people are particularly interested in technological devices and digital environments, synthetic voices will be used to improve the learning and gradually reach the understanding of human voices. As the participants are Mexican and emotional expression differs between cultures, the researchers need to determine acoustic cues in human voices that allow the differentiation of emotional intonations as uttered in a Mexican way. The results are compared to those obtained from a Castilian Spanish database. Once the researchers know how to differentiate emotions in voices, they will be able to correctly digitally synthesize emotional utterances that approximate human voices at a greater or lesser extent.
Finally, those newly-acquired synthesized voices will be used in robot-based training for improving emotional discrimination abilities in autistic children.
Who can participate?
Experiment 1: typically-developed adults (aged 19 to 35) and children (aged 9 to 11)
Experiment 2 and 3: autistic and typically-developed children aged 9 to 11
What does the study involve?
Participants from the first experiment will be typically-developed adults and children who have to utter words with a particular emotional intonation (e.g. anger, sadness) into a microphone. Sessions last 1 hour for adults and 40 minutes for children. Participants from the second experiment are typically-developed and autistic children who have to listen to the emotional utterances recorded in experiment 1 and the synthesized voices newly created from the recordings while their brain’s activity is recorded by electrodes on their head. They also have to discern the emotion conveyed by each word. One session lasts 35 minutes. Finally, the synthesized voice screened in experiment 2 and the human voices are used in a training based on interactions with the NAO robot which is programmed to tell interactive stories (Social Stories™) which include relevant cues to improve the understanding of emotional intonation. Participants are autistic children. Typically-developed children participate only in assessment sessions (baseline, mid-training, end of training and two follow-up sessions where emotion understanding is evaluated by the same method described for experiment 2). The training is 15-sessions long where sessions 1, 7, 13, 14 and 15 are assessment sessions and sessions 2 to 13 are training sessions. Therefore, sessions 7 and 13 include both assessment and training. Sessions 14 and 15 are 2 and 4 weeks after the end of the training, respectively. Each assessment lasts 35 minutes and training lasts 10 minutes. Parents have to answer the Autism Spectrum Rating Scale (20 minutes) at sessions 1, 14 and 15.
What are the possible benefits and risks of participating?
There is no risk in any of the three experiments. Participants from experiments 1 and 2 will respectively help to promote the understanding of cross-cultural differences in emotional intonations and of brain activity and behavioural correlates of emotion processing in autistic and typically-developed children. Experiment 3 aims to improve the distinction of emotional intonations in voice for autistic children.
Where is the study run from?
Tecnologico de Monterrey, Monterrey (Mexico)
When is the study starting and how long is it expected to run for?
February 2020 to June 2023
Who is funding the study?
National Council of Science and Technology (Mexico)
Who is the main contact?
Mathilde Marie Duville
Electroencephalographic correlate of Mexican Spanish emotional speech processing in Autism Spectrum Disorders: to a Social Story™ and robot-based intervention
1. Cross-cultural differences between Castilian and Mexican Spanish in emotional prosody can be highlighted by analyzing duration, intensity and spectral characteristics of emotional speech signals.
2. Autistic electroencephalographic and behavioural correlates of emotional speech processing approximate non-pathological ones when speech is conveyed by synthesized voices.
3. A drug-free social story and robot-based intervention that implements the use of synthesized voices results in improvement of emotional prosodies differentiation skills in autistic children.
Approved 14/07/2020, Ethical Committee of the School of Medicine of Tecnologico de Monterrey (Av.Ignacio Morones Prieto 3000 Pte., Col. Los Doctores, CP: 64710, Monterrey, Nuevo León, México; +81 (0)88 88 21 07; email@example.com), ref: autismoEEG2020
Experiment 1: Observational single-center analytical case-control retrospective study
Experiment 2: Observational single-center analytical case-control retrospective study
Experiment 3: Interventional single-center analytical prospective randomized controlled trial
Primary study design
Secondary study design
Randomised controlled trial
Quality of life
Patient information sheet
See additional files
Autism spectrum disorders
Experiment 1: observational trials – non-pathological participants
Speech data will be recorded from 8 non-professional volunteer actors: 2 adult males and females, 2 children male and female recruited from a public announcement posted on Instituto Tecnologico de Monterrey, Campus Monterrey. Participants will be asked to read and get grips with the entire words’ dataset and will be given as much time as he/she needs. Then he/she will be instructed to read each word one time with an emotional intonation: neutral, happy, angry, sad, disgusted or afraid. All words from a particular emotion will be uttered successively in order to facilitate this process. Participants will be asked to wait at least 5 seconds between two utterances in order to ensure a basal emotional state before each recording. Written informed consents will be obtained from all participants at the beginning of the session. Each session will last approximatively 1 hour for adults and 40 minutes for children.
Experiment 2: observational trials – high-functioning autistic and non-pathological children aged 9-11
The task will proceed in a regular office room where the participant will be comfortably seated in a chair in front of a computer screen. Before starting the experiment, the EEG cap will be mounted on the participant’s head and the equipment for electroencephalographic activity recording will be set up. Once the participant is ready and calm, the experimental task will start. Emotional and neutral words uttered by human voices recorded in Experiment 1 and synthesized voices created in Experiment 2 will be randomly displayed to the participant. After each utterance, neutral, happy, angry, sad, disgusted and feared face emojis will appear on the computer screen, and the participant will have to choose which one fits best with the emotion conveyed by the word according to its prosody. Utterances will be displayed in an audio headset at 60 dBA. They will be spaced with an 8-second interval in order to come back to a basal emotional state before each word. For an approximative 35-min task, about 24 stimuli per emotion (anger, disgust, fear, neutral, sadness and happiness) will be presented, for a total of 144 stimuli. Before starting the actual paradigm and after the explanation of the task, trial stimuli will be displayed (one for each emotion). The participant will be informed that he/she will not have to answer these trials. The utterance will be displayed in the audio headset and the correct response will automatically be displayed on the screen by encircling the correct emoji. Right after, the participant will be informed that for the following stimuli he/she will have to answer. Then the task will start. Instructions will be written on the screen and explained verbally to ensure correct understanding before starting the task. Written informed consents will be obtained from all participants and their legal tutors before starting the session.
Experiment 3: interventional trials – high-functioning autistic and non-pathological children aged 9-11
Typically-developed participants will be recruited from a public announcement posted on Tecnologico de Monterrey, Campus Monterrey. Autistic participants will be recruited by contacting parents of autistic children attending San Jose Hospital from Tecnologico de Monterrey.
The intervention will consist of six-time per week sessions for 2 weeks conducted in a one-to-one format. Each session will last approximately ten minutes and will be composed of two Social Stories™. Presentation order will be random and different for each participant. The stories will be uttered by the NAO social robot that will be programmed using Choregraphe software. Emotional and neutral speech will be uttered using the synthesized voiced selected in Experiment 2. Each Social Story™ will be written to train emotion differentiation and recognition through prosody. Questions about the emotion conveyed will be asked. Correct responses will lead to a social reinforcement (e.g.: “Well done!” *in Spanish* with a happy prosody and synthesized voice). Incorrect or absence of response will result in repeating the part of the story that contains the information necessary to answer the question. Then the question will be asked again, and a correct answer will lead to a social reinforcement. In case of an incorrect answer, the correct answer will be told by the robot. In case of correct answers, the robot will specify that in daily life, this emotion may not be uttered by the synthesized voice but will be better uttered by his entourage with a more natural voice. Then the robot will repeat the same emotional utterance with the human voice and ask which emotion is conveyed. Again, correct responses will lead to social reinforcement. Incorrect or absence of response will result in repeating the part of the story that contains the information necessary to answer the question. Then the question will be asked again, and a correct answer will lead to a social reinforcement. In case of an incorrect answer, the correct answer will be told by the robot. Each session will start with robot NAO inviting the participant to play the “emotion game” and explaining the instructions. The robot’s movements and eye lighting will only be programmed for this part. During Social Stories™, NAO will remain still in order to focus the child’s attention to the audio information. All session long, NAO will be remotely controlled by the researcher with a Wizard-Of-Oz method using the Graphical User Interface (GUI) provided by the system. Written informed consent will be obtained from all participants and their legal tutors before starting the intervention. A total of 20 participants will take part in this experiment and will be divided into four groups:
Group 1: autistic children receiving robot, synthesized and human voices intervention (5 participants)
Group 2: autistic children receiving robot and human voice intervention (5 participants)
Group 3: autistic children not receiving the intervention (5 participants)
Group 4: typically-developed children not receiving the intervention (5 participants)
Group 1 will be the experimental group and groups 2 to 4 will be controls. Children from group 2 will receive the same intervention as children from group 1, with the exception of the use of human voices only. This group will help to evaluate the efficacy of the use of synthesized voices in the intervention. Autistic participants will be randomly assigned to any of groups 1, 2 or 3 (using Excel random function). All four groups will be included in pre- (1st session), mid (7th session), end (13th session) and follow-up assessments (2 and 6 weeks after the end of the intervention). Please refer to outcome measures for more details about the assessments.
Primary outcome measure
1. Autistic symptomatology measured by the Autism Spectrum Rating Scales. Parents of both groups will answer this questionnaire at pre-intervention (1st session) and follow-up sessions (2 and 6 weeks after the end of the intervention).
2. Electroencephalographic correlates and behavioural responses to a task related to emotional prosody processing measured with a 24-channel EEG system (Greentek Gelfree S3, mBrain Train), digitized and amplified at a sampling rate of 500Hz (pass band: 0.1-100Hz). Electrodes placement will be in accordance with the international 10/20 system. Measures will be done at pre-intervention (1st session), mid and end sessions (sessions 7 and 13) and follow-up sessions (2 and 6 weeks after the end of the intervention). The emotional prosody task will be similar to the one described for Experiment 2.
Secondary outcome measures
Experiment 1: cross-cultural differences between Mexican and Castilian Spanish of emotional prosodies measured by statistical analysis of the following physical acoustic features:
1.1. Rate (number of syllables per second)
1.2. Mean, maximum, minimum, and standard deviation pitch in hertz
1.3. Jitter local and jitter ppq5 in percentages
1.4. Shimmer local and shimmer apq5 in percentages
1.5. Harmonic-to-noise-ratio in decibels
1.6. Root mean square energy in volts
1.7. Mean, maximum, minimum intensity in decibels
1.8. Mean first, second and third formants in hertz
1.9. Mean first, second and third formants bandwidths in hertz
1.10. 13 first Mel Frequency Cepstral Coefficients (MFCC)
Each participant will be recorded at one session.
Experiment 2: electroencephalographic and behavioural correlates of discrimination of emotional prosodies uttered by synthesized and human voices measured by the analysis of the amplitude of the Late Positive Potential, P200, N300 components and the number of correct/incorrect responses. Each participant will be recorded at one session.
Overall trial start date
Overall trial end date
Reason abandoned (if study stopped)
Participant inclusion criteria
1. Healthy male and female adults and children (age range for children: 9-11 years old, age range for adults: 19-35 years old).
2. Participants will have to have grown up in Mexico in order to ensure a Mexican way of conveying emotions
1. High-functioning autistic volunteers aged 9-11 years old
2. All participants will be diagnosed by a clinician according to DSM-IV (Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition) or ICD-10 (International Classification of Diseases and Related Health Problems – Tenth Edition) criteria
3. Autistic symptomatology will be assessed by the Autism Spectrum Rating Scales – parent report). A T-score greater than 60 (slightly elevated to very elevated scores for autistic behaviors) on Social/Communication, DSM-IV-TR (Text Revision), Peer and Adult Socialization and Social/Emotional Reciprocity domains will be considered as an inclusion criterion.
Typically-developed (TD) volunteers aged 9-11-year-old will also be included in this study. In the same way that for autistic children, Autism Spectrum Rating Scales (ASRS) will be used, and children with a score inferior to 60 in all scales will be included.
Autistic and TD participants involved in this study will have to meet the same criteria than those described in Experiment 2. However, participants from Experiment 2 will not be included in Experiment 3 in order to avoid any bias elicited by a former exposition to the synthetic voices under study.
Target number of participants
Experiment 1: two adult females (19-35-year-old), two adult males (19-35-year-old), two male children (9-11-year-old), two female children (9-11-year-old) Experiment 2: Statistical power analysis will be performed with G.Power 3.1 software for sample size estimation. An alpha value of 0.05 and power of 0.8 will be established. Approximation: 20 autistic and 20 non-pathological children (9-11-year-old). Experiment 3: 15 autistic children and 5 non-pathological children (9-11-year-old).
Participant exclusion criteria
1. Participants will be excluded if they have been diagnosed with any pathology that affects emotional behavior, hearing or speech at the time of the study
2. Participants presenting sickness traits affecting voice timbre at the time of the study
Any TD or autistic participant will be excluded if:
1. She/he is under medication affecting the central or peripheral nervous system
2. She/he has hearing loss or deficits
3. She/he has a history of a developmental pathology and/or any disease affecting behavior and nervous system, apart from ASD for autistic participants
Autistic and TD participants involved in this study will have to meet the same criteria than those described in Experiment 2
Recruitment start date
Recruitment end date
Countries of recruitment
Trial participating centre
Tecnologico de Monterrey
Ave. Eugenio Garza Sada 2501
Consejo Nacional de Ciencia y Tecnología (ref: 1061809)
National Council of Science and Technology, Mexico, CONACYT
Funding Body Type
Funding Body Subtype
Results and Publications
Publication and dissemination plan
1. Planned protocol study publication
2. Planned publication in a high-impact peer-reviewed journal:
Experiment 1: 30/11/2021
Experiment 2: 30/06/2022
Experiment 3: 28/02/2024
IPD sharing statement
The data-sharing plans for the current study are unknown and will be made available at a later date.
Intention to publish date
Participant level data
To be made available at a later date
Basic results (scientific)
- ISRCTN18117434_PIS.pdf Uploaded 08/10/2020