Assessment of Parent–Child Interaction Quality from Dyadic Dialogue

Assessment of Parent–Child Interaction Quality from Dyadic Dialogue: Comparison

Please note this is a comparison between Version 2 by Lindsay Dong and Version 1 by Chaohao Lin.

The quality of parent–child interaction is critical for child cognitive development. The Dyadic Parent–Child Interaction Coding System (DPICS) is commonly used to assess parent and child behaviors. However, manual annotation of DPICS codes by parent–child interaction therapists is a time-consuming task. To assist therapists in the coding task, researchers have begun to explore the use of artificial intelligence in natural language processing to classify DPICS codes automatically.

parent–child interaction
DPICS
text classification
natural language processing

1. Introduction

Although the quality of parent–child interaction (PCI) profoundly impacts a child’s cognitive and socio-emotional development, PCI can be a challenging issue [1,2]^[1][2]. Therefore, parent–child interaction therapy (PCIT), a therapeutic approach, is crafted to assist parents of children experiencing early behavior problems in enhancing their relationship with their child and effectively managing their child’s behavior ^[3]. PCIT is linked to favorable outcomes for both children and families, leading to a decrease in child behavior problems and alleviation of family stress [4,5]^[4][5]. The Dyadic Parent–Child Interaction Coding System (DPICS) was developed in tandem with PCIT to monitor treatment progress. The DPICS allows for the quantification of child and parent behaviors in dyadic interaction, and DPICS has been extensively employed in the assessment of parent-child interaction quality and treatment outcomes. The DPICS is typically coded manually by a trained therapist or research staff ^[6]. This can be problematic, as time spent training to code to fidelity is costly. Additionally, if large amounts of data are being collected, time spent coding can delay the research process significantly.

Artificial intelligence is an emerging trend propelled by the swift advancement of machine learning and deep learning technologies. The goal of artificial intelligence is to create intelligent agents that are capable of completing tasks in a manner similar to humans. The state-of-the-art results and superhuman achievements have been attained in many fields, including AlphaGo in the Go game, Atlas of Boston Dynamics in whole-body robots, and recent conversational dialogue agent ChatGPT. Within the realm of natural language processing, pre-trained autoregressive deep learning language models like BERT and GPT have gained growing popularity [7,8]^[7][8]. Giving computers the ability to understand human language has long been a goal of artificial intelligence in natural language processing, and pre-trained models are fed massive raw documents in the hopes of identifying relationships among words or sentences.

Labeling DPICS codes is a laborious and time-consuming task for both experts and therapists. To assist PCIT therapists, Huber et al. introduced the SpecialTime system, designed to offer parents feedback as they engage in at-home practice of PCIT skills ^[9]. The developed SpecialTime system can automatically classify child-directed dialogue acts into the eight DPICS classes. B

2. Assessment of Parent–Child Interaction Quality from Dyadic Dialogue

2.1. Text Feature Extraction

2.1.1. Text Representation

When working with text in machine learning models, we need to convert the text into numerical vectors so that the models can process it. Two common methods for achieving this are one-hot encoding and integer encoding. One-hot encoding generates a vector whose length matches the vocabulary size and places a “1” in the index that corresponds to the word. This approach is inefficient because most values in the resulting vector are zero. In contrast, integer encoding assigns a unique integer value to each word. While this approach creates a dense vector that can be more efficient for machine learning models, it does not capture any relationships between the words, meaning that there is no inherent similarity between the encoded values of two words. For example, the integer values assigned to “he” and “she” have no relationship to each other, despite their semantic similarity. This limitation can pose challenges for specific natural language processing tasks, especially those that require a nuanced understanding of relationships between words.

Apart from one-hot encoding and unique numbers, previous techniques such as Bag of Words (BoW) and Term Frequency–Inverse Document Frequency (TF-IDF) have been used for converting the text to numerical vectors [10,11]^[10][11]. BoW and TF-IDF are both statistical measurement methods. There are also several variants, such as n-gram models and smoothed variants of TF-IDF.

Bag of Words

The Bag of Words (BoW) technique is extensively employed as a text representation method in NLP. It involves converting a piece of text into a collection of individual words or terms, along with their respective frequencies ^[10]. To create a BoW model, the text is pre-processed to remove stopwords and punctuation. Each word in the preprocessed text is then tokenized and counted, resulting in a dictionary of unique words and their respective frequencies. Ultimately, the text is represented as a vector with a length corresponding to the size of the dictionary. Despite its widespread use, BoW has several limitations. At first, BoW disregards the order and context of words in the text, potentially leading to the loss of crucial information regarding the meaning and context of individual words. Secondly, the vocabulary size can be very large, resulting in a high-dimensional vector space that can be computationally expensive and require too much memory. Thirdly, stopwords, which are common words like “the” and “a”, can dominate the frequency count and mislead the model. Finally, most documents only contain a small subset of the words in the vocabulary, resulting in sparse vectors that can make it difficult to compare documents or compute similarity measures.

Although BoW has some weaknesses, BoW is a widespread and effective technique for tasks such as text classification or sentiment analysis especially when combined with other techniques like feature selection and dimensionality reduction [12,13,14,15]^{[12][13][14][15]}.

Term Frequency–Inverse Document Frequency

Term Frequency–Inverse Document Frequency (TF-IDF) is a statistical measure employed to determine the relevance of words in a text document or corpus. TF-IDF frequently serves as a weighting factor in information retrieval searches, text mining, and user modeling ^[16].

TF-IDF is composed of two metrics: term frequency (TF) and inverse document frequency (IDF). The TF score measures how often words appear in a particular document. In simple words, TF counts the occurrences of words in a document. The weight of a term is directly proportional to its frequency in the document. This implies that words appearing more frequently in a document are assigned a higher weight ^[11]. In contrast, IDF measures the rarity of words in the text, assigning more importance to infrequently used words in the corpus that may carry significant information. By integrating IDF, TF-IDF reduces the significance of frequently occurring terms while amplifying the importance of less common terms ^[17].

TF–IDF has been one of the most widely used methods in NLP and machine learning for tasks like document classification, text summarization, sentiment classification, and spam message detection. For example, it can identify the most relevant words in a document and then apply these words as features in a classification model. A survey conducted in 2015 on text-based recommender systems found that 83% of them used TF-IDF ^[18]. Furthermore, many previous studies have demonstrated the effectiveness of TF-IDF for tasks like automated text classification and sentiment analysis [19,20,21,22,23]^{[19][20][21][22][23]}. However, TF-IDF has limitations. TF-IDF does not efficiently capture the semantic meaning of words in a sequence or consider the order in which terms appear. Additionally, TF-IDF can be biased towards longer documents, meaning that longer documents will generally have higher scores than shorter ones.

2.1.2. Word Embedding

Word embeddings are a form of representation learning employed in NLP, facilitating computers in comprehending the relationships between words. Humans have always excelled at understanding the relationship between words such as man and woman, cat and dog, etc. Word embedding has been developed to represent these relationships as numeric vectors in an n-dimensional space. In this context, words with similar meanings share comparable representations, implying that two related words are depicted by nearly identical vectors positioned closely in the vector space. This technique has been used effectively in various NLP tasks, such as sentiment analysis and machine translation. However, creating effective word embeddings is a significant and premier issue in NLP because the quality of word embeddings can impact the performance of downstream tasks. Moreover, ingenious word representations in a lower dimensional space can be more beneficial and train a model faster, making the creation of effective word embeddings a critical research area.

Word2Vec

Word2Vec is a popular technique for learning word embeddings using shallow neural networks, developed by ^[24]. Word2Vec comprises two distinct models: Continuous Bag of Words (CBOW) and Continuous Skip-gram. The CBOW model predicts the middle word based on surrounding context words, while Skip-gram predicts the surrounding words given a target word. In CBOW, the context comprises a few words before and after the middle word ^[25].

Global Vectors for Word Representation

Global Vectors for Word Representation (GloVe) is an algorithm that generates word embeddings by using matrix factorization techniques on a word-context matrix. To create the word-context matrix, a large corpus is scanned for each term, and context terms within a window defined by a window size before and after the term are counted. The resulting matrix contains co-occurrence information for each word (the rows) and its context words (the columns). To account for the decreasing importance of words as their distance from the target word increases, a weighting function is used to assign lower weights to more distant words [26,27]^[26][27].

2.1.3. Transformer

In a study by Vaswani et al. (2017), an attention-based algorithm called Transformers was introduced ^[28]. Transformers are a unique type of sequence transduction model that rely solely on attention rather than recurrence. This approach allows for the consideration of more global relationships in longer input and output sequences. As a result, Transformers have recently been utilized in natural language processing to address various challenges.

Bidirectional Encoder Representations from Transformers

BERT is a self-supervised learning model for learning language representations that was released by Google AI in 2018 ^[8]. BERT introduces a masked bidirectional language modeling objective that leverages context learned from both directions to predict randomly masked tokens, allowing it to better capture contextualized word associations. BERT belongs to a class of models known as transformers, and comes in two variants: BERT-Base, which incorporates 110 million parameters, and BERT-Large, which boasts 340 million parameters. BERT relies on an attention mechanism to generate high-quality, contextualized word embeddings ^[28]. The attention mechanism captures the word associations based on the words to the left and right of each word as it passes through each BERT layer during training. Compared to traditional techniques like BoW and TF-IDF, BERT is a revolutionary technique for creating better word embeddings, thanks to its pretraining on Wikipedia datasets and massive word corpus. BERT has been successfully applied to many NLP tasks, including language translation [29,30,31]^[29][30][31].

DistilBERT

DistilBERT is a highly efficient and cost-effective variant of the BERT model that was developed by distilling BERT-base. With 40% fewer parameters than bert-base-uncased, DistilBERT is both small and lightweight. Additionally, it runs 60% faster than BERT while maintaining an impressive 97% performance on the GLUE language understanding benchmark ^[32].

RoBERTa

Yinhan Liu et al. proposed a robust approach called the Robustly Optimized BERT-Pretraining Approach (RoBERTa) in 2019, which aims to improve upon the original BERT model for pretraining natural language processing (NLP) systems ^[33]. RoBERTa shares the same architecture as BERT, but incorporates modifications to the key hyperparameters and minor embedding tweaks to increase robustness. Unlike BERT, RoBERTa does not use the next-sentence pretraining objective, and instead trains the model with much larger mini-batches and learning rates. Additionally, RoBERTa is trained using full sentences, dynamic masking, and a larger byte-level byte-pair encoding (BPE) technique. RoBERTa has been widely adopted in downstream NLP tasks and has achieved outstanding results compared to other models [34,35,36]^[34][35][36].

2.2. Text Classification

Text classification is also referred to as text tagging or text categorization. The aim is to categorize and classify text into organized groups. Text classifiers can automatically analyze provided text and assign a set of pre-defined tags or categories based on its content.

While human experts are still considered the most reliable method for text classification, manual classification can be a complex, tedious, and costly task. With the advancement of NLP, text classification has become increasingly important, particularly in areas such as sentiment analysis, topic detection, and language detection. Various machine learning and deep learning methods have been employed for sentiment analysis, with Twitter being a popular data source [37,38,39,40]^{[37][38][39][40]}. Supervised methods, including decision trees, random forests, logistic regression, support vector machines (SVMs), and naive Bayes, have been used to train classifiers [41,42]^[41][42]. However, supervised approaches require labeled data, which can be expensive. To address this, unsupervised learning methods, such as that proposed by Pandarachalil et al., have been suggested ^[43].

2.3. Dyadic Parent–Child Interaction Coding System and Parent–Child Interaction Therapy

The Dyadic Parent–Child Interaction Coding System, fourth edition (DPICS-IV), is a structured behavioral observation tool that assesses essential parent and child behaviors in standardized situations. DPICS-IV has proven to be a valuable adjunct to PCIT and has been used extensively to evaluate other parenting interventions and research objectives as well ^[6]. Over the years, DPICS has been utilized in various studies addressing a wide range of clinical and research questions. Nelson et al. highlight the development of DPICS and discuss its current usage as a treatment process or outcome variable. The authors also provide a summary of the ways in which DPICS has been adapted and describe the process by which it is designed to undergo adaptation [45]^[44].

The DPICS-IV scoring system is based on the frequency counts of ten main categories, Neutral Talk, Labeled Praise, Unlabeled Praise, Behavior Description, Reflection, Information Question, Descriptive Question, Direct Commands, Indirect Commands, and Negative Talk. However, in previous work, eight categories were commonly used, where Information Question and Descriptive Question were combined into a single category called Question, and Indirect Commands and Direct Commands were combined as Commands [9,46,47]^[9][45][46]. Both Cañas et al. and Huber et al. have suggested that not all DPICS codes are equally important for therapy outcomes and have placed more emphasis on Negative Talk. In addition, Cañas et al. found that the DPICS Negative Talk factor demonstrated a high discriminant capacity (AUC = 0.90) between samples, and a cut-off score of 8 allowed the classification of mother–child dyads with 82% sensitivity and 89% specificity [46]^[45].

The process of labeling DPICS codes manually for each sentence in a conversation is a time-consuming and labor-intensive task that requires trained experts. Confirmatory factor analysis is then used to verify the factor structure of the observed variables [48]^[47]. However, Huber et al. have developed SpecialTime, an automated system that can classify transcript segments into one of eight DPICS classes. The system uses a linear support vector machine trained on text feature representations obtained using TF-IDF and part-of-speech tags. The system achieves an overall accuracy of 78%, as evaluated by the authors using an expert-labeled corpus ^[9].

PCIT helps parents improve interaction quality with children with behavior problems. The therapy instructs parents to employ effective dialogue during interactions with their children [49]^[48].

References

Jeong, J.; Franchett, E.E.; Ramos de Oliveira, C.V.; Rehmani, K.; Yousafzai, A.K. Parenting interventions to promote early child development in the first three years of life: A global systematic review and meta-analysis. PLoS Med. 2021, 18, e1003602.
Nilsen, F.M.; Ruiz, J.D.; Tulve, N.S. A meta-analysis of stressors from the total environment associated with children’s general cognitive ability. Int. J. Environ. Res. Public Health 2020, 17, 5451.
Eyberg, S.M.; Boggs, S.R.; Algina, J. Parent-child interaction therapy: A psychosocial model for the treatment of young children with conduct problem behavior and their families. Psychopharmacol. Bull. 1995, 31, 83–91.
Thomas, R.; Abell, B.; Webb, H.J.; Avdagic, E.; Zimmer-Gembeck, M.J. Parent-child interaction therapy: A meta-analysis. Pediatrics 2017, 140, e20170352.
Valero Aguayo, L.; Rodríguez Bocanegra, M.; Ferro García, R.; Ascanio Velasco, L. Meta-analysis of the efficacy and effectiveness of parent child interaction therapy (PCIT) for child behaviour problems. Psicothema 2021, 33, 544–555.
Eyberg, S.M. Dyadic Parent-Child Interaction Coding System (DPICS): Comprehensive Manual for Research and Training; PCIT International, Incorporated: Riverside, CA, USA, 2013.
Brown, T.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.D.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; et al. Language models are few-shot learners. Adv. Neural Inf. Process. Syst. 2020, 33, 1877–1901.
Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv 2018, arXiv:1810.04805.
Huber, B.; Davis, R.F., III; Cotter, A.; Junkin, E.; Yard, M.; Shieber, S.; Brestan-Knight, E.; Gajos, K.Z. SpecialTime: Automatically detecting dialogue acts from speech to support parent-child interaction therapy. In Proceedings of the 13th EAI International Conference on Pervasive Computing Technologies for Healthcare, Trento Italy, 20–23 May 2019; pp. 139–148.
Harris, Z.S. Distributional structure. Word 1954, 10, 146–162.
Luhn, H.P. A statistical approach to mechanized encoding and searching of literary information. IBM J. Res. Dev. 1957, 1, 309–317.
El-Din, D.M. Enhancement bag-of-words model for solving the challenges of sentiment analysis. Int. J. Adv. Comput. Sci. Appl. 2016, 7, 244–252.
HaCohen-Kerner, Y.; Miller, D.; Yigal, Y. The influence of preprocessing on text classification using a bag-of-words representation. PLoS ONE 2020, 15, e0232525.
Huang, C.R.; Lee, L.H. Contrastive approach towards text source classification based on top-bag-of-word similarity. In Proceedings of the 22nd Pacific Asia Conference on Language, Information and Computation, Cebu City, Philippines, 20–22 November 2008; pp. 404–410.
Yan, D.; Li, K.; Gu, S.; Yang, L. Network-based bag-of-words model for text classification. IEEE Access 2020, 8, 82641–82652.
Rajaraman, A.; Ullman, J.D. Mining of Massive Datasets; Cambridge University Press: Cambridge, UK, 2011.
Sparck Jones, K. A statistical interpretation of term specificity and its application in retrieval. J. Doc. 1972, 28, 11–21.
Beel, J.; Gipp, B.; Langer, S.; Breitinger, C. Paper recommender systems: A literature survey. Int. J. Digit. Libr. 2016, 17, 305–338.
Christian, H.; Agus, M.P.; Suhartono, D. Single document automatic text summarization using term frequency-inverse document frequency (TF-IDF). ComTech Comput. Math. Eng. Appl. 2016, 7, 285–294.
Ghag, K.; Shah, K. SentiTFIDF–Sentiment classification using relative term frequency inverse document frequency. Int. J. Adv. Comput. Sci. Appl. 2014, 5, 36–176.
Hakim, A.A.; Erwin, A.; Eng, K.I.; Galinium, M.; Muliady, W. Automated document classification for news article in Bahasa Indonesia based on term frequency inverse document frequency (TF-IDF) approach. In Proceedings of the 2014 6th International Conference on Information Technology and Electrical Engineering (ICITEE), Yogyakarta, Indonesia, 7–8 October 2014; pp. 1–4.
Sjarif, N.N.A.; Azmi, N.F.M.; Chuprat, S.; Sarkan, H.M.; Yahya, Y.; Sam, S.M. SMS spam message detection using term frequency-inverse document frequency and random forest algorithm. Procedia Comput. Sci. 2019, 161, 509–515.
Suhartono, D.; Purwandari, K.; Jeremy, N.H.; Philip, S.; Arisaputra, P.; Parmonangan, I.H. Deep neural networks and weighted word embeddings for sentiment analysis of drug product reviews. Procedia Comput. Sci. 2023, 216, 664–671.
Mikolov, T.; Sutskever, I.; Chen, K.; Corrado, G.S.; Dean, J. Distributed representations of words and phrases and their compositionality. Adv. Neural Inf. Process. Syst. 2013, 26.
Mikolov, T.; Le, Q.V.; Sutskever, I. Exploiting similarities among languages for machine translation. arXiv 2013, arXiv:1309.4168.
Pennington, J.; Socher, R.; Manning, C.D. Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 25–29 October 2014; pp. 1532–1543.
Schütze, H.; Manning, C.D.; Raghavan, P.; Schtze, H. Relevance Feedback and Query Expansion. Introduction to Information Retrieval; Cambridge University Press: New York, NY, USA, 2008.
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30.
Gao, Z.; Feng, A.; Song, X.; Wu, X. Target-dependent sentiment classification with BERT. IEEE Access 2019, 7, 154290–154299.
Koroteev, M.V. BERT: A review of applications in natural language processing and understanding. arXiv 2021, arXiv:2103.11943.
Müller, M.; Salathé, M.; Kummervold, P.E. COVID-twitter-bert: A natural language processing model to analyse COVID-19 content on twitter. arXiv 2020, arXiv:2005.07503.
Sanh, V.; Debut, L.; Chaumond, J.; Wolf, T. DistilBERT, a distilled version of BERT: Smaller, faster, cheaper and lighter. arXiv 2019, arXiv:1910.01108.
Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D.; Levy, O.; Lewis, M.; Zettlemoyer, L.; Stoyanov, V. Roberta: A robustly optimized bert pretraining approach. arXiv 2019, arXiv:1907.11692.
Adoma, A.F.; Henry, N.M.; Chen, W. Comparative analyses of bert, roberta, distilbert, and xlnet for text-based emotion recognition. In Proceedings of the 2020 17th International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP), Chengdu, China, 18–20 December 2020; pp. 117–121.
Cortiz, D. Exploring transformers in emotion recognition: A comparison of bert, distillbert, roberta, xlnet and electra. arXiv 2021, arXiv:2104.02041.
Tarunesh, I.; Aditya, S.; Choudhury, M. Trusting roberta over bert: Insights from checklisting the natural language inference task. arXiv 2021, arXiv:2107.07229.
Diyasa, I.G.S.M.; Mandenni, N.M.I.M.; Fachrurrozi, M.I.; Pradika, S.I.; Manab, K.R.N.; Sasmita, N.R. Twitter sentiment analysis as an evaluation and service base on python textblob. IOP Conf. Ser. Mater. Sci. Eng. 2021, 1125, 012034.
Gupta, B.; Negi, M.; Vishwakarma, K.; Rawat, G.; Badhani, P.; Tech, B. Study of Twitter sentiment analysis using machine learning algorithms on Python. Int. J. Comput. Appl. 2017, 165, 29–34.
Joulin, A.; Grave, E.; Bojanowski, P.; Mikolov, T. Bag of tricks for efficient text classification. arXiv 2016, arXiv:1607.01759.
Wagh, R.; Punde, P. Survey on sentiment analysis using twitter dataset. In Proceedings of the 2018 Second International Conference on Electronics, Communication and Aerospace Technology (ICECA), Coimbatore, India, 29–31 March 2018; pp. 208–211.
Ahammed, M.T.; Gloria, A.; Oion, M.S.R.; Ghosh, S.; Balaii, P.; Nisat, T. Sentiment Analysis using a Machine Learning Approach in Python. In Proceedings of the 2022 International Conference on Communication, Computing and Internet of Things (IC3IoT), Chennai, India, 10–11 March 2022; pp. 1–6.
Singh, J.; Tripathi, P. Sentiment analysis of Twitter data by making use of SVM, Random Forest and Decision Tree algorithm. In Proceedings of the 2021 10th IEEE International Conference on Communication Systems and Network Technologies (CSNT), Bhopal, India, 18–19 June 2021; pp. 193–198.
Pandarachalil, P.; Sendhilkumar, S.; Mahalakshmi, G.S. Twitter sentiment analysis for large-scale data: An unsupervised approach. Cogn. Comput. 2015, 7, 254–262.
Nelson, M.M.; Olsen, B. Dyadic parent–child interaction coding system (DPICS): An adaptable measure of parent and child behavior during dyadic interactions. In Handbook of Parent-Child Interaction Therapy: Innovations and Applications for Research and Practice; Springer: Berlin/Heidelberg, Germany, 2018; pp. 285–302.
Cañas, M.; Ibabe, I.; Arruabarrena, I.; De Paúl, J. The dyadic parent-child interaction coding system (DPICS): Negative talk as an indicator of dysfunctional mother-child interaction. Child. Youth Serv. Rev. 2022, 143, 106679.
Cotter, A.M. Psychometric Properties of the Dyadic Parent-Child Interaction Coding System (DPICS): Investigating Updated Versions Across Diagnostic Subgroups. Ph.D. Thesis, Auburn University, Auburn, AL, USA, 2016.
Cañas Miguel, M.; Ibabe Erostarbe, I.; Arruabarrena Madariaga, M.I.; Paúl Ochotorena, J.D. Dyadic parent-child interaction coding system (Dpics): Factorial structure and concurrent validity. Psicothema 2021, 33, 328–336.
Cotter, A.M.; Brestan-Knight, E. Convergence of parent report and child behavior using the Dyadic Parent-Child Interaction Coding System (DPICS). J. Child Fam. Stud. 2020, 29, 3287–3301.