Nowcasting Unemployment: Comparison
Please note this is a comparison between Version 1 by Vaida Pilinkiene and Version 2 by Rita Xu.

In contrast to researchers who used only a small number of search queries or limited themselves to job agency explorations, researchers incorporated keywords from the following six dimensions of Google Trends searches: job search, benefits, and application; mental health; violence and abuse; leisure search; consumption and lifestyle; and disasters.

  • LSTM
  • neural networks
  • unemployment
  • nowcasting

1. Introduction

Due to the negative effects of the COVID-19 pandemic on the global economy and the rising rates of unemployment, the need for more accurate unemployment forecasting models has become increasingly urgent. Recent developments in the nowcasting literature indicate an attempt to provide an enhanced approximation of the unemployment situation in a timely manner, which might benefit government institutions that are preparing to take pro-active measures against a surge in joblessness. In essence, nowcasting is a technique used to predict the behavior of the economy in real time or as close as possible to real time, allowing governments to make more informed decisions about the stimulus rather than relying on lagging monthly indicators (Giannone et al. 2008; Bańbura et al. 2013). Numerous authors have already reported successes in nowcasting unemployment, car sales, GDP, household consumption with credit card transaction data, foreign arrivals, and even building permits (Choi and Varian 2009a; Banbura et al. 2010; Barreira et al. 2013; Pavlicek and Kristoufek 2015; Rusnák 2016; Coble and Pincheira 2017; Richardson and Mulder 2018; Antolini and Grassini 2018; Nymand-Andersen and Pantelidis 2018; Giovannelli et al. 2020; Aastveit et al. 2020). Although the reported effectiveness of nowcasting models differs from author to author, a general consensus has emerged that nowcasting helps improve prediction accuracy.
Despite the growing literature on nowcasting, much needs to be discussed and perfected regarding the models or variable choices themselves. For instance, one source of data that has been widely popular amongst researchers is user internet search activity, such as Google Trends data (Askitas and Zimmermann 2009; D’Amuri 2009; Choi and Varian 2009b; Chadwick and Sengul 2012; Barreira et al. 2013; Smith 2016; Naccarato et al. 2017; Nymand-Andersen and Pantelidis 2018; Nagao et al. 2019; Anttonen 2018; Dilmaghani 2019; Borup et al. 2021; Mulero and García-Hiernaux 2021). According to Ettredge et al. (2005), the usefulness of such data is that through search activity, citizens provide useful information about their intentions, needs, wants, and interests. In many cases, people who lose their jobs use Google to help them find unemployment benefit agencies and new job openings, as indicated by the fact that 93.63% of all mobile searches are performed through Google (Netmarketshare 2022). As a result, authors such as Mulero and García-Hiernaux (2021) and Naccarato et al. (2017) claim that by using Google data, an increase in the predictive accuracy of 10–25% or a reduction in forecasting errors was achieved. Another advantage of using Google Trends is that it provides various frequencies, which benefits the nowcasting procedure.
On the other hand, authors, including Nagao et al. (2019) and Barreira et al. (2013), produced mixed results when forecasting unemployment. Google’s trend data did not increase accuracy in all countries, and specific preconditions were required for search activity data to work. As Smith (2016) puts it, the challenge with search data is that one must determine which keywords are relevant and how many keywords should be incorporated to best mirror a jobless or soon-to-be-jobless person’s search query that later translates to unemployment numbers. In the case of Nagao et al. (2019), only two keywords were used; by expanding the keyword dictionary, the results might have changed substantially. Thus, despite some success stories, the question of how to correctly mine the data from Google Trends is a real challenge that requires considerable future improvements.
Tangent to the keyword selection issue, there is also a lack of attempt to incorporate other socially important keywords that could be highly related to unemployment figures. For example, many authors only included keywords such as “jobs” or “job offers” but did not include other keyword dimensions, such as “online casino”, “anxiety”, “depression”, and “leisure activities”. Researchers such as Mousteri et al. (2018) and Liu et al. (2021) found that joblessness and psychological distress are interrelated; thus, the expansion of the keyword corpus could lead to a better Google Trends forecast.
Furthermore, despite recent developments in machine learning, there have been relatively few attempts to use neural networks in nowcasting unemployment. Much of the literature deploys dynamic factor models (DFM), ARIMA, or mixed data sampling (MIDAS) (D’Amuri 2009; Smith 2016; Chernis and Sekkel 2017). However, according to Hopp (2021), the long–short term memory (LSTM) neural network approach has an edge over the DFM in accuracy, speed, and volume. Thus, the machine learning approach could be explored more within the nowcasting context.
Therefore, the aim of this paper is to investigate a multidimensional approach of Google search query data in nowcasting USA initial jobless claim numbers using LSTM neural networks. To our knowledge, this is the first paper that expands Google keyword dimensions to include psychological factors, gambling, and other activities to nowcast initial jobless claims numbers using neural networks. In addition, this paper tests different feature extraction strategies, offering future researchers recommendations regarding what to focus on when conducting keyword selection to avoid common pitfalls. The findings of this study can help central banks and other government institutions expand their variable horizons to achieve better forecast accuracy. The analysis period also covers the COVID-19 timeline.

2. Forecasting Unemployment Using Google Trends

One of the first papers that considered Google Trends for unemployment nowcasting was written by Choi and Varian (2009a). The authors attempted to nowcast initial jobless claims for unemployment benefits by using the keywords “Jobs” and “Welfare & Unemployment”. The results revealed that the out-of-sample mean absolute error was reduced by 15.74 and 12.90% for the long- and short-term models, respectively. In the same year, followed papers by D’Amuri (2009) and Askitas and Zimmermann (2009). The work of D’Amuri (2009) limited itself to only one keyword “jobs” whereas Askitas and Zimmermann (2009) used four different categories of keywords: “unemployment office or agency”, “unemployment rate”, “Personnel Consultant”, and keywords that relate to the most popular job search engines in Germany. Both authors concluded that Google Trends helped improve forecasting errors. As depicted in Table 1, Chadwick and Sengul (2012) nowcasted Türkiye’s unemployment rate using keywords that related to “looking for a job”, “job announcements”, “CV”, and “career” and provided supportive evidence for using Google Trends. Fondeur and Karamé (2013) achieved success in forecasting French youth unemployment with only one keyword “job”. The RMSE was improved on average by 13.6%. Several recent papers are still using a limited range of keywords. For instance, Aaronson et al. (2022), Maas (2019), and Larson and Sinclair (2021) employed a single search term (“unemployment”), D’Amuri and Marcucci (2017) used “jobs”, whereas Simionescu (2020), Simionescua and Cifuentes-Faura (2021) used two terms “unemployment” and “job offers”. Despite the use of single or double keywords, Google Trends improved accuracy.
Table 1. A summary of Google keywords used for unemployment forecasts by previous authors.
On the other hand, some researchers claim to have achieved mixed results. Pavlicek and Kristoufek (2015) used the keywords “work” or “jobs” for Visegrad countries in multiple languages. The authors concluded that for Hungary and Slovakia, user search trails can be integrated easily; however, for Poland and Slovakia, the results were not consistent. Similar findings were achieved by Nagao et al. (2019), who limited themselves to only two terms: “jobs” and “job offer”. The authors found no consistency regarding improved accuracy, particularly when considering the long-term nowcasts, whereas Barreira et al. (2013), using “unemployment” and “unemployment benefits”, found an improvement in three out of the four countries analyzed. Unfortunately, few studies have attempted to incorporate a higher query volume. Among these papers is one by Schiavoni et al. (2021), who studied the Netherlands’ job market. The authors included 85 keyword search terms but limited the query to words that strongly relate to the job process (e.g., CV, cover letter, job vacancies). A slightly different approach was adopted by Caperna et al. (2020) and Yi et al. (2021), who suggested not only examining the “unemployment” keyword but also checking the most-searched terms and, when filtering them, determining which related to work and jobs. Yi et al. (2021) obtained 25 keywords for the USA region and successfully integrated them into the forecasting model. In the case of Caperna et al. (2020), the search term queries were extremely varied; for example, for Estonia, 3 keywords were chosen and 178 for Italy, whereas for some countries, such as Luxemburg or Malta, none were found. The results of Google Trends benefits for Caperna et al. (2020) also varied, and further methods were required to find significant relationships.

3. Additional Keyword Opportunities

The analysis of the existing studies reveals that the authors limited themselves to one or two basic keywords, and the expansion of keywords was highly constrained to the job application procedure. This approach led some researchers to conclude that Google Trends offered no additional benefit to nowcasting models or that trends are inconsistent over longer periods. However, the pitfalls or further improvement of the models could be partially attributed to Google Trends keyword mining process. By only considering one keyword for forecasting, the authors cast aside the big picture and the nuances of behavioral economics that are profoundly related to unemployment numbers. For instance, Liu et al. (2021) and Wanberg et al. (2019) found strong links between unemployment and mental health. In times of downturn, the fear of job loss can trigger mental distress, particularly when loss of employment benefits is taken into account. The study by Breuer (2015) argued that unemployment increases the risk of suicide, while Butterworth et al. (2012) found that poor mental health leads to a longer duration of unemployment for both men and women. Furthermore, studies by Sotis (2021) and Askitas (2015) found that when labor market conditions deteriorate, searches for health symptoms proliferate; therefore, a dimension of mental health keywords (MHKs) could provide substantial benefits to model precision. Unfortunately, to our kno wledge, no study is available for nowcasting unemployment that includes MHKs. Furthermore, a contentious issue among labor economists has been to analyze the trade-offs between leisure and labor. More importantly, Havitz et al. (2004) and Goodman et al. (2016) claim that unemployed people spend more time in leisure to mitigate the stress of joblessness. In the Internet age, leisure time is frequently spent consuming free entertainment, such as YouTube, online video games, and downloading movie torrents via sites such as “The Pirate Bay” (Lehdonvirta 2013). A highly regarded study in this area was undertaken by Dilmaghani (2019) that included torrent websites as a proxy for leisure activities and managed to nowcast unemployment numbers more accurately. However, leisure-related keywords still have more to offer. For instance, Frangos et al. (2011) discovered that people involved in unemployment programs were more likely to develop a problematic pornography habit. This is because, according to Uzieblo and Prescott (2020), pornography may act as a stress reliever in times of high anxiety. Similarly, some kind of association between unemployment and gambling persists. Khanthavit (2021) found that unemployment and gambling have a circular relationship, while Pallesen et al. (2021) and Mallorquí-Bagué et al. (2017) determined that internet gaming disorder was more prominent among single unemployed persons. Thus, an increase in online gaming, gambling platforms, and sites, such as “OnlyFans”, search activity could help to improve Google Trends mining process. Another negative behavior associated with unemployment is domestic abuse (Anderberg et al. 2016; Bhalotra et al. 2021). According to Anderberg et al. (2016), women who are at risk of job loss become more economically reliant on their partners. This reliance can then lead male partners who have a predisposition to violence to reveal their abusive tendencies. Thus, high female unemployment leads to an elevated risk of intimate partner violence. As a way out, people may use Google searches to find public shelter. A study by Berniell and Facchini (2021) reported a high Google search intensity in 11 countries for keywords such as “domestic abuse” or “domestic violence hotline” during the COVID-19 lockdowns. The strong correlation between unemployment and domestic abuse might also assist nowcasting. Many other important keywords should also be incorporated into the nowcasting model. Some of these include “gig economy”, “recipes”, “hobbies”, “Netflix”, “payday loans”, lottery tickets”, “alcohol”, “layoffs”, “auto insurance”, “hardship letter”, or words related to comfort food (Dávalos et al. 2011; Askitas 2015; Agarwal et al. 2016; Chadi and Hetschko 2017; Huang et al. 2018; Gabrielyan and Just 2020; McKinsey & Company 2020; Gamze and Aydan 2021).

4. Nowcasting and Machine Learning

Previous studies used a wide range of methods to nowcast unemployment numbers. Mulero and García-Hiernaux (2021), Larson and Sinclair (2021), Simionescu (2020), Caperna et al. (2020), Nagao et al. (2019), Dilmaghani (2019), D’Amuri and Marcucci (2017), Pavlicek and Kristoufek (2015), Chadwick and Sengul (2012) and Choi and Varian (2009a) used some form of autoregressive models. Yi et al. (2021) suggested a PRISM approach, and Schiavoni et al. (2021) incorporated a well-known model of DFM, whereas Maas (2019) attempted the MIDAS model. Although each model has its own strengths and weaknesses, the recent developments in neural networks deserve deeper exploration. As stated by Brownlee (2018) and Bruneckiene et al. (2021), neural networks may provide a solution for capturing the non-linear patterns in the data that can be difficult to detect using conventional econometrical methods as well as offering the ability to use a higher volume of data. Unfortunately, few studies have attempted to forecast unemployment numbers using Google Trends in conjunction with LSTMs. Fenga and Son-Turan (2022) used a feed-forward neural network for counterfactual predictions, whereas Singhania and Kundu (2020) deployed the LSTM model for monthly data. According to Singhania and Kundu (2020), the authors attempted, at first, to use the VAR model but were soon faced with a major drawback as the VAR model can only be used to capture the relationship between search trends of a limited number of keywords due to growth in the complexity and parameters. As such, the authors attempted to use LSTMs as an alternative. In the end, the LSTMs significantly outperformed the VAR model but were not used for nowcasting. The latter results could be a good indication to use the LSTM model in this paper as it can help to deal with a large number of keywords in an efficient manner.
Video Production Service