Despite the growing literature on nowcasting, much needs to be discussed and perfected regarding the models or variable choices themselves. For instance, one source of data that has been widely popular amongst researchers is user internet search activity, such as Google Trends data (
Askitas and Zimmermann 2009;
D’Amuri 2009;
Choi and Varian 2009b;
Chadwick and Sengul 2012;
Barreira et al. 2013;
Smith 2016;
Naccarato et al. 2017;
Nymand-Andersen and Pantelidis 2018;
Nagao et al. 2019;
Anttonen 2018;
Dilmaghani 2019;
Borup et al. 2021;
Mulero and García-Hiernaux 2021). According to
Ettredge et al. (
2005), the usefulness of such data is that through search activity, citizens provide useful information about their intentions, needs, wants, and interests. In many cases, people who lose their jobs use Google to help them find unemployment benefit agencies and new job openings, as indicated by the fact that 93.63% of all mobile searches are performed through Google (
Netmarketshare 2022). As a result, authors such as
Mulero and García-Hiernaux (
2021) and
Naccarato et al. (
2017) claim that by using Google data, an increase in the predictive accuracy of 10–25% or a reduction in forecasting errors was achieved. Another advantage of using Google Trends is that it provides various frequencies, which benefits the nowcasting procedure.
On the other hand, authors, including
Nagao et al. (
2019) and
Barreira et al. (
2013), produced mixed results when forecasting unemployment. Google’s trend data did not increase accuracy in all countries, and specific preconditions were required for search activity data to work. As
Smith (
2016) puts it, the challenge with search data is that one must determine which keywords are relevant and how many keywords should be incorporated to best mirror a jobless or soon-to-be-jobless person’s search query that later translates to unemployment numbers. In the case of
Nagao et al. (
2019), only two keywords were used; by expanding the keyword dictionary, the results might have changed substantially. Thus, despite some success stories, the question of how to correctly mine the data from Google Trends is a real challenge that requires considerable future improvements.
Therefore, the aim of this paper is to investigate a multidimensional approach of Google search query data in nowcasting USA initial jobless claim numbers using LSTM neural networks. To our knowledge, this is the first paper that expands Google keyword dimensions to include psychological factors, gambling, and other activities to nowcast initial jobless claims numbers using neural networks. In addition, this paper tests different feature extraction strategies, offering future researchers recommendations regarding what to focus on when conducting keyword selection to avoid common pitfalls. The findings of this study can help central banks and other government institutions expand their variable horizons to achieve better forecast accuracy. The analysis period also covers the COVID-19 timeline.
2. Forecasting Unemployment Using Google Trends
One of the first papers that considered Google Trends for unemployment nowcasting was written by
Choi and Varian (
2009a). The authors attempted to nowcast initial jobless claims for unemployment benefits by using the keywords “Jobs” and “Welfare & Unemployment”. The results revealed that the out-of-sample mean absolute error was reduced by 15.74 and 12.90% for the long- and short-term models, respectively. In the same year, followed papers by
D’Amuri (
2009) and
Askitas and Zimmermann (
2009). The work of
D’Amuri (
2009) limited itself to only one keyword “jobs” whereas
Askitas and Zimmermann (
2009) used four different categories of keywords: “unemployment office or agency”, “unemployment rate”, “Personnel Consultant”, and keywords that relate to the most popular job search engines in Germany. Both authors concluded that Google Trends helped improve forecasting errors.
As depicted in
Table 1,
Chadwick and Sengul (
2012) nowcasted Türkiye’s unemployment rate using keywords that related to “looking for a job”, “job announcements”, “CV”, and “career” and provided supportive evidence for using Google Trends.
Fondeur and Karamé (
2013) achieved success in forecasting French youth unemployment with only one keyword “job”. The RMSE was improved on average by 13.6%. Several recent papers are still using a limited range of keywords. For instance,
Aaronson et al. (
2022),
Maas (
2019), and
Larson and Sinclair (
2021) employed a single search term (“unemployment”),
D’Amuri and Marcucci (
2017) used “jobs”, whereas
Simionescu (
2020),
Simionescua and Cifuentes-Faura (
2021) used two terms “unemployment” and “job offers”. Despite the use of single or double keywords, Google Trends improved accuracy.
Table 1. A summary of Google keywords used for unemployment forecasts by previous authors.
On the other hand, some researchers claim to have achieved mixed results.
Pavlicek and Kristoufek (
2015) used the keywords “work” or “jobs” for Visegrad countries in multiple languages. The authors concluded that for Hungary and Slovakia, user search trails can be integrated easily; however, for Poland and Slovakia, the results were not consistent. Similar findings were achieved by
Nagao et al. (
2019), who limited themselves to only two terms: “jobs” and “job offer”. The authors found no consistency regarding improved accuracy, particularly when considering the long-term nowcasts, whereas
Barreira et al. (
2013), using “unemployment” and “unemployment benefits”, found an improvement in three out of the four countries analyzed.
Unfortunately, few studies have attempted to incorporate a higher query volume. Among these papers is one by
Schiavoni et al. (
2021), who studied the Netherlands’ job market. The authors included 85 keyword search terms but limited the query to words that strongly relate to the job process (e.g., CV, cover letter, job vacancies). A slightly different approach was adopted by
Caperna et al. (
2020) and
Yi et al. (
2021), who suggested not only examining the “unemployment” keyword but also checking the most-searched terms and, when filtering them, determining which related to work and jobs.
Yi et al. (
2021) obtained 25 keywords for the USA region and successfully integrated them into the forecasting model. In the case of
Caperna et al. (
2020), the search term queries were extremely varied; for example, for Estonia, 3 keywords were chosen and 178 for Italy, whereas for some countries, such as Luxemburg or Malta, none were found. The results of Google Trends benefits for
Caperna et al. (
2020) also varied, and further methods were required to find significant relationships.
3. Additional Keyword Opportunities
The analysis of the existing studies reveals that the authors limited themselves to one or two basic keywords, and the expansion of keywords was highly constrained to the job application procedure. This approach led some researchers to conclude that Google Trends offered no additional benefit to nowcasting models or that trends are inconsistent over longer periods. However, the pitfalls or further improvement of the models could be partially attributed to Google Trends keyword mining process. By only considering one keyword for forecasting, the authors cast aside the big picture and the nuances of behavioral economics that are profoundly related to unemployment numbers. For instance,
Liu et al. (
2021) and
Wanberg et al. (
2019) found strong links between unemployment and mental health. In times of downturn, the fear of job loss can trigger mental distress, particularly when loss of employment benefits is taken into account. The study by
Breuer (
2015) argued that unemployment increases the risk of suicide, while
Butterworth et al. (
2012) found that poor mental health leads to a longer duration of unemployment for both men and women. Furthermore, studies by
Sotis (
2021) and
Askitas (
2015) found that when labor market conditions deteriorate, searches for health symptoms proliferate; therefore, a dimension of mental health keywords (MHKs) could provide substantial benefits to model precision. Unfortunately, to our knowledge, no study is available for nowcasting unemployment that includes MHKs.
Furthermore, a contentious issue among labor economists has been to analyze the trade-offs between leisure and labor. More importantly,
Havitz et al. (
2004) and
Goodman et al. (
2016) claim that unemployed people spend more time in leisure to mitigate the stress of joblessness. In the Internet age, leisure time is frequently spent consuming free entertainment, such as YouTube, online video games, and downloading movie torrents via sites such as “The Pirate Bay” (
Lehdonvirta 2013). A highly regarded study in this area was undertaken by
Dilmaghani (
2019) that included torrent websites as a proxy for leisure activities and managed to nowcast unemployment numbers more accurately. However, leisure-related keywords still have more to offer. For instance,
Frangos et al. (
2011) discovered that people involved in unemployment programs were more likely to develop a problematic pornography habit. This is because, according to
Uzieblo and Prescott (
2020), pornography may act as a stress reliever in times of high anxiety. Similarly, some kind of association between unemployment and gambling persists.
Khanthavit (
2021) found that unemployment and gambling have a circular relationship, while
Pallesen et al. (
2021) and
Mallorquí-Bagué et al. (
2017) determined that internet gaming disorder was more prominent among single unemployed persons. Thus, an increase in online gaming, gambling platforms, and sites, such as “OnlyFans”, search activity could help to improve Google Trends mining process.
Another negative behavior associated with unemployment is domestic abuse (
Anderberg et al. 2016;
Bhalotra et al. 2021). According to
Anderberg et al. (
2016), women who are at risk of job loss become more economically reliant on their partners. This reliance can then lead male partners who have a predisposition to violence to reveal their abusive tendencies. Thus, high female unemployment leads to an elevated risk of intimate partner violence. As a way out, people may use Google searches to find public shelter. A study by
Berniell and Facchini (
2021) reported a high Google search intensity in 11 countries for keywords such as “domestic abuse” or “domestic violence hotline” during the COVID-19 lockdowns. The strong correlation between unemployment and domestic abuse might also assist nowcasting.
Many other important keywords should also be incorporated into the nowcasting model. Some of these include “gig economy”, “recipes”, “hobbies”, “Netflix”, “payday loans”, lottery tickets”, “alcohol”, “layoffs”, “auto insurance”, “hardship letter”, or words related to comfort food (
Dávalos et al. 2011;
Askitas 2015;
Agarwal et al. 2016;
Chadi and Hetschko 2017;
Huang et al. 2018;
Gabrielyan and Just 2020;
McKinsey & Company 2020;
Gamze and Aydan 2021).
4. Nowcasting and Machine Learning
Previous studies used a wide range of methods to nowcast unemployment numbers.
Mulero and García-Hiernaux (
2021),
Larson and Sinclair (
2021),
Simionescu (
2020),
Caperna et al. (
2020),
Nagao et al. (
2019),
Dilmaghani (
2019),
D’Amuri and Marcucci (
2017),
Pavlicek and Kristoufek (
2015),
Chadwick and Sengul (
2012) and
Choi and Varian (
2009a) used some form of autoregressive models.
Yi et al. (
2021) suggested a PRISM approach, and
Schiavoni et al. (
2021) incorporated a well-known model of DFM, whereas
Maas (
2019) attempted the MIDAS model. Although each model has its own strengths and weaknesses, the recent developments in neural networks deserve deeper exploration. As stated by
Brownlee (
2018) and
Bruneckiene et al. (
2021), neural networks may provide a solution for capturing the non-linear patterns in the data that can be difficult to detect using conventional econometrical methods as well as offering the ability to use a higher volume of data. Unfortunately, few studies have attempted to forecast unemployment numbers using Google Trends in conjunction with LSTMs.
Fenga and Son-Turan (
2022) used a feed-forward neural network for counterfactual predictions, whereas
Singhania and Kundu (
2020) deployed the LSTM model for monthly data. According to
Singhania and Kundu (
2020), the authors attempted, at first, to use the VAR model but were soon faced with a major drawback as the VAR model can only be used to capture the relationship between search trends of a limited number of keywords due to growth in the complexity and parameters. As such, the authors attempted to use LSTMs as an alternative. In the end, the LSTMs significantly outperformed the VAR model but were not used for nowcasting. The latter results could be a good indication to use the LSTM model in this paper as it can help to deal with a large number of keywords in an efficient manner.