Topic Review
Extreme values statistic
One of the pillars of experimental sciences is sampling. Based on analysis conducted on samples the estimations for the populations are made. The distributions are split in two main groups: continuous and discrete and the present study applies for the continuous ones. One of the challenges of the sampling is the accuracy of it, or, in other words how representative is the sample for the population from which was drawn. Another challenge, connected with this one, is the presence of the outliers - observations wrongly collected, not actually belonging to the population subjected to study. The present study proposes a statistic (and a test) intended to be used for any continuous distribution to detect the outliers, by constructing the confidence interval for the extreme value in the sample, at certain (preselected) risk of being in error, and depending on the sample size. The proposed statistic is operational for known distributions (having known their probability density function) and is dependent too on the statistical parameters of the population.
  • 1.1K
  • 29 Oct 2020
Topic Review
Bayesian Analysis in Social Sciences
Given the reproducibility crisis (or replication crisis), more psychologists and social-cultural scientists are getting involved with Bayesian inference. Therefore, the current article provides a brief overview of programs (or software) and steps to conduct Bayesian data analysis in social sciences. 
  • 1.1K
  • 23 Jul 2021
Topic Review
Application of Biological Domain Knowledge
Integrative approaches that utilize the biological knowledge while performing feature selection are necessary for this kind of data. The main idea behind the integrative gene selection process is to generate a ranked list of genes considering both the statistical metrics that are applied to the gene expression data, and the biological background information which is provided as external datasets. 
  • 1.0K
  • 19 Feb 2021
Topic Review
Diffusion of Solar PV Energy
Solar photovoltaic energy (solar PV) is considered a very attractive solution among renewable energy sources (RES), especially for households. According to the most recent IEA report on renewables, the growth of renewable power capacity at the world level has reached another record in 2021, driven by solar photovoltaic energy; solar PV alone has accounted for more than half of all renewable power expansion in 2021, followed by wind and hydropower.
  • 977
  • 28 Apr 2022
Topic Review
Statistical Methods for Food Composition Database Analysis
A food composition database (FCDB) or nutrient database is a compilation of the chemical composition of food and beverage items, obtained from chemical analyses, estimations from published literature, or unpublished laboratory reports. A summary of the statistical methods that have been directly applied to food composition databases and datasets is described here.
  • 956
  • 08 Jun 2022
Topic Review
Sports Analytics
Sports analytics are a collection of relevant, historical, statistics that can provide a competitive advantage to a team or individual. Through the collection and analyzation of these data, sports analytics inform players, coaches and other staff in order to facilitate decision making both during and prior to sporting events. The term "sports analytics" was popularized in mainstream sports culture following the release of the 2011 film, Moneyball, in which Oakland Athletics General Manager Billy Beane (played by Brad Pitt) relies heavily on the use of analytics to build a competitive team on a minimal budget. There are two key aspects of sports analytics — on-field and off-field analytics. On-field analytics deals with improving the on-field performance of teams and players, including questions such as "which player on the Red Sox contributed most to the team's offense?" or "who is the best wing player in the NBA?", etc. Off-field analytics deals with the business side of sports. Off-field analytics focuses on helping a sport organization or body surface patterns and insights through data that would help increase ticket and merchandise sales, improve fan engagement, etc. Off-field analytics essentially uses data to help rightsholders take decisions that would lead to higher growth and increased profitability. As technology has advanced over the last number of years data collection has become more in-depth and can be conducted with relative ease. Advancements in data collection have allowed for sports analytics to grow as well, leading to the development of advanced statistics and machine learning, as well as sport specific technologies that allow for things like game simulations to be conducted by teams prior to play, improve fan acquisition and marketing strategies, and even understand the impact of sponsorship on each team as well as its fans. Another significant impact sports analytics have had on professional sports is in relation to sport gambling. In depth sports analytics have taken sports gambling to new levels, whether it be fantasy sports leagues or nightly wagers, bettors now have more information at their disposal to help aid decision making. A number of companies and webpages have been developed to help provide fans with up to the minute information for their betting needs.
  • 893
  • 14 Nov 2022
Topic Review
Bayesian Nonlinear Mixed Effects Models
Nonlinear mixed effects models have become a standard platform for analysis when data is in the form of continuous and repeated measurements of subjects from a population of interest, while temporal profiles of subjects commonly follow a nonlinear tendency. While frequentist analysis of nonlinear mixed effects models has a long history, Bayesian analysis of the models has received comparatively little attention until the late 1980s, primarily due to the time-consuming nature of Bayesian computation. Since the early 1990s, Bayesian approaches for the models began to emerge to leverage rapid developments in computing power, and have recently received significant attention due to (1) superiority to quantify the uncertainty of parameter estimation; (2) utility to incorporate prior knowledge into the models; and (3) flexibility to match exactly the increasing complexity of scientific research arising from diverse industrial and academic fields. 
  • 888
  • 23 Mar 2022
Topic Review
Partial Area Under the ROC Curve (PAUC)
The Partial Area Under the ROC Curve (pAUC) is a metric for the performance of binary classifier. It is computed based on the receiver operating characteristic (ROC) curve that illustrates the diagnostic ability of a given binary classifier system as its discrimination threshold is varied. The ROC curve is created by plotting the true positive rate (TPR) against the false positive rate (FPR) at various threshold settings.The area under the ROC curve (AUC) is often used to summarize in a single number the diagnostic ability of the classifier. The AUC is simply defined as the area of the ROC space that lies below the ROC curve. However, in the ROC space there are regions where the values of FPR or TPR are unacceptable or not viable in practice. For instance, the region where FPR is greater than 0.8 involves that more than 80% of negative subjects are incorrectly classified as positives: this is unacceptable in many real cases. As a consequence, the AUC computed in the entire ROC space (i.e., with both FPR and TPR ranging from 0 to 1) can provide misleading indications. To overcome this limitation of AUC, it was proposed to compute the area under the ROC curve in the area of the ROC space that corresponds to interesting (i.e., practically viable or acceptable) values of FPR and TPR.
  • 783
  • 21 Oct 2022
Topic Review
Homoscedasticity
In statistics, a sequence (or a vector) of random variables is homoscedastic/ˌhoʊmoʊskəˈdæstɪk/ if all its random variables have the same finite variance. This is also known as homogeneity of variance. The complementary notion is called heteroscedasticity. The spellings homoskedasticity and heteroskedasticity are also frequently used. Assuming a variable is homoscedastic when in reality it is heteroscedastic (/ˌhɛtəroʊskəˈdæstɪk/) results in unbiased but inefficient point estimates and in biased estimates of standard errors, and may result in overestimating the goodness of fit as measured by the Pearson coefficient.
  • 782
  • 31 Oct 2022
Topic Review
Aesthetical Evaluation with Stochastic Analysis
Stochastic calculus is used for the objective evaluation of the variability present in aesthetic attributes of paintings and landscapes.
  • 729
  • 24 Nov 2020
  • Page
  • of
  • 4