Firstly, the data of all power stations need to be transmitted to the company’s PV intelligent operation and maintenance cloud platform through 5G
、, ZigB
ee等无线通信方式传输到公司的光伏智能运维云平台。然后,通过相似性分析方法分析各电站在地理、设备、气候等方面的相似性,得到满足虚拟采集前提的电站集。此外,通过聚类或智能优化算法选择整个光伏系统中最好的RPS来部署传感设备,从而准确估计整个光伏系统的运行数据。ee, and other wireless communication means. Then, the similarity between each power station in terms of geography, equipment, and climate is analyzed by the similarity analysis method to obtain the set of power stations that meet the prerequisites for virtual collection. Further, the best RPSs in the whole PV system are selected through clustering or intelligent optimization algorithms to deploy the sensing equipment so as to accurately estimate the operation data of the whole PV system.
3. Process and Challenges of DPVS虚拟采集的过程与挑战 Virtual Collection
The virtual collection of DPVS data is a new field that few scholars have studied. Therefore, in this section, to facilitate understanding, we compare and analyze the similarities and differences between the steps of virtual collection and other methods. Focusing on their similarities, we give directions for DPVS virtual collection research, and then focusing on their differences, we summarize the challenges faced by DPVS virtual collection. It is worth noting that this approach to elaboration is novel for the review literature and can help the reader understand the connections and differences clearly between virtual collection and other studies.
3.1. 区域Similarity Analysis of Regional DPVS的相似性分析
光伏数据与外部条件以及安装地点的地理位置等因素密切相关。环境因素对虚拟馆藏模型的准确性有重大影响。因此,实现虚拟采集的前提之一是要采集的站点和PV data is most closely correlated with external conditions, and factors such as the geographic location of the installation site. Environmental factors have a significant impact on the accuracy of the virtual collection model. Therefore, one of the prerequisites assumed for the realization of virtual collection is that the station to be collected and the RPS
具有相似的外部因素。从数据的角度来看,我们希望数据集尽可能遵循相似的分布模式,从而为监督学习提供更高质量的输入数据。这种相似性可以使虚拟馆藏更加健壮,并确保虚拟馆藏数据即使在天气变化的情况下也能准确无误。 have similar external factors. From the data point of view, we want the data set to obey similar distribution patterns as much as possible, thus providing higher-quality input data for supervised learning. This similarity can make the virtual collection more robust and ensure the virtual collection data’s accuracy even in weather changes.
举例来说,中国湖北省西南部的巴东县和江苏省南京市的江宁区,由于地形、地形、气象和其他条件的不同,产生的电力数据差异很大。图To illustrate, Badong County in southwestern Hubei Province, China, and Jiangning District in Nanjing City, Jiangsu Province, produce widely different power data due to different terrain, topography, meteorology and other conditions. Figure 4显示了典型夏日巴东县和江宁区光伏电站的功率输出。对于江宁区的 shows the power output of PV stations in Badong County and Jiangning District on a typical summer day. For the DPVS
来说,使用巴东区的DPVS运行数据进行数据推断,由于两者之间的相似度极低,会严重降低虚拟采集的准确性。因此,有必要通过相似性分析提前定义满足相似性要求的光伏电站集群。 of Jiangning district, using the DPVS operation data of Badong district for data inference would seriously reduce the accuracy of the virtual collection because of the extremely low similarity between them. Therefore, it is necessary to define clusters of PV stations that satisfy the similarity requirement by similarity analysis in advance.
Figure 4. Badong county and Jiangning district DPVS output on a typical day.
Many factors affecting the PV output state are coupled with each other [
6]. The main factors influencing the solar energy conversion process are shown in
Figure 5. It can be seen that the degree of solar irradiance received by the PV module is significantly influenced by the geographical location and meteorological conditions. The climate is the comprehensive pattern in the general state of the atmosphere and weather processes in a certain area on a long timescale, which is an important factor affecting the level of light resources, and meteorology refers to the physical phenomena of the atmosphere on a short time scale, such as temperature, clouds, etc. Secondly, the link of solar irradiance to power for conversion is closely related to the selection of equipment, the design of the station, and electrical efficiency. After the series of the energy conversion process mentioned above, the final PV power output is obtained. Therefore, similarity analysis can be performed from two perspectives: influencing factors (causes) and power output trends (results). However, from the perspective of influencing factors, it is difficult to analyze the similarity due to the large number of factors affecting PV output, the significant difference between the dimensions, and the complex types of characteristics. From the perspective of the PV output trend, the trend changes are complicated, and the time scale is long, which makes it challenging to analyze the trend characteristics.
图Figure 5.影响 Factors affecting DPVS功率输出的因素。 power output.
数据相似性的重要性也反映在许多研究领域。在参考文献The importance of data similarity is also reflected in many areas of research. In Ref. [
7]
中,提出了一种基于, an anomaly identification and reconstruction model based on curve similarity analysis with a BP
神经网络曲线相似性分析的异常识别和重建模型,用于检测异常并补偿缺失的PV历史数据。与虚拟采集类似,该方法也需要相邻光伏电站的电源。考虑到光伏发电的周期性,参考文献 neural network is proposed for detecting anomalous and compensating missing PV historical data. Similar to the virtual collection, the method also requires the power of neighboring PV stations. Considering the periodicity of PV power, Ref. [
8]
提出了一种基于近似周期时间序列的数据清洗方法,有效地提高了光伏数据的质量。考虑到由于天气条件的变化而导致光伏发电的不确定性,参考文献 proposes a data cleaning method based on approximate periodic time series, effectively improving the quality of PV data. Considering the uncertainty of PV power generation due to the variation in weather conditions, Ref. [
9]
提出了一个结合类似日期选择技术的预测框架。在此框架中,作者首先筛选出能够准确捕捉不同日期相似度的外部变量,并根据这些外部变量选择相似度较高的日期进行历史日期和待预测日期,从而提高预测精度。 proposes a prediction framework combining similar day selection techniques. In this framework, the authors first screen external variables that can accurately capture the similarity between different days and select dates with higher similarity based on these external variables for the historical day and the day to be predicted, thus improving the prediction accuracy. Although the research methods of the above studies are different, they all desire to obtain higher-quality data.
3.2. 虚拟集合的 RPS 选择Selection for Virtual Collection
选择Selecting the RPS
是虚拟收集过程中最关键的一步。RPS的实时功率数据将作为多维特征输入计算智能算法,以估计所有区域DPV的输出。我们的目标是在区域DPVS中选择DPVS的子集,这些DPVS可以更准确地估计其他台站的数据。此步骤的关键是确定收集最重要数据的选择性传感器位置,以监控所有DPV的状态。因此,我们将从输入数据和设备放置的角度分析RPS选择步骤与其他方法之间的差异和关联。s is the most crucial step in the virtual collection process. The RPSs’ real-time power data will be input into the computational intelligence algorithm as multidimensional features to estimate the output of all regional DPVS. We aim to select the subset of DPVS among the regional DPVS that can estimate the data of other stations with higher accuracy. The key to this step is to identify selective sensor locations where the most important data is collected to monitor the status of all DPVS. Therefore, we will analyze the differences and associations between the RPS selection step and other methods from the perspectives of input data and equipment placement.
如图As shown in Figure 6所示,从输入数据的角度来看,, from the perspective of input data, the selection of RPS
的选择可以近似为机器学习的特征选择问题。两者都旨在通过选择 RPS(特征)尽可能提高结果的准确性。因此,虽然关于参考电站选择的研究很少,但相对成熟的特征选择理论也可以为我们提供启示。对于数据挖掘技术,输入数据的特征质量严重影响模型的性能,因此许多学者都研究了特征选择问题。参考文献s can be approximated as the feature selection problem of machine learning. Both aim to improve the accuracy of the results as much as possible by selecting RPSs (features). Therefore, although there are few studies on the selection of reference power stations, the relatively mature feature selection theory can also provide us with inspiration. For data mining techniques, the feature quality of input data seriously affects the model’s performance, so many scholars have researched the feature selection problem. Ref. [
10]
从单个稀疏特征选择和组稀疏特征选择的角度系统地研究了现有的稀疏学习模型的特征选择。它分析了各种稀疏学习模型之间的差异和联系。参考文献 systematically examined the existing sparse learning models for feature selection from the perspective of individual sparse feature selection and group sparse feature selection. It analyzed the differences and connections among various sparse learning models. Ref. [
11]
提出了一种新的增量特征选择,使该方法对动态排序的数据具有鲁棒性。参考文献 proposes a new incremental feature selection that makes the method robust to dynamically ordered data. Ref. [
12]
提出了一种蚱蜢优化算法,该算法可以通过从大量原始特征中选择可以更好地表征数据属性的特征子集来解决二元优化问题,从而提高分类精度。上述研究对特征选择问题提出了有效的处理方法,可为选择 proposes a grasshopper optimization algorithm that can solve the binary optimization problem by selecting a subset of features that can better characterize the data attributes from a large set of original features, thus improving the classification accuracy. The above studies proposed effective processing for the feature selection problem, which can provide some theoretical reference for selecting RPS
提供一定的理论参考,例如将RPS选择转化为组合优化问题。但值得注意的是,如果选择某电站作为RPS,则将其用作输入要素,其余电站则用作要采集的电站。可以看出,虚拟集合的RPS选择问题类似于高维特征选择问题s, such as transforming the RPS selection into a combinatorial optimization problem. However, it is worth noting that if a power station is selected as the RPS, it is used as the input feature, and the remaining power stations are used as the power stations to be collected. It can be seen that the RPS selection problem for virtual collection is similar to the high-dimensional feature selection problem [
13]
,但与传统回归和分类 yet different from the feature selection in the traditional regression and classification [
14]
问题中的特征选择不同。因此,选择合理的 problems. Therefore, choosing reasonable RPS
比选择功能更具挑战性。s is more challenging than feature selection.
图Figure 6. RPS 选择和功能选择过程。 selection and feature selection process.
从传感器放置的角度来看,From a sensor placement perspective, the selection of RPS
的选择也可以受到智能电表中数据聚合点(DAP)选择问题的启发。如图s can also be inspired by the data aggregation point (DAP) selection problem in smart meters. As shown in Figure 7所示,, both DAP
选择和RPS选择都可以看作是系统中传输节点的优化配置。选择DAP是为了降低数据冗余和带宽需求,将数据本地聚合在传感器或中间节点,形成高质量的信息,降低发送到基站的数据包质量,从而节省能源和带宽。参考文献 selection and RPS selection can be regarded as the optimal configuration of transmission nodes in the system. DAPs are selected to reduce data redundancy and bandwidth requirements by aggregating data locally at the sensor or intermediate nodes to form high-quality information and reduce the quality of packets sent to the base station, thus saving energy and bandwidth. Ref. [
15]
将treats DAP
放置视为混合整数规划问题,并提出了一种新的启发式算法,以最小化安装、传输和延迟成本,以选择最佳 DAP 放置位置。参考文献placement as a mixed integer programming problem and proposes a new heuristic algorithm to minimize installation, transmission, and delay costs to select the optimal DAP placement location. Ref. [
16]
提出了一种改进的 proposes an improved k
均值聚类算法来分配DAP,从而大大减少了安装的DAP数量。-means clustering algorithm to assign DAPs, significantly reducing the number of DAPs installed.
图Figure 7. RPS 选择和 DAP 选择过程。selection and DAP selection process.
尽管Although there are certain commonalities between the selection of DAP
和RPS的选择存在一定的共性,但仍有许多挑战需要研究。获取数据聚合点的目的是确定所有SM布局点中传输和延迟成本最低的,以实现整个系统数据的聚合和传输。通过在区域光伏系统中选择一个光伏子集来选择RPS,以实现整个系统的数据估计。因此,在选择RPS时考虑的要素更加多样化。除了通信和设备成本外,还需要考虑不同RPS集对整个系统的数据估计精度,以及当前研究所缺乏的时空耦合特性。s and RPSs, there are still many challenges that need to be studied. Data aggregation points are obtained with the objective of determining the lowest transmission and delay cost among all SM layout points to achieve aggregation and transmission of data for the whole system. The RPS is selected by selecting a subset of PVs among the regional PV systems to achieve a data estimation of the whole system. Therefore, the elements considered in the selection of RPSs are more diversified. In addition to communication and equipment costs, the accuracy of data estimation for the whole system from different RPS sets needs to be considered, as well as the time and space coupling characteristics, which are lacking in the current study.
3.3. 区域Data Inference for Regional DPVS 的数据推断
虚拟采集技术的最后一步是通过人工智能算法推断整个The final step of the virtual collection technique is to infer the operational data of the whole DPVS
的运行数据。此步骤通过在RPS和要在区域中收集的电站之间构建计算智能模型,使用第二步中选择的RPS数据作为输入,映射RPS与整个系统之间的关系。此步骤类似于PV预测技术中使用的方法,两者都需要一定的历史数据作为驱动因素来获得未知的PV输出功率。 through an artificial intelligence algorithm. This step maps the relationship between the RPS and the whole system by building a computational intelligence model between the RPSs and the power stations to be collected in the region, using the data from the RPSs selected in the second step as the input. This step is similar to the method used in PV prediction techniques, both of which require certain historical data as a driver to obtain the unknown PV output power.
业界对There is relatively little research in the industry on DPVS
虚拟数据推断的研究相对较少,大多数研究仅关注光伏发电量预测,利用历史数据、实时天气等环境信息来预测光伏发电量。值得庆幸的是,目前的DPVS功率预测算法相对成熟,可以为虚拟DPVS数据采集提供一些理论参考。但值得注意的是,虚拟采集中的数据推断在模型构建和使用上与传统的PV预测存在差异。虚拟数据收集通过数据推理模型实时估计当前的光伏功率输出,而光伏预测器估计未来的功率输出。虚拟采集模型的输入是来自 RPS 的实时 PV 数据,PV 预测器的输入是历史运行数据和环境信息。这种实时性使得虚拟采集的数据推理模型必须比PV预测模型具有更好的鲁棒性和更高的精度要求。
4. DPVS虚拟采集方法
上一节介绍了虚拟采集的具体实施步骤及其目的,并指出了为上述步骤面临的挑战提供解决方案的迫切需要。因此,本节通过总结适用于DPVS相似性分析、RPS选择和DPVS数据推理的方法,为虚拟采集技术的发展提供理论支持。图 8 总结了 DPVS 虚拟采集的各种方法。
![](/media/common/202212/mceclip0-63906c551aa7e.png)
图8.虚拟收集方法摘要。
5. 虚拟采集技术的应用场景
随着DPVS的规模扩大,DPVS的应用场景越来越复杂多变。运维信息的获取往往存在数据采集不完整、传输堵塞、采集传输成本高等问题。因此,为引起更多学者对虚拟馆藏实际应用价值的关注,本文创新性地总结了基于多源信息的虚拟馆藏的多种应用场景,包括但不限于以下内容:
- DPVS操作数据异常检测。
- DPVS 故障诊断。
- DPV缺少数据恢复。
- DPVS实时操作数据收集。
图9总结了四种应用场景以及DPVS虚拟采集技术的意义。
![](/media/common/202212/mceclip1-63906d5664133.png)
图9.DPVS虚拟采集技术的应用场景。
virtual data inference, with most studies focusing only on PV power prediction, using historical data, real-time weather, and other environmental information to predict PV power output. Thankfully, the current DPVS power prediction algorithms are relatively mature and can provide some theoretical references for virtual DPVS data collection. However, it is worth noting that data inference in virtual collection differs from traditional PV prediction in model construction and use. Virtual data collection estimates the current PV power output in real time through a data inference model, whereas the PV predictor estimates the future power output. The input to the virtual collection model is real-time PV data from the RPSs, and the input to the PV predictor is historical operational data and environmental information. This real-time nature makes it necessary that the data inference model for virtual collection has better robustness and higher accuracy requirements than that for PV prediction.