Introduction
Digitalization has become a pivotal factor in the optimization and advancement of photovoltaic (PV) systems. Having an IIoT/data platform in the photovoltaic (PV) industry can significantly enhance efficiency, reliability, and performance of solar power systems.
If you have questions or want to contribute, please join the discussion on the forum.
Use-cases
- Real-Time Monitoring: sensors and devices can be installed on solar panels and other components of a PV system to continuously collect data.
- Performance Optimization: data can help in optimizing the performance of PV systems (Identifying inefficiencies or underperforming panels, reduction in downtime, proactive maintenance schedules…)
- Energy Management: IIoT enabled PV systems can be integrated with smart grids and energy management systems (optimize energy storage and distribution, balance supply and demand…)
- Fault Detection and Diagnostics: AI/ML algorithms can detect anomalies and faults in the PV system, such as: panel shading or soiling, electrical faults in panels or inverters…
- Grid Integration: facilitate the integration of PV systems with the grid by enhancing grid stability with real-time data on power generation, supporting grid services like frequency regulation and voltage control…
- Customer Engagement and Reporting: for residential and commercial solar power users, IIoT data can provide detailed insights into energy generation and consumption, user-friendly dashboards and mobile applications for monitoring system performance, alerts and notifications for maintenance needs or performance issues, reports on energy savings and environmental benefits…
Data platform
We provided in the deliverable 1.4 guidance on how to implement a data platform for renewable energy actors that need to perform analytics at scale on their data and share them with different stakeholders.
In deliverable 7.2 (to be published end of 2024), we complemented those recommendations by collecting feedback from SERENDI PV partners based on their own experience with different aspects of data collection, exchange and storage as well as feedback from external stakeholders to the projects collected through a survey.
Cloud vs on-premises to store and process data
Cloud computing is a must-have in any modern data processing and storage platform. They allow fast and easy scalability and access to modern IT stacks at a reasonable cost.
Aspect |
Cloud |
On-premises |
Scalability |
Easily scalable (purchase of more storage/computing) |
Requires procurement and configuration of additional hardware |
Redundancy |
Easily configured with geographical separation. Most cloud providers have data centers in many countries. |
Requires procurement of additional hardware and premises. You might need to have a second location to store data in case of any hazards happening at the first location. |
Supervision |
Tools are available off the shelf. |
Requires to be set-up. It might be cumbersome in case of many tools from many vendors. |
Data security |
Certified security with major solutions, reliant on third-party. |
Dependent on own measures and policies, enables tighter control. |
Data compliance (e.g. GDPR) |
Configurable, but reliant on third party. |
Dependent on own measures and policies but easily demonstrable. |
Connectivity |
Requires internet connection |
Requires connection to the local network or the internet. Ability to continue operations while disconnected from internet. |
Datalake/SQL database
Datalake
IIoT raw data are usually stored locally before being sent to the Cloud (partly or fully depending on the needs/volume) into cheap Cloud storage (such as S3 in AWS, BlobStorage/DataLake in Azure…). This data tends to be large, bringing in large quantities of variables and with a low time periodicity. To have an easy and fast access on this plant data, time-series databases such as TimescaleDB, InfluxDB… can be an option or some cloud services as Azure Times-Series Insights (https://azure.microsoft.com/en-gb/services/time-series-insights/#product-overview) or Amazon Timestream (https://aws.amazon.com/timestream)
SQL database
Cloud/Managed SQL Databases: having SQL database to store results from data matching and analysis from the raw data is very practical. Those are usually already focused on useful data, and their volume is therefore much less than the raw data. The use of databases allows a faster and simpler handling and can be used to set up your own, more controlled acquisition systems, such as an API service. Databases can also be used for all “static” or “almost-static” data.
Data exchange
Data transfer solutions and protocols are needed to enable simple, fast and reliable, exchange of data. The type of transfer depends on several factors including the quantity of data to be transferred, its format, the required security, and speed.
HTTP REST API for small data
HTTP REST API is a very common standard for data exchange with many tools and libraries proposing it. Most Cloud provider provides API endpoints that can scale according to load and can be called efficiently in a parallel mode. Recent API endpoints use standardized JSON data structures in request or response payloads.
SFTP or Cloud storage for large data
In case the data are on-premises, Secure File Transfer Protocol (SFTP) from the server is the recommended options.
Cloud services offer temporary data sharing using a “temporary key” that expires after a given date. It allows to share data for only few hours/days to only a set of people to avoid exposing those data later in case the “key” get stolen.
File format for exchange
There are 2 categories of file format: human and not human readable. Performance wise it is usually better to use not human readable format as there are more compact and are sometimes very well optimized for some use cases (columnar storage, hierarchical data, sparse matrix…).
Human readable
The most common human readable file format are CSV, JSON and XML. From 2010, most people moved from XML to JSON as it is easier to use and have less constraints.
Not human readable
The main advantage of non-human readable format is that the data types are handled properly if the code also uses the proper types. Many mistakes are then avoided regarding type conversion.
There are many not human readable files format targeted to different usages. The two most useful in the energy field:
- Apache PARQUET: recently the PARQUET columnar storage has had a lot of tractions. It is used in the Big Data eco system widely. It is fast to read data and allow parallelism on the data too. More details here: https://en.wikipedia.org/wiki/Apache_Parquet. There is an effort to move forward with the Apache Arrow project: https://en.wikipedia.org/wiki/Apache_Arrow
- HDF / NetCDF: there are older file format, initially designed for super computing and complex type as arrays. They have very good performance. https://en.wikipedia.org/wiki/Hierarchical_Data_Format
Data granularity
Having data periodicity from 5 to 15 minutes is usually enough for most photovoltaic use-cases:
- Actual Energy data (PV production, site consumption) from customers, potentially linked with forecasted/actual irradiation/temperature
- PV model to forecast PV energy for the next hours/days and/or PV simulation data to evaluate a PV project
- Performance analysis: from 15 minutes to longer time ranges (daily, weekly, monthly, quarterly or yearly).
Sometimes, second resolution or lower is needed to be able to take decisions to address use-cases related to system services or energy demand.