Data-Driven Model for Synthetic Data of Engineering Systems

While designing an intelligent engineering system with mechatronic components and software, engineers have many options. They can use

The first two tools are very similar. They use engineering relationships established through physics and solve those relationships through approximate or closed-form techniques. The third method is purely based on data. The first two methods can produce excellent results when the approximations of the physics modelling are properly taken care of by any statistical measures. Still, they become increasingly complicated for complicated systems and systems of systems and require intense domain knowledge. Even when all those are available, the behaviours of the systems observed through measurable variables tend to drift from predicted behaviours for various approximations. These pose challenges for AI specialists and cyber-physical system engineers trying to develop a smart system. Synthetic data, if available, is a perfect solution for their use in complicated engineering systems.

Unfortunately, there is a real scarcity of data in today’s industry. In my conversations with many senior engineering managers of global OEMS, I found that there is not enough data on dynamic systems even within their manufacturing organizations. The situation becomes even bleaker when it comes to data for system behaviors in the end values of the system parameters which are popularly referred to as “ corner cases” and “impending failures” by the software engineers.

While designing an engineering system with hierarchical control – be it IoT a DCS or SCADA engineering systems, software engineers need synthetic data for simulations and building the right AI systems. This is where the corner cases become important. We have seen how synthetic data is leveraged by the autonomous car and drone industry. A similar effort is necessary for the rest of the engineering systems if autonomous and semi-autonomous behaviors need to be built to realize the dream of Industry 4.0.

Most of engineering systems spew time series through their observable and measurable I/O variables. Though a lot of research effort is spent on images, languages, and communication networks, there is a relative dearth of research on predicting synthetic data for time series of engineering machines with multi-dimensional I/Os – which are essentially all the machines and systems used in engineering space. The diagram below shows two states of a very simple machine to help further discussions.

Where
X (t) and Y(t) are the input and output parameters which are time series and change with time. It could be velocity and force on the system.

(P1, P2), (P1’, P2’) are system parameters ( in simple terms those could be mass, stiffness, damping coefficients, etc which may change with time.)

Creating a synthetic data model that can predict the multi-dimensional time-dependent I/O variables for different system parameters, and their transitions is equivalent to creating a dynamic model of the system. That is exactly what is required to simulate the corner cases and understand state transitions.

The model of the above type can be trained from the existing time series data collected experimentally from the same class of machines. These sets of time series data labeled by each unique parameter value, also called condition/regressions variables are the main source of creating an alternate dynamic model of the system. Such a model can be used for

Not many tools are available for software engineers that can handle conditional time series. And when the conditioned variables as continuous, which is true for most of engineering systems, the work done is simply not sufficient. VAR and LSTM, which can be used for predicting multi-variate time series can be also used to synthesize time series but their usage is limited to simple cases. A high level of success in real-life engineering time series synthesis is achieved by GAN-based algorithms that use LSTM for discriminators.

RCGAN and TSGAN are two good algorithms that can be used for synthetic time series generation. DoppelGANger (DG) which is an improvement over the previous algorithms needs a special discussion in this context. Though condition variables were also handled in any CGAN algorithms proposed but DG treats condition variables and I/O data by two interdependent discriminators. The mode collapse problem which happens because of the higher dimensionality of optimization is solved by min-max values of time series data. In a very crude sense, DG uses a two-step process of synthesizing individual time series for a condition and then regressing over those conditions to establish the relationship between their time series data. This is very intuitive from a dynamic system POV, especially when the relationships are linear or near linear.

Many of the algorithms today are not very well ventured and their generalization properties are not well researched. With the growth of AI in engineering domains – Industrial, construction, mining, O&G synthetic data models will be researched further and it will also improve the AI deployment in the connected and autonomous machines.

References

- Synthetic Datasets for Autonomous Driving: A survey by Zhihang Song, Zimin He, Xingyu Li, Qiming Ma, Ruibo Ming, Zhiqi Mao, Huaxin Pei, Lihui Peng, Jianming Hu, Danya Yao, and Yi Zhang.
- Using GANs for Sharing Networked Time Series Data: Challenges, initial promise, and open questions. IMC ’20, October 27–29, 2020, Virtual Event, USA. Authors: Zinan Lin, Alankar Jain, Chen Wang, Giulia Fanti, and Vyas Sekar.
- The Time Series Generative Adversarial Network (TSGAN) was proposed by Kaleb E. Smith and Anthony O. Smith in the paper titled “Conditional GAN for Time Series Generation”, published on arXiv in June 2020.

Data-Driven Model for Synthetic Data of Engineering Systems

Quick Links