Connectivity: The Key to Unlocking Big Data in Rail

Machine Learning (ML) and Artificial Intelligence (AI) are hot topics for Big Data in rail.

The value that these computer sciences offer in digital transformation is widely understood, e.g. operational efficiencies, cost savings, new revenues, and ultimately competitive advantage. However, here lies the problem: “data” – how do you put a value on it, where does it come from, and when is enough data enough?

Data Collection

The Value of Data

At the start of Big Data projects, rail operators need first to define why the data is collected. With a top-level objective in mind, it is easier to understand the value behind the data and, more importantly, the reasons for the need to secure that data.

For example, two rail use cases for Big Data could be:

  1. Enhancing passenger experiences (creating additional revenue)
  2. Predictive maintenance (cost savings)

The above examples highlight general purposes for data collection. Furthermore, they outline for management the justification for the business case of Big Data in rail. With a value now placed on data, rail operator IT teams can apply the computer science of AI or ML.

Is It AI or ML?

From the two example use cases above, it’s possible to outline why and when to use one or the other science or both. In use case 1, rail operators will apply AI to maximise the chances of succeeding in upselling onboard services to customers. Here, the purpose is for the machine to think like a human and deal with unstructured data associated with human thinking.

Take an example AI application: chatbots. The chatbot can obtain information on the passenger journey. The chatbot then interacts with the passenger by suggesting movies or reading lists based on journey time. Over time, the AI refines its suggestions based on interacting with passengers, improving the overall passenger experience. Furthermore, it allows rail operators to refine their offering and reduce expenditure on unwanted content.

Whereas in use case 2, ML will be applied as there is a specific need to learn patterns with accuracy. Take the example of a metro rail car door opening. Here there is an array of structured data to collect. How many times does the door open, how long for, in what locations and under what weather conditions? The learning model ingests the data with an outcome for accurate scheduling of door maintenance. The benefit to the rail operator is operational efficiency by eliminating unnecessary delays due to faulty doors.

Cloud Shared Responsibility Model

With a clear vision around the need and value of the data, the next step is to understand how that data will be collected and stored. Data collection is tricky, as trains consist of multiple closed systems with no interaction between them.

Cloud providers solve some of the challenges of centralising data from disparate systems. However, rail operators need to understand that the security of accessing the cloud and protection of the data in it is their responsibility. Under the Cloud Shared Responsibility Model, cloud providers state that the security and running of the cloud is their role. The implication for the rail operator is that the safety of getting data from the edge into the cloud and being on the cloud remains the rail operator’s problem.

For rail operators, the question remains how do you securely get data out of trains to the cloud and keep it secure?

The Key to Big Data Is Connectivity

The most straightforward approach is to create a secure operational network per train. Its purpose is to aggregate the data from multiple train systems and secure connectivity to where the data is needed, i.e. the cloud or on-premise.

As most Big Data projects start as a pilot, the connectivity architecture needs to be open. For example, a rail operator may start with the predictive maintenance of door operations, connecting to the sensors on the Multifunction Vehicle Bus (MVB). A later step could be enhanced services for conductors and drivers, which implies connecting with the ethernet and Wi-Fi network.

Passenger Experience
Secure connectivity is an essential part to protecting data and onboard systems from cyber threats

Security off the Rails

For Big Data in rail, there are two critical parts to protecting the data, when data is at rest and in motion.

The challenge for rail operations is that the data can remain on trains for long periods due to communication “not-spots” or when the train is powered off. When selecting a connectivity solution, train operators should look for a solution that offers data caching and insist that the storage disks are encrypted.

When it comes to the protection of data transmissions, the easiest and most efficient way is to look for private network connectivity. Private networks isolate the train and cloud from the cyber threats associated with public internet networks. Many rail operators are opting for Software-Defined Wide Area Network (SD-WAN) solutions. The benefit of a SD-WAN is the simplicity of management and the encryption of data over public networks such as Mobile Network Operators (MNO).

A further advantage to SD-WAN is the ability to seed trust in the originating traffic. When it comes to accessing the cloud, identity management or trust in the data source is an essential security best practice.

Cloud Shared Responsibility Model
Simplify AI/ML deployments, with open and modular architecture platforms

When Enough Data Is Enough

Big Data implies there is a need for vast amounts of storage. However, storage, whether in the cloud or on-premise, is costly. Therefore, there is a need to build intelligence at the edge, be it on trains or at the station platform.

Edge computing allows rail operators to apply intelligence close to the data source, enabling streamlining of the data being observed and recording only what is necessary. By running AI and ML onboard the train, the volumes of data stored externally decrease dramatically.

In Summary

The key to unlocking Big Data in rail is connectivity. By streamlining data collection, rail operators can easily deliver predictive maintenance with Machine Learning, or enhanced passenger experiences with Artificial Intelligence.

Furthermore, security should always be at the front of any connected train and not be seen as an afterthought, especially when data is traversing public internet networks.

By choosing solutions designed for the edge, rail operators can consolidate and concentrate existing systems into a single computing platform. This in turn reduces IT expenditure and ultimately lowers the total cost of ownership for their Big Data programme.

To learn more about Klas and how our solutions help deliver cyber-secure Big Data in rail, please visit

Contact Klas

Use the form opposite to get in touch with Klas directly to discuss any requirements you might have.

    We'd love to send you the latest news and information from the world of Railway-News. Please tick the box if you agree to receive them.

    For your peace of mind here is a link to our Privacy Policy.

    By submitting this form, you consent to allow Railway-News to store and process this information.

    Follow Railway-News on LinkedIn
    Follow Railway-News on Twitter