Skip to Content

Improving data quality of outbound transport event log data aiming to heighten its impact on downstream consumption

Mukund Subramaniyan
8 Mar 2023

Data quality is the biggest challenge in implementing high-impact AI use cases. Correcting data quality issues at source requires expensive investments in re-engineering the entire ecosystem and re-designing the process. As a result, downstream AI use cases are put on hold until the data quality becomes acceptable. But this situation can be solved using statistics and machine learning-powered algorithms. The algorithms can cleanse, refine and enrich the data, ultimately accelerating the implementation of AI use cases. 

Problem Statement 

The outbound logistics department of an automotive company had challenges with poor-quality transport event log data for parts of the large dataset in the supply chain management (SCM) data platform. The poor data quality emerges from two sources. First, leaks in the data pipelines transport the data from on-premises Data Warehouse (DW) to the SCM platform. The data quality in the SCM platform did not fully match the DW (a warehouse under decommission and where the current transport events are stored), indicating the leaks in the pipelines. Second, problems at the data collection source, e.g., poor logging of time events by the transport carriers leading to a range of data quality issues, from missing departure and arrival time stamps, incorrect time stamps, out-of-range time stamps, and chronological errors in the recorded timestamps of the transport movements. These collectively hindered the downstream consumption of data, e.g., reporting key performance indicators on lead times and carbon dioxide emissions and supporting advanced analytics use cases desired to be built on the SCM platform. Improving the data quality manually in the SCM data platform was expensive, time-consuming, and hard to scale. The company seeks an AI-based algorithmic solution that automatically fixes data quality.


We partnered with the client to develop an AI algorithm to cleanse the transport data. We used the logistics expert knowledge at the center of developing the AI algorithm to enhance trust in AI algorithmic-based cleansing activities by minimizing the risk of incorrect AI-based estimates. We encoded the knowledge as rules that reflect the outbound logistics business dynamics. The AI algorithmic design had two parts. The first part is identifying the defective records in the transport event log data. We achieved this by setting up data rules configuration. The second part is to automatically fix the poor data quality based on the nature of specific data quality issues. We achieved this by combining statistical learning algorithms with experts’ knowledge encoded in terms of rules.

The AI algorithm achieves two types of cleansing activities. First, it cleaned the historical data records from the calendar year 2021. It statistically imputes the departure and arrival time stamps when they are missing and recommends fixes for out-of-range time stamps. Also, it corrects the incorrect time stamps and the chronological order of the transport movements. Second, it fixed the quality problems in real-time, i.e., as and when the SCM platform receives the transport events from DW. We integrated the complete AI algorithm into the SCM data platform.

In addition, we fixed the leakages in the data pipelines by modifying the data engineering processes. We did that by activating the data pipelines between DW and SCM platforms from a time-based scheduled activation to transport movement updates-based activation. This process ensured the lean flow of data from the DW to the SCM platform and removed the manual maintenance of the transport movement updates in the SCM platform.


Using the AI algorithm, we fixed 85% of the data quality errors in the historical data from 2021 to 2022. Moreover, the insights delivered from this activity have also helped the client with the, 

• Active correction of data at the source (e.g., not allowing the carriers to report arrivals when departure for the corresponding transport leg is missing) 

• Identifying new hidden quality errors and requirements of new data rules 

• Make strategic decisions on data engineering practices to maximize their impact on downstream consumption. 

• Demonstrate the flexibility of leveraging existing SCM data platforms towards analytics use cases and accommodating a broader scope.

• Accelerate the development of high-value AI use cases that uses the event log data to streamline the outbound logistics operations.

About Author

Mukund Subramaniyan

Senior Business Analyst
Mukund works in the manufacturing sector, mainly advising clients on topics related to digital manufacturing, data analytics, artificial intelligence, and operations transformation. He has extensive experience working with major automotive companies in India and Sweden including original equipment manufacturers (OEMs) and suppliers. His educational and professional background integrates technical expertise as computer scientists with the manufacturing expertise of engineers. This ensures that the deep insights generated through AI translate into real measurable impact in an organization. He is passionate about transforming manufacturing operations using IIoT, AI, data, insights, and actions.