The Data Problem No One Talks About

Every serious project in container terminal optimisation eventually hits the same wall. It is not the algorithms. It is not the compute. It is not even getting buy-in from operations. It is the data.

Specifically: the gap between what a Terminal Operating System (TOS) logs and what actually happened on the ground. This gap is wide, it is persistent, and it kills more analytics and ML projects than any modelling challenge ever will. Yet it rarely gets discussed. It is unglamorous, and acknowledging it means admitting that parts of your dataset are fiction.

What a TOS actually records

A terminal operating system is the central nervous system of a container terminal. It tracks container positions, equipment assignments, vessel schedules, gate transactions, and thousands of other operational events. On paper, it is a comprehensive record of everything that happens.

In practice, it is a record of everything the system was told happened.

The distinction matters enormously. A TOS records planned events and confirmed transactions. A crane is assigned to move container X from position A to position B. The move is completed, and the system logs it. But between the assignment and the completion, a great deal can happen that never touches the database. The crane operator might reposition the spreader three times because the container was not where the system said it was. A truck might wait twelve minutes for a slot that was supposed to be clear. A straddle carrier might take a longer route because a lane was blocked by a container that was placed there "temporarily" two shifts ago and never moved.

None of this shows up in the data. The TOS sees a clean, completed move. The reality was anything but.

The categories of invisible data

After years of working with terminal data, I have come to think of the gap in several categories:

Timing distortions. The TOS records when an event was logged, not always when it occurred. A container discharge might be timestamped when the crane operator confirms the move in the system, which could be seconds or minutes after the container actually hit the ground. In high-throughput operations, operators batch-confirm moves. Your timestamps, which form the foundation of any throughput or productivity analysis, can be systematically shifted, and unevenly so.

Missing intermediate states. A container's journey through a terminal involves dozens of micro-decisions and state changes. The TOS typically captures the endpoints: arrived at gate, placed in yard, loaded on vessel. The path between those points, including the re-handles, the repositioning, and the waiting, is often invisible. You see the final position. You do not see the three other positions it occupied on the way there.

Retroactive corrections. Terminal operators routinely correct data after the fact. A container was scanned into the wrong bay. An equipment assignment was updated hours later when someone noticed the discrepancy. These corrections are operationally necessary but analytically dangerous. They create records that look clean in retrospect but never reflected the operational reality at the time decisions were being made. If you are training a model to make real-time decisions, training it on retroactively corrected data means training it on information it would never have had.

Unrecorded manual interventions. Experienced terminal operators constantly make small adjustments that bypass the system. A planner redirects a truck verbally. A crane operator swaps the order of two moves because they can see a conflict the system cannot. A yard supervisor places a container in a non-standard position to solve an immediate problem. These interventions are often the reason the terminal runs as well as it does, and they are almost entirely absent from the data.

System-imposed artefacts. Sometimes the data reflects the system's limitations rather than operational reality. A TOS might force a discrete status for something that is actually continuous. Equipment states might be binary, available or unavailable, when reality involves a spectrum of partial availability, reduced capacity, or imminent maintenance. The data schema imposes a structure on the world that the world does not actually have.

Why this is worse than noisy data

Noise is manageable. Statistical methods handle noise well. You can filter it, model it, account for it. The data problem in terminals is not noise. It is systematic bias and structural missingness.

The timestamps are not randomly wrong. They are biased in specific directions depending on operator behaviour, shift patterns, and system design. The missing data is not randomly missing. It is missing because certain types of events were never designed to be captured, which means the absence correlates with the very operational conditions you most want to understand.

This is the kind of data problem that does not announce itself. Your model trains. Your metrics look reasonable. Your backtests pass. And then in production, the system makes decisions that seem inexplicable, because it learned patterns from data that described a world slightly different from the one it is now operating in.

The gap between IT and operations

There is a cultural dimension to this problem that is easy to underestimate. The people who design and maintain the TOS are IT professionals. They care about data integrity, system uptime, and transaction consistency. The people who run the terminal are operations professionals. They care about moving containers efficiently, safely, and on time.

These are not conflicting goals, but they produce different relationships with data. For IT, the database is the source of truth. For operations, the yard is the source of truth, and the database is a useful but imperfect reflection of it. When these perspectives clash, and they do quietly every day, the result is data that satisfies the system's requirements without capturing the operational reality.

I have sat in meetings where an analyst presents throughput numbers derived from TOS data, and the terminal planners in the room exchange glances because the numbers do not match what they experienced on the ground. The data is not wrong exactly. It is just not telling the whole story.

What this means for ML and analytics

If you are building predictive models, optimisation systems, or RL agents on terminal data, you need to internalise a few realities:

Your historical data describes a filtered version of reality. Every model you train inherits the biases and gaps of the data collection process. This is not a problem you solve once during data cleaning. It is a permanent constraint on what your system can learn.

Backtesting overstates performance. When your training data and your test data share the same systematic biases, your evaluation metrics will look better than real-world performance. The model has learned to predict the data-generating process, not the underlying operational reality. This is a subtle form of data leakage that standard train-test splits do not catch.

Real-time and historical data behave differently. A model trained on historical data, which has been corrected, completed, and cleaned, will encounter raw, in-progress data in production. The distribution shift is not hypothetical. It is structural and guaranteed.

Domain expertise is not optional. You cannot clean terminal data with generic data-quality tools. You need people who understand what a realistic crane cycle time looks like, what a suspicious container position means, why a particular timestamp pattern indicates batch confirmation rather than real-time logging. Without this knowledge, you will clean the data into a different kind of wrong.

What actually helps

I do not have a complete solution. I am not sure one exists. But several practices have consistently made the difference between projects that work and projects that produce impressive slides:

Instrument beyond the TOS. IoT sensors, GPS tracking on equipment, and camera-based position verification provide independent data streams that can validate or supplement what the TOS records. They are expensive and complicated to deploy, but they close the gap between logged and actual events in ways that no amount of data cleaning can.

Build your models with explicit uncertainty. If your timestamps might be off by thirty seconds, do not pretend they are precise. Build models that can handle temporal uncertainty through interval-based representations, probabilistic timestamps, or tolerance windows. This is less elegant than treating your data as ground truth, but it is more honest and produces more robust systems.

Validate against operational knowledge. Before trusting any dataset, sit down with terminal planners and walk through specific shifts, specific vessels, specific operational scenarios. Ask them if the data matches what they remember. The discrepancies they identify will teach you more about your data quality than any statistical test.

Design for the data you actually have, not the data you wish you had. This is perhaps the most important lesson. The temptation is always to assume the data is better than it is and build sophisticated models on top of it. The more effective approach is to be honest about the data's limitations and build systems that are robust to them, even if that means simpler models, wider confidence intervals, or more conservative decision-making.

Create feedback loops. Once your system is in production, use the discrepancies between its predictions and observed outcomes to systematically identify where your data is failing you. These feedback loops are not just model improvement mechanisms. They are data quality discovery tools.

This is not unique to terminals

I have focused on container terminals because that is where I work. But this problem is endemic across industrial environments. Manufacturing execution systems, warehouse management systems, fleet management platforms, and energy grid monitoring all share the same fundamental issue. The data was designed to support operations, not to train machine learning models. The logging granularity, the event definitions, and the timestamp precision all reflect operational priorities, not analytical ones.

Every industrial AI project eventually discovers this. The ones that succeed treat data quality not as a preprocessing step but as a core, ongoing engineering challenge that deserves as much attention, investment, and expertise as the models themselves.

Closing thoughts

The most honest thing I can say about working with terminal data is this: the data is never as good as it looks, and the gap between what is logged and what happened is where the real engineering challenge lives.

This is not a reason to abandon data-driven approaches. It is a reason to pursue them realistically. The teams that acknowledge the data problem, invest in understanding it, and build systems that are robust to it are the teams that ship systems that actually work in production.

The algorithms are the easy part. The data is where most of the work lives.