Shipping an offline-first ML model — architecture and lessons learned

The challenge

A mobile app that talks to a field measurement instrument — drilling, geotechnics, environmental surveys — has to collect signals in real time, interpret them, and surface actionable information to the operator. All of that with no internet connection, on a smartphone or a ruggedised Android tablet.

The concrete case: sensors push streams of mechanical data (pressure, torque, advance rate, depth, GPS) over a direct wireless link. From those raw signals, the app must produce a real-time classification of the ground being drilled through — no cloud, no server, entirely on the device.

The classification model (LightGBM, trained on real annotated data) already exists, but integrating it into a React Native app raises architectural questions that go well beyond simply calling a library.

A three-layer architecture

The system is organised into three strictly separated domains:

Field layer: The measurement instrument communicates over direct WiFi with the app. No cable, no network infrastructure, no cloud. The flow is one-way: the sensor pushes its measurements, the app receives them. Fixed-frequency polling keeps the data fresh without saturating the channel.

Application layer: This is the heart of the system, itself split into three responsibilities:

Acquisition and display: the UI receives the data and renders it in real time as curves and visual representations. The operator can interact (start, pause, stop a measurement) and export results.
Transformation pipeline: raw data isn't directly consumable by the model. A chain of successive transformations (normalisation, calibration, windowing, aggregation) turns it into numeric feature vectors.
Inference and aggregation: the ONNX model runs on the device CPU, produces predictions per measurement point, which are then consolidated by slice to yield a stable, usable result.

Storage layer: Entirely local. The ML model is embedded in the app. Measurement data is persisted to a structured CSV file, organised by job site. User metadata (display preferences, configuration) uses native key-value storage.

┌─────────────────────────────────────────────┐
│  Field layer                                │
│  Instrument → direct WiFi → measurement flow │
└─────────────────────┬───────────────────────┘
                      │
┌─────────────────────▼────────────────────────┐
│  Application layer                           │
│                                              │
│  ┌──────────┐  ┌──────────────┐  ┌─────────┐ │
│  │Real-time │  │Transformation│  │ONNX     │ │
│  │UI        │  │pipeline      │  │inference│ │
│  │+ export  │  │+ calibration │  │+ vote   │ │
│  └──────────┘  └──────────────┘  └─────────┘ │
└─────────────────────┬────────────────────────┘
                      │
┌─────────────────────▼───────────────────────┐
│  Storage layer                              │
│  Embedded model · CSV · Preferences         │
└─────────────────────────────────────────────┘

The structuring architecture decisions

1. A 100% JavaScript preprocessor

The ML model expects normalised feature vectors, not raw data. The tempting path would have been to use Python (numpy, pandas) for this step, since that's how the model was trained — but Python doesn't exist on a React Native smartphone.

Two options:

Write a native binding (C++/Kotlin) that reimplements the preprocessing
Do everything in pure JavaScript

We chose the second. Why? Because the preprocessing is exclusively arithmetic: medians, quantiles, sorts, per-window statistics. No matrix algebra, no GPU. In pure JavaScript, this code is:

Unit-testable with no infrastructure (Jest)
Dependency-free (no versioning to keep in sync with the ONNX runtime)
Fast (a few tens of milliseconds for thousands of points)
Portable (the same code can run on iOS if needed)

It's an interesting counterpoint to the "everything native for performance" trend. Here, JavaScript is a perfect fit.

2. Calibration against an embedded reference

A classic problem with sensor-signal classification: two identical instruments, two different days, two different operators — raw values can vary significantly, even on the same material. A model trained naively on those values would be brittle and context-dependent.

The solution: before each campaign, a baseline (empty) reference measurement is taken. The model never works on raw values, only on the deltas from that reference. This differential approach makes the system invariant to hardware, weather conditions and mechanical wear — only what changes relative to the day's reference carries the useful information.

It's a pattern that goes beyond geotechnics: any app that classifies from sensor signals in a real-world environment should consider some form of contextual normalisation.

3. Inference that never blocks the UI

The ONNX runtime (onnxruntime-react-native) has a little-known architectural property: the session.run() call is asynchronous on the JavaScript side, but the real work runs on a dedicated native C++ thread, independent of the main JS thread.

In practice:

The JS thread prepares the data and triggers the call
The C++ runtime spawns a secondary thread, runs inference there, and returns the result via a Promise
Meanwhile, the JS thread keeps handling the UI, the polling, the animations

This is exactly what you want for a real-time app: inference, even on batches of several thousand points, causes no perceptible latency on the interface.

4. Aggregation to smooth out noise

The model was trained on individual measurement points. Inferring point by point would produce noisy, unstable predictions. The solution: infer over all the points in a depth slice, then vote.

The mechanism is simple: each point votes for a class, the majority class wins. On a tie, average confidence decides. This majority vote smooths out outliers and keeps only the dominant behaviour of the slice.

It's a pattern found in many embedded classification systems: rather than trying to improve point-by-point accuracy (which would be expensive in training data), you exploit the natural redundancy of the measurements to gain robustness.

5. An idempotent real-time pipeline

In real-time mode, data arrives in successive batches (polling). The challenge: don't re-infer slices that are already processed, and detect the moment a slice is complete.

The chosen architecture is a stateful singleton that:

Initialises with the calibration data (once)
Ingests new points on each polling cycle
Triggers inference only when a slice is complete (as soon as a point from the next slice arrives)
Guarantees idempotence: if the same data is passed twice, it's ignored

The lifecycle is simple: init() → process() (in a loop) → reset(). No complex state management, no state machine. A singleton, a Set for idempotence, and a slice-completion rule.

What we learned

Embedded has its own rules

What works on a backend (infer in batches, load everything into memory, tolerate 500ms of latency) doesn't work on a field mobile device. The constraints are different:

Limited RAM: loading the entire dataset into memory is out of the question
Shared CPU: the JS thread runs the polling, the rendering, the animations AND the preprocessing
No cloud fallback: if the model is corrupt or missing, the app must detect it and report it without crashing

The model's output format is a weak link

Models exported to ONNX can have different output formats depending on the exporter version: int64, Int32, string, dictionary — sometimes depending on the pipeline's mood. If the app assumes one format and gets another, the result is a silent error or a crash.

The lesson: always parse outputs defensively. Check the type, provide a fallback, log the discrepancy. A wrong prediction beats a crash, especially on a job site where restarting the app means losing time.

Python/JS cross-validation is essential

The preprocessing was developed and tested in Python (numpy, pandas) for training, then rewritten in JavaScript for the embedded side. Without validating that both implementations produce exactly the same results, bit for bit, on the same data, you cannot trust the predictions.

The method: a reference dataset, two standalone executables (Python and JS), and a test battery that compares the outputs with a Float32 noise tolerance. This approach should be systematic whenever you port an ML pipeline from a research environment to a production one.

Calibration, not raw data

The most important decision in the project isn't technical but methodological: never work on raw values, always on normalised deltas from a contextual reference. That's what makes the system robust in real conditions, on variable hardware, in uncontrolled environments.

To wrap up

This project illustrates a fact that's becoming hard to ignore: ML models are no longer confined to servers. You can — and should — embed them on field devices, where the data is produced, where the decision has to be made.

But embedding a model isn't just calling a library. It means rethinking the whole architecture: how to normalise inputs without Python, how to guarantee reliability without a server, how to validate consistency between training and inference, how to run it all on a smartphone in the rain, with gloves on, next to a vibrating drilling rig.

Edge AI, in 2025, is no longer a lab. It's a job site.

Learn more about Prolog: prolog-system.ai

Shipping an offline-first ML modelarchitecture and lessons learned