Skip to main content
When dealing with video segments—especially in systems that rely on distributed cameras, cloud storage, and multiple data streams—time synchronization becomes one of the most critical challenges. Without precise timing information, aligning events between multiple feeds (e.g., video, audio, and sensor data) becomes error-prone. Rhombus has implemented a custom timestamp embedding strategy that provides millisecond precision within .mp4/.m4v video segments, going beyond the coarse timing fields traditionally found in standard ISOBMFF (ISO Base Media File Format) containers.
This guide explains what this approach means, why it’s important, and how developers can retrieve and use this timestamp for building time-aligned applications.

Understanding ISOBMFF and the ā€œfreeā€ Atom

The ISOBMFF standard (ISO/IEC 14496-12) is the container format underlying .mp4, .m4v, .mov, and many streaming segment formats like .fMP4. Its structure is based on boxes (atoms)—self-contained data units identified by a 4-character code (e.g., moov, mdat, free).
Standard timestamps in ISOBMFF (e.g., creation_time in the mvhd box) typically have seconds-level resolution. This is fine for some media workflows, but insufficient for multi-camera synchronization or high-speed event correlation.
Rhombus solves this by embedding a millisecond-precision timestamp inside a free atom. This is a non-standard, yet fully ISOBMFF-compliant, method.

The Rhombus Custom Timestamp

When Rhombus segments video (and if pulling actual segments, not a live transport stream), a custom metadata signature is written into the free box:
1

Box Type: free

In ISOBMFF, free is normally a placeholder box containing unused space. Rhombus repurposes this to carry timestamp metadata.
2

Signature: rhom

The first 4 bytes are the ASCII string rhom—identifying it as Rhombus-specific data.
3

Millisecond Timestamp

The next 8 bytes are a 64-bit integer representing the start time of the video content, measured in milliseconds since the Unix epoch (UTC).
Binary layout example:
[ free box length ][ 'free' ][ 'rhom' ][ 8-byte timestamp (ms since epoch) ]

Why This Is Important for Developers

This design choice unlocks precise synchronization capabilities across multiple use cases:

Multi-Camera Alignment

Align video feeds from different cameras to within 1 ms for coordinated monitoring

Sensor Fusion

Merge video with IoT sensor data (access control events, environmental readings)

Forensic Accuracy

Reconstruct events down to sub-second intervals in investigations

Reduced Drift

Avoid errors that accumulate when relying solely on client system clocks or NTP sync
For ecosystem and integration partners, this makes Rhombus video streams highly interoperable with third-party analytics, AI/ML pipelines, and real-time monitoring systems.

Retrieving the Timestamp

Parsing ISOBMFF Files

You can use open-source libraries to read the free box from an .mp4/.m4v segment and check for the rhom signature.

Implementation Examples

Python Implementation
import struct
import datetime

def extract_rhombus_timestamp(file_path):
    """
    Extract millisecond-precision timestamp from Rhombus video segment.

    Args:
        file_path: Path to the .mp4 or .m4v file

    Returns:
        tuple: (timestamp_ms, datetime object) or (None, None) if not found
    """
    with open(file_path, "rb") as f:
        data = f.read()

    # Find the 'free' box
    idx = data.find(b'free')
    if idx == -1:
        return None, None

    # Search for 'rhom' signature after 'free'
    rhom_idx = data.find(b'rhom', idx)
    if rhom_idx == -1:
        return None, None

    # Read the next 8 bytes after 'rhom' (big-endian 64-bit integer)
    timestamp_bytes = data[rhom_idx + 4 : rhom_idx + 12]
    timestamp_ms = int.from_bytes(timestamp_bytes, byteorder="big")

    # Convert to human-readable UTC time
    timestamp_dt = datetime.datetime.utcfromtimestamp(timestamp_ms / 1000.0)

    return timestamp_ms, timestamp_dt

# Example usage:
timestamp_ms, timestamp_dt = extract_rhombus_timestamp("video_segment.mp4")

if timestamp_ms:
    print(f"Timestamp (ms since epoch): {timestamp_ms}")
    print(f"UTC Time: {timestamp_dt}")
    print(f"ISO Format: {timestamp_dt.isoformat()}")
else:
    print("Rhombus timestamp not found in file")

# Sample output:
# Timestamp (ms since epoch): 1722945678123
# UTC Time: 2024-08-06 15:21:18.123000
# ISO Format: 2024-08-06T15:21:18.123000

Advanced Python Usage

Multi-Segment Processing
import os
import glob
from datetime import datetime

class RhombusTimestampExtractor:
    """Extract and manage timestamps from multiple video segments."""

    def __init__(self):
        self.timestamps = []

    def process_directory(self, directory_path, pattern="*.mp4"):
        """Process all video files in a directory."""
        files = glob.glob(os.path.join(directory_path, pattern))

        for file_path in sorted(files):
            timestamp_ms, timestamp_dt = extract_rhombus_timestamp(file_path)

            if timestamp_ms:
                self.timestamps.append({
                    'file': os.path.basename(file_path),
                    'timestamp_ms': timestamp_ms,
                    'datetime': timestamp_dt
                })

    def get_timeline(self):
        """Get chronologically sorted list of segments."""
        return sorted(self.timestamps, key=lambda x: x['timestamp_ms'])

    def find_segment_at_time(self, target_datetime):
        """Find the video segment containing a specific time."""
        target_ms = int(target_datetime.timestamp() * 1000)

        for segment in self.get_timeline():
            if segment['timestamp_ms'] <= target_ms:
                closest = segment
            else:
                break

        return closest

# Usage example
extractor = RhombusTimestampExtractor()
extractor.process_directory("/path/to/video/segments")

# Find segment at specific time
target = datetime(2024, 8, 6, 15, 21, 0)
segment = extractor.find_segment_at_time(target)
print(f"Segment at {target}: {segment['file']}")

Real-World Use Cases

Multi-Camera Event Reconstruction

Synchronize footage from multiple cameras to reconstruct security incidents:
Event Timeline Reconstruction
from datetime import datetime, timedelta

class EventReconstructor:
    def __init__(self):
        self.camera_segments = {}  # cameraId -> list of segments

    def add_camera_footage(self, camera_id, segment_files):
        """Add video segments from a specific camera."""
        segments = []

        for file_path in segment_files:
            timestamp_ms, timestamp_dt = extract_rhombus_timestamp(file_path)
            if timestamp_ms:
                segments.append({
                    'file': file_path,
                    'start_time': timestamp_dt,
                    'timestamp_ms': timestamp_ms
                })

        self.camera_segments[camera_id] = sorted(
            segments,
            key=lambda x: x['timestamp_ms']
        )

    def reconstruct_event(self, event_time, window_seconds=30):
        """
        Find all camera segments within time window of an event.

        Args:
            event_time: datetime of the event
            window_seconds: seconds before/after event to include
        """
        window = timedelta(seconds=window_seconds)
        start_time = event_time - window
        end_time = event_time + window

        relevant_footage = {}

        for camera_id, segments in self.camera_segments.items():
            camera_clips = []

            for segment in segments:
                # Check if segment overlaps with event window
                if start_time <= segment['start_time'] <= end_time:
                    offset = (segment['start_time'] - event_time).total_seconds()
                    camera_clips.append({
                        'file': segment['file'],
                        'start_time': segment['start_time'],
                        'offset_from_event': offset
                    })

            if camera_clips:
                relevant_footage[camera_id] = camera_clips

        return relevant_footage

# Usage
reconstructor = EventReconstructor()
reconstructor.add_camera_footage('entrance', entrance_files)
reconstructor.add_camera_footage('lobby', lobby_files)
reconstructor.add_camera_footage('parking', parking_files)

# Reconstruct event at specific time
event_time = datetime(2024, 8, 6, 15, 21, 18)
footage = reconstructor.reconstruct_event(event_time, window_seconds=30)

print(f"Footage for event at {event_time}:")
for camera_id, clips in footage.items():
    print(f"\n{camera_id}:")
    for clip in clips:
        print(f"  - {clip['file']}")
        print(f"    Offset: {clip['offset_from_event']:.2f}s")

Sensor Data Correlation

Align video with access control or environmental sensor events:
Sensor-Video Correlation
class SensorVideoCorrelator {
    constructor() {
        this.videoTimestamps = [];
        this.sensorEvents = [];
    }

    /**
     * Add video segment with its timestamp
     */
    addVideoSegment(cameraId, timestamp, videoUrl) {
        this.videoTimestamps.push({
            cameraId,
            timestamp: timestamp.timestampMs,
            datetime: timestamp.utcTime,
            url: videoUrl
        });
    }

    /**
     * Add sensor event with timestamp
     */
    addSensorEvent(sensorId, eventType, timestampMs, data) {
        this.sensorEvents.push({
            sensorId,
            eventType,
            timestamp: timestampMs,
            datetime: new Date(timestampMs),
            data
        });
    }

    /**
     * Find video coverage for a sensor event
     */
    findVideoForEvent(eventTimestampMs, cameras = null) {
        const relevantVideos = this.videoTimestamps.filter(video => {
            // Video segment starts before or at event time
            const inTimeRange = video.timestamp <= eventTimestampMs;

            // Filter by camera if specified
            const inCameraList = !cameras || cameras.includes(video.cameraId);

            return inTimeRange && inCameraList;
        });

        // Get the most recent video before the event for each camera
        const latestByCamera = {};

        relevantVideos.forEach(video => {
            if (!latestByCamera[video.cameraId] ||
                video.timestamp > latestByCamera[video.cameraId].timestamp) {
                latestByCamera[video.cameraId] = video;
            }
        });

        return Object.values(latestByCamera);
    }

    /**
     * Generate correlation report
     */
    generateCorrelationReport() {
        return this.sensorEvents.map(event => {
            const videos = this.findVideoForEvent(event.timestamp);

            return {
                event: {
                    type: event.eventType,
                    sensor: event.sensorId,
                    time: event.datetime.toISOString(),
                    data: event.data
                },
                associatedVideos: videos.map(v => ({
                    camera: v.cameraId,
                    url: v.url,
                    offset: event.timestamp - v.timestamp
                }))
            };
        });
    }
}

// Usage example
const correlator = new SensorVideoCorrelator();

// Add door access event
correlator.addSensorEvent(
    'door-entrance-1',
    'ACCESS_GRANTED',
    1722945678123,
    { userId: 'user123', cardId: '12345' }
);

// Add corresponding video
correlator.addVideoSegment(
    'camera-entrance',
    { timestampMs: 1722945670000, utcTime: new Date(1722945670000) },
    'https://media.rhombus.com/segment1.mp4'
);

// Generate report
const report = correlator.generateCorrelationReport();
console.log(JSON.stringify(report, null, 2));

Best Practices for Integration

Always confirm the rhom tag before interpreting the following bytes as a timestamp. This prevents misinterpretation of unrelated data.
def is_valid_rhombus_timestamp(data, rhom_index):
    # Verify 'rhom' signature
    if data[rhom_index:rhom_index + 4] != b'rhom':
        return False

    # Verify sufficient data for timestamp
    if len(data) < rhom_index + 12:
        return False

    # Extract and validate timestamp range
    timestamp_bytes = data[rhom_index + 4:rhom_index + 12]
    timestamp_ms = int.from_bytes(timestamp_bytes, byteorder="big")

    # Sanity check: timestamp should be reasonable
    # (between 2015 and 2050)
    min_timestamp = 1420070400000  # Jan 1, 2015
    max_timestamp = 2524608000000  # Jan 1, 2050

    return min_timestamp <= timestamp_ms <= max_timestamp
The timestamp is UTC-based. Convert it appropriately if your application needs local time.
from datetime import datetime
import pytz

def convert_to_local_time(timestamp_ms, timezone='America/New_York'):
    # Create UTC datetime
    utc_dt = datetime.utcfromtimestamp(timestamp_ms / 1000.0)
    utc_dt = pytz.utc.localize(utc_dt)

    # Convert to local timezone
    local_tz = pytz.timezone(timezone)
    local_dt = utc_dt.astimezone(local_tz)

    return local_dt

# Usage
timestamp_ms = 1722945678123
local_time = convert_to_local_time(timestamp_ms, 'America/Los_Angeles')
print(f"Local time: {local_time}")
Combine the segment start timestamp with frame timestamps for frame-accurate alignment.
class FrameTimestampCalculator {
    constructor(segmentStartMs, frameRate) {
        this.segmentStartMs = segmentStartMs;
        this.frameRate = frameRate;
        this.frameDurationMs = 1000 / frameRate;
    }

    /**
     * Calculate absolute timestamp for a specific frame
     */
    getFrameTimestamp(frameNumber) {
        const offsetMs = frameNumber * this.frameDurationMs;
        return this.segmentStartMs + offsetMs;
    }

    /**
     * Find frame number for a specific timestamp
     */
    getFrameAtTimestamp(targetTimestampMs) {
        const offsetMs = targetTimestampMs - this.segmentStartMs;
        return Math.floor(offsetMs / this.frameDurationMs);
    }
}

// Usage
const segmentTimestamp = 1722945678123;
const calculator = new FrameTimestampCalculator(segmentTimestamp, 30);

// Get timestamp of frame 150
const frameTime = calculator.getFrameTimestamp(150);
console.log('Frame 150 timestamp:', new Date(frameTime));

// Find frame at specific time
const targetTime = 1722945683123;
const frameNum = calculator.getFrameAtTimestamp(targetTime);
console.log('Frame at target time:', frameNum);
Store your parsing logic in a modular way in case Rhombus adds new metadata formats.
class RhombusMetadataParser:
    """Extensible parser for Rhombus metadata formats."""

    VERSION = "1.0"

    def __init__(self):
        self.parsers = {
            b'rhom': self._parse_v1_timestamp
        }

    def parse(self, file_path):
        """Parse metadata from video file."""
        with open(file_path, "rb") as f:
            data = f.read()

        # Find 'free' box
        free_idx = data.find(b'free')
        if free_idx == -1:
            return None

        # Check all known signatures
        for signature, parser_func in self.parsers.items():
            sig_idx = data.find(signature, free_idx)
            if sig_idx != -1:
                return parser_func(data, sig_idx)

        return None

    def _parse_v1_timestamp(self, data, rhom_idx):
        """Parse v1 timestamp format."""
        timestamp_bytes = data[rhom_idx + 4:rhom_idx + 12]
        timestamp_ms = int.from_bytes(timestamp_bytes, byteorder="big")

        return {
            'version': 1,
            'type': 'timestamp',
            'timestamp_ms': timestamp_ms,
            'datetime': datetime.utcfromtimestamp(timestamp_ms / 1000.0)
        }

    def add_parser(self, signature, parser_func):
        """Add custom parser for new metadata formats."""
        self.parsers[signature] = parser_func

# Usage
parser = RhombusMetadataParser()
metadata = parser.parse("video_segment.mp4")

if metadata:
    print(f"Version: {metadata['version']}")
    print(f"Type: {metadata['type']}")
    print(f"Timestamp: {metadata['datetime']}")

Performance Considerations

Efficient File Reading

For large files, read only the header portion instead of the entire file

Caching Strategy

Cache extracted timestamps to avoid re-parsing the same files

Batch Processing

Process multiple files in parallel when building timelines

Memory Management

Use streaming parsers for very large video files

Optimized File Reading

Optimized Parser
def extract_rhombus_timestamp_optimized(file_path, max_search_bytes=100_000):
    """
    Optimized version that reads only the beginning of the file.
    Most metadata is in the first portion of MP4 files.
    """
    with open(file_path, "rb") as f:
        # Read only header portion
        data = f.read(max_search_bytes)

    # Search for 'rhom' signature
    rhom_idx = data.find(b'rhom')
    if rhom_idx == -1:
        return None, None

    # Verify we have enough data for timestamp
    if len(data) < rhom_idx + 12:
        return None, None

    timestamp_bytes = data[rhom_idx + 4:rhom_idx + 12]
    timestamp_ms = int.from_bytes(timestamp_bytes, byteorder="big")
    timestamp_dt = datetime.utcfromtimestamp(timestamp_ms / 1000.0)

    return timestamp_ms, timestamp_dt

Conclusion

Rhombus’ method of embedding a millisecond-precision UTC timestamp in the free atom of ISOBMFF segments provides developers with a powerful tool for precise event alignment in multi-stream environments. This approach preserves compatibility with existing video tooling while unlocking sub-second accuracy for analytics, AI, and real-time monitoring—critical for advanced integrations in the Rhombus ecosystem.
Next Steps for Developers:
  • Experiment with the ISOBMFF GitHub library to parse Rhombus segments
  • Use the MP4Box.js online viewer to visually inspect box structures
  • Incorporate timestamp extraction into your ingest pipeline for perfectly synchronized multi-source datasets

Additional Resources

Support

Need assistance with timestamp extraction or video synchronization?
This advanced implementation guide is regularly updated to reflect the latest best practices for working with Rhombus video segments.