Hi there, I wanted to seek advice on how to advance my skills in interfacing hardware sensors to ROS 2 in C++.
I often find myself struggling to write my own ROS 2 driver for various sensors that run smoothly in real time.
IMUs are usually the easiest to pipe into ROS 2 sensor msg format since it streams at such a high sampling rate, a little performance dip wouldn’t hurt much. GPS has low sampling requirements, so the urgency in writing a good real-time pipeline for this part is negligible as well.
What’s crucial in the real-time application from the hardware sensors are usually the camera feed and lidar feed, which I find most difficult to pipe into ROS messages.
I find my code runs poorly and usually sees about 10~20 Hz dip. For example, I wrote a ROS 2 code to convert OpenCV video stream into ROS 2 format to visualize on ROS 2, but often I will see a 10 FPS drop compared to before and after the conversion.
In addition, I sometimes run into problems of deadlocks/starvations at the hardware/software interfacing level, where I find it challenging to come up with a standard on the number of topics and nodes I should be spinning in addition to coupling them with mutexes and threads.
I would appreciate a lot in some guidance to interface hardware sensors properly into ROS 2 for real-time applications or equivalent resources to look deeper into.
Hi @Genozen, great topic of discussion! I agree that there should be a guide to interfacing hardware with ROS 2. I will make a point of this to push something on our platform for more advanced developers.
Your camera driver example is a good starting point, and I’m curious, have you checked if this node behaves the same way in ROS than it does in ROS 2?
The reason I’m mentioning this is because I have a feeling that there could be a relation of what you’re talking about with DDS, the underlying network transport infrastructure of ROS 2.
You can also check if configuring the image topic QoS and optimize it for big messages makes a difference:
auto qos = rclcpp::QoS(rclcpp::QoSInitialization::from_rmw(rmw_qos_profile_sensor_data));
qos.keep_last(10);
qos.best_effort();
qos.durability_volatile();
You have asked a very interesting question. Here is my answer based on my experience in working with sensors / perception devices in confluence with the knowledge of ROS1 and ROS2.
We can classify sensors based on two major types - light payload and heavy payload sensors. I would further separate them in 3 types - light payload, medium payload and heavy payload sensors.
Light Payload Sensors - These are the sensors that provide less data in a single data frame.
Example: Single Proximity Sensor Arrays, IMU, GPS, Optical Wheel Encoders, Tactile Sensor Arrays, Keyboards, Mice, etc.
Medium Payload Sensors - These are sensors that have bit higher data payload but not too heavy.
Example: Low Resolution LiDaRs, Grayscale-only Camera, Camera-based Depth-only Sensors, etc.
Heavy Payload Sensors - These are the sensors that will provide a large chunk of data per frame.
Examples: High Resolution LiDaRs and 3D LiDaRs, RGB Camera, RGB-Depth Cameras, RGB-Audio Camera Transmitters, etc.
Then you have the data frequency for the sensors. Usually two types - high data rate and low data rate sensors.
High Data Rate Sensors are the ones that usually send continuous data without polling.
Low Data Rate Sensors are the ones that send (polled) non-continuous data.
Writing drivers for sensors are quite easy. The simple steps would be:
Read the datasheet for the sensor.
Make functions for activation/power-up and deactivation/power-down for the sensor.
Make functions for reading and writing data from and into the sensor registers.
Make function to decode a byte-encoded data (if there is necessity).
Finally make a ROS-based (ROS1 / ROS2 / MicroROS) program to communicate & interface with the sensor (with the right QoS setting for ROS2).
Coming to your issues now, IMUs and GPS sensors are easy because the data payload is very less. Assuming you have a 10-DoF IMU, you will have 10 data per frame - 3 accelerometer, 3 gyroscope, 3 magnetometer, 1 altimeter/barometer (and a few additional bits). GPS sensors carry data around 256 (+/- 50 bits) bits per message. So like you said, if you suffer random packet losses at low rate, that will be compensated with the high data rate throughput.
Now, talking about medium and heavy payload sensors, like in your case, cameras and lidars, the most important thing to consider is data compression, more specifically, deterministic compression rather than random compression algorithms.
There are quite a few image and data compression algorithms that use fixed hashing. By fixed hashing, I mean a function that is known to the encoding and the decoding system, that stays constant throughout. The same hashing is used to compress and decompress.
Since you have not mentioned the use for your camera stream, I assume that there could only be two uses - either viewing or image perception. Viewing requires no preservation of detail, whereas perception requires preservation of certain details.
For View-only purpose - you could implement a pyramidal image subsampling algorithm on the image acquisition end and use a pyramidal image supersampling algorithm on the reception end to view the image. This will obviously make the image lossy, but it will not be hard for a human to interpret the visuals. Example: Bicubic or Lanczos image subsampling and super sampling.
For Perception purpose - you should implement a fixed hashing codec so that your image is subsampled and supersampled without much loss of image features. You might also need to implement data buffers on acquisition and reception side to compensate for encode-decode times. Example: Laplace or Gaussian pyramidal image subsampling and supersampling, Discrete Cosine / Wavelet Transform based image subsampling and supersampling.
For image processing, it is better to run high resolution cameras at low frame rates like 10-15 FPS. For low resolution images, you can can use upto 30 FPS. 60 FPS (or anything above 30 FPS) will be an overkill for any image processing application.
For LiDaRs, compression would first be in terms of accuracy. You need to check if you need millimeter accuracy or stay with centimeter accuracy. If your device is very precise in measurements, then you can use integer centimeter values as the error would only be +/- 5 millimeters. That way you can reduce noise also. That would help you compress the lidar data for buffered transmission. If you do need millimeter accuracy, then you can use integer millimeter values and get rid of float values, since floats consume more memory and computation time.
Then you have to do group-encoding to compress similar bunch of values. This could result in variable packet data length post compression but increases transmission rate in the long-run.
One way to not have multiple sensor drivers running in parallel is to group the low data rate senors together. To give you an idea, let’s say we have IMU running at 20 Hz, GPS running at 5 Hz and Temperature Sensor running at 6 Hz. Assuming that you need temperature data constantly but not continuously, you can group GPS and Temperature data-read functions together, that way you will have one ROS node reading from both sensors and spitting data out into two topics - one for GPS and one for Temperature. But given the sensor data rate differences, you won’t be able to group IMU and GPS together. So you will have one node for IMU data acquisition separately.
At the end of the day, ROS is a huge framework that needs quite a few things running that requires some dedicated computation resources besides sensor data acquisition processes, due to which you will face delays with the whole communication system. The best way to ease the data throughput is to make the data easy for transmission, in case of mission critical applications, you should try to salvage as much data as possible from the sensors without incurring data-packet losses. You will always find yourself in a dilemma to choose between performance or quality in data processing. You should find the right balance.
I hope my (super-long) explanation gives you some insight. If you need further help with anything specific that I have mentioned above, feel free to ask!