Stereo camera with cv_bridge jitters

Hello, I’m learning how to convert RGB stereo stream into ROS 2 format and publish it.

I’ve written a snippet of function to get the image data from “map.data” convert it using cv_bridge, and publish it:

        cv::Mat image(height, width, CV_8UC3, map.data);

      //   Convert the cv::Mat to a ROS 2 Image message
        auto image_msg = cv_bridge::CvImage(std_msgs::msg::Header(), sensor_msgs::image_encodings::RGB8, image).toImageMsg();

        image_msg->header.stamp = rclcpp::Clock(RCL_ROS_TIME).now();
        image_msg->header.frame_id = "camera_frame_" + std::to_string(cfg->camid);

        std::thread([image_msg, cfg]() {
            auto publish_start = std::chrono::high_resolution_clock::now();

            if (cfg->camid == 0) {
                camera_publisher_node->publish_cv_image_data_cam0(image_msg);
            } else if (cfg->camid == 1) {
                camera_publisher_node->publish_cv_image_data_cam1(image_msg);
            }

            auto publish_end = std::chrono::high_resolution_clock::now();
            auto publish_duration = std::chrono::duration_cast<std::chrono::microseconds>(publish_end - publish_start).count();
            RCLCPP_INFO(rclcpp::get_logger("rclcpp"), "Publishing took %ld microseconds", publish_duration);
        }).detach();

Here is the CamerClass I wrote:

class CameraPublisher : public rclcpp::Node {
public:
    CameraPublisher()
    : Node("camera_publisher") {
        // Define QoS settings
        rclcpp::QoS qos_settings = rclcpp::QoS(rclcpp::KeepLast(10))
            .reliability(RMW_QOS_POLICY_RELIABILITY_BEST_EFFORT)
            .durability(RMW_QOS_POLICY_DURABILITY_VOLATILE);

        img_publisher_cam0_ = this->create_publisher<sensor_msgs::msg::Image>("camera0/image", qos_settings);
        img_publisher_cam1_ = this->create_publisher<sensor_msgs::msg::Image>("camera1/image", qos_settings);

    void publish_cv_image_data_cam0(const std::shared_ptr<sensor_msgs::msg::Image>& image_msg) {
        img_publisher_cam0_->publish(*image_msg);
    }

    void publish_cv_image_data_cam1(const std::shared_ptr<sensor_msgs::msg::Image>& image_msg) {
        img_publisher_cam1_->publish(*image_msg);
    }

private:
    rclcpp::Publisher<sensor_msgs::msg::Image>::SharedPtr img_publisher_cam0_;
    rclcpp::Publisher<sensor_msgs::msg::Image>::SharedPtr img_publisher_cam1_;
};

#endif

Although I’m getting camera stream from cam0 and cam1, it is very jittery:
You can also see the publishing time spikes sometimes…

  1. Is this the proper way design the ROS 2 image stream for stereo?
  2. What would be the reason why the image is jittery?

Hi @Genozen ,

Could you explain more precisely as to what you mean by “jittery”?

  1. Are the images in the camera stream fluctuating?
  2. Are you experiencing variable frame rate in an otherwise fixed frame rate mode?
  3. Are the frames having random lag between two consecutive frames?

EDIT:
After going though your video I understood these could be ruled out:

  1. Are you seeing black frames in the stream sometimes?
  2. Are you seeing corrupted image data in the stream frames?
  3. Are you experiencing interlacing in the stream frames?

There are so many things that could contain within the term “jitter”. Please explain with more context.

Regards,
Girish

Thanks for the question.

Yeah, let me clarify the “jitter” a bit.

In the video you’ve seen, I was trying to move the logo back and forth smoothly… but as you saw in the stream is that the video feed is jumping back and forth in time, or sometimes skips some frames.

The only observation I saw that seems to correlate, is the amount of time it took to publish the frame. In the terminal in that same video, you can see some publishing spiked up to 3000 or more ms…

I think another observation is that, these “spikes” of publishing time happens when I either 1. do echo to the image topic, or 2. using Rviz2 or rqt to visualize it. When the program runs without any listeners, it seems to be pretty steady in publishing the data…

Hi @Genozen ,

Thanks for explaining your better about the jitter.

I went through through your video a few more times and observed that your frames, frame rate and frame read-times are fluctuating. The “publishing time” goes upto 500 usecs for non-moving frames and reaches ~3.5 secs for moving frames (for the frames where you move the logo).

The first approach that you should take is to have fixed frame rate and fixed resolution. Choose a medium resolution that you see fit for you application without consuming too much network bandwidth per data packet.

You have also mentioned RGB stereo stream, but you have shown only a single stream with Grayscale colors, which makes me have a lot more questions about the actual behavior in stereo mode with RGB color format.

In the case of stereo images, there would be some form of “image stitching” process that needs to be done to merge the left and right images during image processing.

The code that you have posted will work for capturing images from each of the stereo camera one by one, but not together. You must get the frames at exact time from both the cameras at the same instant, so you might have to change your program to implement a 2-thread data acquisition logic, to capture image data simultaneously from left and right cameras. It would be better to concatenate the left and right camera frames horizontally into one frame so that it can be processed later. Your post-capture frame will look like [left-right] in one frame, instead of left and right in two frames, if you understand my point.

Working with Grayscale is easy, but are you going to work only with Grayscale or would you be using RGB later on?
If you plan on using RGB, then you should think of image compression. 24 bit RGB (CV_8UC3) gets large too quickly if you increase the resolution. You might want to consider color compression, check out 8bit RGB color conversions. You will basically convert R8G8B8 to RGB8. Checkout this here: 8-bit RGB

In your code, you are using an “if / else condition” to read from the camera streams, which will also cause a delay to the stream. You are effectively reading data from camera 0 and 1 alternatively, this reduces the frame rate exactly by half. Both frames are captured by the camera at the same time. When you read camera 0, you lose camera 1 data and vice versa. Therefore your fps output will be halved.

So addressing your issues now:
Frame fluctuation can be fixed by using a fixed frame rate and resolution.
Frame rate fluctuation can be fixed by using a lower resolution and better image capture code.
Frame read-times can, again, be fixed with 2-thread logic for camera data capture.

The random appearance of past frames in your image stream is due to the data buffer not getting rid of past data. To fix this issue, you must setup a FIFO buffer and clear the buffer at regular intervals. The main reason for this issue is that you are not in control of the subscriber data buffers in ROS2 (the KeepLast(10) that you are using), and it is not getting cleared properly.

Some good practices:
When you are capturing camera frames from the stereo camera, do minimal processing at the data acquisition node. It is better to output both the images horizontally concatenated as one topic rather than having 2 topics to output single frame.
During image processing, you can unconcatenate the image data to apply stitching or depth recognition.

I know I have expressed my thoughts incoherently. I hope this helps.

Regards,
Girish

1 Like

Thank you for the detailed analysis. I very much appreciate your advice.

Regarding having fixed framerate and resolution, the camera’s driver already does a soft-sync so frames are pretty consistent at 60 FPS, and in the form of RGB888. The problem seems to reside on the ROS end…

I like the suggestion of stitching two frames together, however, when just displaying using OpenCV’s imshow() the two image streams appear to be just fine… (left and right). Still I think this might be a better design that I should go with in the future. Although stitching two frames together seems like an additional computational cost.

I’m currently doing the image processing (converting to ROS image) at a separate thread (camera processing thread), which is a sub-thread of the camera capture stream (a callback that calls the processing thread later on), is this what you mean regarding a minimal processing at data acquisition node?

Regarding the FIFO buffer, is this something the ROS 2 provides? if not do you have any resources I could look into implementing one? I think it’s best to do one step at a time, and fixing the “order” of the frames should be prioritized, from there I’ll slowly figure out why the frame rate is so slow…

Hi @Genozen ,

This is the problem. 60 fps is bad. Try to use 15 or 30 fps for image capture. 60fps with R8G8B8 (24bit RGB) is really an overkill. If you still want to use this configuration, I have nothing else to say other than “you will have problems”. 60 fps is only good for video recording/storage, not good for image processing!

I think you are overthinking the “stitching” part (with things like advanced image matching and mosaicing algorithms). I am telling you to just concatenate the images together. Stitching here is basically averaging pixel values of the overlapping image pixel columns from both left and right cameras. You also don’t have to necessarily do averaging, it could also be minimum or maximum values from the overlapping pixels. The processing time to do this averaging will be much less than the capture time for one frame (at 60 fps, 1 frame capture time = 1/60 sec = 16.67 millisec). You will have a maximum of 50% image overlap with both the cameras for an image at infinity.

You need to use two different programs (or nodes). You should not have image capture and image processing in the same program, even as two separate threads. You should have a node that always captures the camera frames and another program that does all the processing you want. This way you will have minimum processing at the data acquisition node (capture node).

I am not exactly sure if there is a specific ROS2 based package for FIFO buffer. Google is your friend if you want to know more about FIFO/LIFO (queue/stack) buffering. You will essentially set your KeepLast to a value of 1 or 2 and use a buffer that can hold half of the maximum (or desired) fps of your camera.

I hope this helps.

Regards,
Girish

1 Like

Thanks again for all the suggestions. I will give them a try and observe the differences!

  • Separate node capture from node image processing
  • Lower FPS or use smaller image format
  • Implement some FIFO

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.