Screen Recording on Linux with FFmpeg and NVIDIA CUDA Hardware Acceleration

Recently, one of my colleagues encountered issues with DRM "protected" content where he had to resort to using TeamViewer to record content he had legitimately paid for.

This situation struck me as absurd - not the recording itself, but the fact that someone paid for content yet has been prevented from downloading or recording it for personal use which I have recently (somewhat) covered in my blog The Broken Digital Promise.

But I digress. I was convinced there must be a more efficient solution than using two computers connected via TeamViewer. Whatever the solution, I suspected it would likely involve FFmpeg in some capacity. As it turns out, FFmpeg alone was indeed the answer.

The complete script is available on my GitHub it contains the complete command and the ability to choose which window to screen capture.

If you find ways to improve it (and I am sure there are many) please feel free to open a PR or email me. Anyway lets start the guide for Screen Recording on Linux with FFmpeg and NVIDIA CUDA Hardware Acceleration.

The Command

Let's start with the complete command, and then I'll break down each part so you understand exactly what's happening:

sh
ffmpeg -hwaccel cuda -hwaccel_output_format cuda \
	-f x11grab \
	-thread_queue_size 4096 \
	-window_id $WINDOW_ID \
	-video_size ${WIDTH}x$HEIGHT \
	-framerate 60 \
	-i $DISPLAY \
	-f pulse -i $AUDIO_OUTPUT -ac 2 \
	-c:a aac -b:a 192k \
	-c:v h264_nvenc -preset p6 -tune hq -b:v 8M -bufsize 8M -maxrate 10M \
	-qmin 0 -g 120 -bf 3 -b_ref_mode middle -temporal-aq 1 \
	-rc-lookahead 20 -i_qfactor 0.75 -b_qfactor 1.1 \
	-vsync 1 -r 60 \
	output.mkv

Breaking Down the Command

Hardware Acceleration Setup

sh
-hwaccel cuda -hwaccel_output_format cuda

Video Input Configuration

sh
-f x11grab \
-thread_queue_size 4096 \
-window_id $WINDOW_ID \
-video_size ${WIDTH}x$HEIGHT \
-framerate 60 \
-i $DISPLAY

Audio Input Configuration

sh
-f pulse -i $AUDIO_OUTPUT -ac 2

Audio Encoding

sh
-c:a aac -b:a 192k

Video Encoding (NVIDIA Hardware Accelerated)

sh
-c:v h264_nvenc -preset p6 -tune hq -b:v 8M -bufsize 8M -maxrate 10M

Advanced Encoding Parameters

sh
-qmin 0 -g 120 -bf 3 -b_ref_mode middle -temporal-aq 1 \
-rc-lookahead 20 -i_qfactor 0.75 -b_qfactor 1.1

Output Configuration

sh
-vsync 1 -r 60 \
output.mkv

Concerning buffer sizes for NVIDIA

When it comes to the bit-rate settings -b:v 8M -bufsize 8M -maxrate 10M, the optimal values really depend on content type, and quality requirements.

Understanding the Parameters First

Considerations for Optimal Values

Resolution-Based Recommendations

For 1080p (1920×1080):

For 1440p (2560×1440):

For 4K (3840×2160):

Buffer Size Considerations

While keeping the buffer size equal to the target bit-rate (-bufsize = -b:v) is common, you can:

A good rule of thumb is to set -bufsize between 1-2× your target bit-rate.

Testing Approach

The best way to find optimal values is through testing:

  1. Start with settings matched to your content type from above
  2. Record short samples with different bit-rate combinations
  3. Compare quality and file size
  4. Adjust based on your specific quality requirements and storage constraints

If you're trying to optimize for quality while keeping file sizes reasonable, the current settings are actually quite balanced for most 1080p screen recording scenarios, but don't hesitate to experiment with the values I've suggested based on your specific content.

Advanced FFmpeg and NVIDIA parameters deep dive

Let's take a deep dive into those advanced encoding parameters for NVENC in FFmpeg.

-qmin 0

This sets the minimum quantization parameter (QP) value to 0, which is essentially telling the encoder to prioritize quality at all costs when needed.

The QP scale typically runs from 0-51, with 0 being lossless. Setting qmin to 0 might be unnecessary in most cases and could lead to inefficient bitrate allocation. For most content, a value of 15-18 is often more efficient while still maintaining excellent visual quality.

The extremely low value (0) might be allocating bits to imperceptible quality improvements. A potential improvement would be -qmin 15 for a better quality-to-filesize ratio, unless you specifically need near-lossless quality in some frames.

-g 250

This sets the GOP (Group of Pictures) size to 250 frames, meaning a keyframe is inserted every 250 frames.

At 60fps, this means a keyframe approximately every 4.17 seconds. For screen recording, especially with applications that have scene changes (like switching windows), this might be too infrequent. Shorter GOPs provide better seeking performance and recovery from packet loss. A potential improvement would be -g 120 (2 seconds at 60fps) for screen recordings with frequent content changes.

-bf 3

This parameter sets the maximum number of B-frames between reference frames to 3.

While B-frames improve compression efficiency, they increase encoding complexity and latency. For screen content, especially text and UI elements, B-frames can sometimes cause temporal artifacts around sharp edges. Depending on the content, -bf 2 might provide a better balance for screen recordings, particularly if there's text or UI elements with sharp edges.

-b_ref_mode middle

This sets the B-frame reference mode to "middle," meaning B-frames will reference the middle frame in a sequence.

NVENC supports three modes:

The "middle" setting is generally good for higher quality but might not be optimal for all content types. Screen content often has predictable motion patterns unlike natural video.

-temporal-aq 1

This enables temporal adaptive quantization, which adjusts quantization parameters based on temporal complexity.

Temporal AQ works well for natural video but can sometimes over-optimize for screen content. It may allocate too few bits to static areas that nonetheless need precise reproduction (like text). For screen recordings with lots of text or detailed UI elements, a better configuration might be -spatial-aq 1 either instead of or in addition to temporal-aq.

-rc-lookahead 20

This sets the rate control lookahead to 20 frames, allowing the encoder to analyze 20 frames ahead for better bit allocation decisions.

Higher values improve quality but increase encoding latency. For screen recording, content changes can be more abrupt than in natural video, suggesting a potential benefit from increased lookahead. If your system can handle it, -rc-lookahead 40 might provide better quality during rapid scene changes in your screen recordings.

-i_qfactor 0.75 and -b_qfactor 1.1

These set the quantizer scale factors for I-frames (0.75) and B-frames (1.1) relative to P-frames.

These values tell the encoder to use higher quality (lower QP) for I-frames and lower quality (higher QP) for B-frames compared to P-frames. For screen content, I-frames are especially important as they establish the baseline quality.

For screen recording, more benefit would bring an even lower I-frame factor, like -i_qfactor 0.6, to ensure crisp quality on scene changes and static elements.

Advanced Optimizations - Not Currently Used

There are several NVENC parameters not in the current command that could be beneficial:

  1. Spatial AQ: -spatial-aq 1 can improve quality in spatially complex regions like text.
  2. AQ Strength: If using spatial-aq, you can tune it with -aq-strength 8 (values 1-15, higher values provide stronger adaptation).
  3. Weighted Prediction: -weighted_pred 1 can improve quality during fades and transitions.
  4. No SCENECUT: For screen recording, -sc_threshold 0 disables scene change detection, which can help maintain consistent quality across the entire recording.
  5. Two-Pass Encoding: -2pass 1 enables two-pass encoding which can significantly improve quality at the same bitrate.

Conclusion

With FFmpeg and NVIDIA's CUDA hardware acceleration, you can create high-quality screen recordings on Linux without overloading your CPU. The command provided gives you an excellent starting point.

Created at: 11-Apr-2025