Screen Recording on Linux with FFmpeg and NVIDIA CUDA Hardware Acceleration

Recently, one of my colleagues encountered issues with DRM "protected" content where he had to resort to using TeamViewer to record content he had legitimately paid for.

This situation struck me as absurd - not the recording itself, but the fact that someone paid for content yet has been prevented from downloading or recording it for personal use which I have recently (somewhat) covered in my blog The Broken Digital Promise.

But I digress. I was convinced there must be a more efficient solution than using two computers connected via TeamViewer. Whatever the solution, I suspected it would likely involve FFmpeg in some capacity. As it turns out, FFmpeg alone was indeed the answer.

The complete script is available on my GitHub it contains the complete command and the ability to choose which window to screen capture.

If you find ways to improve it (and I am sure there are many) please feel free to open a PR or email me. Anyway lets start the guide for Screen Recording on Linux with FFmpeg and NVIDIA CUDA Hardware Acceleration.

The Command

Let's start with the complete command, and then I'll break down each part so you understand exactly what's happening:

ffmpeg -hwaccel cuda -hwaccel_output_format cuda \
	-f x11grab \
	-thread_queue_size 4096 \
	-window_id $WINDOW_ID \
	-video_size ${WIDTH}x$HEIGHT \
	-framerate 60 \
	-i $DISPLAY \
	-f pulse -i $AUDIO_OUTPUT -ac 2 \
	-c:a aac -b:a 192k \
	-c:v h264_nvenc -preset p6 -tune hq -b:v 8M -bufsize 8M -maxrate 10M \
	-qmin 0 -g 120 -bf 3 -b_ref_mode middle -temporal-aq 1 \
	-rc-lookahead 20 -i_qfactor 0.75 -b_qfactor 1.1 \
	-vsync 1 -r 60 \
	output.mkv

Breaking Down the Command

Hardware Acceleration Setup

-hwaccel cuda -hwaccel_output_format cuda

-hwaccel cuda: Enables NVIDIA CUDA hardware acceleration for decoding
-hwaccel_output_format cuda: Makes sure decoded frames stay in GPU memory for processing

Video Input Configuration

-f x11grab \
-thread_queue_size 4096 \
-window_id $WINDOW_ID \
-video_size ${WIDTH}x$HEIGHT \
-framerate 60 \
-i $DISPLAY

-f x11grab: Specifies the input format as X11 screen grabbing
-thread_queue_size 4096: Increases the buffer size for threads to prevent frame dropping during high CPU usage
-window_id $WINDOW_ID: Captures a specific window instead of the entire screen
-video_size ${WIDTH}x$HEIGHT: Sets the recording resolution
-framerate 60: Captures at 60 frames per second
-i $DISPLAY: Specifies the display to capture (typically :0)

Audio Input Configuration

-f pulse -i $AUDIO_OUTPUT -ac 2

-f pulse: Uses PulseAudio as the audio input format
-i $AUDIO_OUTPUT: Specifies the audio source (set this to your output device name)
-ac 2: Sets audio to stereo (2 channels)

Audio Encoding

-c:a aac -b:a 192k

-c:a aac: Uses AAC codec for audio
-b:a 192k: Sets audio bit-rate to 192 kbps

Video Encoding (NVIDIA Hardware Accelerated)

-c:v h264_nvenc -preset p6 -tune hq -b:v 8M -bufsize 8M -maxrate 10M

-c:v h264_nvenc: Uses NVIDIA's hardware H.264 encoder
-preset p6: Sets encoding preset to p6 (higher quality, slower encoding)
-tune hq: Optimizes for high quality
-b:v 8M: Sets video bit-rate to 8 Mbps
-bufsize 8M: Sets the buffer size to 8 MB
-maxrate 10M: Sets maximum bit-rate to 10 Mbps

Advanced Encoding Parameters

-qmin 0 -g 120 -bf 3 -b_ref_mode middle -temporal-aq 1 \
-rc-lookahead 20 -i_qfactor 0.75 -b_qfactor 1.1

-qmin 0: Sets minimum quantization parameter to 0 (higher quality)
-g 120: Sets keyframe interval to 120 frames
-bf 3: Uses 3 B-frames between reference frames
-b_ref_mode middle: Uses middle frame as reference for B-frames
-temporal-aq 1: Enables temporal adaptive quantization for better quality
-rc-lookahead 20: Sets rate control lookahead to 20 frames
-i_qfactor 0.75: Sets I-frame quantizer factor
-b_qfactor 1.1: Sets B-frame quantizer factor

Output Configuration

-vsync 1 -r 60 \
output.mkv

-vsync 1: Enables video sync method 1 (duplicates or drops frames to maintain sync)
-r 60: Sets output frame rate to 60 fps
output.mkv: The output file name.

Concerning buffer sizes for NVIDIA

When it comes to the bit-rate settings -b:v 8M -bufsize 8M -maxrate 10M, the optimal values really depend on content type, and quality requirements.

Understanding the Parameters First

-b (Video Bit-rate): The target average bit-rate
-bufsize: Size of the buffer used for rate control
-maxrate: Maximum bit-rate allowed at any point

Considerations for Optimal Values

For screen recordings with minimal motion (like slideshows, programming): -b:v 4M -bufsize 4M -maxrate 6M would likely be sufficient
For content with moderate motion (like application demos, the current settings): -b:v 8M -bufsize 8M -maxrate 10M
For high-motion content (gaming, fast action): -b:v 12M -bufsize 12M -maxrate 15M
For very high quality recordings (professional use): -b:v 15M -bufsize 15M -maxrate 20M

Resolution-Based Recommendations

For 1080p (1920×1080):

Low motion: 4-6 Mbps
Medium motion: 6-10 Mbps (current command settings)
High motion: 10-15 Mbps

For 1440p (2560×1440):

Increase values by approximately 50%

For 4K (3840×2160):

Low motion: 12-15 Mbps
Medium motion: 15-25 Mbps
High motion: 25-40 Mbps

Buffer Size Considerations

While keeping the buffer size equal to the target bit-rate (-bufsize = -b:v) is common, you can:

Increase buffer size for more consistent quality (fewer quality fluctuations)
Decrease it for more responsive bit-rate changes (better for varying content)

A good rule of thumb is to set -bufsize between 1-2× your target bit-rate.

Testing Approach

The best way to find optimal values is through testing:

Start with settings matched to your content type from above
Record short samples with different bit-rate combinations
Compare quality and file size
Adjust based on your specific quality requirements and storage constraints

If you're trying to optimize for quality while keeping file sizes reasonable, the current settings are actually quite balanced for most 1080p screen recording scenarios, but don't hesitate to experiment with the values I've suggested based on your specific content.

Advanced FFmpeg and NVIDIA parameters deep dive

Let's take a deep dive into those advanced encoding parameters for NVENC in FFmpeg.

`-qmin 0`

This sets the minimum quantization parameter (QP) value to 0, which is essentially telling the encoder to prioritize quality at all costs when needed.

The QP scale typically runs from 0-51, with 0 being lossless. Setting qmin to 0 might be unnecessary in most cases and could lead to inefficient bitrate allocation. For most content, a value of 15-18 is often more efficient while still maintaining excellent visual quality.

The extremely low value (0) might be allocating bits to imperceptible quality improvements. A potential improvement would be -qmin 15 for a better quality-to-filesize ratio, unless you specifically need near-lossless quality in some frames.

`-g 250`

This sets the GOP (Group of Pictures) size to 250 frames, meaning a keyframe is inserted every 250 frames.

At 60fps, this means a keyframe approximately every 4.17 seconds. For screen recording, especially with applications that have scene changes (like switching windows), this might be too infrequent. Shorter GOPs provide better seeking performance and recovery from packet loss. A potential improvement would be -g 120 (2 seconds at 60fps) for screen recordings with frequent content changes.

`-bf 3`

This parameter sets the maximum number of B-frames between reference frames to 3.

While B-frames improve compression efficiency, they increase encoding complexity and latency. For screen content, especially text and UI elements, B-frames can sometimes cause temporal artifacts around sharp edges. Depending on the content, -bf 2 might provide a better balance for screen recordings, particularly if there's text or UI elements with sharp edges.

`-b_ref_mode middle`

This sets the B-frame reference mode to "middle," meaning B-frames will reference the middle frame in a sequence.

NVENC supports three modes:

disabled
each
middle

The "middle" setting is generally good for higher quality but might not be optimal for all content types. Screen content often has predictable motion patterns unlike natural video.

`-temporal-aq 1`

This enables temporal adaptive quantization, which adjusts quantization parameters based on temporal complexity.

Temporal AQ works well for natural video but can sometimes over-optimize for screen content. It may allocate too few bits to static areas that nonetheless need precise reproduction (like text). For screen recordings with lots of text or detailed UI elements, a better configuration might be -spatial-aq 1 either instead of or in addition to temporal-aq.

`-rc-lookahead 20`

This sets the rate control lookahead to 20 frames, allowing the encoder to analyze 20 frames ahead for better bit allocation decisions.

Higher values improve quality but increase encoding latency. For screen recording, content changes can be more abrupt than in natural video, suggesting a potential benefit from increased lookahead. If your system can handle it, -rc-lookahead 40 might provide better quality during rapid scene changes in your screen recordings.

`-i_qfactor 0.75` and `-b_qfactor 1.1`

These set the quantizer scale factors for I-frames (0.75) and B-frames (1.1) relative to P-frames.

These values tell the encoder to use higher quality (lower QP) for I-frames and lower quality (higher QP) for B-frames compared to P-frames. For screen content, I-frames are especially important as they establish the baseline quality.

For screen recording, more benefit would bring an even lower I-frame factor, like -i_qfactor 0.6, to ensure crisp quality on scene changes and static elements.

Advanced Optimizations - Not Currently Used

There are several NVENC parameters not in the current command that could be beneficial:

Spatial AQ: -spatial-aq 1 can improve quality in spatially complex regions like text.
AQ Strength: If using spatial-aq, you can tune it with -aq-strength 8 (values 1-15, higher values provide stronger adaptation).
Weighted Prediction: -weighted_pred 1 can improve quality during fades and transitions.
No SCENECUT: For screen recording, -sc_threshold 0 disables scene change detection, which can help maintain consistent quality across the entire recording.
Two-Pass Encoding: -2pass 1 enables two-pass encoding which can significantly improve quality at the same bitrate.

Conclusion

With FFmpeg and NVIDIA's CUDA hardware acceleration, you can create high-quality screen recordings on Linux without overloading your CPU. The command provided gives you an excellent starting point.

Created at: 11-Apr-2025