Recently, one of my colleagues encountered issues with DRM "protected" content where he had to resort to using TeamViewer to record content he had legitimately paid for.
This situation struck me as absurd - not the recording itself, but the fact that someone paid for content yet has been prevented from downloading or recording it for personal use which I have recently (somewhat) covered in my blog The Broken Digital Promise.
But I digress. I was convinced there must be a more efficient solution than using two computers connected via TeamViewer. Whatever the solution, I suspected it would likely involve FFmpeg in some capacity. As it turns out, FFmpeg alone was indeed the answer.
The complete script is available on my GitHub it contains the complete command and the ability to choose which window to screen capture.
If you find ways to improve it (and I am sure there are many) please feel free to open a PR or email me. Anyway lets start the guide for Screen Recording on Linux with FFmpeg and NVIDIA CUDA Hardware Acceleration.
Let's start with the complete command, and then I'll break down each part so you understand exactly what's happening:
ffmpeg -hwaccel cuda -hwaccel_output_format cuda \ -f x11grab \ -thread_queue_size 4096 \ -window_id $WINDOW_ID \ -video_size ${WIDTH}x$HEIGHT \ -framerate 60 \ -i $DISPLAY \ -f pulse -i $AUDIO_OUTPUT -ac 2 \ -c:a aac -b:a 192k \ -c:v h264_nvenc -preset p6 -tune hq -b:v 8M -bufsize 8M -maxrate 10M \ -qmin 0 -g 120 -bf 3 -b_ref_mode middle -temporal-aq 1 \ -rc-lookahead 20 -i_qfactor 0.75 -b_qfactor 1.1 \ -vsync 1 -r 60 \ output.mkv
-hwaccel cuda -hwaccel_output_format cuda
-hwaccel cuda
: Enables NVIDIA CUDA hardware acceleration for decoding-hwaccel_output_format cuda
: Makes sure decoded frames stay in GPU memory for processing-f x11grab \ -thread_queue_size 4096 \ -window_id $WINDOW_ID \ -video_size ${WIDTH}x$HEIGHT \ -framerate 60 \ -i $DISPLAY
-f x11grab
: Specifies the input format as X11 screen grabbing-thread_queue_size 4096
: Increases the buffer size for threads to prevent frame dropping during high CPU usage-window_id $WINDOW_ID
: Captures a specific window instead of the entire screen-video_size ${WIDTH}x$HEIGHT
: Sets the recording resolution-framerate 60
: Captures at 60 frames per second-i $DISPLAY
: Specifies the display to capture (typically :0
)-f pulse -i $AUDIO_OUTPUT -ac 2
-f pulse
: Uses PulseAudio as the audio input format-i $AUDIO_OUTPUT
: Specifies the audio source (set this to your output device name)-ac 2
: Sets audio to stereo (2 channels)-c:a aac -b:a 192k
-c:a aac
: Uses AAC codec for audio-b:a 192k
: Sets audio bit-rate to 192 kbps-c:v h264_nvenc -preset p6 -tune hq -b:v 8M -bufsize 8M -maxrate 10M
-c:v h264_nvenc
: Uses NVIDIA's hardware H.264 encoder-preset p6
: Sets encoding preset to p6 (higher quality, slower encoding)-tune hq
: Optimizes for high quality-b:v 8M
: Sets video bit-rate to 8 Mbps-bufsize 8M
: Sets the buffer size to 8 MB-maxrate 10M
: Sets maximum bit-rate to 10 Mbps-qmin 0 -g 120 -bf 3 -b_ref_mode middle -temporal-aq 1 \ -rc-lookahead 20 -i_qfactor 0.75 -b_qfactor 1.1
-qmin 0
: Sets minimum quantization parameter to 0 (higher quality)-g 120
: Sets keyframe interval to 120 frames-bf 3
: Uses 3 B-frames between reference frames-b_ref_mode middle
: Uses middle frame as reference for B-frames-temporal-aq 1
: Enables temporal adaptive quantization for better quality-rc-lookahead 20
: Sets rate control lookahead to 20 frames-i_qfactor 0.75
: Sets I-frame quantizer factor-b_qfactor 1.1
: Sets B-frame quantizer factor-vsync 1 -r 60 \ output.mkv
-vsync 1
: Enables video sync method 1 (duplicates or drops frames to maintain sync)-r 60
: Sets output frame rate to 60 fpsoutput.mkv
: The output file name.When it comes to the bit-rate settings -b:v 8M -bufsize 8M -maxrate 10M
, the optimal values really depend on content type, and quality requirements.
-b:v 4M -bufsize 4M -maxrate 6M
would likely be sufficient-b:v 8M -bufsize 8M -maxrate 10M
-b:v 12M -bufsize 12M -maxrate 15M
-b:v 15M -bufsize 15M -maxrate 20M
For 1080p (1920×1080):
For 1440p (2560×1440):
For 4K (3840×2160):
While keeping the buffer size equal to the target bit-rate (-bufsize = -b:v
) is common, you can:
A good rule of thumb is to set -bufsize
between 1-2× your target bit-rate.
The best way to find optimal values is through testing:
If you're trying to optimize for quality while keeping file sizes reasonable, the current settings are actually quite balanced for most 1080p screen recording scenarios, but don't hesitate to experiment with the values I've suggested based on your specific content.
Let's take a deep dive into those advanced encoding parameters for NVENC in FFmpeg.
-qmin 0
This sets the minimum quantization parameter (QP) value to 0, which is essentially telling the encoder to prioritize quality at all costs when needed.
The QP scale typically runs from 0-51, with 0 being lossless. Setting qmin
to 0 might be unnecessary in most cases and could lead to inefficient bitrate allocation. For most content, a value of 15-18 is often more efficient while still maintaining excellent visual quality.
The extremely low value (0) might be allocating bits to imperceptible quality improvements. A potential improvement would be -qmin 15
for a better quality-to-filesize ratio, unless you specifically need near-lossless quality in some frames.
-g 250
This sets the GOP (Group of Pictures) size to 250 frames, meaning a keyframe is inserted every 250 frames.
At 60fps, this means a keyframe approximately every 4.17 seconds. For screen recording, especially with applications that have scene changes (like switching windows), this might be too infrequent. Shorter GOPs provide better seeking performance and recovery from packet loss. A potential improvement would be -g 120
(2 seconds at 60fps) for screen recordings with frequent content changes.
-bf 3
This parameter sets the maximum number of B-frames between reference frames to 3.
While B-frames improve compression efficiency, they increase encoding complexity and latency. For screen content, especially text and UI elements, B-frames can sometimes cause temporal artifacts around sharp edges. Depending on the content, -bf 2
might provide a better balance for screen recordings, particularly if there's text or UI elements with sharp edges.
-b_ref_mode middle
This sets the B-frame reference mode to "middle," meaning B-frames will reference the middle frame in a sequence.
NVENC supports three modes:
The "middle" setting is generally good for higher quality but might not be optimal for all content types. Screen content often has predictable motion patterns unlike natural video.
-temporal-aq 1
This enables temporal adaptive quantization, which adjusts quantization parameters based on temporal complexity.
Temporal AQ works well for natural video but can sometimes over-optimize for screen content. It may allocate too few bits to static areas that nonetheless need precise reproduction (like text). For screen recordings with lots of text or detailed UI elements, a better configuration might be -spatial-aq 1
either instead of or in addition to temporal-aq
.
-rc-lookahead 20
This sets the rate control lookahead to 20 frames, allowing the encoder to analyze 20 frames ahead for better bit allocation decisions.
Higher values improve quality but increase encoding latency. For screen recording, content changes can be more abrupt than in natural video, suggesting a potential benefit from increased lookahead. If your system can handle it, -rc-lookahead 40
might provide better quality during rapid scene changes in your screen recordings.
-i_qfactor 0.75
and -b_qfactor 1.1
These set the quantizer scale factors for I-frames (0.75) and B-frames (1.1) relative to P-frames.
These values tell the encoder to use higher quality (lower QP) for I-frames and lower quality (higher QP) for B-frames compared to P-frames. For screen content, I-frames are especially important as they establish the baseline quality.
For screen recording, more benefit would bring an even lower I-frame factor, like -i_qfactor 0.6
, to ensure crisp quality on scene changes and static elements.
There are several NVENC parameters not in the current command that could be beneficial:
-spatial-aq 1
can improve quality in spatially complex regions like text.-aq-strength 8
(values 1-15, higher values provide stronger adaptation).-weighted_pred 1
can improve quality during fades and transitions.-sc_threshold 0
disables scene change detection, which can help maintain consistent quality across the entire recording.-2pass 1
enables two-pass encoding which can significantly improve quality at the same bitrate.With FFmpeg and NVIDIA's CUDA hardware acceleration, you can create high-quality screen recordings on Linux without overloading your CPU. The command provided gives you an excellent starting point.