At 6000mhz hags was on. But with hags off, I also did test in the past, at that time only difference was 3200x1800 was possible at 24fps to 60fps.
with 6400mhz 36 38 38 hags off, 3840x2160 24fps to 60fps possible and I didn't test further because I am on a 60hz 4k projector. Also I checked now that with HAGS on, it is still working 3840x2160 24 to 60fps without any frame drops.

I use MPC-HC with MPC Video Renderer. In the past I was using MadVR but now with the new RTX Video Super Resolution, I changed to MPC Video Renderer. For 4K Content it makes no difference but for 1080p and below content, I see much more improvement than the MadVR upscaling.

https://github.com/emoose/VideoRenderer … ag/rtx-1.1
This is  the release github link for  MPC-HC and MPC-BE players.

I specifically use this build because the latest release  didn't work on  my system strangely.
https://github.com/emoose/VideoRenderer … f8c3e2.zip

Also  as RAM, I  am using Kingston  Fury 6000mhz 2x16gb.
https://www.amazon.com/Kingston-2x16GB- … B09N5M5PH3
When I bought it, there was  chip crisis,  I paid around 500 usd, damn it got much cheaper now big_smile

My motherboard has IC SK Hynix. It has excellent price/performance.
https://www.amazon.com/MSI-Z690-ProSeri … B09KKYS967

In this link you can see that SK Hynix is the best IC for DDR5 overclocking. When I bought it, I had no idea what the motherboard had for  IC but now I am  glad to see that it  has SK Hynix big_smile
https://www.overclockers.com/ddr5-overclocking-guide/

I read that also all DDR5 rams have built in Error Correction in it, so overclocking DDR5 is much safer now for system stability.
On-Die ECC
We typically only see the Error Correction Code (ECC) implemented in the server and workstation world. Traditional performance DDR4 UDIMM does not come with ECC capabilities. The CPU is responsible for handling the error correction. Due to the frequency limitations, we’ve only seen ECC on lower-spec kits. DDR5 changes everything in this regard because ECC comes standard on every DDR5 module produced. Therefore, the system is then unburdened as it no longer needs to do the error correction.

UHD wrote:

Who wants to interpolate 4K HDR videos x5 in real time with RIFE?
Perhaps a more appropriate question is who can afford it?

After reading all the posts on this thread I come to the conclusion that it is possible, and the limitation is not the NVIDIA GeForce RTX 4090 graphics card but the RAM bandwidth.

For someone with unlimited financial resources this is the solution:

ASUS Pro WS W790E-SAGE SE motherboard
Intel Xeon W9-3495X processor
G.SKILL DDR5-6400 CL32-39-39-102 octa-channel R-DIMM memory modules

https://www.gskill.com/img/pr/2023.02.23-zeta-r5-rdimm-ddr5-announce/06-zeta-r5-rdimm-spec-table-eng.png
Source: https://www.gskill.com/community/150223 … erformance

The result?

303.76GB/s read, 227.37 GB/s write, and 257.82 GB/s copy speed in the AIDA64 memory bandwidth benchmark, as seen in the screenshot below:

https://www.gskill.com/img/pr/2023.02.23-zeta-r5-rdimm-ddr5-announce/04-zeta-r5-rdimm-ddr5-6400-c32-16gbx8-bandwidth.png
Source: https://www.gskill.com/community/150223 … erformance

Of course, the Intel Xeon W9-3495X is completely out of my reach....

The cheapest unlocked Intel Xeon would be the W5-2455X at a suggested price of $1039: https://www.anandtech.com/show/18741/in … -5-0-lanes Should be enough. If the current dual-channel DDR5-6000 allows x3 interpolation then the quad-channel DDR5-6400 should be enough for x5 real-time interpolation.

I am looking for someone who has an NVIDIA GeForce RTX 4090 graphics card, at least DDR5-6000 memory and some spare time to test how RIFE real-time interpolation scales at different RAM speeds.

Alternatively, someone who would like to build an HTPC based on quad-channel or octa-channel DDR5-6400 R-DIMM memory. Octa-channel is for the 4K 240Hz screens that will soon go on sale big_smile

I am using 12900K at 5.2 ghz all core overclock and I overclocked my ram from 6000mhz 40 40 40 xmp 3 profile to 6400 36 38 38 after reading your text. I was using RIFE tensor rt with 3040x1710 downscaling for 60 fps interpolation, after overclocking the RAM, with my rtx 4090 and hags off, I can interpolate 3840x2160 60 fps with no frame drops! Thanks for the heads up.

Wow, I turned off HAGS now and 4k 60 FPS runs like a breeze. Thank you guys, I never thought that would be the reason. God damn HAGS big_smile

I have a question regarding the bottleneck when using rife tensorRT engine on higher resolutions. I have 12900K oc to 5.2ghz all cores, 6000mhz DDR5 32gb and RTX 4090. I play 1800p rife on svp with madvr on mpc-hc 4k UHD files which have bitrate around 70-100 mbps. CPU usage is around 20-30%, GPU usage around 50%, total ram usage is around 20gb at 60fps interpolation. If I try higher resolutions, then the framerate cannot keep up anymore but the CPU GPU usage are not so different than before. Can it be the RAM speed or maybe tensor cores are full on 4090 or is there some other reason? I am absolutely fine with playing videos on 1800p, I see no difference with native 4k and 1800p even on 160 inch screen with projector but still I am curious about what is causing this bottleneck.

I just checked now. It is possible to play 4k UHD movies in MPC HC with 12900K and 4090, with x2 Movie setting in Tensor without hiccups. 60 fps is too heavy, maybe in the future with Chainik's new cool updates smile However x2 Movie with Tensor is way better than regular interpolation, so I will use it for every case from now on. I had lost my hope long time ago to have rife in 4k, this update was really revolutionary for me, great job SVP Team!

Actually after some experiments, I realized that, thanks to dlr5668 for pointing me out, from svp settings I downscaled the image to 3200x1800. Now I can play 60fps fixed screen refresh rate. Then I can use madvr image upscaling NGU Sharp to upscale image back to 4k. It works super fluid now.

I got the idea from using Nvidia DLSS for games. Rendering at lower resolution and then upscaling at the end big_smile

Chainik wrote:

Performance boost: TensorRT only. Increase performance up to ~5% and lower memory usage in exchange for running a 5-10-minutes optimization pass in a command-line window for _every_ new video resolution. Previous runs are cached. For example, you're opening a 1080p video for the first time, you'll wait 5+ mins. Every other 1080p videos will start instantly, but when the frame size changes due to, for example, black bars cut off, you'll wait another 5+ mins.

With "perf boost" off there's only ONE optimization run for all resolutions below 1440p.


Oh I see, that's great to know that it is being cached for each resolution. I had no issues using vulkan for 1080p but at 2160p it was failing miserably, everything was slow motion. I am mostly excited to use TensorRT for 4k 2160p videos. So I won't use the performance boost mode in the beginning and wait until it is finished then. You wrote that until 1440p, it is only doing one pass without perf boost. How is the situation for 2160p videos?
Also could you send me the documentation link for TensorRT if there are other things to know? I am really excited for this update, thanks a lot for this great achievement Chainik!

Which main thread exactly dlr5668? I would like to try 1800p if I can do it real time without manually downscaling the video.

I see Chainik, so as I understand, tensorRT is not working instantly real time like vulkan variant?

I have a high end computer with 12900K and rtx 4090. I was using ncnn/Vulkan with no issues with Vapoursynth. However I updated SVP today and tried the new NVIDIA TensorRT and when I activate it, command line appears on the screen and the video freezes as black screen. I waited 10 minutes but nothing happened. Then I tried to close the command line 3 times as it appears again, then MPC-HC wrote error message on the screen.

I checked the log in the error message and I am posting here what is in the log:

&&&& RUNNING TensorRT.trtexec [TensorRT v8501] # C:/Program Files (x86)/SVP 4/rife\vsmlrt-cuda\trtexec --onnx=C:/Program Files (x86)/SVP 4/rife\models\rife\rife_v4.6.onnx --timingCacheFile=C:\Users\onurco\AppData\Roaming\SVP4\cache\Program Files (x86)/SVP 4/rife\models\rife\rife_v4.6.onnx.1920x1088_fp16_trt-8502_cudnn_I-fp16_O-fp16_NVIDIA-GeForce-RTX-4090_3dcbe72f.engine.cache --device=0 --saveEngine=C:\Users\onurco\AppData\Roaming\SVP4\cache\Program Files (x86)/SVP 4/rife\models\rife\rife_v4.6.onnx.1920x1088_fp16_trt-8502_cudnn_I-fp16_O-fp16_NVIDIA-GeForce-RTX-4090_3dcbe72f.engine --shapes=input:1x11x1088x1920 --fp16 --tacticSources=-CUBLAS,-CUBLAS_LT --useCudaGraph --noDataTransfers --inputIOFormats=fp16:chw --outputIOFormats=fp16:chw
[01/19/2023-02:27:19] [i] === Model Options ===
[01/19/2023-02:27:19] [i] Format: ONNX
[01/19/2023-02:27:19] [i] Model: C:/Program Files (x86)/SVP 4/rife\models\rife\rife_v4.6.onnx
[01/19/2023-02:27:19] [i] Output:
[01/19/2023-02:27:19] [i] === Build Options ===
[01/19/2023-02:27:19] [i] Max batch: explicit batch
[01/19/2023-02:27:19] [i] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default
[01/19/2023-02:27:19] [i] minTiming: 1
[01/19/2023-02:27:19] [i] avgTiming: 8
[01/19/2023-02:27:19] [i] Precision: FP32+FP16
[01/19/2023-02:27:19] [i] LayerPrecisions: 
[01/19/2023-02:27:19] [i] Calibration: 
[01/19/2023-02:27:19] [i] Refit: Disabled
[01/19/2023-02:27:19] [i] Sparsity: Disabled
[01/19/2023-02:27:19] [i] Safe mode: Disabled
[01/19/2023-02:27:19] [i] DirectIO mode: Disabled
[01/19/2023-02:27:19] [i] Restricted mode: Disabled
[01/19/2023-02:27:19] [i] Build only: Disabled
[01/19/2023-02:27:19] [i] Save engine: C:\Users\onurco\AppData\Roaming\SVP4\cache\Program Files (x86)/SVP 4/rife\models\rife\rife_v4.6.onnx.1920x1088_fp16_trt-8502_cudnn_I-fp16_O-fp16_NVIDIA-GeForce-RTX-4090_3dcbe72f.engine
[01/19/2023-02:27:19] [i] Load engine: 
[01/19/2023-02:27:19] [i] Profiling verbosity: 0
[01/19/2023-02:27:19] [i] Tactic sources: cublas [OFF], cublasLt [OFF], 
[01/19/2023-02:27:19] [i] timingCacheMode: global
[01/19/2023-02:27:19] [i] timingCacheFile: C:\Users\onurco\AppData\Roaming\SVP4\cache\Program Files (x86)/SVP 4/rife\models\rife\rife_v4.6.onnx.1920x1088_fp16_trt-8502_cudnn_I-fp16_O-fp16_NVIDIA-GeForce-RTX-4090_3dcbe72f.engine.cache
[01/19/2023-02:27:19] [i] Heuristic: Disabled
[01/19/2023-02:27:19] [i] Preview Features: Use default preview flags.
[01/19/2023-02:27:19] [i] Input(s): fp16:chw
[01/19/2023-02:27:19] [i] Output(s): fp16:chw
[01/19/2023-02:27:19] [i] Input build shape: input=1x11x1088x1920+1x11x1088x1920+1x11x1088x1920
[01/19/2023-02:27:19] [i] Input calibration shapes: model
[01/19/2023-02:27:19] [i] === System Options ===
[01/19/2023-02:27:19] [i] Device: 0
[01/19/2023-02:27:19] [i] DLACore: 
[01/19/2023-02:27:19] [i] Plugins:
[01/19/2023-02:27:19] [i] === Inference Options ===
[01/19/2023-02:27:19] [i] Batch: Explicit
[01/19/2023-02:27:19] [i] Input inference shape: input=1x11x1088x1920
[01/19/2023-02:27:19] [i] Iterations: 10
[01/19/2023-02:27:19] [i] Duration: 3s (+ 200ms warm up)
[01/19/2023-02:27:19] [i] Sleep time: 0ms
[01/19/2023-02:27:19] [i] Idle time: 0ms
[01/19/2023-02:27:19] [i] Streams: 1
[01/19/2023-02:27:19] [i] ExposeDMA: Disabled
[01/19/2023-02:27:19] [i] Data transfers: Disabled
[01/19/2023-02:27:19] [i] Spin-wait: Disabled
[01/19/2023-02:27:19] [i] Multithreading: Disabled
[01/19/2023-02:27:19] [i] CUDA Graph: Enabled
[01/19/2023-02:27:19] [i] Separate profiling: Disabled
[01/19/2023-02:27:19] [i] Time Deserialize: Disabled
[01/19/2023-02:27:19] [i] Time Refit: Disabled
[01/19/2023-02:27:19] [i] NVTX verbosity: 0
[01/19/2023-02:27:19] [i] Persistent Cache Ratio: 0
[01/19/2023-02:27:19] [i] Inputs:
[01/19/2023-02:27:19] [i] === Reporting Options ===
[01/19/2023-02:27:19] [i] Verbose: Disabled
[01/19/2023-02:27:19] [i] Averages: 10 inferences
[01/19/2023-02:27:19] [i] Percentiles: 90,95,99
[01/19/2023-02:27:19] [i] Dump refittable layers:Disabled
[01/19/2023-02:27:19] [i] Dump output: Disabled
[01/19/2023-02:27:19] [i] Profile: Disabled
[01/19/2023-02:27:19] [i] Export timing to JSON file: 
[01/19/2023-02:27:19] [i] Export output to JSON file: 
[01/19/2023-02:27:19] [i] Export profile to JSON file: 
[01/19/2023-02:27:19] [i] 
[01/19/2023-02:27:19] [i] === Device Information ===
[01/19/2023-02:27:19] [i] Selected Device: NVIDIA GeForce RTX 4090
[01/19/2023-02:27:19] [i] Compute Capability: 8.9
[01/19/2023-02:27:19] [i] SMs: 128
[01/19/2023-02:27:19] [i] Compute Clock Rate: 2.58 GHz
[01/19/2023-02:27:19] [i] Device Global Memory: 24563 MiB
[01/19/2023-02:27:19] [i] Shared Memory per SM: 100 KiB
[01/19/2023-02:27:19] [i] Memory Bus Width: 384 bits (ECC disabled)
[01/19/2023-02:27:19] [i] Memory Clock Rate: 10.501 GHz
[01/19/2023-02:27:19] [i] 
[01/19/2023-02:27:19] [i] TensorRT version: 8.5.1
[01/19/2023-02:27:20] [i] [TRT] [MemUsageChange] Init CUDA: CPU +436, GPU +0, now: CPU 13780, GPU 1771 (MiB)

My Vapoursynth version is Vapoursynth Filter v1.4.5 # svp with Vapoursynth R61 API R4.0
If anybody could point me about what to do, I would be really glad.

Also I am adding here what was written in the command line:

&&&& RUNNING TensorRT.trtexec [TensorRT v8501] # C:/Program Files (x86)/SVP 4/rife\vsmlrt-cuda\trtexec --onnx=C:/Program Files (x86)/SVP 4/rife\models\rife\rife_v4.6.onnx --timingCacheFile=C:\Users\onurco\AppData\Roaming\SVP4\cache\Program Files (x86)/SVP 4/rife\models\rife\rife_v4.6.onnx.1920x832_fp16_trt-8502_cudnn_I-fp16_O-fp16_NVIDIA-GeForce-RTX-4090_3dcbe72f.engine.cache --device=0 --saveEngine=C:\Users\onurco\AppData\Roaming\SVP4\cache\Program Files (x86)/SVP 4/rife\models\rife\rife_v4.6.onnx.1920x832_fp16_trt-8502_cudnn_I-fp16_O-fp16_NVIDIA-GeForce-RTX-4090_3dcbe72f.engine --shapes=input:1x11x832x1920 --fp16 --tacticSources=-CUBLAS,-CUBLAS_LT --useCudaGraph --noDataTransfers --inputIOFormats=fp16:chw --outputIOFormats=fp16:chw
[01/19/2023-02:47:37] [i] === Model Options ===
[01/19/2023-02:47:37] [i] Format: ONNX
[01/19/2023-02:47:37] [i] Model: C:/Program Files (x86)/SVP 4/rife\models\rife\rife_v4.6.onnx
[01/19/2023-02:47:37] [i] Output:
[01/19/2023-02:47:37] [i] === Build Options ===
[01/19/2023-02:47:37] [i] Max batch: explicit batch
[01/19/2023-02:47:37] [i] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default
[01/19/2023-02:47:37] [i] minTiming: 1
[01/19/2023-02:47:37] [i] avgTiming: 8
[01/19/2023-02:47:37] [i] Precision: FP32+FP16
[01/19/2023-02:47:37] [i] LayerPrecisions:
[01/19/2023-02:47:37] [i] Calibration:
[01/19/2023-02:47:37] [i] Refit: Disabled
[01/19/2023-02:47:37] [i] Sparsity: Disabled
[01/19/2023-02:47:37] [i] Safe mode: Disabled
[01/19/2023-02:47:37] [i] DirectIO mode: Disabled
[01/19/2023-02:47:37] [i] Restricted mode: Disabled
[01/19/2023-02:47:37] [i] Build only: Disabled
[01/19/2023-02:47:37] [i] Save engine: C:\Users\onurco\AppData\Roaming\SVP4\cache\Program Files (x86)/SVP 4/rife\models\rife\rife_v4.6.onnx.1920x832_fp16_trt-8502_cudnn_I-fp16_O-fp16_NVIDIA-GeForce-RTX-4090_3dcbe72f.engine
[01/19/2023-02:47:37] [i] Load engine:
[01/19/2023-02:47:37] [i] Profiling verbosity: 0
[01/19/2023-02:47:37] [i] Tactic sources: cublas [OFF], cublasLt [OFF],
[01/19/2023-02:47:37] [i] timingCacheMode: global
[01/19/2023-02:47:37] [i] timingCacheFile: C:\Users\onurco\AppData\Roaming\SVP4\cache\Program Files (x86)/SVP 4/rife\models\rife\rife_v4.6.onnx.1920x832_fp16_trt-8502_cudnn_I-fp16_O-fp16_NVIDIA-GeForce-RTX-4090_3dcbe72f.engine.cache
[01/19/2023-02:47:37] [i] Heuristic: Disabled
[01/19/2023-02:47:37] [i] Preview Features: Use default preview flags.
[01/19/2023-02:47:37] [i] Input(s): fp16:chw
[01/19/2023-02:47:37] [i] Output(s): fp16:chw
[01/19/2023-02:47:37] [i] Input build shape: input=1x11x832x1920+1x11x832x1920+1x11x832x1920
[01/19/2023-02:47:37] [i] Input calibration shapes: model
[01/19/2023-02:47:37] [i] === System Options ===
[01/19/2023-02:47:37] [i] Device: 0
[01/19/2023-02:47:37] [i] DLACore:
[01/19/2023-02:47:37] [i] Plugins:
[01/19/2023-02:47:37] [i] === Inference Options ===
[01/19/2023-02:47:37] [i] Batch: Explicit
[01/19/2023-02:47:37] [i] Input inference shape: input=1x11x832x1920
[01/19/2023-02:47:37] [i] Iterations: 10
[01/19/2023-02:47:37] [i] Duration: 3s (+ 200ms warm up)
[01/19/2023-02:47:37] [i] Sleep time: 0ms
[01/19/2023-02:47:37] [i] Idle time: 0ms
[01/19/2023-02:47:37] [i] Streams: 1
[01/19/2023-02:47:37] [i] ExposeDMA: Disabled
[01/19/2023-02:47:37] [i] Data transfers: Disabled
[01/19/2023-02:47:37] [i] Spin-wait: Disabled
[01/19/2023-02:47:37] [i] Multithreading: Disabled
[01/19/2023-02:47:37] [i] CUDA Graph: Enabled
[01/19/2023-02:47:37] [i] Separate profiling: Disabled
[01/19/2023-02:47:37] [i] Time Deserialize: Disabled
[01/19/2023-02:47:37] [i] Time Refit: Disabled
[01/19/2023-02:47:37] [i] NVTX verbosity: 0
[01/19/2023-02:47:37] [i] Persistent Cache Ratio: 0
[01/19/2023-02:47:37] [i] Inputs:
[01/19/2023-02:47:37] [i] === Reporting Options ===
[01/19/2023-02:47:37] [i] Verbose: Disabled
[01/19/2023-02:47:37] [i] Averages: 10 inferences
[01/19/2023-02:47:37] [i] Percentiles: 90,95,99
[01/19/2023-02:47:37] [i] Dump refittable layers:Disabled
[01/19/2023-02:47:37] [i] Dump output: Disabled
[01/19/2023-02:47:37] [i] Profile: Disabled
[01/19/2023-02:47:37] [i] Export timing to JSON file:
[01/19/2023-02:47:37] [i] Export output to JSON file:
[01/19/2023-02:47:37] [i] Export profile to JSON file:
[01/19/2023-02:47:37] [i]
[01/19/2023-02:47:37] [i] === Device Information ===
[01/19/2023-02:47:37] [i] Selected Device: NVIDIA GeForce RTX 4090
[01/19/2023-02:47:37] [i] Compute Capability: 8.9
[01/19/2023-02:47:37] [i] SMs: 128
[01/19/2023-02:47:37] [i] Compute Clock Rate: 2.58 GHz
[01/19/2023-02:47:37] [i] Device Global Memory: 24563 MiB
[01/19/2023-02:47:37] [i] Shared Memory per SM: 100 KiB
[01/19/2023-02:47:37] [i] Memory Bus Width: 384 bits (ECC disabled)
[01/19/2023-02:47:37] [i] Memory Clock Rate: 10.501 GHz
[01/19/2023-02:47:37] [i]
[01/19/2023-02:47:37] [i] TensorRT version: 8.5.1
[01/19/2023-02:47:38] [i] [TRT] [MemUsageChange] Init CUDA: CPU +448, GPU +0, now: CPU 13279, GPU 1771 (MiB)
[01/19/2023-02:47:39] [i] [TRT] [MemUsageChange] Init builder kernel library: CPU +430, GPU +116, now: CPU 14166, GPU 1887 (MiB)
[01/19/2023-02:47:39] [W] [TRT] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage. See `CUDA_MODULE_LOADING` in https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#env-vars
[01/19/2023-02:47:39] [i] Start parsing network model
[01/19/2023-02:47:39] [i] [TRT] ----------------------------------------------------------------
[01/19/2023-02:47:39] [i] [TRT] Input filename:   C:/Program Files (x86)/SVP 4/rife\models\rife\rife_v4.6.onnx
[01/19/2023-02:47:39] [i] [TRT] ONNX IR version:  0.0.8
[01/19/2023-02:47:39] [i] [TRT] Opset version:    16
[01/19/2023-02:47:39] [i] [TRT] Producer name:    pytorch
[01/19/2023-02:47:39] [i] [TRT] Producer version: 1.12.0
[01/19/2023-02:47:39] [i] [TRT] Domain:
[01/19/2023-02:47:39] [i] [TRT] Model version:    0
[01/19/2023-02:47:39] [i] [TRT] Doc string:
[01/19/2023-02:47:39] [i] [TRT] ----------------------------------------------------------------
[01/19/2023-02:47:39] [W] [TRT] onnx2trt_utils.cpp:377: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[01/19/2023-02:47:39] [i] Finish parsing network model
[01/19/2023-02:47:39] [W] Could not read timing cache from: C:\Users\onurco\AppData\Roaming\SVP4\cache\Program Files (x86)/SVP 4/rife\models\rife\rife_v4.6.onnx.1920x832_fp16_trt-8502_cudnn_I-fp16_O-fp16_NVIDIA-GeForce-RTX-4090_3dcbe72f.engine.cache. A new timing cache will be generated and written.
[01/19/2023-02:47:40] [i] [TRT] [MemUsageChange] Init cuDNN: CPU +1083, GPU +406, now: CPU 14928, GPU 2293 (MiB)
[01/19/2023-02:47:40] [i] [TRT] Global timing cache in use. Profiling results in this builder pass will be stored.