Review of the Nvidia GeForce RTX 4080 Super video accelerator based on the Gigabyte Aorus GeForce RTX 4080 Super Master 16G (16 GB) card

General information about GeForce RTX 4080 Super

Nvidia unveiled its latest series of graphics cards in the fall of 2022, led by the top-of-the-line GeForce RTX 4090, followed by the GeForce RTX 4080 and GeForce RTX 4070 in the spring of 2023. The Ada Lovelace architecture gives these cards outstanding performance in graphics and computing tasks, cementing their position as the most powerful on the market. Ada Lovelace GPUs outperformed the previous generation in both rasterization and ray tracing performance. One of the key innovations was DLSS 3 technology, which provided an additional increase in frame rate by generating additional frames based on already created ones.

The GeForce RTX 4080 stood out as one of the fastest graphics cards for gaming enthusiasts, but the market was expecting changes in the form of more affordable prices and additional performance gains. Typically, Nvidia introduces the Super line to meet these expectations, improving the performance of the base models in terms of speed and/or price. At CES 2024, the company introduced three new models in the GeForce RTX 40 line with Super suffixes: GeForce RTX 4070 Super, GeForce RTX 4070 Ti Super and GeForce RTX 4080 Super.

New video cards were presented by Nvidia to refresh the line before the expected release of the full next generation, planned no earlier than the fall. The GeForce RTX 4070 Super was the first to appear on the market, then the GeForce RTX 4070 Ti Super, and the GeForce RTX 4080 Super, released at the end of January, completed the trio. Configuration and pricing details for these new Super models were previously revealed on our website, and today we'll take a closer look at the GeForce RTX 4080 Super.

Based on the specifications, the GeForce RTX 4080 Super does not offer a significant increase in performance. This new product stands out mainly due to its lower price compared to the base GeForce RTX 4080 model, while maintaining almost the same GPU characteristics and video memory. The recommended retail price has become noticeably more affordable, it seems even closer to reducing the price of the GeForce RTX 4080 model than to introducing a completely new, noticeably more powerful version of the video card. Despite this, the new product should offer a better price-performance ratio compared to its predecessor. According to Nvidia, the GeForce RTX 4080 Super performs approximately twice as fast as the previous generation GeForce RTX 3080 Ti, which is a very impressive result.

This graphics card is designed for enthusiasts looking to take advantage of the new architecture and high performance, but at a more affordable price compared to the GeForce RTX 4080. The model announced today is designed for use at the highest resolutions and maximum graphics settings, including ray tracing. The new product promises to provide high performance in any games, even in the most demanding projects with advanced graphics and ray tracing, using DLSS technology — without it, even the GeForce RTX 4090 may not be enough.

The Ada Lovelace architecture, although new, has many similarities with the previous Ampere architecture, which in turn inherits features from the Turing and Volta architectures.

GeForce RTX 4080 Super graphics accelerator
Chip code name	AD103
Production technology	5nm (TSMC 4N)
Number of transistors	45.9 billion
Core area	378.6 mm²
Architecture	unified, with an array of processors for stream processing of any type of data: vertices, pixels, etc.
DirectX hardware support	DirectX 12 Ultimate, supporting Feature Level 12_2
Memory bus	256-bit: 8 independent 32-bit memory controllers supporting GDDR6X memory
GPU frequency	up to 2550 MHz
Computing blocks	80 streaming multiprocessors, including 10240 CUDA cores for INT32 integer and FP16/FP32/FP64 floating point calculations
Tensor blocks	320 tensor cores for matrix calculations INT4/INT8/FP16/FP32/BF16/TF32
Ray tracing blocks	80 RT cores for calculating the intersection of rays with triangles and BVH bounding volumes
Texturing blocks	320 texture addressing and filtering units with support for FP16/FP32 components and support for trilinear and anisotropic filtering for all texture formats
Raster Operation Blocks (ROPs)	14 wide ROP blocks of 112 pixels with support for various anti-aliasing modes, including programmable and for FP16/FP32 frame buffer formats
Monitor support	HDMI 2.1 and DisplayPort 1.4a support (with DSC 1.2a compression)

GeForce RTX 4080 Super graphics card specifications
Core frequency	2295/2550 MHz
Number of universal processors	10240
Number of texture blocks	320
Number of blending blocks	112
Effective memory frequency	23 GHz
Memory type	GDDR6X
Memory bus	256 pages
Memory	16 GB
Memory Bandwidth	736 GB/s
Compute Performance (FP32)	up to 52.2 teraflops
Theoretical maximum fill rate	286 gigapixels/s
Theoretical texture sampling rate	816 gigatexels/s
Sheena	PCI Express 4.0 x16
Connectors	according to the manufacturer's choice
Energy consumption	up to 320 W
Additional food	one 16-pin connector
Number of slots occupied in the system case	according to the manufacturer's choice

Obviously, a comparison between the GeForce RTX 4080 Super and the GeForce RTX 4090 is inappropriate, since the new product is significantly inferior to the flagship solution in terms of the number of functional units, video memory and other parameters. The AD102 used in the GeForce RTX 4090 is noticeably larger and more powerful than the AD103 used in the GeForce RTX 4080 Super. If the GeForce RTX 4090 is an uncompromising flagship, the GeForce RTX 4080 (Super) is intended for a wider range of users. Compared to the base GeForce RTX 4080 model, the Super model in question has a slight increase in execution units operating at a slightly increased frequency, which can provide a small but still performance increase. This increase is estimated at a few percent, and although the GeForce RTX 4080 Super will not significantly outperform the GeForce RTX 4080, it does provide a definite improvement.

This refreshing of the graphics card line between generations benefits both Nvidia itself (by stimulating the market and motivating those waiting for better offers) and its partners producing video cards. Now they can offer not just overclocked old models, but new ones. There are more attractive price offers, and Super graphics cards are most likely aimed at those who have legacy GPUs that clearly do not belong to the GeForce RTX 40 family. These owners of older solutions may be interested in upgrading, especially given the more attractive Super options.

While the GeForce RTX 4080 Super doesn't offer significant performance gains over the GeForce RTX 4080, its lower MSRP makes it attractive. The Super model can be seen as a more affordable version of the base model. If we compare the GeForce RTX 4080 Super with its predecessor from the GeForce RTX 30 family, the new product significantly exceeds its performance, even in comparison with the top-end GeForce RTX 3090 Ti. At the same time, the new GPU consumes much less energy. The main competitor of the new product from AMD is the Radeon RX 7900 XTX, which is close to the GeForce RTX 4080 in performance without ray tracing, and the Super model should strengthen Nvidia’s position in this price range, especially with the active use of hardware ray tracing.

According to Nvidia's own tests, the GeForce RTX 4080 Super is significantly superior to its counterparts from the two previous (full) generations, and for obvious reasons, comparisons with the GeForce RTX 4080 are not made due to minor differences. Analyzing the chart, you can see that most of the games tested use DLSS 3 frame generation technology, which may cause mixed reactions from some players. Overall, the GeForce RTX 4080 Super is one and a half times faster than the GeForce RTX 3080 Ti, and when using frame generation, the difference doubles or more. At the same time, the GeForce RTX 2080 Super lags significantly, especially in modern games with active ray tracing.

The GeForce RTX 4080 Super is also highly effective in professional tasks related to digital content creation. The graphics card has sufficient performance, tensor cores accelerate AI-based tools, and hardware ray tracing units work effectively in 3D packages and engines such as Blender Cycles, Redshift, V-Ray, Octane and others. This allows you to speed up the rendering of complex scenes and increase the efficiency of your projects. According to test results, the GeForce RTX 4080 Super outperforms the GeForce RTX 3080 Ti in these tasks by one and a half times, which is a very significant achievement.

The capabilities of the GeForce RTX 4080 Super in the field of video processing are similar to the functionality of the flagship model. The eighth generation dedicated NVEnc hardware encoder now supports AV1 video encoding. The AV1 encoder in the Ada architecture is 40% to 50% more efficient than the H.264 encoder used in previous generation GPUs. As a result, the new AV1 format allows you to increase the resolution of the video stream when streaming from 1080p to 1440p at the same bitrate.

It is noticeable that almost all Ada GPUs are equipped with two NVEnc hardware encoders, which makes it possible to encode video data at 8K resolution at 60 FPS or process four 4K video streams at 60 FPS at once. At the same time, the processing speed is more than double that of the GeForce RTX 3080 Ti, which represents a significant benefit in the field of efficient video processing.

Ada architecture GPUs integrate the fifth-generation NVDec hardware decoder, which first appeared in the Ampere architecture. This decoder provides hardware-accelerated decoding of video data in various formats, including MPEG-2, VC-1, H.264 (AVCHD), H.265 (HEVC), VP8, VP9 and AV1. Full support for video decoding in 8K resolution at 60 FPS is also provided. These technologies significantly expand the video processing capabilities of GPUs.

Thus, in addition to NVEnc hardware encoding, Ada architecture GPUs have a built-in fifth-generation NVDec hardware decoder, which provides high efficiency in processing video data and supports a wide range of formats and resolutions, including support for 8K at 60 FPS.

Video card Gigabyte Aorus GeForce RTX 4080 Super Master 16G 16 GB

Gigabyte Technology, a brand of Gigabyte, was founded in 1986 in the Republic of China, with headquarters in Taipei, Taiwan. It initially started as a group of developers and researchers. In 2004, on the basis of this company, the Gigabyte holding was created, which united several areas, such as Gigabyte Technology (specializing in the development and production of video cards and motherboards for PCs) and Gigabyte Communications (a branch engaged in the production of communicators and smartphones under the GSmart brand since 2006 of the year).

The object of the study is the commercially produced Gigabyte Aorus GeForce RTX 4080 Super Master 16G graphics accelerator with 16 GB of GDDR6X memory and a 256-bit data bus.

Gigabyte Aorus GeForce RTX 4080 Super Master 16G 16GB 256-bit GDDR6X
Parameter	Meaning	Nominal value (reference)
GPU	GeForce RTX 4080 Super (AD103)
Interface	PCI Express x16 4.0
GPU operating frequency (ROPs), MHz	BIOS OC: 2625(Boost)—2775(Max) BIOS Silent: 2625(Boost)—2775(Max)	2550(Boost)—2705(Max)
Memory operating frequency (physical (effective)), MHz	2875 (23000)	2875 (23000)
Memory bus width, bits	256
Number of computational units in the GPU	80
Number of operations (ALU/CUDA) in block	128
Total number of ALU/CUDA blocks	10240
Number of texturing units (BLF/TLF/ANIS)	320
Number of rasterization units (ROP)	112
Number of Ray Tracing blocks	80
Number of tensor blocks	320
Dimensions, mm	355×165×75	310×130×70
Number of slots in the system unit occupied by a video card	4	4
PCB color	black	black
Peak power consumption in 3D, W (BIOS OC/BIOS Silent)	310/310	320
Power consumption in 2D mode, W	42	42
Energy consumption in sleep mode, W	eleven	eleven
Noise level in 3D (maximum load), dBA (BIOS OC/BIOS Silent)	30.2/29.2	32.0
Noise level in 2D (video viewing), dBA	18.0	18.0
Noise level in 2D (idle), dBA	18.0	18.0
Video outputs	1×HDMI 2.1, 3×DisplayPort 1.4a	1×HDMI 2.1, 3×DisplayPort 1.4a
Multiprocessing support	No
Maximum number of receivers/monitors for simultaneous image output	4	4
Power: 8-pin connectors	0	0
Power: 6-pin connectors	0	0
Power: 16-pin connectors	1	1
Weight of the card with delivery set (gross), kg	3.34	3.0
Card weight (net), kg	2.43	2.2
Maximum resolution/frequency, DisplayPort	3840×2160@144 Hz, 7680×4320@60 Hz
Maximum resolution/frequency, HDMI	3840×2160@144 Hz, 7680×4320@60 Hz

Memory

The card has 16 GB of GDDR6X SDRAM memory, located in 8 16 Gbit chips on the front side of the PCB. Micron memory chips (MT61K512M32KPA-24 / D8BZF) are designed for a nominal operating frequency of 3000 (24000) MHz.

Card features and comparison with Gigabyte GeForce RTX 4080 Gaming OC 16G (16 GB)

Gigabyte Aorus GeForce RTX 4080 Super Master 16G (16 GB) front view

Gigabyte GeForce RTX 4080 Gaming OC 16G (16 GB) front view

Gigabyte Aorus GeForce RTX 4080 Super Master 16G (16 GB) rear view

Gigabyte GeForce RTX 4080 Gaming OC 16G (16 GB) rear view

The core of this video card is marked AD103-400, which differs from the options on the GeForce RTX 4080 marked “-300/301”. The kernel release date is week 44 of 2023.

The total number of power phases on the Gigabyte GeForce RTX 4080 Gaming OC 16G (16 GB) card is 21, while our Gigabyte Aorus GeForce RTX 4080 Super Master 16G (16 GB) video card has 23 phases.

It is important to note that the phase distribution in these models differs. The Gigabyte GeForce RTX 4080 Gaming OC 16G (16 GB) has 18 phases for the core and 3 for memory chips. And on the Gigabyte Aorus GeForce RTX 4080 Super Master 16G (16 GB) card, the phases are distributed in a ratio of 20 to 3, allocated for the core and memory chips, respectively.

The core power supply circuit is marked in green, and the memory in red. All controllers are located on the reverse side of the PCB. The core power phases are controlled by the uP9512R PWM controller (maximum 8-12 phases, depending on the modification).

Obviously, the 20 core power phases operate in parallel.

The three-phase power supply circuit for the memory chips is managed by another PWM controller from the same company, uPI Semi — uP9529Q (up to 3 phases).

The power converter, traditionally for all Nvidia video cards, uses DrMOS transistor assemblies — in this case SiC653A (Vishay), each of which is rated at a maximum of 50 A.

Also on the back of the card there is a uS5650Q (uPI Semi) controller, which is responsible for monitoring the card (monitoring voltages and temperatures).

Backlight control is traditionally assigned to the Holtek controller.

The printed circuit board is dust and moisture protected.

This card has two operating modes, which are determined by two BIOS options. Switching between them is carried out using a special switch located on the top of the card: OC and Silent. The main difference between these modes is mainly the fan speed. However, the power consumption limit for both BIOS versions is limited equally at 350 W.

The standard memory frequencies correspond to the reference values, however, both for the Boost mode and for the maximum core frequencies in both BIOS versions they exceed the reference values by 2.2%-3%. As a result, real-world performance gains in games compared to reference models are limited to 2%.

The power consumption of the Gigabyte card in tests of both BIOS versions reaches 310 W, with a peak value of 339 W.

When performing manual overclocking with an increase in the consumption limit to 133%, maximum frequencies of 2880/24920 MHz were reached. However, even with such an increase in the consumption limit, the performance increase in games at 4K resolution was only 5.3% relative to the reference values, since Nvidia drivers do not allow a significant increase in the actual consumption limit. The power consumption of the card increased to 318 W.

Power for the Gigabyte card is provided via a 16-pin PCIe 5.0 connector.

There is an indicator next to the power connector on the board. A lit LED indicates a power failure (the cable is not inserted completely or there is a problem with the power supply).

Let us note the decent dimensions of this card, especially in thickness: about 7.5 cm. As a result, the video card occupies 4 slots in the system unit.

The GeForce RTX 4080, like the GeForce RTX 4090 (unlike the GeForce RTX 3090/Ti), does not support multi-graphics configuration via SLI technology, and is not equipped with a special connector at the top end.

In terms of video outputs, the card is equipped with a standard set of three DP 1.4a ports and one HDMI 2.1.

The video card's operating parameters are controlled using the proprietary Gigabyte Center utility, which was already mentioned earlier. This software provides the ability to control fans, adjust card frequencies, control core voltage, and monitor the status of the video card.

Heating and cooling

The basis of the cooling system of the Gigabyte card is a massive sectioned plate radiator made of nickel-plated material with integrated heat pipes. These tubes effectively distribute heat across the surface of the radiator fins, providing high efficiency in dissipating heat from heated components. The heat pipes are connected to an extensive copper base, underneath which is a vapor chamber.

This design demonstrates outstanding heat dissipation efficiency, which is an important factor in maintaining low graphics card operating temperatures under high loads.

The memory chips are cooled using the same extensive copper base and thermal pads. Additionally, for efficient cooling of the VRM power converters, a separate soleplate is provided, which is also integrated into the same heatsink.

The back plate of the video card does not participate in cooling the back side of the board, but rather serves as an element of protection and strengthening the rigidity of the PCB. In addition, an illuminated brand logo is located on this plate, giving additional style to the video card.

The radiator fins (through one) have bevels at the edges for greater efficiency in the passage of air inside.

A casing with three ∅95 mm fans with grooved blades is installed on top of the radiator.

The technology of rotating the central fan in the opposite direction is no longer a novelty; this operating concept was also used in many models of the GeForce RTX 30 series. In theory, it helps reduce air flow turbulence, which leads to more efficient cooling of all existing controllers and chips on the PCB.

Fan cooling at minimal load on the video card is activated when the GPU temperature drops below 50 degrees and the temperature of the memory chips below 80 degrees. When you start the computer, the fans start working, but after loading the video driver, the current temperature is monitored, and if it drops below the specified values, the fans are automatically turned off. A video describing this feature is available.

Temperature monitoring

BIOS OS mode:

After two hours of testing under maximum load of the video card, the maximum core temperature did not exceed 60 °C, and the temperature of the memory chips was 62 °C. These figures are excellent results for high-end video cards. The power consumption of the card reached up to 310 W. It is important to note that the safe temperature limit for GDDR6X memory chips is 105°C. The maximum GPU hotspot temperature is limited to 70°C.

The maximum heating is near the GPU and at the power connector.

We filmed 9 minutes of heating and sped it up 50 times.

During manual overclocking (BIOS OC mode) with the consumption limit set at 133%, the heating and noise parameters changed little. Despite raising the limit, Nvidia drivers limited consumption, so GPU operating frequencies in gaming tests did not rise above 2880 MHz.

BIOS Silent Mode:

In this case, the card’s operating parameters remained virtually unchanged: the maximum temperature of the core was 60 degrees, the maximum temperature of the memory chips was 62°C. The card's power consumption was the same 310 W. The maximum GPU hotspot temperature is 69 °C.

Noise

Noise level measurements were carried out in a specially prepared room with minimal reverberation and sound insulation. The system unit, which was idle, did not contain fans and did not generate mechanical noise. The background noise level was 18 dBA, including the sound level meter.

The measurements were taken at a distance of 50 cm from the video card, at the level of the cooling system. Measurement modes included:

Idle mode in 2D: an Internet browser with the site iXBT.com, a Microsoft Word window, and several Internet communicators are launched.
2D mode with movie viewing: SmoothVideo Project (SVP) was used with hardware decoding and insertion of intermediate frames.
3D mode with maximum load on the accelerator: the FurMark test was used.

The noise level was assessed as follows:

less than 20 dBA: relatively silent
20 to 25 dBA: very quiet
from 25 to 30 dBA: quiet
from 30 to 35 dBA: clearly audible
from 35 to 40 dBA: loud, but tolerable
above 40 dBA: very loud

At idle in 2D, in both BIOS modes, the temperature did not exceed 33 °C, the fans did not activate, and the noise level remained at a background level of 18 dBA. When watching a movie with hardware decoding, the situation also did not change.

BIOS OS mode:

At maximum load in 3D, temperatures reached 60/70/62 °C (core/hot spot/memory). At the same time, the fans spun up to 1320 rpm, the noise increased to 30.2 dBA: this is on the verge of distinct audibility.

The noise spectrogram shows that there were no annoying peaks during the study.

BIOS Silent Mode:

In fact, there are almost no differences between the Silent version and the OC version. At maximum load in 3D, temperatures reached 60/69/62 °C (core/hot spot/memory). At the same time, the fans spun up to 1176 rpm, the noise increased to 29.2 dBA: this is quiet.

Backlight

The brand logos on the ends and on the back of the card are illuminated, and there is also lighting along the edges of the fan blades. The backlight is activated only when the fans are rotating; at rest there is no glow.

Management is traditionally carried out through the Gigabyte Control Center utility. It is worth noting the interesting glow mode at the edges of the blades, when the speed of color change corresponds to the fan speed.

It is possible to save the selected mode in the card itself, that is, if you wish, you can configure the backlight once and not run the program again.

The card also has a small LCD screen on the top end, where you can display preset animations, map monitoring data, custom labels, pictures and animations in GIF format.

This screen is also controlled using Gigabyte Control Center.

Delivery and packaging

In addition to the traditional quick user manual, the package also includes a dismountable metal stand-bracket for the card with a set of fasteners and a power adapter.

The stand is not the usual one in the form of a tripod, and not even in the form of the card support bracket previously used by many Nvidia partners.

This is also a bracket, but it is mounted on the case wall either outside the motherboard (in the case of an ATX form factor model) or above it (in the case of an E-ATX form factor), using the supplied mounting kit.

Testing: synthetic tests

We tested the updated Nvidia graphics card in our standard synthetic benchmark suite, which is constantly updated with new tests added and outdated ones removed. We've now added several new benchmarks to measure the performance of ray tracing and resolution scaling technologies such as DLSS, FSR and XeSS. Semi-synthetic tests include subtests from the 3DMark package, such as Time Spy, Port Royal, DX Raytracing, Speed Way, etc.

We compared the performance of the following video cards:

GeForce RTX 4080 Super with standard parameters.
GeForce RTX 4090 with standard parameters.
GeForce RTX 4080 with standard parameters.
GeForce RTX 3090 Ti with standard parameters.
Radeon RX 7900 XTX with standard parameters.

For a more detailed analysis of the performance of the GeForce RTX 4080 Super video card, three video cards from the same company were also used: RTX 3090 Ti, RTX 4080 and RTX 4090. This allows us to evaluate how close the new model is to the top options and what its performance is in comparison with the previous generation. A comparison with the top-end AMD Radeon RX 7900 XTX video card gives us an understanding of the competitiveness of the new product on the market in a similar price category.

3DMark Vantage tests

DirectX 10-focused 3DMark Vantage feature tests provide an opportunity to look at aspects of performance that may be missed in more modern tests.

Feature Test 1: Texture Fill.

This test focuses on measuring the performance of texture fetch units. During the test, the rectangle is filled with values read from a small texture. This happens using multiple texture coordinates that change every frame. Analysis of the results of this test allows us to more deeply understand how the video card copes with texturing operations, in particular, with frequent changes in coordinates.

Even with its age, Feature Test 1 remains a useful tool for identifying subtle details in performance, making it an important component when testing new graphics cards.

The performance of AMD and Nvidia video cards in Futuremark's texture test is usually high, showing results close to theoretical parameters. However, sometimes slight underestimations are observed, especially in the case of some GPUs. The full version of the AD103 graphics processor showed fairly high performance in the texture test. If the RTX 4080 is already compared with the RX 7900 XTX, then the older version of the RTX 4090 is clearly superior, which is in line with expectations.

Feature Test 2: Color Fill

The second task is a fill rate test using a simple pixel shader that does not limit performance. In this test, the interpolated color value is written to an off-screen buffer using alpha blending. It is noteworthy that it uses a 16-bit off-screen buffer of the FP16 format, common in games with HDR rendering, which makes this test modern and relevant.

The results of the second subtest of 3DMark Vantage provide information about the performance of ROP units without taking into account video memory bandwidth. The test measures the performance of the ROP subsystem, and bandwidth typically does not have a significant impact on the results. In the case of the new GeForce RTX 4080 Super video card, similar to the base RTX 4080 model, the speed of the ROP subsystem is close, which makes the results comparable. Obviously, the top-end RTX 4090 demonstrates significantly higher performance of ROP units, thanks to a larger number of them.

Unfortunately, all Nvidia video cards in this test are inferior to AMD's flagship model, the Radeon RX 7900 XTX, which even outperforms the RTX 4090 in this task. Historically, GeForce video cards have often been out of competition in tests assessing peak scene fill rates, which is confirmed by the current comparative results.

Feature Test 3: Parallax Occlusion Mapping

Feature Test 3: Parallax Occlusion Mapping is one of the most interesting feature tests, since this technique has been used in games for a long time. This test uses the Parallax Occlusion Mapping method to simulate complex geometry. Resource-intensive ray tracing operations and work with a high-resolution depth map are involved. The test also includes shadowing using the Strauss algorithm. This test is a complex and GPU-intensive pixel shader test involving multiple ray tracing texture samples, dynamic branching, and complex Strauss lighting calculations.

The 3DMark Vantage benchmark, which evaluates the performance of physical interactions in GPU Cloth simulation, depends on several parameters simultaneously and is not limited only by the speed of mathematical calculations, the efficiency of branch execution, or the speed of texture fetching. This test highlights the importance of proper balance within the GPU, as well as the efficiency of running complex shaders. It is useful because the results often correlate with performance in gaming tests.

In the case of the new GeForce RTX 4080 Super model, the test results were close to the base RTX 4080 model, which is quite expected, given the small improvements in the new product. The RTX 4090 continues to be significantly more powerful, which is in line with expectations. The good news is that the main competitor of the new product, the Radeon RX 7900 XTX video card, outperforms both Nvidia models, but the gap is not so big, and the gap is narrowing slightly.

Feature Test 4: GPU Cloth

Feature Test 4: GPU Cloth is the fourth test that calculates the physical interactions of simulating cloth using the GPU. Vertex simulation is used, which includes the combined operation of vertex and geometry shaders with several passes. Stream out is used to transfer vertices between simulation passes, which allows you to evaluate the performance of vertex and geometry shaders, as well as the speed of stream out.

The rendering speed in this test depends on several parameters, primarily on geometry processing performance and the efficiency of geometry shaders. However, the results of this test became unreliable due to obvious problems with driver optimization on both Nvidia and AMD. Over time, AMD drivers also show poor results, and in general, all video cards show incorrect data in this test. This situation does not meet theoretical expectations, and the main reason is the lack of optimizations for the legacy test suite.

Feature Test 5: GPU Particles

Feature Test 5: GPU Particles is a physics simulation test of effects based on particle systems that are calculated using the GPU. The test uses a vertex simulation, where each vertex represents a separate particle. Stream out is used for the same purpose as in the previous test. The calculation of several hundred thousand particles, their individual animation, as well as collisions with a height map are being tested. Particle rendering uses a geometry shader that generates four vertices that form a particle from each point. Vertex calculations place a large load on shader units, and the efficiency of stream out is also checked.

As a result of the second geometry test from 3DMark Vantage, we received results that are very different from what was theoretically expected for the new video card. However, these results are closer to reality than in the previous test. In the case of the video card in question, if these results are considered correct, it demonstrated performance close to the base model, which can be explained by the fact that their frequencies are approximately the same and the number of blocks does not differ much. On the other hand, the competing Radeon RX 7900 XTX was inferior to all presented video cards in comparison, which can only be explained by ineffective driver optimization. In previous tests, the results were significantly higher for everyone.

Feature Test 6: Perlin Noise

Feature Test 6: Perlin Noise is the latest test from the Vantage suite and is a mathematically intensive GPU test. It calculates several octaves of Perlin noise in the pixel shader. Each color channel uses its own noise function, placing a high mathematical load on the video chip. Perlin noise is a standard algorithm widely used in procedural texturing that requires a lot of mathematical calculations.

In this mathematical test, the performance of all solutions, although not fully consistent with theoretical expectations, is usually close to the peak performance of video chips in extreme tasks. The test uses floating point operations, and the new Ada Lovelace and RDNA3 architectures were expected to reveal their unique capabilities in working with the corresponding dual execution instructions. However, the legacy nature of this benchmark may limit its ability to fully demonstrate the new capabilities of modern GPUs, as the comparative results show.

The super model of the GeForce RTX 40 family showed the expected results, being slightly ahead of the base model. The difference between them is not significant. Obviously, both GPUs are inferior to the more powerful RTX 4090 in all aspects of performance. Compared to the similarly priced competing Radeon RX 7900 XTX, they again show closeness, with the new GeForce showing a slight edge. We'll wait for results in more modern synthetic tests with increased GPU load.

Direct3D 12 tests

Examples from Microsoft's DirectX SDK and from AMD's SDK that use the Direct3D12 graphics API were excluded from our tests because they have long shown incorrect results in most cases. The only computational test with Direct3D12 support that remains in this section is the famous Time Spy benchmark from 3DMark. In this case, we are interested not only in a general comparison of GPU power, but also in the difference in performance with the asynchronous computing capabilities that appeared in DirectX 12 enabled and disabled. For reliability, we tested the video cards in two graphics tests.

If we consider the performance of the new GeForce RTX 4080 Super model in this task compared to a basic video card based on the same GPU, but slightly cut down, we can note that the new video card turned out to be quite a bit faster, which is in line with the theory — the difference between them cannot be significant. Compared to the RTX 4090, both cards are significantly behind, although the difference in this test is not so great.

In the case of the Radeon, performance generally looks slightly better in this test compared to competing GeForces at the same price, which is worth considering. This time, the performance of the GeForce RTX 4080 Super video card in question turned out to be slightly lower than a similarly priced competitor's solution — the Radeon RX 7900 XTX, which usually turns out to be a little slower. In real games, the results of this test are not always an accurate predictor of overall performance, so we'll look at them with the understanding that AMD's solution may show a slight advantage in rasterization tasks. Now let's move on to ray tracing tests, where the situation will be completely different.

Ray tracing tests

One of the first tests of ray tracing performance is the Port Royal benchmark from the creators of the famous 3DMark series tests. This test works on all GPUs that support the DirectX Raytracing API. We tested several video cards at a resolution of 2560x1440 at various settings, when reflections are calculated using ray tracing in two modes, as well as the traditional rasterization method.

The benchmark demonstrates several new aspects of using ray tracing through the DXR API, including algorithms for rendering reflections and shadows using ray tracing. Although the test is not perfectly optimized and poses a significant load on even powerful GPUs, it is suitable for comparing the performance of different video cards in this particular task.

The test results clearly highlight the differences in the approaches of AMD and Nvidia to integrating hardware acceleration of ray tracing. While the RDNA3 architecture is a slight improvement on AMD's offerings, the RTX 4080 Super delivers the expected performance, slightly ahead of the barebones RTX 4080. Both cards are significantly superior to the RTX 3090 Ti, which was recently top of the line. AMD's top model, the RX 7900 XTX, wasn't too bad in this test, beating the RTX 3090 Ti, although the RTX 4080 remains faster.

Later, another 3DMark subtest appeared, focused on testing the performance of ray tracing using DirectX Raytracing. Unlike the previous hybrid test, this one does not use rasterization at all, focusing solely on ray tracing. The scene in this benchmark is familiar from other 3DMark subtests and, being small, allows you to better evaluate the capabilities of new video cards, taking into account the possibility of placing the BVH structure in a large cache, which affects performance.

Of course, all GeForce video cards demonstrate noticeably higher performance in these conditions compared to Radeon. This is because Nvidia's dedicated RT cores do most of the work and have more versatility, without sacrificing performance when ray tracing is enabled as much as Ray Accelerator cores combined with a competitor's regular SIMD cores. In most ray tracing games, the load on the RT units is reduced, and the Radeon position becomes more competitive. However, in this test, Nvidia video cards continue to demonstrate a clear advantage.

The new RTX 4080 Super model is expected to outperform the base RTX 4080 model, and the difference between the two is more pronounced than expected. Probably, the speed increase in ray tracing tasks is slightly higher than in pure rasterization. Both cards based on the AD103 GPU, although far from the top AD102 model, are noticeably superior to the RTX 3090 Ti and the competitor's flagship. Radeon this time was inferior even to the video card of the previous generation, and the RTX 4080 Super is more than one and a half times faster. However, it is worth noting that this is a purely synthetic test, and results may vary in real games, especially those that actively use ray tracing, such as Portal RTX, Quake II RTX, Cyberpunk 2077, Alan Wake 2 and similar projects.

With the release of new generations of Nvidia and AMD GPUs in 2022, another test with a serious load on ray tracing has been added to the 3DMark suite — Speed Way. This test represents a more realistic load on various GPU units and is closer to the use cases for ray tracing in modern games.

Top GPUs demonstrate acceptable frame rates in both resolutions, and the difference between Radeon and GeForce video cards, although still noticeable, is decreasing. The only AMD video card in this comparison, the RTX 4080 Super, is no longer so much inferior to its price competitor. Regarding the pair of video cards on the AD103 chip, the difference between them turned out to be greater than expected. Under high load on the ray tracing units, the RTX 4080 Super showed a larger lead over the base model. As expected, the top-end RTX 4090 model turned out to be significantly more powerful.

Another interesting benchmark is Boundary, created on a real game engine with support for DXR and DLSS. This Chinese project is a GPU-intensive benchmark that makes heavy use of ray tracing for complex reflections, soft shadows and global illumination. It is important to note that DLSS technology cannot be used in Radeon tests.

Without the use of DLSS technology, even in Full HD resolution, only the most powerful video cards demonstrate stable performance. AMD's top model, although lagging behind all GeForces, including the previous generation RTX 3090 Ti, still provides more than 60 FPS. At 4K resolution without scaling, the picture remains playable only on the top-end RTX 4090 and to some extent on the RTX 4080, including the new Super model, but with minimal playability. The video card in question, although slightly superior to the base model, this time turns out to be close to it, and the main limiting factor is video memory bandwidth. The performance of the only Radeon video card in this comparison indicates that in ray tracing tests, AMD solutions cannot even compete with the competitor's outdated GPUs, let alone the new generation Ada.

In this benchmark, the updated model, built on the full version of the AD103 GPU, showed results consistent with expectations, slightly exceeding the level of the base RTX 4080 (despite the latter's strange failure in Full HD). A similar trend is visible in the previous diagram. The RTX 3090 Ti also performed well in this test, especially in high resolution, which may be due to the advantages of memory bandwidth or the amount of video memory that was more significant in the top-end GPU of the previous generation. In general, the results for the new Super model are pleasant — in ray tracing tests it added a few percentage points compared to the RTX 4080, strengthening Nvidia's position in the price range of around $1000.

Computational tests

We continue our research to include benchmarks that use OpenCL for current computing tasks in our suite of synthetic tests. At the moment in this section we are looking at a rather old and poorly optimized ray tracing test (not hardware) — LuxMark 3.1. This cross-platform benchmark is based on LuxRender and relies on OpenCL.

The new GeForce RTX 4080 Super model, based on the full version of the AD103 GPU, has a significant number of compute units and a slightly increased clock speed. As a result, it's no surprise that it easily outperforms the previous generation RTX 3090 Ti in this test. However, the improvement over the base RTX 4080 model is small, which is easily explained by theoretical assumptions. Compared to the competitor's top video card, the results of the new product in all subtests were higher than those of the best Radeon. In the most difficult subtest, the difference reached twofold, although in other cases it was not so great. As you'd expect, the RTX 4090 is ahead of all other competitors.

Let's also look at another GPU computing performance test — V-Ray Benchmark. This test, based on ray tracing without hardware acceleration, allows you to evaluate the capabilities of the GPU in complex calculations. In previous tests, we used different versions of the benchmark, which provide the result in the form of time spent on rendering.

In this test, focused on software ray tracing, the GeForce RTX 4080 Super again demonstrates a noticeable superiority over the RTX 3090 Ti. However, it could not beat the base model RTX 4080, their results are identical. Software ray tracing strongly depends on the speed of the cache memory and main video memory, and since the memory bandwidth parameters of both models are almost equal, the result was the same.

Let's move on to another rendering application — OctaneRender. Compatible with most 3D content creation applications, this popular renderer uses CUDA and RTX capabilities. OctaneRender 2020.1.5 now supports the Ampere architecture. The benchmark based on this renderer provides the ability to disable RTX acceleration and test performance in several test scenes with different loads. Unfortunately it doesn't support OpenCL. Let's give the total number of points:

The new GeForce RTX 4080 Super model is noticeably ahead of the previous version from the previous family, which was expected, especially with RTX hardware acceleration enabled. This improvement in results is particularly noticeable across all Nvidia GPUs, with the Ada Lovelace architecture showing significant improvements in ray tracing and compute. In computing tests, the new Super model, a representative of the Ada Lovelace architecture, demonstrated strong results, outperforming the base RTX 4080 more than expected. The RTX 3090 Ti was significantly inferior to the new product, lagging behind by almost one and a half times.

Maxon recently released a new version of Cinebench 2024, a popular benchmark for testing 3D rendering that provides an assessment of the hardware capabilities of the processor and video card. Cinebench 2024 is based on the Redshift rendering engine used in the 3D graphics and animation program Cinema 4D. This benchmark allows you to compare results between CPU and GPU using the same algorithms and scenes.

Today's new GeForce RTX 4080 Super is a slight improvement over the base model, which uses the same GPU but in a slightly stripped-down configuration. At the same time, it is also superior to the previous generation RTX 3090 Ti. The top model RTX 4090 leads the ranking, but the gap between it and the new product being reviewed today is not so great. Unlike GeForce, the competing Radeon RX 7900 XTX was unable to demonstrate strong results. Perhaps the increased instruction rate of the new RDNA3 graphics architecture did not work effectively in this test, and overall performance did not increase so much that the Radeon fell behind by almost half.

Tests of DLSS/XeSS/FSR technologies

In this section, we analyze additional tests related to performance enhancement technologies. Initially, we only considered resolution scaling technologies (DLSS 1.x and 2.x, FSR 1.0 and 2.0, XeSS), but then another innovation was added to them — the technology for generating intermediate frames, known as DLSS 3. To begin with, we introduced Our materials are a separate test of the second version of DLSS technology. While we've previously run tests using DLSS in ray tracing applications, we felt it was important to take additional measurements at 8K resolution. Let's look at the results of Nvidia's GPUs running at the highest 8K resolution using DLSS technology at various quality levels and both versions.

Without DLSS 2.0 enabled, rendering occurs at full resolution, which significantly impacts performance. Even the flagship RTX 4090 only achieves 13 FPS at 8K, which is considered insufficient. Both RTX 4080 models face a VRAM limitation, especially trailing the RTX 3090 Ti's 24GB. However, with DLSS enabled at maximum performance, the flagship RTX 4090 achieves comfortable frame rates, and the new RTX 4080 Super achieves a good 45 FPS. This may not be enough for the most comfortable gaming experience, but it is quite suitable for relaxed solo games.

Don't forget that all modern Nvidia video cards have one more trump card — support for DLSS 3.0 technology. In Ada Lovelace architecture video cards, this technology adds intermediate frame generation to the existing capabilities of DLSS 2.x. Enabling the generation of intermediate frames brings an increase in FPS by one and a half times, as the results show in practice:

With the inclusion of new technology, the GeForce RTX 4080 Super reaches 30 FPS in high-quality mode, providing minimal comfort, and in productive mode it reaches 60 FPS — thanks to the generation of intermediate frames. DLSS 3.0 significantly improves the smoothness of video footage, with a slight increase in control latency, leaving the choice to the user. This technology can be especially useful at high resolutions, where it provides playable frame rates. It is important to note that without frame generation, the frame rate should not drop below 30 FPS, since in this case DLSS 3.0 will not add the required responsiveness.

XeSS is another performance improvement method that uses resolution scaling and image restoration using artificial intelligence. This method, proposed by Intel, differs from DLSS 2.0 and works on a wider range of GPUs, including Intel graphics cards. The tests used a specialized benchmark from the 3DMark package.

The inclusion of XeSS technology significantly increased the frame rate — more than doubling. Given the versatility of this technology, it can be considered as a viable alternative, given that each company has its own technology and specialized acceleration units. DLSS is the most advanced, but is limited to use only on Nvidia GPUs. FSR provides a general solution, but does not use specialized blocks as efficiently and is considered a less complex method. What's surprising in the tests is that the RTX 4080 Super performed slightly better than the base RTX 4080, possibly due to optimizations in Nvidia's drivers. Overall, the new product performed better than the Radeon RX 7900 XTX, and, despite a small lead over the RTX 4080, it is significantly inferior to the RTX 4090.

Another rendering scaling technology is FSR 2.0 from AMD. Interestingly, this technology was the last to appear in 3DMark tests. Comparing the performance of upscaling technologies is complicated by differences in scenes and image quality. Regardless, AMD's FSR 2.0 provides an alternative solution to accommodate different rendering resolutions and qualities, although comparisons with other technologies require further analysis.

Because FSR is a universal technology, its performance is roughly the same across different GPUs, and the FSR 2.0 test results did not show any significant variation. The Radeon RX 7900 XTX was once again underperforming, even compared to the RTX 4080. With FSR disabled, the Radeon was faster, but its performance relative to the Nvidia decreased as rendering resolutions dropped. The RTX 4080 Super was accordingly even slightly faster, which is in line with expectations given the slight changes in GPU architecture and frequency. It's important to note that the new RTX 4080 Super model comes at a noticeably reduced price, making it a more attractive option for consumers. Now let's move on to real gaming tests, although they probably won't bring us any surprises.

Testing: gaming tests

Test bench configuration

Computer based on Intel Core i9-13900K processor (Socket LGA1700) :
- Platform:
  - Intel Core i9-13900K processor (overclocked to 5.4 GHz on all cores);
  - ZhSO Cougar Helor 360;
  - Asus ROG Strix Z790-A Gaming WiFi D4 motherboard based on the Intel Z790 chipset;
  - RAM TeamGroup Xtreem ARGB White (TF13D416G5333HC22ADC01, CL22-32-32-52) 32 GB (2×16) DDR4 5333 MHz;
  - SSD Intel 760p NVMe 1 TB PCIe;
  - SSD Intel 860p NVMe 2 TB PCIe;
  - ThermalTake Toughpower GF3 1000W power supply ;
  - Thermaltake Level20 XT case;
- operating system Windows 11 Pro 64-bit;
- TV LG 55Nano956 (55″ 8K HDR, HDMI 2.1);
- AMD drivers version 24.1.1/2;
- Nvidia drivers version 546.65/551.22;
- Intel drivers version 101.5125;
- VSync is disabled.

3D gaming performance in a nutshell

Before demonstrating detailed tests, we provide brief information about the performance of the family to which the particular accelerator under study belongs, as well as its rivals. We evaluate all this subjectively on a scale of seven gradations.

Games without ray tracing (classic rasterization):

Modern top-end video cards are so fast that even at 4K resolution in many games, the overall performance is no longer limited by the video card, but most often by the capabilities of the central processor. As for the GeForce RTX 4080 Super card, it predictably turns out to be the second in performance and copes well with any of the above resolutions, provided that the game is played with maximum graphics settings (without RT and/or DLSS/FSR/XeSS).

Games using ray tracing and DLSS/FSR/XeSS:

Of course, turning on RT reduces performance, but the Nvidia DLSS, AMD FSR and Intel XeSS scaling technologies already implemented in almost all games with ray tracing greatly help compensate for the drop in speed from using RT. So, in the end, the previous conclusions remain valid.

Conclusions and comparison of energy efficiency

Nvidia GeForce RTX 4080 Super (16 GB) is a new gaming flagship that uses the AD103 core, similar to the GeForce RTX 4080, but with full use of all blocks. Despite the availability of the GeForce RTX 4090, political restrictions and high demand for accelerators for artificial intelligence computing make the RTX 4090 less affordable and less practical for everyday users. Additionally, the prices of these cards continue to rise, making them unaffordable for most enthusiast gamers.

The GeForce RTX 4080 Super offers excellent performance, sufficient for comfortable gaming at resolutions up to 4K with maximum graphics settings. It is intended to replace the previous GeForce RTX 4080 model, offering a more affordable price. However, at the initial stage, some sellers may charge inflated prices, and the performance difference between the RTX 4080 and RTX 4080 Super is small, in the range of 2-6%, with an average of about 4%.

At the time of preparing our review, sales of the GeForce RTX 4080 Super had not yet begun, so our conclusions are based on estimated and expected prices.

The Ada Lovelace architecture significantly improves execution unit capabilities over the previous generation, especially in the area of hardware ray tracing. The 2x speedup in determining ray-triangle intersections in third-generation RT kernels is a significant improvement. Additional hardware units, such as the Opacity Micromap Engine and Displaced Micro-Mesh Engine, are designed to optimize the processing of translucent objects and speed up the construction of BVH structures for complex objects.

Shader Execution Reordering, another innovation, allows you to optimize the execution of shaders during ray tracing on the fly. This results in a potential two to three times speedup for many ray tracing algorithms.

The technology of the GeForce RTX 40 generation, DLSS 3, is actively being introduced into games. It uses the Optical Flow Accelerator, enhanced by the Ada Lovelace architecture. DLSS 3 combines the resolution scaling of DLSS 2 with frame rate doubling via frame insertion using the optical flow field. This allows players to get twice as many frames per second while maintaining high visual quality.

Regarding the energy efficiency of the new GeForce RTX 4080 Super accelerator, specific tests and data are expected after the official start of sales and a more detailed analysis of performance and power consumption.

Note that GeForce RTX 4080/Super occupy leading positions in the ranking, being in the top five. They represent the most powerful graphics cards of the current generation, significantly surpassing models of the same level from previous generations GeForce RTX 30 and Radeon RX 6000. In a direct comparison between the AMD and Nvidia lines, the GeForce RTX 40 absolutely dominates: 7 places in the top ten, taking into account tracing technologies rays and scaling methods, they take the first 7 places in a row.

The tested model — Gigabyte Aorus GeForce RTX 4080 Super Master 16G (16 GB) — is a large flagship gaming accelerator of the latest generation. Its dimensions are impressive: 35.5 cm in length, 16.5 cm in height, and the video card occupies 4 slots in the case! At the same time, energy consumption does not exceed 310-320 W, the cooling system is quiet, and temperature parameters are within normal limits with a margin.

Gigabyte Aorus is equipped with a 16-pin PCIe 5.0 power connector (the kit includes an adapter adapter for connecting 3 PCIe 2.0 power connectors, but it is recommended to use a modern power supply with PCIe 5.0 support).

The highlight of this card is the stylish backlighting of the fans and the presence of an LCD screen at the end, which is useful for displaying monitoring data. It is also worth noting that the kit includes a reliable bracket that firmly fixes the heavy card in the PC case.

The manufacturer provides a 4-year warranty on this card with mandatory registration on the company’s website.

Let us note once again that the GeForce RTX 4080 Super is excellent for gaming at 4K resolution with maximum graphics quality with ray tracing even without DLSS, and in some games even at 8K resolution (but DLSS support is required).