• Home
  • Blog
  • Performance testing of modern NVMe SSDs with interface limitation to one PCIe line

Performance testing of modern NVMe SSDs with interface limitation to one PCIe line

11.01.2024 10:44

Many users are amazed by the fact that the average computer has less than 1.1 drives installed. However, the situation is simple: it is difficult to have less than one drive, and most often it is not necessary to have more than one. There was a time when you had to choose between speed and capacity, and using both a solid-state drive and a hard drive was necessary for comfortable work. But over time, SSDs became more affordable, and mass users began to abandon hard drives.

However, there are times when one drive is not enough, or its use is inconvenient. Historically, controllers were designed to work with multiple hard drives. With the advent of SSDs, their use has become less and less common, but some scenarios still require the installation of multiple drives. With the advent of trunk lines, where there is one “main” drive, but one or more additional ones can be added, and, if necessary, additional controllers can be installed, new opportunities have appeared. However, with the development of technology and the increase in SSD capacity, the SATA interface, chosen as a temporary solution, is giving way to more suitable interfaces. Production and sales of SATA SSDs continue, but they are most often purchased to service older equipment, while manufacturers focus on releasing more modern devices, and new decent SATA drives become rare.

The idea of installing multiple NVMe drives seems attractive, but previously this was not always justified, since they often cost more than similar SATA drives, although they provided higher operating speeds. However, now, given the difficulties in finding SATA analogues, this idea is becoming more attractive.

However, there are difficulties in solving this problem directly. Most modern motherboards usually have few M.2 slots, rarely more than four, and sometimes only a couple. Older models often have no slots at all. To solve this problem, you can use adapters, but they usually require a PCIe x4 slot, and such slots may be missing or occupied by a video card.

There is also the issue of compatibility with PCIe x1 slots, which are often not used. While this may limit performance, in some cases it may not be noticeable. However, a more detailed discussion of this issue will be presented in the test part of the material. For now, let's discuss compatibility issues.

Lines and standards

PCI Express is a serial-to-parallel data interface, different from the previous parallel buses that were used previously. This architecture solved bandwidth problems by sharing it between devices. Bandwidth increased with each generation, which was a more efficient way than increasing the number of lines. For example, the throughput of one PCIe Gen5 lane is equivalent to 16 PCIe Gen1 lanes.

The evolution of the standard has led to the emergence of various versions that increase data transfer rates. However, video cards, even modern high-end models, usually do not use all the bandwidth of a x16 slot, so even x8 or x4 may be sufficient. It is important to balance line usage to ensure optimal performance.

Compatibility between PCIe versions is usually not an issue, and devices can work in lower version slots. However, a malfunction may occur if the device does not have enough lines. Some manufacturers, such as Sony with the PlayStation 5, create artificial compatibility restrictions for their devices.

In general, in most cases, using devices with fewer lanes or a lower version of the PCIe standard will not cause problems, but may reduce overall throughput.

Why do most SSDs use four PCIe lanes?

The first experiments using the PCIe interface in some SSD models began before the advent of the NVMe protocol. For example, the Marvell 88SS9183 controller supported PCIe Gen2 x2, and the 88SS9293 supported PCIe Gen2 x4. PCIe Gen3 was not yet available in the mass market, and even when it was introduced in 2012, for example on the Intel LGA1155 platform, it provided only 16 lanes, usually reserved for video cards. The first full support for PCIe Gen3 appeared on the LGA1151 platform in 2015. Some AMD chipsets only supported PCIe Gen2 until 2019. Even with the introduction of AMD's X570 chipset supporting PCIe Gen4, it was expensive and hot, and many prefer the more affordable B450 with only Gen2 support. In general, many systems, especially those based on AMD, have limited support for PCIe Gen3 or Gen4.

This becomes important in the context of using SSDs, since older systems support lower versions of PCIe. At the beginning of the development of NVMe controllers often supported PCIe Gen2 x4, as it was the best available solution at that time. Even budget models used Gen3 x2. However, many buyers didn't pay attention to actual speed, so it was difficult to market controllers targeting lower PCIe versions.

Today, PCIe Gen4 x4 has become the standard, and even modern SSDs can be connected to older slots. So even the newest SSD can be connected to PCIe Gen2 x1 and will work. This provides an alternative to SATA, especially on older systems with empty PCIe x1 slots. It's important to note that in unidirectional mode, PCIe Gen2 x1 speeds may be slower than SATA600, but simultaneous high-speed reads and writes make NVMe SSDs an attractive option.

Adapters rush to the rescue

Some systems have M.2 slots equipped with only a couple of PCIe lanes, especially in laptops. Previously, similar slots were common, for example, on desktop boards with Intel H97 and Z97 chipsets (LGA1150 platform). Many boards for AMD AM4 also had an additional slot with support for PCIe Gen3 x2. These slots were designed to be used by SATA Express connectors, but these connectors have become obsolete and board manufacturers have found creative ways to use the available resources, including installing additional USB3 Gen2 controllers or additional M.2 slots.

However, to create a universal solution, you can use a PCIe x1 slot adapter. These x1 slots are common in many systems and often go unused.

Regular adapters are not suitable for this case. For example, the mass version, designed for slots from x4 to x16, is inconvenient, since it holds well in x8 or x16 slots without additional fixation, but is incompatible with x1 slots. The additional segments in this case serve more to maintain friction than functionality. To use the x1 slot, you will have to modify the adapter by removing the part that is intended for the x4 slot, which in turn will lose compatibility with the original slot.

The solution to this problem is provided by special models where only the x1 slot is present. They are usually not intended for long-term use, as they only lock into the connector, which is short and does not provide a secure fit. However, such models are suitable for testing, and if necessary for practical use, some modifications can be made. These adapters solve the problem by turning any PCIe slot into single-lane M.2. The PCIe version will depend on the slot the adapter is connected to. It is possible to use Gen2, which is already rare on modern boards, but its mode can be enabled in UEFI for Gen3 or Gen4 slots. This aspect is also of interest for testing, given that Gen2 slots are still found in some systems.

Testing

Testing methodology

For our tests, we chose a test bench that included an Intel Core i9-11900K processor and an Asus ROG Maximus XIII Hero motherboard based on the Intel Z590 chipset. This stand provides two options for connecting SSDs: to PCIe Gen4 lanes tied to the processor, and to PCIe Gen3 chipset lanes. Studies have shown that even for Gen3-limited SSDs, using a CPU-bound slot is preferable due to lower latencies. However, for x1 mode this aspect can be neglected, since there are other restrictions. We mainly used the processor slot, switching it to Gen3 and Gen2 modes if necessary. We were not interested in connecting to the Gen1 slot, since these are extremely outdated systems that cannot boot from NVMe drives without much effort. Even if this problem is solved, it makes no sense to use Gen1 x4 due to the very low speed of the interface, so in this case it would be easier to limit ourselves to SATA.

Test objects

For our tests, we chose three SSDs of different levels: the top-end Kingston KC3000 with a capacity of 1 TB, the budget Kingston NV2 of the same capacity, and the even older and budget Kingston NV1 with 500 GB. The Kingston KC3000 will allow us to evaluate how a powerful drive behaves in unsuitable conditions, although its use in this case may be unnecessary. Situated in the price range between «decent» and «indecent» SATA drives, the Kingston NV2 is a practical choice in terms of price/quality ratio. Finally, the Kingston NV1, which is an older but affordable option, will allow us to define a minimum level of performance without frills. Perhaps using a more modern and high-quality drive on a single PCIe line will be pointless, and here we are looking for a balance without fanaticism.

Since we're focusing on Kingston products today, the company's two half-terabyte SATA drives serve as the main points of comparison. The Kingston KC600 is one of the surviving decent SATA drives, while the Kingston A400, while not having the best reputation, is one of the best-selling models due to its affordable price. The mainstream market segment is dominated by such low-cost solutions, and this is important to consider.

To evaluate performance from above, we will look at the results of Kingston NV1, NV2 and KC3000 in their “native” modes — Gen3 x4 for the first and Gen4 x4, respectively, for the rest. Intermediate modes are not of much interest today, given the approximate equality of throughput between Gen4 x1, Gen3 x2, Gen2 x4 and Gen1 x8. This correspondence is true in terms of throughput, although delays increase as the number of lines increases. In some cases, in degenerate scenarios, these delays may be negligible.

Filling with data

Let's first refresh the data on “decent” and “indecent” SATA drives. It is important to remember that we have a modification of the Kingston A400, which, although not considered the worst, still has its limitations, even with the use of TLC memory. The test results speak for themselves — the A400 spends four times more time on this test (in numbers everything is a little worse, but taking into account the capacity, the results are slightly lower). It would seem a simple scenario: the speed of flash memory is constantly growing at a time when top-end SSDs are already surfing the landscape. However, even in this era, budget models are not able to maintain the write speed at least at several hundred megabytes per second. This has to be masked by the maximum use of SLC caching, for which there is then an inevitable penalty. Even with a more advanced controller, it's hard to hide these limitations.

It is important to understand that changing interfaces and protocols is not always a universal solution. For example, the Silicon Motion SM2263XT controller used in the Kingston NV1 is not very different from the SM2259XT/XT2 used in the Kingston A400. Thus, similar problems persist. However, even in Gen2 x1 mode (which, remember, is slower than SATA), the overall test execution time decreased. Gen3 x1 mode results in faster cache writes and also increases speeds beyond it, highlighting the NV1's lead over the A400. Still, it is not possible to catch up with the KC600 — NV1 can bypass it only within the cache, but no more.

But it is too early to draw final conclusions. After all, in the SATA segment the choice is very limited, and there are much more diverse NVMe drives. For example, you can consider not only the Kingston NV1 and the like, which, it should be recalled, are quite outdated. We have the KC3000 in our arsenal, which retains the same caching scheme, but has a more powerful controller. Even in Gen2 x1 mode we are slightly behind the KC600 due to interface bandwidth limitations. Moving on to Gen3 x1, you can already guess where the influence of the cache ends. However, it’s difficult to say for sure, since the speed remains almost at the maximum level for one PCIe Gen3 link, which is basically inaccessible for SATA. On the Gen4 x1, the cache is more visible, and memory speeds almost reach the limits of the interface, but the overall speed is already more than double even the theoretical capabilities of SATA.

Let's return to the budget segment, but in its modern incarnation. Kingston NV2 cannot reach its full potential even when using Gen2 x1, and even SATA would not be enough here. However, despite this, it is noticeably ahead of the A400, and when switching to a faster interface it reaches the level of the KC600 on average. Let us repeat, the main thing here is the return of choice. Unlike what happens in the SATA segment, where there is no real choice. Even the best surviving SATA drives honestly don't have much to offer. Moving completely into another segment can justify the cost by allowing you to gain more opportunities. Yes, this will require certain financial investments, but if the problem can be solved for money, then this is no longer a problem, but just an expense.

Maximum speed characteristics

Modern low-level benchmarks, including CrystalDiskMark 8.0.1, are often limited in their ability to provide a complete picture due to the influence of SLC caching. These tests can typically only measure cache performance. However, the information provided by manufacturers about device speeds is also limited by cache limits. Therefore, it is always a good idea to test devices in practice to see how they perform in real-world conditions. Work on optimizing caching is aimed precisely at ensuring that the device gets into its own cache as often as possible and demonstrates high speeds, despite the reduction in memory costs.

At the beginning of the test we measure the practical performance of the interface, and the final part of the test tactfully hints at the need for high bandwidth in such conditions. This is an important statement, highlighting the need for high bandwidth in high-load scenarios. Essentially, one lane of PCIe Gen2 is slightly slower than SATA600, while Gen3 is already slightly superior. Even budget SSDs, long out of production and outdated, can realize such capabilities in low-level tests. These facts are well known to those who have thought seriously about the aspects of data storage. However, not everyone pays attention to this, and sometimes it is worth stopping and thinking about it.

Particularly interesting is the third column, where the bidirectional load is not as simple as simply reading from or writing to the cache. When using even a budget controller, such as the Phison E21T in the Kingston NV2, you immediately remember that PCIe is a bidirectional interface. Even with a single Gen2 lane, you can easily bypass SATA without breaking a sweat.

It is believed that in small block operations (small block) SATA/AHCI is not a direct competitor to PCIe/NVMe. This is mostly true, although overlaps may occur. And today we found one such intersection: SSD KC600, connected to the same Gen2 line, and NV1 work with almost the same performance. On the other hand, the Gen3x4 mode designed for the NV1 was not as significantly faster than the Gen3x1 or even Gen2x1 modes. However, for the NV2 or KC3000 the modes are more important.

In the end, what do we have? Reducing the overhead in the NVMe protocol is beneficial. Increasing the interface bandwidth is also useful, even under such loads, since data must not only be found, but also transmitted. However, the key factor remains the SSD itself — it must be able to take advantage of these capabilities. This also serves as the final nail in the coffin for SATA. Not only because its “ceiling” was determined a long time ago, and nothing has improved since then, but also because all the “decent” models are basically average survivors. The Silicon Motion SM2259 controller, for example, was used back in 2017 and has not undergone noticeable improvements since then. In contrast, changes and active innovation continue in the NVMe segment. For example, the Silicon Motion SM2263XT controller, although relatively old, is still used in the lower budget segment, even without DRAM. Even such a budget solution in some cases can compete with the best SATA drives, even in non-standard operating modes.

When working with random addressing, writing is usually easier than reading. When reading, data must be found and read exactly where indicated. While recording, thanks to dynamic address translation, allows you to transfer data to where it is more convenient and faster, simply by adjusting the translator. This opens up wide possibilities for optimization. It turns out that the SM2263XT controller, even on a narrow PCIe bus, is significantly different from the SM2259 and other direct SATA predecessors. One PCIe Gen3 lane is still not enough to compete with the best SATA drives, but using three lanes improves the result. Newer controllers, such as the Phison E21T in the Kingston NV2, do not face this problem and show better performance as early as Gen2.

Such operations have a much greater impact on the performance of real software. In practice, as already noted, long queues are not that common, but data blocks other than 4K bytes are quite common. Although there is a slight reduction in operations per second when dealing with «large» blocks, the block size itself is larger, which ultimately results in a higher resulting speed in megabytes per second. Therefore, if possible, many prefer to work with such blocks. Some controllers, such as the Phison E21T, may show some aversion to 16K blocks, but this affects them regardless of the interface used. The main thing that stands out here is the virtual identity in performance between the KC600 and NV1 on PCIe Gen2 x1. Considering that the NV1 is an analogue of the A400, their cost is almost the same in this configuration.

With recording, everything is more complicated, and the reasons for this were described above: a lot depends on the controller, and the first budget NVMe products usually do not have outstanding intellectual capabilities. This has become a source of frustration because these products cannot always convincingly demonstrate the benefits of new technologies in different use cases. However, much depends on environmental conditions, and one of the goals of the transition was to change these conditions. In standard mode, for example, the Kingston NV1 is generally faster than the KC600, which is quite satisfactory. Problems arise when the use of “non-standard” modes is required, and here you have to make choices. But, be that as it may, in this case the user has options to choose from.

Mixed mode is also important because in real conditions (unlike test utilities) it is rarely the case that data only needs to be written or only read. This is especially true in multitasking use cases, given the rich inner life of modern operating systems. However, there is nothing surprising in this — all the results are expected. Regarding the main topic, it should be emphasized once again that the first budget NVMe drives were indeed often not very fast. If they also limit the interface bandwidth, then the problem becomes even more acute. However, a second wave followed, offering faster options. Then the third, and so on. While the SATA segment remains without change, losing its best fighters. Even those that remain are not impressive compared to modern budget SSDs (not to mention non-budget ones).

Working with large files

However, despite the impressive performance in low-level utilities, in real practice it is not always possible to achieve the same speeds. This is due to several reasons. Firstly, a higher level of complexity — utilities such as CrystalDiskMark operate with small pieces of information within a single file. This file is almost always in the SLC cache during testing, and the utility does not pay attention to file system overhead operations. Actually writing a single file involves modifying the MFT, working with journals (since most file systems in use are journaled, including NTFS), which results in writing to different places sequentially, and also using small blocks. In this context, the Intel NAS Performance Toolkit has more practical accuracy. With its help, you can test not only the cache, but also perform more realistic tests on a device with minimal free space, which is closer to real conditions.

As mentioned before (and repeatedly), the throughput of a single PCIe Gen2 lane is between SATA300 and SATA600. If you only have such a slot, then you should not expect a new level of speed for ordinary operations. However, this is pretty much the worst case scenario you could encounter, excluding perhaps AMD AM4 systems with early chipsets or Intel LGA1150. In these cases, you may have to stick with narrower slots, but even there there are likely «wider» slots that provide higher throughput. PCIe Gen3 x1 slots, the first to appear in desktops since the first version of LGA1151 was released in 2015, already provide faster speeds than SATA600.

The problem for budget SSDs, sometimes even not quite budget ones, is to get the most out of their stock interface. For example, Kingston NV1 is capable of working even with two PCIe Gen3 lanes, although it is designed for four. The next in the line, NV2, already requires four lanes, but at the same time supports PCIe Gen4. However, the full configuration of slots is not always available, and when limited to one line we encounter a bottleneck. Therefore, for stable superiority over SATA, Gen3x1 or a similar Gen2x2 in terms of throughput is sufficient. Let us remember that some LGA1150 boards of the latest wave already had M.2 slots with such interfaces. Currently, with a limited choice of SATA devices and their higher cost, even for small upgrades of old computers, it makes sense to pay attention to modern SSDs.

When it comes to recording, the problem remains mainly at the level of the SSDs themselves, which is quite expected. In this context, SLC caching settings in the budget segment remain no less important than hardware characteristics, as confirmed by the AIDA64 test results. Even limiting the interface bandwidth does not solve the problem of running out of cache space. There are also drives that do not face this problem, since they are able to write data faster than the bus can “pump” it. The main factors here are the same as for SATA devices, with the difference that the interface is still limited, and even with “fast” models there is a limit on data transfer speed.

Absolutely rightly noted, and this is not surprising, since recording inside the SSD is still parallel. External multithreading can sometimes be more efficient and sometimes less efficient. However, the order of magnitude is preserved. Thus, the main difference between the segments is that there are simply more NVMe drives with high write speeds. By fixing the speed below the maximum values (which are ultimately achieved exclusively within the SLC cache), it is easier to select some kind of “guaranteed” level. The main thing is that this is generally possible, unlike other segments.

PCIe, as already mentioned, is a bidirectional interface — which some people sometimes forget. Therefore, in such scenarios, even one Gen2 line is already faster than SATA. But not only all SSDs can take advantage of this — for example, the old budget one had problems with this. The new one — less often. Another question is that the writing speed is also important here, and it depends on the specific SSD and its condition. But there is nothing new here. For some, it almost doesn’t matter (in such conditions, of course) — even if they are quite expensive, at least they are there.

Such scenarios in real life are, perhaps, more important than “one-sided” ones — if only because of the multitasking of modern systems, which sometimes leads to the fact that the snapshot of disk activity (especially when the system has a single drive installed on which everything lives) This is how it looks from the outside most of the time. Why are SSDs, in principle, even with the naked eye different from hard drives? For the latter, such situations are like death, although purely sequentially in one direction they can still do something. But this is true for any SSD. And the interfaces here provide only a quantitative, but not a qualitative difference — it’s difficult to pump more than 800 MB/s in total through one PCIe Gen2 line, but this is also more than SATA can, in principle, and then the peak results increase. Real ones, again, highly depend on specific SSDs. And the “problematic” ones are the write operations. In relation to the main topic, it is important that even the need to work on a “narrowed” tire does not negate the benefits of fast models. Ideally, so as not to think about SLC caching at all. Since the latter, of course, can mask problems, and if there is enough free space, it sometimes does it very effectively, but it may not work. Especially when there is no time for service operations to clear the cache, so there is really no free space in it if necessary.

Comprehensive performance

At the moment, the best comprehensive benchmark for storage devices is PCMark 10 Storage, a brief description of which can be found in our review. We also noted there that not all three tests included in the set are equally useful — it is best to operate the “full” Full System Drive, which includes almost all common scenarios: from loading the operating system to banal copying of data (internal and “ external"). The other two are just its subsets, and, in our opinion, not very “interesting”. But this one is also useful for accurately measuring not only the actual throughput when solving practical problems, but also the delays that arise. Averaging these metrics across scenarios and then reducing them to a single number is, of course, a little synthetic, but just that little: estimates that are closer to reality “in general,” and not just in particular cases, are still not available at the moment. Therefore, it makes sense to familiarize yourself with this one.

What is worth paying attention to is that PCMark 10 Storage noticeably responds to SLC caching algorithms. Not surprising, since approximately 200 GB of data is recorded during the tests. On a conditionally empty disk (since at least 50 GB are occupied by the working test files themselves — and it requires 80 GB of free ones, so that there is still room for writing) disk, with some settings, there can be so much already carefully prepared free cache — and everything will work very quickly. True, this hits the “decent” SATA models the hardest — which in this situation turn out to be not too faster than the “indecent” SATA and strictly slower than the “indecent” (and noticeably cheaper at the same time) NVMe. And if there is no spare space and there are no pauses to clear the cache, then Groundhog Day comes.

Although a little unexpected. Since Kingston NV1 and A400 are closest relatives, the first one is in any case consistently faster even if we limit it to one PCIe Gen2 line. And if Gen3, which is much more accessible, then the KS600 will be left behind. Kingston NV2 also outperforms it on Gen2. It is clear that on average — above we saw particular scenarios where this is not true, but complex benchmarks are good and bad at the same time in a single assessment.

And ideologically and technically, the Kingston KC600 of the modern models is closer to the Kingston KC3000. There is a feeling that the latter can be made slower (and even then not always) only by driving it into the framework of PCIe Gen1 — which is no longer interesting at all, even theoretically. And it would seem that buying a top-end device for PCIe Gen4, capable of fully managing four lanes of the modern standard, in order to plug it somewhere into PCIe x1, and even an ancient version, is also not interesting, but… Prices sometimes behave bizarrely.

Total

For those who traditionally read only the end of the review, skipping all sorts of boring numbers and graphs, let's say right away: in practice, the significance of the results obtained is small. No, modern (and not so modern) SSDs will work when connected with one PCIe line. The simple fact is that, as we said, the average computer has about 1.1 disks. From which it follows that the mass user does not have questions about how to install a bunch of drives, and even choose the right interface. Not to mention the fact that sometimes it is not even physically possible to install a drive of your choice: for example, if you want to slightly cheer up an old laptop that only has one bay for 2.5″ SATA, then you will have to buy just that and no other SSD. And if we are talking about a new laptop, then a 2.5-inch SSD, most likely, will not fit there at all, due to the lack of a corresponding compartment (in some inexpensive “tags” and larger models it is present, but paired with one, or even two M.2 slots, which still solves the problem).

If you have the opportunity to install a device of your choice, then you need to take into account not only performance — and not only performance in relation to price. In fact, it is generally dangerous to give abstract advice on choosing a drive for a laptop or mini-PC without seeing the laptop itself. On desktops everything is much simpler — plus a variety of adapters come to the rescue. But, again, in practice it may not come to them — at least to such specific ones as those with a PCIe x1 interface. After all, as a rule, there is somewhere to plug in one drive, and in most cases there are even two. And this already covers the requests of 99% of users. The remaining 1% are usually able to deal with their problems on their own — otherwise they would not have created them for themselves at all.

But even if you managed to figure out the type of drive being installed, you still need to choose a specific SSD. And here some buyers are confused by the issue of interface compatibility, and it is also unclear whether it is worth buying an expensive device if it cannot be provided with a regular seat. As tests show, sometimes such a purchase is justified even in very degenerate cases. Yes, the bandwidth of PCIe Gen2 x1 is 16 times lower than that of PCIe Gen4 x4 — so what? Does this slow down the drive? Of course! Otherwise it can not be. Everything about a person should be beautiful — his face, his clothes, his soul, and his thoughts. But a “good” NVMe SSD, even under such extreme conditions, is able to compete with less good SSDs that are less severely limited — and, despite such a head start, emerge victorious. There is nothing to say about equal conditions: even when they are equally not the best for all compared drives, this does not eliminate all the inherent features of these SSDs. So, interface bandwidth is an important factor in realizing the speed capabilities of NVMe drives, but not the only one and, especially, not the determining one. You can’t jump over your head, but everyone “jumps” differently. And limiting cases are good because they are illustrative. This is what prompted us to do such testing, and then everything else.