October 22, 2024

OCP 2024:Challenges in Enabling FDP in High Capacity QLC SSDs Targeting AI

OCP 2024:Challenges in Enabling FDP in High Capacity QLC SSDs Targeting AI
연락처

As big data and artificial intelligence continue to advance rapidly, the demand for storage solutions is steadily growing. AI-driven technologies, in particular, place higher demands on storage capacity and durability.

2024 OCP Global Summit

To address these challenges, Silicon Motion presented its latest research at the 2024 OCP Global Summit. David Wang, SSD firmware architect at Silicon Motion, participated in the Flexible Data Placement (FDP) panel discussion and delivered a presentation titled "Challenges in Enabling FDP in High Capacity QLC SSDs Targeting AI" offering an in-depth look into the technology's applications.

David Wang, SSD firmware architect at Silicon Motion, seated in the middle.

With its large capacity and high energy efficiency, QLC has become a key option for data-intensive applications. However, in AI servers, data processing involves multiple stages, such as data collection, preparation, training, and inference. During these processes, data access patterns vary (e.g., sequential vs. random access), operations are complex (read, write, update), data sizes range from small to large files, and concurrency levels are high.

Challenges faced by QLC SSDs in AI servers:

  • Performance degradation: Traditional host write without data placement aware to SSD can lead to high WAF, reducing IOPS and overall throughput.
  • Durability concerns: QLC NAND generally has a lower Program/Erase (PE) cycle, resulting in fewer Daily Writes Per Drive (DWPD), which shortens the SSD's lifespan.

To address these issues, Silicon Motion has adopted Flexible Data Placement (FDP) as a key solution to improve both performance and durability in QLC SSDs.

According to Silicon Motion's test data, based on the MonTitan™ 16TB PCIe Gen5 QLC SSD, implementing FDP technology delivers the following benefits:

  • Reduced Write Amplification Factor (WAF)
    FDP significantly minimizes garbage collection write operations, reducing the WAF from 5.5 to 1, while boosting IOPS from 120K to 725K.
  • Improved Daily Writes Per Drive (DWPD)
    FDP increases DWPD, allowing the SSD to reach 0.96 DWPD under specific conditions.
  • Maximized performance with minimized impact
    FDP enhances performance while mitigating the negative effects of low PE cycles in QLC NAND, ensuring long-term stability.

From the above chart, we see that SSDs with FDP not only show a lower WAF compared to those without FDP, but also exhibit an improvement in write throughput. The reduced WAF signifies more efficient data management, resulting in less data rewriting, which enhances both the endurance and performance of the SSDs. The increased write throughput demonstrates that FDP enables faster data write speeds, making QLC SSDs more suitable for write-intensive applications while extending their lifespan.

Given the increasing demand for high-performance storage solutions in modern data centers—especially for AI training and inference—Silicon Motion has introduced several key design elements to ensure an optimal user experience:

Reduced DRAM usage to control costs

  • A configurable Indirection Unit (IU) design, such as 16K IU, suited for large-capacity drives.
  • L2P table space savings through hardware-assisted bit-packing: Using 33-bit entries (instead of 40 bits) to address 8G IU, reducing L2P table size by 17.5%.
  • Minimizing WAF from small writes on large IUs: Supporting mixed 4K-IU based RUHs (Reclaim Unit Handle) and 16K-IU based RUHs in QLC SSD, allowing applications to allocate small write operations to 4K IU's RUH to reduce write amplification due to read-modify-write operation.

QoS and performance consistency across multiple namespaces

Utilizing Silicon Motion's proprietary PerformaShape™ technology to optimize read/write performance and QoS for each Namespace, reducing performance fluctuation caused by resource contention and noise from neighboring workloads.

Test with emulated AI workloads (combining data ingest, transformation, training and checkpoint) on 4 Namespace/4 RUHs in QLC SSD show that enabling PerformaShape™ technology improves read and write consistency by 21% and 31%, respectively. This means the checkpoint throughput to the SSD can be guaranteed while the data read from the SSD for AI training is ongoing.

As this innovative solution is widely adopted, we believe future data centers will become more efficient and cost-effective, meeting the rapidly growing demand for data. Silicon Motion will continue to push the boundaries of storage technologies, delivering stronger data processing capabilities across industries.

연락처