Using SQL; Server partition table function improves database read/write performance
Generally speaking, the first bottleneck point in a system is likely to be the database. For example, our production system has a high concurrency
Generally speaking, the first bottleneck point in a system is likely to be the database. For example, our production system has a high concurrency. After running for a period of time, the data volume of certain tables in the database will become larger and larger. Massive data can seriously affect the read and write performance of databases.
At this point, we will start optimizing the system, usually going through several processes:
Identify slow SQL queries and optimize them, such as improving SQL writing, reviewing execution plans, and indexing fields scanned throughout the table
Introducing cache to load a portion of read pressure into memory
Read Write Separation
Introducing queues to serialize concurrent requests to alleviate transient system pressure
Sub Table/Sub Database
Let's elaborate on the fifth optimization plan. There are usually two dimensions for splitting tables and databases: 1. Vertical splitting, which is often strongly related to business. For example, migrating some uncommon fields of a table, such as the detailed data of an order, can be separated into a table and only read when needed. 2. Horizontal splitting, such as splitting by year, divides the database into multiple tables by year or by certain rules and time periods.
After splitting the table, the amount of data in each table will decrease, which brings self-evident benefits. Both full table scans and index queries will have a relatively high improvement. If different table files are placed on multiple disks, the IO performance of the database can be further improved.
If it is purely manual splitting, such as splitting into multiple tables by year, then the upper level business code also needs to be adjusted. Every time I read and write, I have to determine which table to use. It is even more difficult to perform pagination queries across multiple years. It is basically impossible to achieve human flesh segmentation, which is a nightmare for upper level coding. So for table and database partitioning, we usually use certain middleware, such as Mycat, Sharding JDBC, and so on. Using these components can indeed achieve table and database partitioning, and shield the business layer code from changes to the database architecture, but the configuration is slightly cumbersome. If you are using a SQL Server database, and you do not need a sub database at present, but only a sub table, then the built-in partition table function is the simplest solution. Just open SQL Server Management Studio and make a few simple settings. It is completely insensitive to your upper level application, and your code and database connection string do not need to be modified.
Below, we will conduct two simple tests to demonstrate how to perform table partitioning operations and test the performance changes before and after partitioning.
Test Write Performance
Our testing plan: Create a new logs table and write data by year. In 2019, 1000000 data was written, and in 2020, 100000 data was also written. In order to accelerate the writing speed, 10 threads are written in parallel each year, with each thread writing 100000 data, totaling 1000000 data. Then change the logs table to a partition table and write 2000000 data in the same way. Record the time spent comparing the time spent twice.
The hardware is a 14 year old laptop with an OS of win10. Mount 2 hard drives, 1 mechanical hard drive with 5400 RPM, and 1 SSD with 15 years of experience. Disk performance can be said to be extremely garbage. When not partitioned, the table files will fall on the mechanical hard drive.
Testing without partitioning
Create a table using a script:
CREATETABLE[dbo]. [logs]([id][uniqueidentifier]NOTNULL,[log_txt][varchar](200)NULL,[log_time][datetime]NULL,CONSTRAINT[PK_logs]PRIMARYKEYCLUSTERED([id]ASC)WITH(PAD_INDEX=OFF,STATISTICS_NORECOMPUTE=OFF,IGNORE_DUP_KEY=OFF,ALLOW_ROW_LOCKS=ON,ALLOW_PAGE_LOCKS=ON)ON[PRIMARY])
Create a new console program to write code:
classProgram{staticvoidMain(string[]args){Console.WriteLine("HelloWorld!");Task.Run(()=>{InsertData(2019);}); Task.Run(()=> {InsertData(2020);}); Console.ReadLine();} staticvoidInsertData(intyear){vartasks=newList<Task>();Stopwatchsw=newStopwatch();sw.Start();for(inti=0;i<10;i++){tasks.Add(Task.Run(()=>{using(varconn=newSqlConnection()){conn.ConnectionString="PersistSecurityInfo=False;UserID=sa;Password= dev@123 ; InitialCatalog=fq_ test; Server=. mssql2016 "; conn. Open(); intindex=0; for (intj=0; j< 100000; j++) {varlogtime=newDateTime (year, newRandom(). Next (1,12), newRandom(). Next (1,28)); conn. Execute (" insertintologs2values (newid(), 'place order', @ logtime) ", new {logtime}); Console. WriteLine (" logtime: {0} index {1} ", logtime, index ex++);}}}});} Task. WaitAll (tasks. ToArray()); sw. Stop(); Console. WriteLine (" Year {0} complete, totaltime: {1} ",year,sw.ElapsedMilliseconds);}}
Writing 2000000 data took 1369454 milliseconds.
Start testing with partitioning
There are about five steps to set a table as a partition table:
Add Filegroup
Adding Files to a Filegroup
New Partition Function
New Partition Scheme
Starting partition
The following demonstrates how to use the SQLSERVERManagementStudio manager for table partitioning:
Select Database=> Property=> Add two file groups, group1 and group2, to the file group.
Select Database=> Property=> File. Add file1, select group1 for the file group, and select a file directory for the path. Select the data directory of the E drive here. Add file2, select group2 for the file group, and select a file directory for the path. Select the data directory of the X drive here. This way, when partitioning, the data will fall into these two directories. The path here can be selected on the same hard drive, but for higher read and write performance, it is recommended to directly specify it on a different hard drive if possible.
Select the logs table=> Storage=> Create a partition and launch the Partition Wizard tool.
Create a new partition function and click Next.
Create a new partition scheme and click Next.
Select a partition column, and the data will be horizontally split based on that column.
Choose logtime here because time is a more suitable dimension for horizontal segmentation.
The range of data worth splitting.
Range selection; Right Border”.
The difference between the right and left boundaries lies in the handling of boundary values.
The right boundary is<, The left boundary is<=, That is, including boundary values.
We set up group1 to store data for 2019 and group2 to store data for 2020.
So the boundary value of group 1 is set to 2020 01 01, and the boundary value of group 2 is set to 2021 01 01.
After setting it up, it looks like this, requiring 3 file groups.
When data outside the scope of group1 appears, data within the scope of group2 will be stored in the third file group.
After creating partition functions and schemes, you can choose to generate scripts or execute them immediately.
Select&quo; here; Execute immediately;.
After the execution is completed, the data in the table will be scattered across multiple files according to the boundaries set by the partition scheme.
Testing under partition conditions
Clear all data in the logs table first, and then use the same code for testing.
The test results show that writing 2000000 data takes:
568903 milliseconds.
It can be seen that the database write performance has significantly improved, approximately doubling the performance.
This is also in line with the expected simultaneous IO of two disks.
Test read performance
Our testing plan: Create a new log2 table and use the above code to write 2000000 data by year. Then use the select statement to simultaneously read the data from 2019 to 2020. Convert the log table to a partition table, and retest the select time. Compare the time between two reads of data.
SQL statement:
select *& nbsp; from log2 where (logtime > '2019-05-01' and logtime < '2019-06-01') or (logtime > '2020-05-01' and logtime < '2020-06-01')
Firstly, test query performance on unpartitioned tables, taking 3 seconds.
Partition and split the table according to the previous method, and the query takes 1 second.
The read performance is approximately three times that of unpartitioned data.
summary
Through simple tests, the partition table function of SQL Server can greatly improve the read and write performance of the database. Through the simple settings of SQL Server Management Stuio, database tables can be partitioned and the application layer code is completely insensitive, which is much simpler than using table and database middleware.
This is the end of this article about using the partition table function of SQL Server to improve the read-write performance of the database. For more information about SQL Server partition table, please search the previous articles of Script Home or continue to browse the following articles. I hope you can support Script Home more in the future!
Tag: Using SQL Server partition table function improves database read
Disclaimer: The content of this article is sourced from the internet. The copyright of the text, images, and other materials belongs to the original author. The platform reprints the materials for the purpose of conveying more information. The content of the article is for reference and learning only, and should not be used for commercial purposes. If it infringes on your legitimate rights and interests, please contact us promptly and we will handle it as soon as possible! We respect copyright and are committed to protecting it. Thank you for sharing.