Pages

Friday, March 12, 2010

General SQL Optimization Tips and Tricks

SQL and You
SQL can get complicated.  You are simple.  That's just the way it is.  You like issuing commands like...

select * from orders
...when all you need is a few columns.  But the SQL Server will dutifully fetch all the data you requested, no matter the cost to the performance of the server, the network, and your fellow database users.

The above query might not be too bad, but when you see what an order looks like...

...you figure out you need to join in some additional data to make a meaningful report.
so you join to the customer table, and your query becomes:
select o.*, b.billto_firstname+' '+b.billto_lastname as CustName from orders o
join buyerdata b on b.OrderID = o.OrderID
...giving us a concatenated name...

Ok, time goes on and you add joins to categorize the orders by state, product type, salesman, etc. etc. etc. until your SQL gets into a state we like to call gnarly.  Another thing that happens is that your SQL is written to make life easier for you, not for the database server.  Soon, your query gives you the perfect data, and your report/web site/whatever looks amazing - after minutes of crunching to gather the data.

This article is written to help you find and correct some common things that make queries easy for us, but difficult for your database server.  This process in general, is called query optimization and companies pay big bucks for people who are experts at this.  Imagine a business with 800 users entering data on their desktops.  Some are on the phone with vendors or clients, and system performance is mission critical.  Suddenly you launch off the un-optimized query for your weekly TPS report, and the entire database starts chugging like The Little Engine That Could.  This means hundreds of your business clients all start hearing words like "Geez, hang on - this screen is loading...".  You can see the business need here.

Even if your business is small, why tax your systems needlessly if you can take a few steps to write better SQL that gives you the data you need faster?  I promise - if your small business grows big, that query that generates a report out of your 1MB database in 3 seconds now will probably take 30 seconds when you reach 10MB and 5 minutes at 100MB. 

Tip 1:  Never use Select *
Select * is great for when you just want to get a feel for what data is in a table.  But it's the least efficient thing you can do in a production query.  Think about it.  First, the parser has to access the data dictionary and fetch the names and types of all the columns in the table (or multiple, joined tables).  Then your '*' is replaced with all the names of the columns - which (by the way) are not guaranteed to be in any particular order under ANSI SQL specs.  If you are writing a query for production use, one that will potentially execute thousands of times a day, take the time to tell it which columns to return.

Tip 2: Indexes
Indexes are designed to help queries. If you're summarizing sales by agent, then adding a secondary index to the agent column in your sales table will save you a lot of time.  Technically the time is being spent, but it's being used in tiny slices whenever you change data in the sales table.  That triggers the maintenance of the index, which can then be searched much faster than the whole table.

Tip 3: Case Insensitive Searches
Consider:
Select fname, lname, email from Customers where lastname='McHenry'
This query might work, unless you're looking for all people who entered any of these: McHenry, mchenry, Mchenry, MCHENRY, McHENRY.  ok - so you want to just find the customers with a case insensitive search.  So you do this:
Select fname, lname, email from Customers where lowercase(lastname)=lowercase(@searchname)
What happens now? The query is executed not against the field "lastname" (and any indexes applied to it), it is instead executed against the function lowercase(), which has no index.  There are two common workable solutions to this issue.   Neither of the solutions is to make sure all data is entered using strict capitalization standards. No one wants to be the Capitalization Nazi!
  1. Use a function index.  Oracle supports these. you can actually make an index on "lowercase(field)" or "getmonth(datefield)".  This is not a widespread feature, but is really useful in these cases.








  2. Add a lower or upper column in your data to use for joining and searching.  If you have an ASP.NET membership provider database, note that the aspnet_users table has a column named LoweredUserName.  If this approach is good enough for Microsoft, it's good enough for me.
Tip 4: Data Warehousing
Sometimes you just *must* have that report that performs a 23 table join against a 40 million record database.  There's just no way you can put out your TPS reports without it.  What many companies do is perform certain data mining and harvesting tasks once per day, rolling up and summarizing certain data into a "data warehouse" for offline reporting needs.  This data is often de-normalized to make faster queries and it's usually stored on a different database server and even a different network to prevent reporting from slowing down the workday.
The idea is that you create a query that performs certain common joins and relations... rolls up hourly/daily/monthly or whatever kind of totals, creates various types of crosstabs, and stores the rolled up data in a reporting database.
These summaries are run during off hours (or infrequently during the workday) to prevent the database servers from clogging with work while you're open for business.
Managers and their lackeys can hammer away at the reporting database all day long and your main operation could not care less.

Tip 5: Joins vs. Sub-Selects
Sometimes when you just want a single lookup from a table (say decoding a County ID to a name) , it's faster to not join in the county table.  Some SQL servers can detect this kind of lookup and perform this optimization for you.  Note...

select U.UserName, RR.RoleName from aspnet_Users U
join aspnet_UsersInRoles R on R.UserId=U.UserId
join aspnet_Roles RR on RR.RoleId=R.RoleId
We have joined the aspnet_UsersInRoles table to get the ROLEID of the user's roles, and joined the aspnet_Roles table to get the name of the roles for the user.

Note the difference here:

select U.UserName,
(select RR.RoleName from aspnet_Roles RR where RR.RoleId=R.RoleId) as Role
from aspnet_Users U
join aspnet_UsersInRoles R on R.UserId=U.UserId
See how we're using the sub-select to pick up one field?  The thing about this tip is that sometimes the former way is faster and sometimes the latter.  You have to experiment.

Tip 6: Explain Plan/Query Optimizers
Ok, most SQL Servers have a module called a Query Optimizer.  This module can take your weak, human SQL and tweak it before execution to improve performance.  But there is also usually a UI that you can use to performance tune your queries.  I know Oracle has this, and have used it many times.  It breaks your query up into chunks called a query plan, and assigns an estimated cost to each part of the plan.  You can then alter the text of your query to improve the overall performance.

Tip 7:  Query Hints
On the rare occasions when your SQL is tricking the optimizer into taking the wrong approach to your query, Oracle and SQL Server have the ability for you to add query hints to your SQL.  These hints can tell the optimizer to use a specific kind of cursor, or prefer a different index, and alter the query plan used during execution.
The syntax differs greatly between the various SQL servers (Oracle/MSSQL/etc.), and will be shown only as an example here.
  1. SQL Server
  2. More SQL Server
  3. Oracle
Tip 8: Views
Sometimes creating a view can help optimize your query.  The idea is to optimize the SQL for certain common transactions and save it as a view, making the need to reinvent the wheel significantly lower.  But there is also such a thing as a materialized view. This is a type of view that is maintained like an index, as the live data is stored in the main tables.  A normal view does not store it's own data, but a materialized view does.  The view's data can then be queried without having to do whatever complex joins were needed to generate the data.  Of course the trade-off here is storage space.

Share This!

Contact Us

Name

Email *

Message *