Tag Archives: Yapp

新的Oracle性能神话?

很多 DBA 应该都记得这篇文章吧 ? Myths & Folklore About Oracle8i Performance Tuning. 这篇文章的出现, 粉碎了当时的不少图书中标榜的实际上没有什么作用的优化”技巧”.

来自 OraPub 的 Craig A. Shallahamer 在一篇新的论文 Modern Performance Myths 试图定义新的 Oracle 性能神话.包括如下四条:

Myth #1. Decreasing wait event time will always decrease Oracle response time.
Myth #2. Decreasing wait event time will always decrease end-to-end response time.
Myth #3. Profiling sessions is the best way to diagnose performance problems.
Myth #4. Focusing on where most of the time is spent is always the best approach.

老实说, Craig 这篇论文写的非常”绕”.完全看明白要费点时间.因为第一条和第二条 Myth, 说的都是”always”, 只需要举出一个反面例子即可. 非常有趣的是第三条, Profiling sessions , 因为这是 Hotsos 的 Cary Millsap 在 Optimizing Oracle Performance 一书中 Method R 方法(参见:Oracle 数据库优化的R方法)所提倡的手段. 要反驳第三条 Myth 倒也不难, Profiling sessions 只能做到针对特定 Session(or User) 进行优化,这个优化能从全局的角度上看是否是成功的? 就不能简单的下判断. Craig 的建议是在系统级和会话级进行响应时间分析(RTA).

那么如何避免这些所谓的 Myth 呢? Craig 的答案是 The Holistic Problem Isolation Method (整体问题隔离方法,HPIM), 识别 Oracle,Application,OS (三环法)每个子系统的瓶颈,并且理解各个子系统之间的关系.

Cary Millsap 在Oracle 性能优化一书中提出的 Method R 的时候应该是自信满满, 但是 Craig 的这篇文档无疑也说明了 Method R 的一些遗漏之处.方法论是一个不断进化的过程, 没有所谓完美的方法,随着对Oracle优化认识的不断深入,相信也会有号称更为优秀的方法出现.但是能否更有效用在实践中,这是一个主要问题.

—-
BTW:
小道消息:Craig A. Shallahamer 将在 07 年推出一本名为 Forecasting Oracle Performance 的 Oracle 图书.期待.

Oracle 数据库优化的R方法(Method R)

好长时间没怎么看 Oracle 技术文档了，今天阅读了一篇 Oracle Response Time Optimization with Method R. 这是 Optimizing Oracle Performance 经典图书这本经典图书的主旨方法。R 代表响应时间(response time).具体的定义如下：

1. Target the tasks that are critical to the business.
2. Collect properly scoped, un-aggregated profile data for each task
while the task is exhibiting the behavior you want to record.
3. React with the candidate repair that will have the greatest net
payoff to the business.

a. Stop if the cost of the repair exceeds the cost of the problem.
4. Go to step 1.

这里面的核心元素是 Profile .Profile 要提供应用程序到最终用户的响应时间的详细描述.体现到 Oracle 数据库这一层，就是要得到扩展的 SQL Trace 数据。
是不是感觉有些”虚”, R 方法和一些我们已知的数据库优化方法颇一些相似之处，但是 Cary Millsap 宣称 R 方法是目前已知 Oracle 优化方法中的最优秀的、最全面的。我们来看看一些简单比较：

继续阅读 →

YAPP 发布后的十年

因为某些原因，国外的很多 Blog 站点在国内都是看不到的，包括 Blogspot．BlogSpot 上有很多不错的 Oracle 专家的 Blog .比如 Thomas Kyte. 此外还有 Anjo Kolk 的Oraperf. Oraperf 的三位大牛Anjo Kolk、Shari Yamaguchi、Jim Viscusi曾经在10年前提出 Oracle 数据库优化的 YAPP(Yet Another Performance Profiling Method)方法。在这个Blog第一篇就是回顾YAPP发布后的十年(YAPP Ten years later: What has changed? ).以下内容为引用：

In 1995 I started to work on the wait events paper based on Oracle 7.3, and that formed the basis for the YAPP white paper. That in turn formed the basis for the wave of response time tuning books and presentations. Now it is funny to see how the same thing is rehashed over and over again and nothing really new is being added. In fact I believe that all the attention on wait events and response time or resource tuning in the Oracle RDBMS, is taking away the way the focus of performance problems that actually originate outside the database. The word “over exposure” comes to mind. Does this mean that wait events are no longer important? Well yes and no. Believe it or not, but most databases suffer from the same performance problems. They differ in the symptoms that they show. For example, many databases suffer from I/O performance problems and Oracle has quite a number of wait events that are directly and indirectly associated with I/O. So instead of approaching each of these events individualy, they could be grouped together to just show the symptom. In fact, Oracle 10g has finally done this and has introduced wait event classes (not new, a company Precise (now part of Veritas) did that first in their product back in 1998). So Oracle is still expanding the number of wait events, but at the same it is grouping them together again.

Before the response time tuning (and even today) people actually based on best practices (BP). Each best practice has a ratio associated with it. For example the Buffer Cache Hit Ratio, basically is the Best Practice that tells people to cache frequently used data (which is a good thing in principle). The problem with tuning today is that many Best Practices exists and they all have some kind of ratio associated with them. So if a performance problem occurs DBAs starting working on this list of Best Practices to check if something applies to their problem. There are a couple of problems with that:

1) Not every body may have the same list of Best Practices or the same threshold for the ratios.
2) The list of Best Practices may not be sorted in the same way for different people

The result is that the problem finding process takes a long time and is not really repeatable by different people (different lists, different ratios). So starting the problem finding process with Best Practices is like shooting a gun and hooping that you will hit some thing.

The Response Time tuning process is basically telling you what Best Practice to use. For example if the Response Time consists mostly of I/O relatated waits we should start looking at the Best Practices for I/O. If CPU is the most common resource consumption, we should start looking at the Logical I/Os .

In this approach the different wait event groups play an important role, because each group is basically assoicated with one or more Best Practices. We still do care about what wait events are actually in the group, but for selecting the right Best Practice it doesn’t matter. For solving the problem, it may be important.

I believe that people should start thinking this way instead of worrying about the individual events. I am not saying that the individual events are not important, but keep an eye on the complete picture before diving into the indivual events.

So one of these days (may this year) I should write an update to the YAPP paper and hopefully it can be basisses for a wave in the response time tuning. Oh yeah, I don’t see my self as the inventor of all this. As far as I am concerned, Response time Tuning has always been there, it was just not really accepted in the Oracle world as the way of tuning your Oracle database. So may be I started that, but I just wrote the paper(s) and other people like Mogens Norgaard made it popular. It is kind of funny that I actually still use the same method(s) almost 10 years later (and it doesn’t matter if they are on instance, session or SQL statement level it is the same methodology, with different result but still solving the same problems; think about that one …..)

提到了所谓的 BP(best practices) 方法.想到曾经在一个站点上看到所谓的 HP 专家最欣赏”最佳实践”之类的说法不由得有点好笑.

还在 Oraperf 上看到了一则关于 Outer Edge of Disk 的帖子。
Oraperf’s Blog: Outer Edge of Disk

The question is why one should place data on the outside of the disk and if that helps performance. I just like to add my view on that discussion. The number of physical I/Os per Disk is limited by the (full/average) seek time. The seek time is the biggest component of the I/O time (with normal Oracle blocksize). The Full Seek Time (seeking from the outermost track to the innermost track or vice versa) can be greatly reduced by only using the outermost tracks on a disk. Why the outermost tracks? Well there is this statistic that on the 1/3 of the outermost tracks there is around 50 percent of the disk capacity stored. If I only have to seek 1/3 of the distance to reach 50 percent of the data all the time, the average seek time will be greatly reduced. If the Seek time is greatly reduced, the total I/O time is greatly reduced and we can do more random I/O operations per second. So to summarize:
1) Use around 50 percent of the disk capacity (outer 1/3 of the tracks)
2) That will increase the number of random I/Os greatly.