Hello Robert,
We are having a terrible time at a bunch of client sites. I am away for this week but return back in office next week. When I do, I will open a support call directly with SAP to resolve it at high level -- but I need some help/suggestions now so I can be prepared.
Yes, it sounds like we will need to work with you directly in support to investigate this issue. Opening an incident (or multiple incidents for each end-customer) is best, to help keep the information organized.
We have thousands of sites and we never had issues. Since we've started upgrading client sites to the latest Sybase, their nightly processes are killing us. For example, the nightly process would take about 8 minutes. Now it is taking 3 and 4 hours!!! A process that would take 3 seconds is now taking 10 minutes. The logs show that processing IS happening (it's not frozen) but for some reason things have been pretty bad. In fact one process took from 6:01AM to 6:59AM
Yes, this is quite possible and a general hallmark of a performance issue - query execution can go from typical seconds to inexplicable minutes (or minutes to hours in your case).
There are a number of factors operating here: the database optimizer must make calculations to figure out the proper ordering of tables when creating a plan. With N joined tables in a query, there are N! possibilities for joining them. The larger the join magnitude, the more difficult it is to come up with a stable and optimal ordering. In earlier editions of SQL Anywhere, there were fewer optimization nodes/algorithms/options which meant a smaller search space for possible plans (and indirectly meant more stable plans). User-defined functions used to join tables can also cause instability in plans as many times we generally do not know what the join cardinality of a function is and need to make a guess. Combined with a changing cache status and data statistics in the database, some tables may be placed in different in orders in plans, leading to drastically different performance numbers due to the number of rows that are being processed in the internal plan nodes. Poor table join order choices can become large performance problems, very quickly.
All of these changing conditions have to be looked at for each individual query you can identify that has a performance discrepancy. To solve the issue, sometimes breaking up the query, recreating statistics, adding an index, or changing the costed join conditions (as you have above with the temp table solution from before) is a viable solution to ensure proper table ordering and join estimates in the optimizer. If you are not able to change the query, that does slightly limit our options for resolution and may require you to employ a more situational based approach that works for your situation (i.e. change the schema, add an index, statistics creation, etc.).
We do note that there are also interim updates to the optimizer in Support Packages to achieve correctness or better performance on real customer queries that are coming in to our development group. While these changes may be advantageous to most or many queries overall, there may be outlier queries where this change impacts their performance significantly and we need to then understand those queries better in response. Our internal QA tests try to minimize these impacts in Support Packages before being released to customers.
The way to generally prevent this scenario from happening is to create a performance test ( Performance and Tuning - SQL Anywhere - SAP SQL Anywhere - SCN Wiki ), under 'production' load conditions that mimic your real life users and data sets, before deploying any software changes. See our whitepaper on Capacity Planning: Capacity Planning for SQL Anywhere
---
What log would be the best right now to give you - it's not a SINGLE PLAN / Query - this is thousands of queries that run during a nightly process (read a record, update, etc.) -
Database Tracing ( SQL Anywhere Trace Database Setup via Database Tracing Wizard - SAP SQL Anywhere - SCN Wiki ) is the best method to collect information for multiple statements over a longer period of time. Enabling a 'High' level of trace (i.e. including plans with statistics) is the best way to collect detailed performance data so that we can then drill down into the actual queries that are slow. If this type of monitoring is too impacting to the performance, try removing counters that are not related to capturing the plans and performance.
We would then encourage you to take a look at this performance information collected as it will most likely help you to understand the areas/queries of concern. We can then work with you to take a look at the identified issues with the queries individually in the tracing database and look for resolutions to the performance issues.
From a system perspective, understanding the CPU, I/O queue, and memory usage for each customer during these slow times would also be beneficial to see if there is a common system limitation that SQL Anywhere is encountering (in possible combination with an optimizer issue).
Regards,
Jeff Albion
SAP Active Global Support