
incorrect set of records returned when using order on a field with little variation
Reported by ReggieB | January 14th, 2009 @ 07:33 PM
I think I have found a bug this time :o) . However, it may be a MS SQL only error as I haven't tested the problem with MySQL.
I have a table of "support cases" and each case can have a status of one of five options (e.g. 'open', 'closed', 'suspended' etc.). If I sort by the case status in a paginated view I get records repeated on different pages and some records don't appear at all. The correct total of records is always shown.
I believe I have found the problem: it is the way will_paginate pulls out the small number of records it needs for a single page view. If you have a page size of 20 and 65 records total, to get the page two records it first pulls out the first 40 records (order DESC), then flips them over (order ASC) and pulls the first 20 records, then flips them over again (order DESC) to present the set of 20 records in the desired order.
This works fine if the field you are sorting on is unique for each record, but falls down if you have a lot of records with the same field value. SQL seems to be inconsistent in the ordering of individual records within one unique field value.
That is the order of ten records with the status "open" sorted using "order DESC" don't seem to be mirror images of the same records when you do the next "order ASC" step. That is, they may be 1,2,3,4,5,6,7,8,9,10 on the "order DESC" but may well be 10,8,7,3,4,5,1,2,9,6 on next "order ASC"
This results in inconsistency when you reorder the records and remove some at each step.
What seems to fix the problem is if you add a field to the sort that is unique for each record. id is the obvious one.
So
SupportCase.paginate(:page => params[:page],
                :per_page => 20,
                :order => "order DESC"
Results in inconsistent behaviour.
SupportCase.paginate(:page => params[:page],
                :per_page => 20,
                :order => "order DESC, id DESC"
Creates the correct results.
I suggest you append ", id DESC" to then end of all order statements unless id is already used in the sort.
Comments and changes to this ticket
- 
         Mislav January 15th, 2009 @ 02:31 PM- State changed from new to invalid
 What you said there -- This works fine if the field you are sorting on is unique for each record, but falls down if you have a lot of records with the same field value. SQL seems to be inconsistent in the ordering of individual records within one unique field value. Exactly. It's inconsistent. MS SQL and Postgres exhibit this behavior quite nicely. MySQL is consistent in most cases because they aren't following the spec closely and have implemented some sort of implicit ordering by ID. Still, the user decides on the ordering, not this library. I understand what you're suggesting, but will_paginate has this rule of never changing your queries (apart from inserting LIMIT and OFFSET), so I will continue with this philosophy. The user decides. 
- 
            
         ReggieB January 15th, 2009 @ 04:04 PMMy argument would be that it isn't the user who has chosen to flip the data back and forth. It is the way will_paginate processes the data that makes this an issue. However, a simple option would be to update the RDoc comments: http://mislav.uniqpath.com/stati... And add a comment to the "The importance of the :order parameter" section along the lines of: "Ordering by a field for which many records have the same value (e.g. status) can result in incorrect selection of records for each page with some databases (MS SQL and Postgres for instance). With these databases it is recommend that you also order by id. That is, instead of ordering by "status DESC", use the alternative "status DESC, id DESC" and this will give better results." 
- 
         Mislav January 15th, 2009 @ 05:28 PMI think what is there in the docs now is quite fine: Without the ORDER BY clause, databases aren‘t required to do consistent ordering when performing SELECT queries 
- 
            
         ReggieB January 15th, 2009 @ 06:05 PMThat just says you need an ORDER BY clause, whereas in fact you need more than that - you need an ORDER BY clause that uniquely identifies and orders each record. 'ORDER BY status' is an ORDER BY statement and therefore would satisfy "Without the ORDER BY clause, databases aren‘t required to do consistent ordering when performing SELECT queries" but if there are far more records than there are statuses, this can result in incorrect pagination. Looks like we'll have to agree to disagree, and as its your code - you win. 
- 
         Mislav January 16th, 2009 @ 12:13 PM- State changed from invalid to resolved
 Well, it can't hurt http://github.com/mislav/will_pa... :P 
Please Sign in or create a free account to add a new ticket.
With your very own profile, you can contribute to projects, track your activity, watch tickets, receive and update tickets through your email and much more.
 Create new ticket
                    Create new ticket
 Mislav
      Mislav
 ReggieB
      ReggieB