You are viewing...

Recorderable lists a.k.a. Scoring items for Sorting is Pretty Hard

Updated on January 30, 2020 at the 14th hour
Posted under:

DISCLAIMER: Expressed views on this blog are my own.

Let's say you have a list of items that you want to sort, but you want it such that a user can change items in the list.

Naively you would give every item a score in an integer range. Typically starting from 0 though some make use of the negative numbers. Alright, sounds good what happens when I perform an add, insert, delete, swap then?

On add, you are just adding to the end of the list, so score of that item is the max(list scores) + 1.

On insert, you are adding to the middle of the list, so score of that item is scoreOf(list position) while the item in that spot and every item afterwards have to be incremented by 1.

On delete, you are removing from anywhere on the list. Assuming you don't want to reuse score, this is easy. Just remove. Assuming you do want to reuse scores then you have to decrement every item after the item you just deleted by 1.

On swap, this is easy right just swap the scores. Nothing complicated.

You can see where the most expensive operations are: insert and delete. It could be a linear order run time depending on how the database performs these operations. These writes eventually have to be persisted, so that linear order run time could be amplified.

Ok, so how do I decrease the number of operations for insert and delete then?

I figure if you can keep the list as a linked list then this is easy to do as operations will be node manipulation. Linked list could be persisted in a way that isn't expensive to store onto disk that does not involves wiping a file. It would involve figuring out how to index the linked list positions and store that index as well. Too expensive to write.

Next idea is to use an existing database, but in such a way that the scores are spaced away enough from each other that they don't cause conflicts. Ok, how do you calculate those scores? No idea.

Another idea is to use strings as the scores where each gets a "space". item at position 1 get "a", item at position 2 gets "b". If you insert an item between 1 and 2 then score would be "a1" or "a:a". With deletes, you would just remove the item without touching any other one. When you wan to reuse you just go in and calculate "a:x" where x is some incrementing alphanumeric character. Plausible idea, but I don't like it as it is not NUMERIC.

What if we could apply the idea of each item getting a space to the real number space where each integer number has practically infinite space of decimal numbers. Technically, it is not infinite as the IEEE 754 spec says, but for my purpose it is. What if those numbers were pre-computable? That would mean no calculations would be needed. Just give position 0 and it gives instant score x. If you insert a new value then depending on where you wanted it, can score it between two existing numbers by adding a decimal digit or using the mid point of the two. Scores could be recalculated on the background if needed.

You were looking for me to give you my answer to this here? Wolfram Alpha is a good place to come up with an equation that suits your needs. My needs involved two variables, so I have a 2D equation (two free variables). Took me a while to figure out the components as it should for you. Any answer I give would probably be wrong for your use case as such I don't offer consulting to you.

Why go through this anyway? User sort-able lists are going to be small so you could set acceptable limits for lists and call it a day. Database will only be IO limited and so how much IO will you have for a small set of users negligible right? Can always hire someone else to do a better job with sorting and scoring than you. Personally, I think getting a lot of things to perform really really well keeps cost low, so I emphasize minimizing as much CPU and IO as necessary, so that if traffic scales up then the servers can scale up as linearly as possible. Having reorder-able lists in a DB like MySQL is expensive. Use a better database that will give you sorted lists easily with backups.

You just read "Recorderable lists a.k.a. Scoring items for Sorting is Pretty Hard". Please share if you liked it!
You can read more recent posts here.