Runtime Complexity of Java Code
To evaluate obtained data structures, I considered two types of scenarios: adding/removal scenario and adding/search scenario. Each scenario has two parameters: size --- the total number of elements in the multiset before starting scenario, ratio --- ratio of adding operation (for example, if adding ratio equals 0.5, then the number of adding operations performed equals to the number of other operations performed).
Parameter size is also equal to the number of total performed operations. To look precisely at the dependency of scenario performance time on operations number, I varied this parameter from 10,000 to 170,000 with step 40,000. The ratio varied from 1.0 to 0.0 with step 0.25 for both types of scenarios. The involved datatype was Integers from 0 to size/10.
Adding/removal scenario:
- INPUT: MultiSet with size/2 elements;
- Adding ratio*size random elements to multiset;
- Removing size-ratio*size random elements from multiset.
Adding/search scenario:
- INPUT: MultiSet with size/2 elements;
- Adding ratio*size random elements to multiset;
- Searching size-ratio*size random elements from multiset.
Also, there was a special type of scenario when elements were added and deleted in ascending/descending order. After running scenarios several times and varying involved parameters, I took the average and obtained the following result:
Analysis
LinkedList multiset
The LinkedList multiset achieved the worst results on almost all runs of all scenarios. Let’s look at the performance of operations for LinkedList multiset in detail.
Adding an element. If multiset already contains at least an entry for every possible integer, then the expected time complexity of adding operation is O(N/2), where N --- is the length of the list. If we try to insert a new element to the list, firstly we must look through the whole list to ensure, that there is no entry of this element in the multiset already. In all cases: the complexity of the operation is linear (O(N)).
- Best case: we can add only a small number of integers many times. In this case, the complexity is “almost” constant;
- Worst case: every added integer is new for the multiset. So, we must look through the list every time. Strictly O(N).
Removing one/ removing all. It does not matter if we want to delete one entry of an integer or all entries. Complexity is just the same as in the adding case. If we are sure that there is an entry of an element in the multiset, then the expected time is O(N/2). But in worst cases, we must look through the whole list with zero results. If we already find the desired element, removing one entry and removing all entries both have constant complexity, so the total complexity of the operation is linear again (O(N)).
- Best case: we remove elements from the “small” set of integers. The same as for adding: “almost” constant.
- Worst case: each time we try to remove the element, which is not in the multiset. We look through the whole list – O(N).
Searching. Just the same as for removing. Deleting elements from the list (or decreasing their multiplicity) takes constant time, so the complexity does not differ. Bad cases and worst cases are similar.
SortedLinkedList multiset
This implementation has the same time exponent as the previous, but a little bit better. Adding an element. In this case, it does not matter, if the element is in multiset or not --- anyway we will look through a fixed number of elements. Even if there is no entry of new elements in the list, we won’t look through the whole list. So the expected time is O(N/2). That’s why the obtained results are a little better, then for the usual linked list.
- Best case. We add new elements every time in descending order (if the list is sorted in ascending order). In this case every time we will put an element to the beginning of the list – so the complexity of each operation is O(1) (constant).
- Worst case. We add new elements every time in ascending order (if the list is sorted in ascending order). In this case every time we will put an element to the end of the list. To make it we need O(N) time.
Removing one/ removing all. It does not matter if we want to delete one entry of an integer or all entries. Firstly, we must find a place for the element. As in the adding case, the expected time, in this case, is O(N/2).
- Best case. We remove elements in ascending order(if the list is sorted in ascending order). In this case every time we will delete the start element of the list – so the complexity of each operation is O(1) (constant).
- Worst case. We delete elements every time in descending order (if the list is sorted in ascending order). In this case every time we will delete the finishing element of the list. To make it we need O(N) time.
Just the same as for removing. Deleting elements from the list (or decreasing their multiplicity)takes constant time, so the complexity does not differ. Bad cases and worst cases are similar.
BinarySearchTree multiset
BinarySearchTree has a very big performance improvement in comparison to previous ones.
Adding an element. To add an element to the binary tree, we must go through from its root to one of its leaves. So the complexity linearly depends on the height of the tree: O(h). As this tree is not balanced, we can not guarantee that height of the tree is rather good, so it varies from logN to N. But in almost all cases it is significantly less than N. That’s why we obtained such a performance time gap from the previous implementations.
- Best case. In the best case, we add elements in such a way, that every time we have a balanced tree, so the height of the tree equals logN.
- Worst case. In the worst case, we obtain a binary tree with height N. We can get such a tree, for example, by adding elements in ascending order: every time a new element will be the right son of the right-most leaf.
Removing one/ removing all. First of all, we have to find the desired element in the tree. It will take O(h) time, where h --- height of the tree. If we don’t need to delete the node from the tree ---we just decrease the node label. Otherwise, we must find another node to put it in the place of the deleted node (method find successor() does it in BstMutliset class). The complexity of this search is O(h’), where h’ --- height of the deleted node. As h>=h’, the total complexity of operation anyway equals O(h).
- Best case. The node deletion does not happen. We just decrease node labels. In this case, we can execute the operation in h steps (or even less).
- Worst case. Every time we delete the node from the tree.
Searching. Any search can be finished in O(h) steps, as we deal with the binary tree.
- Best case. The tree is balanced. Complexity is O(logN).
- Worst case. A tree is a chain. Complexity is O(N).
BalancedTree multiset
This implementation has approximately the same time results, as the usual binary tree: some tests are better, some are not.
Adding an element. This operation is closely related to the balancing procedure, as after adding a new element as the leaf of the tree, it may become unbalanced. Both procedures have O(logN)time complexity in summary.
- Best case. In the best case, we don’t need to create an additional node in the tree – just increase the node label.
- Worst case. Each time creating a new node.
Removing one/ removing all. The search of the desired element in the balanced tree will take O(logN) time. If we don’t need to delete the node from the tree --- we just decrease the node label. Otherwise, we must delete the node and call the balancing procedure with complexity O(logN). So the total complexity is also O(logN).
- Best case. In the best case we don’t need to delete the node from the tree – just decrease the node label.
- Worst case. Each time deleting a new node.
Searching. As it was said above, any search can be performed in logN steps.
Hash multiset
Hash multiset reached the best results among all the implementations. The reason for it, in my opinion, is the insufficiently big size of possible multiset values. That’s why every operation was performed in “almost” constant time.
Adding an element. To add an element, we just have to put this element into the appropriate hash bucket. The calculation of the bucket index is constant. If the number of possible elements is not sufficiently big, then every bucket contains zero, one, or a very small number of elements. That’s why we can say, that we can put the new element to the bucket in “almost” constant time. If the number of elements is large, the time complexity becomes O(N/b), where b is the number of hash buckets.
Removing one/ removing all. To remove elements, we have to find the necessary hash bucket and remove the element from the list. Again, if lists in buckets are short, then we can say, that complexity of this operation will be “almost” constant. Otherwise, it will become linear with complexity O(N/b).
Searching. The same situation as with the previous operations. Calculating the bucket index and looking through the list.
Summary
The practical results have matched theoretical analysis. For a wide variety of purposes, Hash multiset is the best choice. However, when the number of elements becomes incredibly huge, it is better to use tree multisets, which give logarithm complexity for all operations. You have to prefer a balanced tree if you want to execute a large number of search operations. SortedLinkedList in all cases is preferable to LinkedList. It can be used when you deal with a small number of elements with big multiplicities.