Additional overload to System.Concurrent.ConcurrentDictionary GetOrAdd
As it stands, the ConcurrentDictionary has a GetOrAdd method that accepts a valueFactory delegate that is not included within the scope of the lock, hence it can be executed multiple times and as a result adding the value to the dictionary and executing the valueFactory is not an atomic operation. This behaviour seems to be misleading in relation to what the API leads one would assume that the benifit of having a method accepting a function overload would be. Let's see an example:
For example, the intention of the following code is clear:
Object value = ValueFactory(); myConcurrentDictionary.GetOrAdd(key, value)
From the overload that accepts only the value it is crystal clear that when I invoke the valueFactory, it is not thread safe.
Now suppose we have
private string ValueFactory()
// some logic for creating a value
The developer is left thinking, "now what could the added value of this overload be!?" - and then comes the false assumption "aha, the cool guys developing the concurrent dictionary thought that I might need to enlist some additional actions within the lock for adding as an atomic operation, great!" Of course, this is not the case. So as a result there appears to be little added value of having this method overload provided by the platform, given that I can already invoke my value factory in an unsafe way.
Understandably, when the team desiged this feature in this way, we can assume it was decided to 'protect' the developer by avoiding the risk of locks acquired during a long period of time due to slow running valueFactories. But why add this overload then, if I already have a way of invoking valueFactory in a thread unsafe way that is more explicit?
To address this issue, In the future, I would really like to see an overload with an option of lockBehaviour for the valueFactory (if it should be included or excluded from the lock) - By adding such a switch, it will allow for backwards compatibility (with a default of e.g. ValueFactoryLockBehaviour.ExcludeFromLock) and it would also empower the developer to choose if he wants to have valueFactories execute as atomic operations with the add and at the process allow for shooting himself in the foot by potentially acquiring a looooong running lock. (After all with great power comes responsibility)
A couple of links:
Thanks for taking the time to share this suggestion. This item has been around for a couple of versions of Visual Studio and we haven’t acted on it. Looking at the VS “15” plans, we’re not going to take action on this item, so we’re going to close it. If the suggestion is still relevant, please either take a look to see if there’s another suggestion that’s similar that you can vote on, or open a new suggestion.
- The Visual Studio Team
AFAIK ConcurrentDictionary does not even have a single monitor that can be "abused" to protect user level code. The locking in this class is fine grained and maybe some implementations might become even lock-free. At least it guarantees forward progress (i.e. dead-lock free), which is impossible if user code from the factory gets involved.
From my experience in practical applications it is not even a recommended option to protect the factory by the same monitor than the dictionary. Usual applications are application caches. Protecting the factory this way would eliminate any parallelism from the factory. No big deal if it is just new T(), but in this case a duplicate object is no big deal either. If in contrast the factory is expensive then it would be counterproductive to serialize all factory calls.
A use case with an expensive factories is e.g. a data cache, maybe together with the multiton pattern where no two objects with the same key must co-exist in memory. In this cases you need to add a placeholder object into the cache which indicates that the data is to be loaded. This object emerges to the final object once the data is available. The synchronization takes place using a conditional variable (or something like that). But the synchronization takes place at object level. It will not prevent other threads from inserting other objects. You can get really high throughput this way.
If on the other side massive parallelism is not required, an ordinary Dictionary with an ordinary lock object will be a much better choice than ConcurrentDictionary. In this case it is no big deal to protect your factory by the lock object too.
I have to agree with - Florin Neamtu COMPETELY 100% and re-iterate his rant on http://geekswithblogs.net/. There is no point in running the delegate outside the lock. There is no point in gaining concurrency at the expense of data integrity!!
It is rare that I run an update that does not have some check in it. e.g. "Does he have a non zero balance"... How the blazes are you supposed to include this in an atomic fashion?!?!?
The ConcurrentDictionary is waste of time!
Keith Robertson commented
I also was completely sure that GetOrAdd was atomic, the alternative didn't cross my mind, and I also wasted a morning debugging just to learn that it isn't. I agree with previous statement, "otherwise the whole concept of a concurrent dictionary is useless." Don't add an overload, just make the existing method work in a way that makes sense. Overhead is far less important than conceptual correctness and simplicity.
Joshua A. Schaeffer commented
I never ran into the bug but I did find it when examining with .NET Reflector.
Florin Neamtu commented
I just came across this - it is a nasty bug, the valueFactory must always be included in the lock, otherwise the whole concept of a concurrent dictionary is useless. The operation MUST be atomic, otherwise how can I add or retrieve a value in an atomic manner? I will have to lock externally the dictionary myself, same as I'd do with the ordinary dictionaries ... I wasted a whole morning trying to figure wtf is going on!!! Is it fixed in .NET 4.5?