I think this is more of a problem of knowing when a specific tool should be used. Probably most people familiar with hadoop are aware of all the overhead it creates. At the same time you hit a point in dataset sizes (I guess even more with “real time” data processing) where it’s not even feasible with a single machine. (at the same time I’m not too knowledgeable about hadoop and bigdata, so anyone else feel free to chime in)
Some context though is that this article was written when cloud computing was all the buzz like crypto just was and AI is now. So many people used cloud just for the buzz and without understanding the tool.
I think this is more of a problem of knowing when a specific tool should be used. Probably most people familiar with hadoop are aware of all the overhead it creates. At the same time you hit a point in dataset sizes (I guess even more with “real time” data processing) where it’s not even feasible with a single machine. (at the same time I’m not too knowledgeable about hadoop and bigdata, so anyone else feel free to chime in)
Some context though is that this article was written when cloud computing was all the buzz like crypto just was and AI is now. So many people used cloud just for the buzz and without understanding the tool.