Why opening its Big Data Cosmos system would be the right decision for Microsoft

A rumor has been circulating since August (first, then subsequently, reported by Mary Jo Foley to ZDNet) that Microsoft is preparing to launch its Big Data Cosmos system as a service on its Azure cloud platform, possibly as an alternative to Hadoop. This would not only be a bold move for Microsoft, but also, possibly, a smart move.

We will come back to this shortly, but first a brief history of Cosmos. This is Microsoft’s internal big data system, used to store and process data for applications such as Bing, Internet Explorer, and email. Cosmos’ batch computing element is called Dryad, and it’s similar – although apparently much more flexible than – the MapReduce framework associated with Hadoop as well as the early 2000s Google, where MapReduce was invented. Cosmos also has an SQL-like query engine called SCOPE.

In early 2011, Microsoft claimed that it stores around 62 petabytes of data in Cosmos. At that time, Microsoft was also planning commercial versions of Cosmos / Dryad and promoting them as a better alternative to Hadoop. Overall the reviews were positive. You can read more about Cosmos here and here.

A graphic showing the place of Cosmos in the architecture of the application, circa 2011.

Around October 2011, Microsoft began to invest heavily in making Hadoop run on Windows, an early – and wise – indication of the company’s trend to embrace both open source technologies and the technologies that businesses and developers have. expressed interest in using. Dryad’s work was moved to the Windows high performance computing product line, where he ultimately died. In February 2012, Microsoft took to Hadoop, announcing a partnership with Hortonworks and upcoming products for Windows and Azure servers.

In the meantime, the company has been pretty quiet about Cosmos, Dryad, and all of the shebang, but in August, ZDNet‘s Foley reported on a job posting at Microsoft that suggests the company is developing a version of Cosmos for external consumption. At the end of January, she expanded on the original report with information that Microsoft is recruiting testers for the Cosmos service, along with improvements to the system’s SQL engine and new storage and compute components.

Microsoft has declined to comment on any plans to release products based on Cosmos.

Microsoft CEO Satya Nadella speaks at a Microsoft cloud event.  Photo by Jonathan Vanian / Gigaom
Microsoft CEO Satya Nadella talks about open source at a Microsoft cloud event.

If Microsoft is indeed preparing a Cosmos service on Azure, it’s easy to see why. Processing and analyzing data is going to be a major driver of IT spending in the decades to come, and smart businesses are going to have their bases covered when it comes to meeting customer needs. Hadoop is just the platform that every cloud provider, database provider, and analytics provider needs to support because the community is so large and there are so many workloads already running on it.

But that doesn’t mean that Hadoop is necessarily the best technology for every task, especially for cloud providers who want to control every aspect of a new service, from backbone to the user interface. Google’s Compute Engine platform supports Hadoop, but the company all but said “Hadoop is out of date” when it rolled out its post-Hadoop Cloud Dataflow service in June. Databricks, a startup based on Apache Spark technology, works closely to integrate Spark into the Hadoop ecosystem but relies on a cloud service centered on Spark.

If the Apache Storm stream processing project were as popular as Hadoop, maybe Amazon Web Services would have built something around it rather than starting with its own stream processing technologies, Kinesis and Lambda. Microsoft, in fact, is now also touting its own stream processing engine called Trill which already underpins the company’s Azure Stream Analytics service, as well as the streaming workloads for the back-end systems powering Bing and Halo.

Compare Trill to other streaming engines.
Compare Trill to other streaming engines.

We will discuss the big data business in detail at our Structure Data conference, which will take place March 18-19 in New York. Speakers include CEOs from Hadoop vendors Cloudera, Hortonworks, and MapR, as well as executives from Google, Microsoft, and Amazon Web Services. And, of course, some of the most advanced users in the world will be talking about the tools they use and what they would like to see from companies selling software.

And new data services, especially from cloud providers, are also aimed at showing off a company’s technological strengths, just as they boast about how many data centers they have. Engineers love to work on the biggest and best systems out there, and developers love to build apps on them. Just as Google has open source chunks and chunks of its infrastructure in the form of Kubernetes and some Cloud Dataflow libraries, it won’t surprise me if Microsoft decides to open parts of Cosmos and Trill at some point – maybe be to help generate more interest around its recently open source .NET development framework.

There is too much money to be made in the cloud computing and big data markets to leave good technology locked in a company’s internal towers. As Microsoft, Google, and Amazon look to grab as many cloud workloads as possible and hire as many talented engineers as possible – in a competitive market that also includes very open source-friendly companies like Facebook and Netflix. – expect to see a lot more openness to the things they build, as well as a lot more services based on that.


Source link

Comments are closed.