IT leaders within large enterprises are investing heavily in cloud-based analytics technologies for obvious reasons—flexibility, time to value, access to innovation, data gravity and more. We hear the mantra use “the right tool for the right job” at nearly every cloud technology conference, with IT leaders pushing their teams to build data pipelines that utilize the tools best suited for each type of analytics workload. This often drives cost and operational efficiency. However, it remains at odds with another growing trend in the industry, providing self-service support for analysts and data scientists who want to utilize the toolset that they are already familiar with. The good news is that a few cloud analytics vendors are already addressing these considerations effectively, and far exceeding expectations of what a “cloud analytics” vendor can offer.
Let’s discuss how.
Until recently, cloud analytics vendors either offered relational database solutions or advanced analytics/machine learning solutions. Gone are the days of IT leaders investing in separate point-solutions for reporting, predictive and prescriptive analytics, and machine learning. Large enterprises should expect their cloud analytics vendor to deliver capabilities that go way beyond SQL in a powerful database. The leading solutions are providing the capabilities for:
- Data integration, with support for multiple data sets from across the entire organization. This requires streaming in from cloud data pipelines (think Kinesis and Lambda) as well as reaching out to a variety of data stores and transactional databases and joining them in a single query.
- Machine Learning, with powerful analytic functions and engines built directly into the database.
- The tools and languages that analysts and scientists in your organization already use. You should expect to be able to point your Jupiter Notebook directly at the database and run R or Python against data already in the database.
This is particularly important in the cloud, where the prevalence of streaming data pipelines and cheap, scalable storage options mean the vendors need to get on board. Fortunately, enterprises using Teradata Vantage
in the cloud
can already take advantage of these capabilities. This allows large enterprises to get actionable answers, faster.
Let’s use data science teams working primarily in R
as an example. Until recently, running an analysis against data that included both customer records residing in a data warehouse, as well as clickstream data residing in a cloud storage solution, meant that they’d have to first to run painful exports or ETL jobs to move this data over to R before they could get started.
With cloud solutions such as Teradata Vantage, data scientists can point R Studio directly to Teradata and access both. Not only can they execute their R code against the customer record in Teradata, but also reach out to low cost data stores to pull in clicksteam data and run advanced analytics against both these datasets joined in real time. Further, they can kick off machine learning processes that execute efficiently by leveraging Teradata’s powerful architecture.
This integration of data across a variety of platforms – whether it be cloud data storage, low cost data stores, or transactional databases – coupled with machine learning and advanced analytics functions give businesses a competitive edge by enabling them to leverage all of the relevant data, all of the time – accelerating time to business-critical insights and answers.
As Director of Teradata's Cloud Solutions, Scott Dykstra is responsible for a team of architects that design and deploy analytic environments in the Cloud for Teradata's customers across the Americas. His team also defines best practices for Cloud analytics and provides direction for product development from a field perspective. Scott is a results-oriented technology executive that led Teradata's transformation in the Cloud, and continues to set strategy and market positioning today.
View all posts by Scott Dykstra