FINAL WORD
“
THE MAJORITY OF
COMPANIES PERFORM DATA
PREPARATION AND ETL IN ONE
SYSTEM AND MACHINE LEARNING
IN ANOTHER, WHICH CAN CAUSE
EXTRA WORK FOR IT.
François Sergot, Dataiku
are functioning as they should and maintain
security across the organization. It is worth
noting that while data democratization
can cause more work for IT architects, it
is not their primary concern. Integration
is. IT architects work with teams across
the organization, including data teams, to
manage integration globally.
The good news is that IT architects can
leverage data science tools to help make
their jobs easier which, in turn, allows people
to gather data themselves (even from
multiple sources) and merge it without IT
intervention. In order for that to even be a
possibility, though, IT architects will need
to ensure the proper security protocols and
systems are in place, so they still have a
critical role to play in the process.
Data orchestration involves automating the
process of taking siloed data from multiple
data storage systems and locations,
combining it, and making it available
for analysis and insight extraction.
This orchestration is pivotal in today’s
enterprise, given the rising complexity
of the data landscape. This is mainly
due to the variety of data repositories,
the growing use of alternative and
unconventional data pools and the use of
hybrid infrastructure – not to mention the
fact that this process likely carries different
meaning for different companies.
Extract, Transform, Load (ETL)
IT architects usually face the brunt of ETL
tasks. However, it doesn’t need to be that
way. By equipping data science users with a
data science tool that can handle everything
data-related, IT architects are putting data
integration into the hands of the many,
including those working closely with the data
for a given data science project.
In Dataiku’s survey of IT leaders, the
majority of companies perform data
preparation and ETL in one system and
Machine Learning in another, which can
cause extra work for IT.
There are many benefits to implementing
a data science tool with ETL capabilities,
notably the fact that it enables IT architects
to avoid going through multiple tools for a
snapshot of where and how data is flowing.
It can also help organizations maintain a
consolidated view of the data that fuels
business decisions (making it easier for folks
like analysts to examine and report on data
relevant to their projects). Further, it fosters
consistency by ensuring data policies or
even the data itself is consistent across
each data location.
Security in the age of
data democratization
IT architects are heavily involved in
compliance and cybersecurity initiatives across
the organization. As more data becomes
accessible, their role becomes increasingly
critical in order to make sure policies are
enforced and audits can be carried out easily
and without any bad surprises (e.g. for GDPR
or ISO 27001). Without a unified workspace,
those policies and audits can become
extremely complex and require a lot of time
both from IT architects and the auditors.
Using a unified data science tool can
make a significant impact when dealing
with securing data, but one additional
important advantage lies in how such
a unified tool greatly mitigates the risk
added by the usage of multiple cloud
solutions – it’s hard to deny that the shift
to hybrid or pure cloud from pure onpremises
infrastructure has brought many
advantages to companies.
With data accessed through a unified,
secured and audited workplace, companies
can benefit from cloud platforms without
increasing their risks. As an example, an IBM
report estimates the average cost of a data
breach to be US$3.92 million, demonstrating
the importance of a robust data governance
plan (which should always include data
quality and security).
Successful data orchestration is the
key to extracting impactful insights
There’s no arguing that, individually,
instruments sound nice when they are
played by someone who knows how to play
them. However, when orchestra instruments
are played together in unison, they go a step
further than just sounding nice – they give
the conductor a job and allow the audience
to watch the one-of-a-kind performance, all
from one stage.
Similarly, data orchestration combines
various disparate datasets from different
systems and locations, brings it together,
and makes it available for the extraction of
deeply informed insights (which can then
inform business leaders to make decisions
that have real-world impact).
In an era of data democratization, the role
of IT architects to support and provide
transparency around both SSA and o16n
initiatives, ensure elasticity and security and
provide data accessibility to a wide variety
of users, is pivotal to the true success of
data projects. •
84 INTELLIGENTCIO www.intelligentcio.com