Maximizing Your Return on Big Data (Part 2)
Some of the Main Steps in Leveraging Big Data
Access needs to be delegated throughout the organization and maintained. If you have an email marketing provider, do the right people have access to be able to download the click-through results and load into your CRM system?
Your SAAS (Software As A Service) applications need to be integrated so that data is pulled automatically and loaded. There are some appliances and software that can help with this. Your ETL (extract, transform, load) tool should be robust enough to utilize SAAS processes directly so you don’t waste time on intermediate steps that take expensive staff time.
Standards should be developed. Example: If you’re working with multiple telemarketing vendors, you should insist that the data returned from all of your vendors should adhere to your formats and definitions. This makes it a lot easier for you to integrate and manage the data you’re receiving.
You need to simplify the data discovery as you get more and more data sources. The key is to have structured entity relationships. Discovery should include a pre-load and profiling analysis before you load into your system. It’s not fun to do one-off data migrations and discovery is key to understanding what you’re getting and to design a process that’s systematic and repeatable.
It seems obvious, but cleansing is a critical strategy. If you have too many outliers, don’t include the data, it will just confuse decision making. Set up a formal schedule to review your cleansing rules to make sure they’re still valid.
Sometimes a source system such as a student information system can change and you might not be informed – and this is not just a new cleansing rule. You’ll need to consider what happens in the registrar’s office that might have changed the way the data is collected and what it means. When evaluating/re-evaluating cleansing look at the data but don’t forget the human elements.
One strategy for making integration easier is to consider replication as opposed to direct access. When was the last time you asked someone for access to their mainframe – and got a response in a timely fashion? Replication is quick and data can be streamed in real-time. It’s easier to grab the raw data and crunch of it all in your own data warehouse – where you have all of the direct control – than to modify or change an external system.
The integration of social data and customer data is one of the newest horizons of data integration. Consider retailers who have wish lists. Other users can see their friend’s wish lists. MIT developed a philanthropy wish list a number of years ago so donors could keep track of projects they were interested in giving to. If we move this to a social app, we need to integrate the data in – what users are wishing for – and information out – what projects to we have available – to complete the donor cycle. Fundraising and engagement has a lot of use cases for social media.
Integration should include streaming data. There’s a hospital admission desk where one of the admission questions for patients is “Do you want to opt in to receive communications form the Foundation.” Whether or not the question is asked depends on the person behind the desk. If data from the on-screen check box was streamed in real time, the fundraising organization could be instantly aware of where an admissions staff person might not be asking the question and corrective action could be taken immediately rather than after the fact – when it’s too late to ask.
Integration tools continue to improve. Wherever possible, select tools that work graphically rather than having to do a lot of coding.
What tools are you using to quickly prototype? Going through a “traditional” report development cycle can be very time consuming and by the time you get the first report, you may have forgotten all of the questions you need to ask. Utilize visual analytics or similar tools to help design the outputs. Get these out quickly to your users and ask for their feedback and any other additional questions.
Part of your analysis should consider archiving data that’s no longer being used. As disk space has become less expensive our tendency has been to save everything forever. Big data makes this more complex. If we’re storing tweets, how long are they valid? A change in employment status reported in LinkedIn has a more “permanent” value because it indicates a change in life status – which has obvious wealth indications, whether positive or negative. A change in employment data for key constituents should trigger research that a profile may need to be updated.
Analysis is not just about presenting pieces of information, analysis is also about what systems we need to think about creating based on the information that we’re receiving.
Sometimes, complex events are difficult to understand and manage. Complex event processing has a past, present and future. In any event there can be gains, losses, problems, incidents and opportunities. Rule based engines help to define where we need to focus.
A donor logs into our online community, navigates though a few pages, makes no gift. We send out a number of email newsletters. The donor has had a visit from a major gift fundraiser. The stock market has declined. Real estate is up. What are the business rules we need to have in place to understand the complexity of the event and inform us what to do? It’s a little akin to data mining – these are the analysis directions we’ll eventually need to go in. Most of us are a long ways from having any automation built into these complex events – but we should start planning.
The Love of Big Data
One of the largest dating and relationship sites marries about 500+ people a day out of their 40 million customers. They are very good at managing very large amounts of data in real-time. Not only is the management of big data critical – it may be one of the key foundations in ensuring the continuity of civilization.
Make sure you understand the trends and so can plan your own Big Data strategy.