- Observe data in different dimensions trend/moving pattern over time, record the moving paths to perceive past trend and predict future pattern (Use data to do projection based on the past pattern)
- Visualize data based on reasonable measure, such as population
- Show graphs by both Aggregate data and split data
- Dangerous to only use average data (can't see the big picture), since the details can be much different from average. Also, average can be affected by outlier and misrepresent the truth.
- Clearly show over time and regional pattern
- Link data with internet the draw the whole picture and tell story about the data
- Explain the behavior of the data - not just how data is moving but why
- Volume of data makes things different - "big data" to be representative
- How to Access to data and Cleaning of data are important
- Present data in a visually appealing way
- Data is not always in/about a company, rather, data outside the company, such as industry trend, is also important to predict future.
Saturday, April 21, 2012
Lecture 25 & 26 Visualization and social media monetization
Key takeaway from the video, Hans Rosling shows the best stats you've ever seen
Takeaway from GOMC Competition
From a company's perspective, performance of campaign is the most important thing. Therefore it's critical to identify goal of campaign and estimate what the costs and benefits are to measure the overall ROI for the whole campaign. While comparing the performance of campaign, not only compare to pre-campaign but also to post campaign, to show the contribution of the campaign. If the conversions are much higher than in both pre-campaign and post-campaign period, that means that campaign did increase conversions, helping organization achieve goal.
For the pre-campaign comparison, we should not only compare the campaign period with previous month but also the same month in the past year. Comparing to the same month in different enables us to capture the seasonality that is not able to see in only compare months in the same year. We will know whether the increase of conversion is due to campaign or just in hot season. On the other hand, comparing the the months just prior campaign allow us to get the overall scale of the visits and conversions. We can perceive that whether the increase by year is actually attributed by campaign rather than just business expansion.
For the pre-campaign comparison, we should not only compare the campaign period with previous month but also the same month in the past year. Comparing to the same month in different enables us to capture the seasonality that is not able to see in only compare months in the same year. We will know whether the increase of conversion is due to campaign or just in hot season. On the other hand, comparing the the months just prior campaign allow us to get the overall scale of the visits and conversions. We can perceive that whether the increase by year is actually attributed by campaign rather than just business expansion.
Friday, April 20, 2012
Lecture 23 & 24 Dimensional modeling and balanced scorecard
- Dimensional modeling Miscellaneous details
- Slowly changing dimension
- Purpose: Changes happen in attribute's value over time (e.g. brand, department, …), but don't happen frequently, rather happen once a while
- Type 0: nothing is ever going to change (this is not typical)
- Type 1: directly update attributes, completed lose history but simple
- Type 2: in order to preserve history, split tables
- Every time value of attribute change, add a new tuple with a new surrogate key, add start time and end time to the new tuple
- Type3: make changes to the table, Add a new column/new attribute called "new" territory, and change the previous one as "old" territory
- Rule playing dimension
- Junk dimension
- Put data altogether
- Semi-additive
- Non-additive: average, minimum, maximum, can't add together, e.g. balance of bank statement
Wednesday, March 28, 2012
Lecture 19 & 20 Develop dimensional modeling for AYFG case
AYFG is a gym with several branch nationwide. Provided ER diagram, we are going to develop a dimensional model step by step to answer the business questions and address business problems.
The first idea is star schema. Figure 1 below is the conceptual schema of the business. We can see there are several primary entities, including members, memberships, salesinvoices, merchandises, etc. An interesting thing we noticed is that there is only merchandises sales has quantity attribute. The reason of this is based on the assumption that sales quantities of other products, such as memberships and oneday pass, are defined as one for every record.
The first idea is star schema. Figure 1 below is the conceptual schema of the business. We can see there are several primary entities, including members, memberships, salesinvoices, merchandises, etc. An interesting thing we noticed is that there is only merchandises sales has quantity attribute. The reason of this is based on the assumption that sales quantities of other products, such as memberships and oneday pass, are defined as one for every record.
Lecture 17 & 18 Dimensional Modeling
Data cleansing and data profiling take around 80% of data analysis process, playing an extremely important role. The first thing we need to do is to look at the schema and try to understand what information the schema and attributes tell you. We can come up a graph to visualize the relationship among attributes. A few things we can check before analyzing data, are investigation of each attribute on each table, business rules that govern every attribute, such as referential/data integrity, combination of attributes, and carnality. Most common situations that we need to profile are data from different tables in different format because of merger and acquisition or different information system.
We will create system catalog in the data warehouse. Each database has a list of tables. Each table has a list of attributes. Further, Each attribute has a list of constraints. For constraints, it's a good practice to define a default value for each constraint. Otherwise, once you want to merge multiple tables with null value for the attributes associated with constraints, we will need to turn off the constraint to make data merge. The common case would be people forget to turn on the constraints, which might be used to follow data integrity, again, causing a mess of data.
We will create system catalog in the data warehouse. Each database has a list of tables. Each table has a list of attributes. Further, Each attribute has a list of constraints. For constraints, it's a good practice to define a default value for each constraint. Otherwise, once you want to merge multiple tables with null value for the attributes associated with constraints, we will need to turn off the constraint to make data merge. The common case would be people forget to turn on the constraints, which might be used to follow data integrity, again, causing a mess of data.
Monday, March 19, 2012
Lecture 15 & 16 BI infrastructure
OLAP is Online analytical processing, which is used in most BI infrastructure to look at the trend of purchase, aggregating query data/large amount of data and analyzing them.
OLTP is Online Transaction Processing, which is regularly used in operational database, analyzing single query data and small amount data.
There are two primary BI infrastructure. First is data warehouses, which was first brought up by Bill Inmon. The concept is starting from a big enterprise system, eventually divide into small division data marts. All supply chain systems are pulled into data warehouse. The other one is data marts, which was proposed by Ralph Kimball. The idea is starting from small data marts and come together to a big enterprise system. The two actually generate the same result, but different approaches. In the modern age, Kimball is more common.
Sunday, March 4, 2012
Lecture 13 & 14 Network Analysis
Before Analyzing network, we need to have to know basic structural properties. Centrality measures are used to examine how close the relationship in the network. In the centrality, we use different measures to how the network connection.
- Degree Centrality: counting the number of link, i.e., how many people can a particular person like directly. In- and out- link are counted as the same in the degree centrality. If every node in a network has both in- and out-link, we call that fully connected network. Clique means that every node connects to every other node in a network. If a network is not "clique", there must have some bridges between unconnected directly nodes.
- Between Centrality: How likely is a node to be in the direct route between two nodes
- Closeness Centrality: the distance of a node to link to all other nodes in a network
- Eigenvector centrality: How well is a person's network overall. It bases on the influence of nodes to assign score of each node.
Wednesday, February 22, 2012
Lecture 12: Graph Theory
Like 7 bridge problem, graph theory is a study of how nodes in the network connect to each other in the most efficient way. Two primary factors in the graph are vertices and edges. In this modern age, we are interested in information networks, which is the network moving information from one node to another. There are some terminologies that we might need to know before we go further to the detail of networks. First, collaboration graph visualize how people cooperate with each other, linking every pair together if they cooperate with each other. Path is a sequence of path that link nodes. In the Internet, we choose the shortest path/minimum path, which goes through the least number of nodes.
Lecture 11: Long Tail Keywords
On 2006, Chris Anderson published a book called "The Long Tail". He stated that the society is moving from a high demand market portion toward a number of small "hits", accumulating a huge portion in niche market tail.
![]() |
| Source: http://www.longtail.com/the_long_tail/about.html |
Wednesday, February 15, 2012
Lecture 9&10: Pre-Campaign Report & Optimized Publishing in Social Spaces
Generally, pre-campaign report is used by judge to understand the campaign structure/strategy, company background…etc. With the current situation of company, we can measure the baseline case and compare with expected benefits to show the value of campaign. Avoiding abstract phrases, we should provide precise quantitative numbers to measure campaign and make sure you know what you really mean about objectives. For example, how many visitors will be attracted by campaign, or what’s the conversion rate. With concrete figure targets, we can examine performance on real time and adjust movement depending on different situations. While analyzing client industry and current situation, we can not only research industrial data but also incorporate traffic information of website to predict future trend.
Thursday, February 9, 2012
Lecture 8: Design an Ad Campaign
Advertisement is part of marketing. The fundamental idea of marketing is to convey your message to customers and expect conversion. The same idea applies to either traditional ads or online ads. Initial analysis is non-trivial for a marketing campaign. We need to thoroughly understand client's business, industry, and products by marketing researching and constantly communicating with client. The most important is to know business objective clients want to achieve and the potential customers/audiences. Industry information is critical as well. Understanding the industrial trend and competitors help us to design the content of ads.
Lecture 7: Advertising
Traditional advertising, such as TV, radio, billboard, print ads, usually use impression as a unit to measure price and value. Impression is the frequency of advertising showing up. Traditional commercial media charge sponsors based on cost per thousand impression or mille (CPM). Super bowl is an extreme case that cost per impression is very high. All these traditional advertising focus on the frequency of ads appearance rather then the effectiveness of the commercial. Sponsors can spend a great deal of money but do not obtain expected return, since the budget spent on advertising is not positively related to income, placing a huge risk for a firm when spending millions dollars to advertise via traditional media.
Wednesday, February 1, 2012
Lecture 6: Network Analysis
Social media networks grow rapidly. Business, especially marketing folks, want to utilize Big Data to distribute customized message to potential customers. People want to connect to each other to strengthen their individual social networks, building relationship. To achieve those objectives, more and more demand of networks analysis increase. To do network analysis, first we should think about what kind of network we are going to construct and what are we going to define node in the network. Then, how do we connect those nodes? Social media connect people as nodes. This post will focus on two current popular social media in the United States: LinkedIn and Tweeter.
LinkedIn is a professional social media, allowing users to upload resume and connect to each other in related fields.
Lecture 5: Website Analysis Process
Ten years ago, websites are only for web presents. Now, we not only want people to visit our websites but want conversion, that is, people visit our website and do what we want. To improve the website, we should analyze the website and see what we can do. Before starting to analyze visit traffic, there are some critical steps we need to proceed.
First of all, thoroughly understand the website in both content and structure. Knowing the content helps us to familiarize with organization's business and core competences, and what message it wants to convey. From the structure of website, we can see how an organization navigate visitors within website and generate improvements that make website more friendly and goal oriented.
First of all, thoroughly understand the website in both content and structure. Knowing the content helps us to familiarize with organization's business and core competences, and what message it wants to convey. From the structure of website, we can see how an organization navigate visitors within website and generate improvements that make website more friendly and goal oriented.
Sunday, January 29, 2012
Lecture 4: Analysis Presentation of Eller MIS department website
Presentations are focused on recommendations to the alumni and partnership sections of the Eller MIS website to improve each section. Most teams merely introduced the general findings in the Google Analytics and made recommendations without considering the targeted audiences and the goal of the website. As Dr. Ram and Ms. Anji Seigel mentioned, we should identify the goal and the targeted audiences before we start our analysis and making recommendations. Otherwise, we actually don't know what we are working on, since recommendations should be tied to objectives.
Lecture 3: Google Analytics
Before access into Google Analytics account, we are better to know the goal that we want to achieve and what the KPI for the goal is. When you first time log into the Google Analytics, setup a long period time slot to overview the fluctuation of traffic over time. We should be curious about what reasons or events caused the fluctuation, since the factors made a peak in previous period can an important element of our future element. Several critical terminologies of Google Analytics as followings were introduced during the lecture, helping web/data analyst to interpret visitor's behavior.
Friday, January 20, 2012
Lecture 2: BI and Web Metrics
-What is BI (terminology)?
Business Intelligence (BI)
Tools, technologies and techniques used for collection, measurement, understanding, analysis, and prediction using data (internal and external) in some way for performance management.
Performance Management
It’s more than measurement. First identify target and define key performance indicators (KPI), some probably can quantify but some might only qualify, to measure performance. Then based on data analyzed and metrics, we can take appropriate strategic action to reach the target and manage performance. (Monitoring, Managing, Metrics)
Subscribe to:
Comments (Atom)
