Showing posts with label Data Visualization. Show all posts
Showing posts with label Data Visualization. Show all posts

Storytelling with data

 
Each of these rules will help guide you to the delivery of a compelling and effective visual story



U.S. Presidential Elections 1900-2024

A picture really can depict 1000s of words



From the Industrial Revolution to WWI to the Great Depression to the New Deal to WWII to the Red Scare to the Civil Rights Movement to Neoconservatism to Neoliberalism to 9/11 to the Great Recession to MAGA, these small multiples charts beautifully encapsulate U.S. presidential elections and the attendant American political eras that fueled them in these past 124 years.



Major U.S. Economic Events

1929-1932 Great Depression

The gridlines are distracting and unnecessary but this chart, and the informational callouts, explain the stock market crash that caused the Great Depression




Post-WWII Boom

The United States was the biggest gainer among the countries whose GDP per capita spiked after WWII




Oil Prices and Inflation

The U.S. experienced huge inflation spikes in the 1970s due to oil price "shocks" and again in 2022 due to Russian oil sanctions and COVID stimulus




1980s Savings and Loan Crisis

Well over 1000 banks failed as a result of the money market and junk-bond deregulation that led to the S&L crisis; this hasn't happened before or since




Millenium Dot-Com Bubble

Between 1998 and 2002 the NASDAQ rose from ~2000 to over 5000 only to come crashing back to just over ~1000 by the end of the bubble burst




2007-2009 Great Recession

Due to internationally linked financialization, the Great Recession impacted not just the U.S. economy, but economies across the world




2020 COVID and U.S. Unemployment

This graph illustrates the impact that COVID had on U.S. unemployment, which subsided once vaccines were developed, re-opening economies worldwide 




Music Sales

Fascinating composition bar chart showing how music sales have shifted from vinyl to cassettes to CDs to mp3s to streaming



Whenever you present a story about data you will invariably be displaying some values (along a time series axis, or in static/isolation) within a categorization-based visual. The four primary chart types are Comparison, Relationship, Distribution and Composition. We will briefly pose the questions that illustrate what each of these chart types aims to answer.
  • Comparison: How much of each subgroup exists in relation to the other subgroups?
  • Relationship: How did a value change over time? (or change in relation to some other non-temporal metric like "how are various foreign currency exchange rates impacted by the movement of the U.S. federal funds (interest) rate?")
  • Distribution: What is the concentration of values within different percentiles if you chart the data along a linear scale?
  • Composition: What are the sizes of each of the constituent parts that comprise the whole of the thing you are trying to depict or explain?
  
Other key data visualization concepts to know and always be considering...:

Avoid Chartjunk: The goal should be to encode as much information as possible using near-exclusively "data ink", and as low a level of "non-data ink" as possible. Charts and graphs that contain excessive non-data ink (also called "chartjunk"), which is any chart content that does not communicate information relevant to your data visualization, are only going to confuse the consumer of your data visualization and hinder the expression of the message your visualization is meant to convey.

Chartjunk should be ruthlessly excised wherever it is found. Only include the data ink that serves the communication of your data visualization. All extra clutter will detract from and degrade your dataviz and your message- which is the entire purpose of graphs and charts.

Know Your Audience: It is important that you know your audience. A chart presented in a scientific or academic journal is often expected to contain values derived from complex ratios and formulas and labeling using esoteric language; a chart presented for general consumers in a newspaper or magazine is not. You should have an idea of what the baseline expectations are for the data you are presenting and ensure that you communicate your visualization in a way that is easy for your target audience to understand. 

Data Integrity: Many charts have been used as propaganda and to otherwise mislead people. This is done by using outright fake data or trying to elicit specious insights from thin data sets that do not provide a complete picture of what a particular data point or set of data points means within the context of other data it is a part of.  



References:











Power BI and SSRS - A complimentary symbiosis

 

"Generally, Power BI paginated reports (SSRS reports) are optimized for printing, or PDF generation. Power BI reports are optimized for exploration and interactivity."

 

So, what really IS up with MSBI these days? Is SSRS getting shuttered? PBI has paginated RS-like reports, but not a lot of the other features SSRS provides. Microsoft marketing will continue to hate people like me who would go to great lengths to help keep alive an aging reporting technology. But the thing is- SSRS simply does the job for the vast majority of reporting use cases. In the last 18-20 years, there have been no major advancements in scheduling and snapshotting and otherwise caching and managing and distributing electronic information. And SSRS has all of that built in.


Similar BI products- but they'll both be around for a while, with PBI eventually subsuming all the most useful SSRS tech


Nearly all of the advancements in reporting technology have come on the presentation and client-side. We can now create beautiful ad-hoc analysis and brilliantly composed interactive charts and other data presentations. But this all comes with a not insignificant price ($10 per user/month). And beyond the price, much like Azure SQL (vs. a genuine "Microsoft SQL Server" VM) and the extremely limited Azure SQL Workbench (vs. SSMS), there is a lot that Power BI cannot do well or at all.

You may have noticed the built-in SSRS reports in SSMS19's new QueryStore feature. These are very useful reports that can give DBAs an idea of how queries and being processed and what processes are consuming the most CPU. And it is a good example of a company "eating its own dog food".

I've seen SSRS installations that contained thousands of reports representing trillions of dollars of value, categorized and summarized with realtime security ownership, counterparty, price and other core trade data. Several of these business-ctitical reports had scheduled delivery and were cached and snapshotted programmatically.

Little old SSRS is a quiet but reliable business data spartan. To my surprise it is actually quite popular in the investment banking industries where stock valuations and company summaries on reports are a big part of the lifeblood that drives investment banking decision making.


A professionally developed PBI report looks and behaves more like an interactive BI dashboard- this is a good example of a good PBI report


And with a little bit of customization magic via things like ExtRSAuth, ExtRSNET48, ExtRS and other RS extension tools, SSRS and Power BI can be tailored-made to serve as a uniquely effective symbioses of print formatted, scheduled and data-driven management reports (SSRS) and ad-hoc or OLAP-based interactive data analysis charts with data visualizations (Power BI).

To answer the question of "when will SSRS be end of life?", I would say that SSRS isn't going away anytime soon. Microsoft has decided to combine SSRS and PBI (RS, .rdl reports are the "Paginated Reports" in PBI) in a way that serves both platforms. The PBI3.0 REST API indicates as much as the combined SSRS/PBI API offers a plethora of functionality that .NET developers can use to get the best of both worlds (SSRS and Power BI) and customize RS and/or PBI dashboards to support unique business processes. 


A print-formatted SSRS report- great for standardized, templated data reporting


The choice in what tool or tools you will enrich your printed reports and data visualizations with, is yours. Keep in mind that many organizations make use of both- with SSRS getting equal to more attention than PBI even to this day- not only because of the huge, global SSRS install base with all of these currently running SSRS reports- including many which support critical business and governmental processes across the globe. But also because Power BI requires a monthly subscription fee :( . Freeware seems to be slowly dying. Let's hope things change with the next version of SQL Server and maybe we'll get free* PBI. 

SQL Server 2025? Happy reporting and data analyzing. Always remember that PBI and SSRS serve different organizational needs- ad-hoc analyzation of data and pixel-perfect, professional, print-ready reports, respectively.


*(at least a free "tier"? I mean c'mon MSFT..... developers want to CREATE, and MSBI data visualization creativity is dying behind that paywall)- SSRS and PBI should be free and work hand in glove. Anything less is a mistake and a gigantic missed opportunity, imho.


Reference: https://learn.microsoft.com/en-us/power-bi/guidance/migrate-ssrs-reports-to-power-bi

ChartJS for Data Vizualiizations

I came across ChartJS about 2 years ago while debugging code from another, similar data visualization technology inside an AngularJS app.


ChartJS is a flexible JavaScript data visualization framework which allows for some pretty powerful integrations and customizations


The concept is you have a "<canvas>" DOM element, which you transform into a ChartJS chart via some JavaScript initialization. After that your tasks are simply finding the data you want to render and deciding the options of exactly how you want the chart visual to appear.

You can do some really neat and dynamic stuff with ChartJS.

I have used a lot of charting frameworks, and it does not get more flexible or simple than this:
 <html>  
 <head>  
 <script src="https://cdn.jsdelivr.net/npm/chart.js@2.8.0"></script>  
 </head>  
 <body>  
 <div>  
 <canvas id="myChart" style='background-color:darkgray; width:100%; height:100%;'></canvas>  
 </div>  
 <script>  
 var ctx = document.getElementById('myChart').getContext('2d');;  
 var chart = new Chart(ctx, {  
   type: 'line',  
   data: {  
     labels: ['16_Qtr1', '16_Qtr2', '16_Qtr3', '16_Qtr4', '17_Qtr1', '17_Qtr2', '17_Qtr3', '17_Qtr4', '18_Qtr1', '18_Qtr2', '18_Qtr3', '18_Qtr4', '19_Qtr1', '19_Qtr2', '19_Qtr3', '19_Qtr4', '20_Qtr1', '20_Qtr2', '20_tr3', '20_Qtr4', '21_Qtr1', '21_Qtr2', '21_Qtr3', '21_Qtr4','22_Qtr1', '22_Qtr2', '22_Qtr3', '22_Qtr4', '23_Qtr1', '23_Qtr2', '23_tr3', '23_Qtr4'],  
     datasets: [{  
       label: 'Some random quartley demo data..',  
       backgroundColor: 'black',  
       borderColor: 'lime',  
       data: [40.2, 72.88, 47.1, 22, 54.43, 52.18, 17.1, 52, 67.2, 54.88, 64.1, 78, 67.2, 55.88, 58.1, 57, 50.2, 52.88, 57.1, 62, 74.43, 62.18, 67.1, 72, 77.2, 74.88, 74.1, 78, 77.2, 75.88, 78.1, 77, 70.2, 72.88, 77.1, 62, 64.43, 62.18, 67.1, 72, 67.2, 54.88, 44.1, 28, 27.2, 25.88, 38.1, 37, 40.2, 42.88, 44.1, 52, 54.43, 52.18, 67.1, 82, 87.2, 84.88, 84.1, 88, 87.2, 95.88, 108.1, 127]  
     }]  
   },  
    "options": {  
       "legend": {"position": "bottom"}      
     }  
 });  
 </script>  
 </body>  
 </html>  


Reference: https://www.chartjs.org/

Graphical Integrity

"The representation of numbers, as physically measured on the surface of the graphic itself, should be directly proportional to the quantities represented." -Edward Tufte

It is amazing how easy it is to find highly inaccurate and misleading data graphics and charts even in this year 2019. These inaccuracies and sometimes outright perversions of the truth are of particular concern to an insta-culture who gets its news in headlines, memes, charts and other bite-sized generalizations via social media and rarely looks for the evidence beyond the headlines and the source data behind the charts.

The “Lie Factor”, first defined by American statistician Edward Tufte is defined as "a value to describe the relation between the size of effect shown in a graphic and the size of effect shown in the data." A larger Lie Factor value indicates a higher level of deception or "inaccurate scaling/weighting".


Lie Factor in Action:

The numbers do not equate to the scale of the bars and money bags... not quite as "strong" as projected.




This example mixes 2 different scales and data sets and only serves to confuse the reader...




This is a propaganda data graphic displaying a series of 5 increases using a totally nonsensical scale



This graphic shows Last Year, Last Week, and Current Week as having the same temporal scale.... O'Lie Factor.




Lie Factor Breakdown:

Lie Factor is the change shown in the graphic (say 100%) divided by the change reported in the data (say "50%") - (100/50 = a LF of 2)


There are reasons for misleading graphics that go beyond propaganda and sensationalist news articles:

  • Lack of quantitative skills on the part of the graphic creator and publication editor
  • Doctrine that statistics are boring and therefor need to be "jazzed up"
  • Doctrine that graphics are only for unsophisticated and so don't need "accuracy constraints"
  • Failure to treat graphics with the same fidelity to the truth as the written word it accompanies

Other ways that graphical information displays are corrupted include cherry-picking data, making small changes appear large by showing a small scale interval and when all else fails for information manipulators- using fake data.

It is important to not jump to conclusions when assessing graphical information displays even if it is coming from a reputable publisher. As you can see it is not always obvious that the information being communicated graphically is accurate. Wherever possible, get a look at the source data.

"When we see a chart or diagram, we generally interpret its appearance as a sincere desire on the part of the author to inform. In the face of this sincerity, the misuse of graphical material is a perversion of communication, equivalent to putting up a detour sign that leads to an abyss" - Wainer


References:

https://viz.wtf/

https://infovis-wiki.net/wiki/Lie_Factor