Monday, February 15, 2010

Making (non)sense of social data though visualization

Moreno's idea of sociagram- if at all a novel idea of that time, is an interesting way to understand relations and associations. However, it lacks weight or degree of intensity in nature of such associations. In the image of "Who Recognized Whom Among a Collection of Babies", or in  network of friends, the strength of recognition and friendship is a vital element to understand the association properly. A crude way to improve it would be to add thicker, colored arrows to depict such detail.

On a side note, this fixation with dichotomy in "categorizing" world, I think is way too simplistic to warrant any thought, but this method has dominated our thought process for a very long time; perhaps a result of having Descarte in our mind; or shall I say brain!!  But for such simplistic view of the world I would blame lack of resourses (computational ones) and not the acuity of mind in analysis.
And my point is corroborated by having sophisticated visualization of statistics in all areas including social networks these days.

In the article by Freeman, I particularly liked the work of Michael Chan-  Spring Embedder.

Having more computational powers and graphics = increasingly sophisticated and refined analysis.

For example, if you don't already know, here at UF, during exam season, to decide which exam will be held at what time, and in which room, is all done by network flow algorithm, possibly visualizing them as well but I am not sure of the visualization part!

Doctors (real medicine ones, not Phds) in academia are familiar with another graph/network balance problem to decides the distribution of MD students in various universites. This is  amusingly called "Stable marriage problem"!

Coming back to Moreno- in problems faced by geographers, and in some socialogical studies, I can see why Moreno's idea is good. I can appreciate visualizing food chain, trade routes and such, as a real value addition to our understanding of things.


And for "non binary" problems(isn't  everything nonbinary?), I would say that factor analysis seems like a good way to find patterns and trends,  discerning what "majority" of sample gravitate towrds- by emphasizing  first few vectors in the result. And as is highlighted in the article, it does give a good sense about the nature of the overall trend, its spread, diversity and such.

And then enters "digital visualization"! DRAW and ENERG, or Pajek's limitations in terms understanding 3D data is now overcome by using sophisticated graphics and animation; say for time variance analysis or scenario simulations.
Running complex analysis, for example principal component analysis which takes the eigenvector method (mentioned in Freeman's article) one step further.  Weeding out unwanted data, is now possible in realtime, even  on cheap mobile phones! In case you are wondering why mobile phones, a form of independent component analysis (FastICA) is used to filter out noise from the actual spoken words  before they are used in voice recognition or any other software in mobile phones.
 
All these lead to the visually appealing graphs, charts, animations, enabling us to actually "see" the analysis. And it is all great! But some of us have already started to protest this information overload! Not me, I believe information overload in terms of visualization is because we don't have very sophisticated way to measure the knowledge or actual content in the information- and visualization adds to the already cluttered system. I am referring to Shanon's information theory. Although it is widely used in data compression and has a lot of significance in electrical and electronic engineering, Shanon's mathematical theory of communication does has its significance in measuring information content or entropy. I am not aware of any work done to use it in social network visualization, might be worth looking into for sincere treatment of the matter.

Reading the article by Linton Freeman made me worry about a big issue with visualization and data analysis in general. As we progress towards better visualizations of relations, networks and trends, I feel we must not get too carried away, and should always remember that just as easy it is to produce some fancy graphical representation of social dynamics in play, say for example, who got dumped on a valentine's day, for the record- it wasn't me because I don't have a girlfriend who could possibly dump me! How convenient! :)  Anyways, coming back to visualizing the issue of who got dumped on valentine's day, Arturo showed me one such fancy work done in Processing. I cannot find the link upon googling, so please bear with me. It basically shows some trend and patterns on the topic, possibly by scanning some keywords in tweeter feeds.

My point is that making any visualization system too appealing has its downside in our susceptibility in taking it as true analysis without questioning it.
We already had the issue with bad statistics, cooked up data and now we add another layer of appealing graphics on top of things. It is like hiding scars by wearing makeup!!

It would be an interesting research to measure our gullibility in believing graphical data against plain  numbers. I believe both methods (numbers and graphics) can be wrong in their own ways, but it is easier to take  fancy graphics as truth than numbers and charts.

And just as statisticians can make mistake in collecting and analysing data,programmers can add erroneous code (bugs!) in graphical visualization and twist the facts(or lies) further!

I have always been cynical about trends, statistics in social realm.
I believe visualizing it using newer and sophisticated means is a good way to get some perspective, but we should be very cautious in judging the veracity of such visualization, and should always have a questioning mind towards such fancy visual tricks.
A recent example can be brought forth on climate issue- you will find cooked up data being broadcasted on television with well done graphics and trends in colors to "make you believe". Here is one such example I could find right now.



Bottomline - if you found my ranting too convoluted and unnecessary :)
In visualization of social aspects, digital tools give us decent means to enrich our perspective of things, but at the cost of adding another seducing layer on top of potentially false data which may misguide us further.

No comments: