Saturday, April 19, 2014

Prevalence of Obesity in USA - 2012


Objective: The main objective of this map is to study the prevalence of obesity in USA.

Critique: Before any of the experts in data science or data visualization criticize my project in their blogs let me inform you that i am trying to learn and use tile mill, Qgis and scape toad via this data. I am quiet aware that

1) Alaska looks bigger than it should be. I am not sure why but the shape file is the same
2) A better way to show the prevalence of obesity is via a choropleth map over time but www.cdc.gov  already does this for their viewer.
3) A better way would be to show the same image for 2 different time span and show them side by side but the data for obesity in 1990 is not available for all the states and so it would not be a fair comparision.

usefulness of the distorted map: A good way to use a distorted map like this one is to show population or real estate prices in each state, or internet users. The possibilities are endless.

Technology: I have used a QGIS (open source) for creating shapefiles, Scape toad for creating distorted map and tile mill for making the map look prettier.

Refferences:
1)www.cdc.gov - for data
2)https://www.census.gov/  - for USA states shape file

Thursday, April 17, 2014

Walmart spread in USA - intro 2


Idea: To use Tile Mill to show the location of Walmart in USA.

Limitations: The data for the project was downloaded from Flowing Data . However, i cannot provide you with the data for this project as the data is not freely available. You can download the data from Agg data at a price. The cheaper and better alternative is to subscribe to flowing data tutorials and download the data from the flowing data itself. If possible i will try to download some geocoded data and provide it to the reader but for now you will have to find your way out of this situation.

Step 1: Download

Tile Mill - its an open source project. You can also browse the documentation section, which are very well written. 

Step 2: Data 

The data file is a .csv file with 7 columns.In order to plot the data in tile mill you would need to geocode the data. The data should contain columns lattitude and longitude, else you will get an error.











Step 3 : Power up your Tile Mill

An important note to keep in mind is that by default mapbox will create a folder under your user /documents/mapbox/projects/<name of the project>. Every time you create a new project it will create a new folder under mapbox /project. Hence in case you want to ad images or svg you need to save them under the project folder else you will not be able to use them in Tile Mill.

Open tile mill and click on new project enter a file name and click add, tile mill create a folder on screen double click and open the project. You see that tile mill will use default country data and launch a world map. Users do have the flexibility to change the map but for now lets keep it as it is.

The main screen is divided into 3 parts. The left most part is a menu which can also be used to add data(layers), labels, legends etc. The center contains the world map along with the data once you load it. The right most part contains the css sheet which is mainly used to edit style sheets. 
















Step 4 : Upload Data

to upload the data , save the data in a csv format in a drive or desktop. Click on the last icon on the left most part of tile mill(image for reference). Tile mill will now open a new window. Now enter the ID click browse followed by save and style. The save option will only save the data but save and style will plot the data as well.













You would now be able to see the data points plotted on the map. As you zoom you will be able to observe the data at granular level.

Step 5: Customizing 

In order for the map to look like the image on the very top you will have to add different colors. You can easily do this by selecting the colors from palette at the bottom of the right most part of tile mill.As you select different colors, and save, for the background, borders and maps you will observe that the changes will simultaneously take place on the map itself. The CSS code will also change. If you know how to code you can make changes directly in the code window.

Step 6: Export 

In order to export the data click on the export drop down and select the format in which you would like to export. I usually use png or pdf. Now tile mill will allow you to edit the map and select the area of the map that you would like to export.Sometimes you will have issues selecting the area, i dont have a solution for that.

If you go back to the same export drop down you will observe view exports this will then allow you to save it once the saving process is complete.Tile Mill will also tell you where it has stored the image. Lastly, you can open an account on mapbox and upload your maps by exporting the format MBTiles. I have never used xml but you can explore that option as well.



  

Sunday, April 13, 2014

Inequality and Wealth



Motivation:

The main idea of this article is to introduce readers to Google Ngrams.

Introduction:

Since Obama mentioned the word inequality in one of his lengthy speeches - every newspaper, columnist and every economist i follow had one or two articles on inequality, wage gaps or measurement of inequality.So i thought why not write one myself.

But i had something different in mind i wanted to use the power of big data along with the algorithm of google so i deployed "Ngram". If you have never used Ngrams then this is the right time for it. Just google Ngrams. Ngram basically searches google books huge database to come up with an line chart that shows how often the word has been used since 1800 to 2008.

The best way to interpret this graph is to observe the use of word inequality and wealth since 1800 in books scanned by google. After the second world war race between inequality and luxury  started  getting interesting and it seems like word inequality has gained momentum since 1980. So it is not just an Obama speech that has caused the rise in study of inequality but it has been popping its ugly head since 2000.

Fugly Map - intro 1


Inspiration: I am not that smart to come with an original idea like this one. I came across a tweet by Professor that said that he had asked his students to come up with an ugly USA map and he tweeted the image. I wanted to do something very similar. The link to the professor site : http://www.brenthecht.com/

Tile Mill: The image was created using tile mill an open source project by Mapbox. Its free to download and the documentation is very well written. Its a great tool to quickly generate graphs. It took me like 2 hrs to come up with the image posted above. This image is very similar to what the professor had tweeted.

I will try to write a small introduction on step by step instructions on creating this UGLY map. But if you are curious to learn may be this link will help you get started .

Off course similar things can be done using R. Somehow i think Tile mill is easy to learn and very handy. If you like to play with maps, data and not great at coding then this is a great tool for you.





















Friday, January 3, 2014

Visualizing a network of good men


Summary of study:

Network of good men was created as a side project while taking a course on social network analysis on coursera wherein use was made of an open source package called gephi. The data for the image was collected by me in an excel sheet and is also made available for your use and replication.The main aim of this project was to study the linkages in inspirations and influences of good men in a society. It is hard to define what is good and what is bad as everythiung depends on an individuals perceptions.

Data Collection:

Data for the study was collected using Wikipedia. In order to collect the data i started with Nelson Mandela, Castro, Gandhi, Karl Marx etc. I would go their individual page on wikipedia and search for the word "Influence" and copy the names that would follow after the word. I would then go these influences and search for the word again. The list would thus grow. For e.g.Nelson Mandela was influenced by Gandhi, Marx etc. I would then go to the wiki page for Marx and see who influenced Marx. It does not matter if the influence was from a school teacher, tailor or a famous philosopher. The idea is to study the interlinkages. I have ignored cases where influence came from a school of thought such as Romanticism. I created a list of such influences, note that i have ignored cases Marx would have influenced some one else. I wanted to see who influenced these men to come up with such a path breaking thought process.

Data Processing:

I created two excel sheets one with labels and second one that showed connections. Hence you come down to a file with source and target. The first file is called a file for NODES and the second file is a file for EDGES to be used in the GEPHI package. It is very easy to learn gephi.

Gephi:

In gephi you will need two csv files as mentioned above. once the data is imported to gephi, i have made use of Modularity for partition and betweenness centrality for ranking.I stopped when i reached 50 rows but if the reader is interested he/she can keep on adding to the list.

Conclusion:

We find some interesting points:

1) Karl Marx has influenced majority of revolutionaries in the world.

2)all the men on the outer edge have not been influenced and you will observe that these are the men that have greatly influenced society and have been called father of  a particular school of thought. Adam smith, kant, sun tzu, Plato, Henry salt to name a few have all contributed by influencing other great thinkers but they themselves have not been influenced by any thinker/philosopher. These are the individuals that have no match for the word influence in WIKI. It could be possible that some of them would have been influenced but since i was only using WIKIPEDIA as my data source i relied on information available on WIKI.