Part I
The UScereal data frame has 65 rows and 11 columns. The data come from the 1993 ASA Statistical Graphics Exposition, and are taken from the mandatory F&DA food label. The data have been normalized here to a portion of one American cup.
Firstly, I constructed the scatterplot to examine the relationship between grams of protein in one portion and number of calories in one portion. They horizontal axis represents the grams of protein in one portion and the vertical axis represents the number of calories in one portion. The above graph suggests that there is a moderate association between grams of protein in one portion and number of calories in one portion. Further, the segments are banked to 45 and the aspect ratio is 1 vcm/hcm. Thus we can readily judge the association between these two variables.
Secondly, I reproduced the above graph with violating few attributes of clear vision described in the Cleveland’s book.I incorporate the following attributes which may affects the visual clarity of the above graph.
- The plotting symbols are not sufficiently large enough.
- The tick marks look inward.
- There are large number of tick marks and labels needlessly clutters the graph.
- The aspect ratio is 0.25 vcm/hcm. So the absolute orientation are centered on an angle much less than 45 degree, which interferes with our judgment of rate of change.
Part II
I chose Shelf as a third variable in the USCereal dataset that is associated with both variables grams of protein in one portion and number of calories in one portion. The Shelf variable contains values 1,2,3 which correspond to the bottom, middle, and top shelf at store. Generally, the cheaper cereals are placed on the bottom shelf, the middle shelf have kids cereals and the adults cereals are on the top shelf. The following graph portrays the association between grams of protein in one portion and number of calories in one portion in the different place of shelf at store.
Further, in the following graph, I included the smooth curve to see the relationship between variables.