Scenic spot identification:
In this exercise, you are going to deal with a real-world problem of extracting scenic spots from the latitude & longitude infomation from the photos archives such as flickr. You can download the data file at here, which contains three columns of longitudes, latitude, and URL of the photos. You can plot the data distribution as follows:
From the above plot, you can almost see the shape of Taiwan. Probably you can also identify several big cities. Our goal is to identify the mostly visited places of Taiwan, or scenic spots. One way to achieve the goal is to partition the whole map into grids and counting the photos in each square. The square with the higher counts will be regarded as the scenic spots. To do so along the x-axis, please following the general steps as follows:
Find the center points of each square along the x-axis:
xCenter=linspace(min(x), max(x), m);
Note that m is used to determine the size of each square.
Find the boundary of each square along the x-axis:
xBoundary=(xCenter(1:end-1)+xCenter(2:end))/2;
Determine the intervl of of each photo's location along the x-axis. For instance, if you want to determine the interval indices of photos 3 to 5 when m=1000, use the command:
squareIndex=rangeSearchBin(x(3:5), xBoundary);
The result is [278; 199; 960]. (Note that rangeSearchBin is available within the Utility Toolbox.)
Please follow the above procedure to find the counts along the y-axis, and then combine them to get the counts for all the squares. When m (resolution on the x-axis) is 1000, n (resolution on the y-axis) is around 1778, so you should initialize a sparse matrix countMat of size m by n to store all the count information. This is just like a 2d histogram.) You need to complete the following tasks, assuming m = 1000.
Plot the sparse matrix countMat using spy command as shown next. What is the density of the sparse matrix?
Plot the grid lines and the data distribution, and overlay the top-10 scenic spots with their counts, as follows:
Note that you should be able to zoom in to view the details. For instance, you can zoom in the scenic spot at Tainan to get the following plot:
Please find a way to specify a region as the scenic spot instead of a single dot. Here you can use convhull to find the region and then plot it on top of the data distribution.
When m is 1000, the size of the square is about 200000/m = 200 meters. This indicates we are interested in big-area scenic spots, such as Taipei 101 and its surroundings. When m is 10000, the size of the square is about 200000/m = 20 meters. This indicates we are interested in small-area scenic spots, possibly some famous restaurant (for example, 鼎泰豐). Please repeat the above process when m is 10000.
Improvement to scenic spot identification:
Apparently the method proposed in the previous exercise for scenic spot identification is not quite satisfactory since some scenic spots are likely to be divided by a grid line. Please propose your own methods and demonstrate the improved results.
Tree representation:
Plot a tree specified by the vector [0 1 1 2 2 1 6 6 6 7 7 11].