# POI-GeoHash

### POI-GeoHash

*Geographic Information Systems (GIS) is an interesting area of exploration because it poses two significant challenges: latency at scale and modeling spatial locality.*

- You want to store location points on disk such that points close to each other in space are close to each other on disk.
- You want to retrieve as few points as possible when responding to a query.

**Starting with a compound rowkey**

Concatenating X- and Y-axis values as the rowkey won’t cut it for an efficient schema.

This sorting results in lots of hopping between the northern and southern clusters because of sorting first on longitude, then latitude.

Using the geohash as a spatially aware rowkey

The geohash makes a great choice for the rowkey because it’s inexpensive to calculate and the prefix gets you a long way toward finding nearest neighbors.

Compared to the target query area, the six-character prefix match areas are huge. Worse still, the query spans two of those larger prefixes. Seen in this context, those five characters of common prefix include far more data than you need. Relying on prefix match results in scanning a huge amount of extra area. Of course, there’s a trade-off. If your data isn’t dense at this precision level, executing fewer, longer scans isn’t such a big deal. The scans don’t return too much superfluous data, and you can minimize the remote procedure call (RPC) overhead. If your data is dense, running more, shorter scans will reduce the number of excess points transported over the wire. Plus, if there’s one thing that computers are getting good at these days, it’s parallelism. Execute each of those shorter scans on its own CPU core, and the query is still as fast as the slowest scan.

https://www.elastic.co/guide/en/elasticsearch/guide/current/geohashes.html

Geohashes are a way of encoding lat/lon points as strings.

Geohashes divide the world into a grid of 32 cells—4 rows and 8 columns—each represented by a letter or number. The g cell covers half of Greenland, all of Iceland, and most of Great Britian. Each cell can be further divided into another 32 cells, which can be divided into another 32 cells, and so on. The gc cell covers Ireland and England, gcp covers most of London and part of Southern England, and gcpuuz94k is the entrance to Buckingham Palace, accurate to about 5 meters.

In other words, the longer the geohash string, the more accurate it is. If two geohashes share a prefix— and gcpuuz—then it implies that they are near each other. The longer the shared prefix, the closer they are.

That said, two locations that are right next to each other may have completely different geohashes. For instance, the Millenium Dome in London has geohash u10hbp, because it falls into the u cell, the next top-level cell to the east of the g cell.

http://www.bigfastblog.com/geohash-intro

1 |
0010110101011100011000110001101111000111 |

1 |
00101 10101 01110 00110 00110 00110 11110 00111 |

1 |
5 p f 6 6 6 y 7 |

http://blog.factual.com/how-geohashes-work

A geohash is a series of bits that repeatedly bisects a search space of latitudes and longitudes. The first bit bisects the longitude, the next one latitude, the next longitude, etc. As a result, some geohashes specify square-ish (in spherical coordinates) regions, while others specify 2:1 rectangles. Because geohashes are fundamentally binary data, they’re often written in base-32. The alphabet is a little unusual in that it omits some letters that might be confused with numbers.

**Typical encoding strategy**

`long`

`encodeGeohash(`

`double`

`lat, `

`double`

`lng, `

`int`

`bits) {`

` `

`double`

`minLat = -`

`90`

`, maxLat = `

`90`

`;`

` `

`double`

`minLng = -`

`180`

`, maxLng = `

`180`

`;`

` `

`long`

`result = `

`0`

`;`

` `

`for`

`(`

`int`

`i = `

`0`

`; i < bits; ++i)`

` `

`if`

`(i % `

`2`

`== `

`0`

`) { `

`// even bit: bisect longitude`

` `

`double`

`midpoint = (minLng + maxLng) / `

`2`

`;`

` `

`if`

`(lng < midpoint) {`

` `

`result <<= `

`1`

`; `

`// push a zero bit`

` `

`maxLng = midpoint; `

`// shrink range downwards`

` `

`} `

`else`

`{`

` `

`result = result << `

`1`

`| `

`1`

`; `

`// push a one bit`

` `

`minLng = midpoint; `

`// shrink range upwards`

` `

`}`

` `

`} `

`else`

`{ `

`// odd bit: bisect latitude`

` `

`double`

`midpoint = (minLat + maxLat) / `

`2`

`;`

` `

`if`

`(lat < midpoint) {`

` `

`result <<= `

`1`

`; `

`// push a zero bit`

` `

`maxLat = midpoint; `

`// shrink range downwards`

` `

`} `

`else`

`{`

` `

`result = result << `

`1`

`| `

`1`

`; `

`// push a one bit`

` `

`minLat = midpoint; `

`// shrink range upwards`

` `

`}`

` `

`}`

` `

`return`

`result;`

`}`

**Where’s the nearest wifi hotspot?**

You want to store location points on disk such that points close to each other in space are close to each other on disk.

You want to retrieve as few points as possible when responding to a query.

design of the rowkey is the single most important thing you can do in your HBase schema

We claimed earlier that concatenating X- and Y-axis values as the rowkey won’t cut it for an efficient schema.

longitude and latitude are equally important in defining the location of a point. In order to use them as the basis for the spatial index, you need an algorithm to combine them.

You’ll use 12 geohash characters for the precision because that’s the highest precision you can fit in an 8-byte Long and still represent a meaningful character string. By truncating characters from the end of the hash, you get a less precise geohash and a correspondingly less precise selection of the map. Where full precision represents a point, partial precision gives you an area on the map, effectively a bounding box around an area in space.

The geohashes are all represented as character strings in the Base32 encoding alphabet.

the optimal approach for the data is to scan the center tile and its eight neighbors. This approach will guarantee correct results while minimizing the amount of unnecessary network IO. Luckily, calculating those neighbors is a simple bit-manipulation away.

http://www.mitbbs.ca/article_t/JobHunting/32476139.html

里面讲的很清楚怎么样apply geohash to query.如果我没有理解错的话，用里面说的

方法可以省去前面大牛讲的design中的index server，就是存放每个grid中的PO壹的信息的

给一个point and range,应该可以计算出minimum geohash prefix that contains thearea

然后用这个a list of geohash prefix去storage layer 拿所有符合要求的POI就可以了。

https://github.com/yinqiwen/ardb/blob/master/doc/spatial-index.md

http://gis.stackexchange.com/questions/18330/would-it-be-possible-to-use-geohash-for-proximity-searches

Store Spatial Data

ZADD key <GeoHash50Bits> <value>

Encode longitude/latitude coordinate with estimate bits by geohash-int.

Find surrounding 8 neighbors’ geohash integer value by geohash-int.

For each geohash integer value, we generate a pair (GeoHashIneger, GeoHashIneger + 1). Then we got 9 pairs.

For each pair, we convert it to a score range. Any integer value in the pair should be left shift to 52 bits. The shifted value is the smallest 52 bits geohash value in the geohash box represent by unshifted GeoHashInteger.

For example, if we need search points in radius 3000m, then we should encode the longitude/latitude coordinate to a integer with 26 bits, then left shift it 26 bits, then we got a 52 bits integer.

For each score range, use ‘ZRANGEBYSCORE key min max WITHSCORES’ to retrieve all point’s value and it’s score.

For each point value and it’s score, we can decode the score to a GeoHash area by geohash-int and compute the distance with given longitude/latitude , then compare the distance with given radius value to exclude the point not in radius.

Solr Spatial Search

For indexing geodetic points (latitude and longitude), supply the pair of numbers as a string with a comma separating them in latitude then longitude order. For non-geodetic points, the order is x,y for PointType, and for RPT you must use a space instead of a comma, or use WKT.

`LatLonType`

and its non-geodetic twin`PointType`

`SpatialRecursivePrefixTreeFieldType`

(RPT for short), including`RptWithGeometrySpatialField`

, a derivative`BBoxField`

sfield a spatial indexed field`fq={!geofilt sfield=store}&pt=45.15,-93.85&d=5`

. This filter returns all results within a circle of the given radius around the initial point:`fq={!bbox sfield=store}&pt=45.15,-93.85&d=5`

. The rectangular shape is faster to compute and so it’s sometimes used as an alternative to geofilt when it’s acceptable to return points outside of the radius.

**Coding**

**Products**

GeoFire

https://www.firebase.com/blog/2013-09-25-location-queries-geofire.html

A **point of interest**, or **POI**, is a specific point location that someone may find useful or interesting.

Q1. Given the current location, how to find the most closest k points.

Q2. Given the current location, how to find all the points within k miles.

A1. Geohash

A2. K-D tree

A3. Space-filling Curve

intersection of intervals — search points in a range.

intersection of rectangles (using bst) search overlapped intervals

https://www.youtube.com/watch?v=Igr6yONkpIQ