Job done!

This week we are closing a number of items we have been pursuing for a while. I’m writing the last post as part of a lift exercise to ‘blog more’, and have added the last few visitors to our store crawlers. Lets quickly list out what we have done.

The Vomar visitor

vomar

With store details on individual pages, the Vomar website has the same pattern of use as the Jumbo visitor. The main page contains a master list of pages to visit, preventing that we need to revert to brute force searching.

The MCD visitor

MCD

With all stores nicely listed on a single page, the MCD site is easy to crawl with Cheerio. Simple one lines do the job. Only thing missing are the latitude and longitude of store locations in parsable format. We will have to revert to geocoding based on the address as part of a later quality improvement effort.

The Agrimarkt visitor

Agrimarkt

The Agrimarkt visitor follows the same pattern again as the Vomar visitor. Nothing special to note, and no deviation from the pattern. Easy piecy.

The Jan Linders visitor

JanLinders

Jan Linders is mostly found in the south-eastern part of the Netherlands. It provides a clean JSON data stream for its store locations and enough detail information to not have to crawl the actual website.

The Nettorama visitor

Nettorama

Nettorama seems to position itself as a discounter on the dutch market like the German originated Aldi and Lidl. The store locations are embedded in data islands in a single page, which makes it really easy to index.

The Poiesz visitor

Poiesz

Never heard of it before, the Poiesz brand (don’t really know how to pronounce it correctly) seems to be a regional focused brand for the northern part of the Netherlands. Like the Nettorama store locator, the store webpage contains all the information we need to create entries in our database.

The Spar visitor

spar

Last, but effort-wise not least, is the Spar brand. Spar is a concept in which local store owners are collaborating under a common name, call a franchise. The site is server generated and keeps the results restricted to location. I’m not sure how much results we get returned and what the relevance is with either amount, or biggest distance. This is a good candidate to review for quality later on as well.

Some statistics

And with that last visitor created we finish our initial effort of indexing store locations. Here are some numbers:

  • Total number of stores in database: 4106
  • Total number of brands indexed: 25
  • Total number of visitors created: 22

From the initial list of brands in scope, Sanders was removed as it has been incorporated by the EMTÉ brand in 2011. Instead, Dagwinkel was added since it was automatically brought into the list by the Attent visitor. I made a correction to the Wikipedia page I used for reference to create my original list.

The list of stores per brand (concept) indexed as of March 9, 2014:

Brand
Stores
AH
816
Aldi
478
Jumbo
409
Lidl
382
Plus
254
C1000
251
Spar
235
Coop
143
EMTÉ
129
Troefmarkt
125
Attent
104
DekaMarkt
68
Deen
66
Poiesz
64
Hoogvliet
63
Vomar
60
Jan Linders
57
Dirk
55
CoopCompact
53
AH TOGO
46
Boni
39
MCD
35
SuperCoop
34
AH XL
34
Nettorama
31
Bas
30
Digros
17
AH DNTG
13
Dagwinkel
10
Agrimarkt
5

We will also provide this summary view through our API on short notice as well, which will then be automatically updated when things change.

With above content I will now leave you for a while. I am done with my lift exercise and need to spend some more time on another interesting topic.  I will blog about that as well and will be back with work on the Smarter Grocery App in the future, so stay tuned!

lift

Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *