There are two types of scrapers in PidPid.
Metadata scrapers discover the product name, product id, link, and sometimes the product image.
Barneys, Size, JDSports, FootPatrol, MrPorter, and NetAPorter.
Products scraped by an image scraper do not have a name tag associated at first. For image scrapers, you'll need to manually check the images scraped in the respective indexes in order to discover early links or product ids. This is a limitation of the website being scraped rather than a limitation in PidPid's scraping methods. However, for live products, the metadata is attached and searchable. This tutorial will focus on searching in metadata-oriented indices. Every index except the ones mentioned above are metadata oriented.
Search for a product by brand, pid, or name. To search for any product containing the word yeezy in any of its fields, enter the following in the search bar:
*yeezy*
The asteriks are formally called globs. Globs will match any characters before or after the word yeezy. For example, a product with the title Yeezy Boost 350 V2 Beluga or Kanye West Yeezy will match this search.
name
, pid
, brand
, etc. Not every index has the same fields, however all indexes contain at least the name
, pid
, and timestamp
fields. The variation of fields depends on the data scraped.brand:adidas
brand:*adidas*
pid:>550000 AND pid:<600000