Scraping Google Web Search using ScraperAPI in PHP
Scraping data on the web is a tiresome process when you do things manually. For instance, if you happen to do it using PHP, you need to do the following steps.
- Get website content using the
file_get_contents()
function - Parse the content using the
DOMDocument
class - Then load the data using the
loadHTML()
function - Finally, traverse the DOM tree using the
getElementsByTagName()
function
While this works, this is a tedious process and it’s not very efficient. It’s also prone to errors if the DOM tree is complex. You might not get the desired data you want if you don’t parse the HTML correctly.
That’s where a solution like ScraperAPI comes in.
- What is ScraperAPI?
- Traversing Google search results using ScraperAPI
- Using ScraperAPI’s Structured Data endpoints
- In closing
What is ScraperAPI?
Essentially, ScraperAPI provides a neat way to scrape data from the web. We’ll see how we can parse Google search results…
- First, using the API the normal way. i.e. by loading the content using ScraperAPI and then parsing the HTML using DOMDocument.
- Second, using ScraperAPI’s proprietary solution called “Structured Data” which is far easier and more efficient to use.
Traversing Google search results using ScraperAPI
To get started, you need to create a free account on ScraperAPI first and grab your API key from the dashboard.
This key is required to access the API.
Now, let’s see how we can use ScraperAPI to scrape Google search results.
// Function to scrape Google search results
function scrapeGoogleSearch($query) {
// Your ScraperAPI API key
$apiKey = "your_scraperapi_api_key";
// Craft the Google search URL
$url = "https://www.google.com/search?q=" . urlencode($query);
// Craft the request URL with ScraperAPI
$requestUrl = "https://api.scraperapi.com?api_key={$apiKey}&url={$url}";
// Initialize cURL session
$curl = curl_init();
// Set cURL options
curl_setopt($curl, CURLOPT_URL, $requestUrl);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
// Execute the request
$response = curl_exec($curl);
// Check if the request was successful
if ($response === false) {
echo "Failed to fetch data from Google.";
} else {
// Parse the HTML response to extract search results
$dom = new DOMDocument();
libxml_use_internal_errors(true); // Disable libxml errors
$dom->loadHTML($response);
libxml_clear_errors(); // Clear libxml errors
// Find elements with class 'tF2Cxc' which typically represent search result titles
$searchResults = $dom->getElementsByTagName('h3');
$results = [];
$count = 0;
foreach ($searchResults as $result) {
$results[] = $result->textContent;
$count++;
if ($count >= 5) {
break; // Stop after collecting 5 results
}
}
// Output the search results (you may want to process this data further)
print_r($results);
}
// Close cURL session
curl_close($curl);
}
// Example usage
$query = "scraping data"; // Example search query
scrapeGoogleSearch($query);
/*
Output:
Array
(
[0] => What is data scraping?
[1] => What Is Data Scraping And How Can You Use It?
[2] => Images
[3] => Description
[4] => What Is Data Scraping | Techniques, Tools & Mitigation
)
*/
As you can tell, ScraperAPI provides an API URL where we can pass the URL of the website we want to scrape and the API key. The API key is required to access the API.
In this case, we used the “https://www.google.com/search?q=searchterm” URL to get the search results from Google.
We can call the ScraperAPI endpoint with this URL using cURL and parse the HTML response to extract the search results.
Using ScraperAPI’s Structured Data endpoints
There’s, however, a sleeker way provided by ScraperAPI. It’s called Structured Data.
Essentially, ScraperAPI provides special data endpoints that can be used to gather structured data in a JSON format. So, no more traversing the DOM tree to extract data!
Here’s how we can rewrite the previous example using ScraperAPI’s Structured Data endpoint.
// Function to scrape Google search results using ScraperAPI's Structured Data Solution
function scrapeGoogleSearchStructuredData($query) {
// Your ScraperAPI API key
$apiKey = "your_scraperapi_api_key";
// Craft the request URL with ScraperAPI's Structured Data endpoint
$requestUrl = "https://api.scraperapi.com/structured/google/search?api_key={$apiKey}&country=US&query=" . urlencode($query);
// Initialize cURL session
$curl = curl_init();
// Set cURL options
curl_setopt($curl, CURLOPT_URL, $requestUrl);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
// Execute the request
$response = curl_exec($curl);
// Check if the request was successful
if ($response === false) {
echo "Failed to fetch data from Google.";
} else {
// Parse the JSON response to extract search results
$data = json_decode($response, true);
// Extract the search results
$results = [];
if (isset($data['organic_results'])) {
$count = 0;
foreach ($data['organic_results'] as $result) {
if (isset($result['title'])) {
$results[] = $result['title'];
$count++;
}
if ($count >= 5) {
break; // Stop after collecting 5 results
}
}
}
// Output the search results (you may want to process this data further)
print_r($data);
}
// Close cURL session
curl_close($curl);
}
// Example usage
$query = "OpenAI GPT-3"; // Example search query
scrapeGoogleSearchStructuredData($query);
/*
Output:
Array
(
[0] => GPT-3 powers the next generation of apps
[1] => Product
[2] => OpenAI
[3] => Images
[4] => Description
)
*/
As you can tell, we now have a new structured data endpoint in the form of the following URL.
$requestUrl = "https://api.scraperapi.com/structured/google/search?api_key={$apiKey}&country=US&query=" . urlencode($query);
The endpoint is specific to retrieving Google search results where we need to pass in the API key, country, and the search term as query parameters. The country denotes that the endpoint should return results relevant to the specified country.
After that, we can hit the endpoint using cURL and parse the JSON response to extract the search results. The search results would lie in the organic_results
key in the JSON response.
There are a lot of other data returned in the response. Such as related questions, related searches, pagination, etc. You can take a look at all the fields by printing the entire JSON response.
In closing
And that’s about it! That’s your brief look at how to scrape data from the web using ScraperAPI. I think it’s a great tool for anyone who wants to scrape data from the web.
I found it quite convenient to use. Especially, the structured data endpoint. So, if you’re looking for a tool to scrape data from the web, I highly recommend ScraperAPI.
Like this article?
Buy me a coffee👋 Hi there! I'm Amit. I write articles about all things web development. You can become a sponsor on my blog to help me continue my writing journey and get your brand in front of thousands of eyes.