chromedp: Working with nodes and tabs

Today we are going to have some fun with chromedp. Today's scenario is gathering all the links on a page and loop through them. It should be simple, easy, and fun. let's get started.

As we discussed before, for doing this kind of actions in chromedp, we need to pass the pointer of our variable to the corresponding function.

We wanted to go to the Notpad++ website, print all links and open them in different tab. So first we need to define our pointer to the array of nodes.

var nodes []*cdp.Node

and then fill it with the Nodes function.

selector := "#main ul li a"
pageURL := "https://notepad-plus-plus.org/downloads/"
chromedp.Run(ctx, chromedp.Tasks{
    chromedp.Navigate(pageURL),
    chromedp.WaitReady(selector),
    chromedp.Nodes(selector, &nodes),
})

so let's take a look at the nodes and see what we have on them.

for _, n := range nodes {
  u := n.AttributeValue("href")
    fmt.Printf("node: %s | href = %s\n", n.LocalName, u)
}

the output is:

node: a | href = https://notepad-plus-plus.org/downloads/v7.9.2/
node: a | href = https://notepad-plus-plus.org/downloads/v7.9.1/
node: a | href = https://notepad-plus-plus.org/downloads/v7.9/
node: a | href = https://notepad-plus-plus.org/downloads/v7.8.9/
node: a | href = https://notepad-plus-plus.org/downloads/v7.8.8/
node: a | href = https://notepad-plus-plus.org/downloads/v7.8.7/
node: a | href = https://notepad-plus-plus.org/downloads/v7.8.6/
node: a | href = https://notepad-plus-plus.org/downloads/v7.8.5/
node: a | href = https://notepad-plus-plus.org/downloads/v7.8.4/
node: a | href = https://notepad-plus-plus.org/downloads/v7.8.3/
node: a | href = https://notepad-plus-plus.org/downloads/v7.8.2/
...

so we grab all links, let's store them in an array, and then open them in a new tab. for opening a new tab in chromedp we need to create a new context from the context that we already have. We have a function name NewContext that does it for us.

clone, cancel := chromedp.NewContext(ctx)
defer cancel()

from now on, the clone context will do everything in a new tab. the rest of the process is exactly the same so we can easily run chromedp tasks in our new context.

for _, n := range nodes {
    u := n.AttributeValue("href")
    clone, cancel := chromedp.NewContext(ctx)
    defer cancel()
    chromedp.Run(clone, chromedp.Navigate(u))
}

if you disable headless mode and run the project, you see that URLs are opening one by one in a new tab. however, we can open all of them in a manner of second by using goroutine

f := func(ctx context.Context, url string) {
    clone, cancel := chromedp.NewContext(ctx)
    defer cancel()
    chromedp.Run(clone, chromedp.Navigate(url))
}
for _, n := range nodes {
    u := n.AttributeValue("href")
    go f(ctx, u)
}

that's it, and here's the complete code:

package main

import (
    "context"
    "fmt"
    "log"

    "github.com/chromedp/cdproto/cdp"
    "github.com/chromedp/chromedp"
)

func main() {
    ctx, cancel := chromedp.NewContext(context.Background(), chromedp.WithErrorf(log.Printf))
    defer cancel()
    var nodes []*cdp.Node
    selector := "#main ul li a"
    pageURL := "https://notepad-plus-plus.org/downloads/"
    if err := chromedp.Run(ctx, chromedp.Tasks{
        chromedp.Navigate(pageURL),
        chromedp.WaitReady(selector),
        chromedp.Nodes(selector, &nodes),
    }); err != nil {
        panic(err)
    }
    f := func(ctx context.Context, url string) {
        clone, cancel := chromedp.NewContext(ctx)
        defer cancel()
        fmt.Printf("%s is opening in a new tab\n", url)

        if err := chromedp.Run(clone, chromedp.Navigate(url)); err != nil {
            // do something nice with you errors!
            panic(err)
        }
    }
    for _, n := range nodes {
        u := n.AttributeValue("href")
        go f(ctx, u)
    }
}