Write your own plugins

Plugins enable the freeyourstuff.cc browser extension to retrieve your content for a supported website. We hope you decide to join the freeyourstuff.cc community and add support for your favorite sites.

freeyourstuff.cc is entirely written in JavaScript. It consists of a browser extension and a Node.js web service. The browser extension loads the plugins; the Node.js service includes a simple test runner to automatically verify that they're still working. For the up-to-date list of currently supported sites, see the file sites.json.

To write a plugin, you'll need the following:

There are countless free resources to help you get started with JavaScript; we can recommend Eloquent JavaScript by Marijn Haverbeke, exercism.io as an exercise platform, and Nodeschool for self-directed learning of Node.js.

Basic familiarity with ECMASCript 6 (the latest version of JavaScript) is helpful; you may wish to consult the ES6 cheatsheet for quick reference.

jQuery has its own learning center; so does GitHub. Finally, while programming, you may want to keep a DevDocs tab open. As a code editor, we recommend Atom, which has good Git integration. All of these tools are free and (except for GitHub itself) open source. Yay!

With that out of the way, let's get started. First, you'll need a Git checkout of the freeyourstuff.cc repository. If you want to skip the Git part for now, you can download a ZIP file that contains the latest version of the repository. To install the extension, unpack it into your working directory and then follow the instructions for loading unpacked extensions.

As you work on the extension, you will sometimes have to reload it from your extensions list, but note that plugins are dynamically loaded so you should be able to skip that step while working on your plugin. The plugins themselves can be found in the extension/src/plugins codepath. Go ahead and look at the contents of those folders. Each plugin's index.js contains the main code, while its schema.json describes the data the plugin can handle.

To write a plugin, you don't need to worry too much about the main architecture of the extension. Still, you may want to take a look at Google's documentation of Chrome extension development. Essentially, our extension injects the plugin's JavaScript into a supported page like Yelp or Amazon.com. The plugin is a content script in Chrome extension terminology and runs in what's called an "isolated world"; it can't modify the scripts run by the site, but it can access the document, fetch pages from the same domain, and so on.

Messages, events and the Chrome API

A plugin communicates with the rest of the extension through messages, which it receives by listening for relevant events using the Chrome API. It can listen for messages from the extension's popup (the thing that opens when you click the button in the browser). These messages will typically tell it to go and fetch some data. Once it's done that, the plugin then sends a message back to a background page that's always listening. That way, we don't need to have the popup open to show the results.

While it's doing its thing, a plugin can send little updates back to the popup, showing progress notices or error messages. A couple of convenience functions are provided for this purpose, which you can find in extension/src/plugin.js.

Within a plugin, you'll generally want to be careful with direct calls to the Chrome extension API. The reason is that plugins can also get called in the Node.js context for testing purposes, in which case that API is not available. A sane way to check for the existence of the API is the expression typeof chrome !== 'undefined'. We typically wrap extension-only code in functions and call them after this check.

Schemas and datasets

Plugins handle different kinds of data. We send data around in JavaScript's lightweight JSON format, and we use very simple schemas to describe the data structure. The point of those schemas is primarily to ensure that we only stash the data we actually want, enforce type constraints, and can track schema changes.

These schemas are pretty important; they're also used by the freeyourstuff.cc web service to parse data submissions and store them in its MongoDB backend. The schema system is still pretty basic and designed for textual data right now. We'll need to make changes to it over time to support new data types and more complex data structures.

Check out extension/src/examples/yelp.js as an example of data represented by a schema, and the schema itself. If you view your local copy of extension/src/result.html in your browser, it will attempt to render this data using extension/src/result.js. You can use this method to verify that your schema and data are handled as expected -- feel free to submit pull requests for additional examples.

Data is validated against its schema by extension/src/dataset.js. A single schema describes all the kinds of data that can be stored for a given site (e.g., reviews, comments, blog posts). Together, we call this a SiteSet, while we call the sets that comprise it DataSets. Each DataSet has a header (for non-repeating information like the author) and rows of data. That's it.

Note that descriptive metadata (like column titles) isn't parsed by dataset.js, but it is used in rendering results both in the extension and on the website. Simply follow the example and make sure you add longer descriptions and shorter labels for all your data. Right now we add the language code 'en' for all labels and descriptions, so we can potentially support translations later.

The schema version is a simple integer and should be increased with each update to the schema once the first data has been published (you can keep it at 1 while in development).

Plugin registration

To create a new plugin, first add the relevant directory under extension/src/plugins and create an index.js and schema.json. Keep directory and filenames all lowercase (use a hyphen for separating words), since not all platforms store files in a case-sensitive way.

In addition, you will need to register a plugin in extensions/sites.json:

Once you've added a plugin to sites.json, you will want to reload the extension to refresh the matching rules.

Retrieving data with jQuery

If you've not worked with your browser's developer tools before, for Chrome/Chromium, you may want to read the dev tools tutorial before getting started on your plugin in earnest. In many cases sites don't offer a full, unlimited API to get your data, so you'll need to process HTML. Dev tools help you analyze the page's layout, and jQuery makes it easy to process the HTML.

A plugin can do everything that the user can do, though sometimes you'll have to jump through a few hoops to do it. freeyourstuff.cc is not a web scraping tool -- it's a tool for managing data you own. So a plugin should always first check that the user is logged in (a visual element you can use a jQuery selector on is a fine indicator).

Depending on the content type you want to retrieve, you typically want to look for a page that shows all information of that type (all reviews, comments, posts, etc.). Take into account that there's usually some pagination for users with lots of content.

What you have to do next depends on how the page is built. Use your dev tools to look at the browser's network requests to see if the page already arrives in your browser fully formed (static HTML) or if it's dynamically constructed using JavaScript requests.

The former case is more common and more straightforward: you loop through the relevant elements using jQuery, stash them in an array, fetch follow-up pages as needed, and then finally send the result (schema-parsed using dataset.js) back to the extension.

The latter case can get fairly hairy. You may have to figure out how a website's internal API works. This also often involves some kind of generated token which you have to extract from the page source. Check out the TripAdvisor plugin's AJAX code for an example of this. On the flip side, an API may be less fragile than the page's layout.

Basic tests

Many people don't like to write tests, but since plugins are likely to break every now and then, it's a requirement for any new plugin submission to at least be basically testable. Fortunately, that may be a matter of adding a single statement to your code:

var jsonTests = {
  someData: retrieveSomeData,
  someOtherData: retrieveSomeOtherData
};

The testrunner in service/tests will attempt to run the functions defined in this variable for all plugins defined in sites.json and write the results to tests/results/pluginname.description.json, where description is the key in the jsonTests variable (e.g., "someData" in the example above).

The fact that the testrunner sits in the service/ directory indicates that it runs under Node.js. To run the tests, you'll have to first have a full Node.js environment (version 4 or later) running on your system. Then you'll need to run npm install within the service/ directory in order to get all the packages the service relies on.

If you want, you can play with the service itself (for example, to test importing data obtained through your plugin), but in order to run the plugin tests you'll simply need to run node tests in the service/ directory. You can also specify individual plugins, like so: node tests pluginname.

We have to do some magic to make what's written to be client-side code fully runnable in the Node.js environment. Specifically, we're using the jsdom package to simulate a page loaded by a browser environment including the DOM, the window object, and so on. We inject the jQuery library and the plugin code into this simulated page. Keep this in mind when debugging -- jsdom doesn't always behave in exactly the same way a browser would, yet.

Web-based login forms often trigger captchas. In order to avoid those, and to simply re-use existing credentials, we look for the user's local Chrome cookie database and run the tests using those. Upon the first run, the JSON result of each plugin will simply be stored. In future runs, we compare the existing JSON result with the new one and highlight any differences.

HTTP vs. HTTPS

One common pitfall for some sites is that they support both HTTP (unencrypted connection) and HTTPS. If the user is on HTTP but the resource we need to fetch is on HTTPS, the browser's security model won't let us fetch that resource.

There doesn't seem to be an elegant way to address this. If we redirect the user (and extension/src/popup.js provides some facilities to do so), Chrome no longer allows us to insert scripts without special permissions for that domain. We can ask for those permissions at install time, but that's very scary for the user.

Instead, in the Amazon.com plugin where we encounter this issue, we simply ask the user to visit the HTTPS version of the site. The good news is that this problem should slowly go away, as freeyourstuff.cc targets logged-in users of sites, and it's a terrible security practice to send logged-in users of a site to its unencrypted version.

Happy hacking!

freeyourstuff.cc is born out of a simple motivation -- we think users ought to be able to control the destiny of the content they contribute around the web. Whether this remains a singular experiment useful for a few websites or turns into a larger movement is entirely up to you! We hope you join us in turning this into a better tool, and look forward to your pull requests. You're welcome to subscribe to our mailing list for developers and users. You can also find us on irc.freenode.net on the #freeyourstuff channel (you can use the web interface if you're new to IRC).