Plugins enable the freeyourstuff.cc browser extension to retrieve your content for a supported website. We hope you decide to join the freeyourstuff.cc community and add support for your favorite sites.
freeyourstuff.cc is entirely written in JavaScript. It consists of a browser extension and a Node.js web service. The browser extension loads the plugins; the Node.js service lets users upload and browse datasets. For the up-to-date list of currently supported sites, see the file sites.json
.
To write a plugin, you'll need the following:
There are countless free resources to help you get started with JavaScript; we can recommend Eloquent JavaScript by Marijn Haverbeke, exercism.io as an exercise platform, and Nodeschool for self-directed learning of Node.js.
Basic familiarity with ECMASCript 6 (the latest version of JavaScript) is helpful; you may wish to consult the ES6 cheatsheet for quick reference.
jQuery has its own learning center; so does GitHub. Finally, while programming, you may want to keep a DevDocs tab open. As a code editor, we recommend Atom, which has good Git integration. All of these tools are free and (except for GitHub itself) open source. Yay!
With that out of the way, let's get started. First, you'll need a Git checkout of the freeyourstuff.cc repository. If you want to skip the Git part for now, you can download a ZIP file that contains the latest version of the repository. To install the extension, unpack it into your working directory and then follow the instructions for loading unpacked extensions.
As you work on the extension, you will sometimes have to reload it from your extensions list, but note that plugins are dynamically loaded so you should be able to skip that step while working on your plugin. The plugins themselves can be found in the extension/src/plugins
codepath. Go ahead and look at the contents of those folders. Each plugin's index.js
contains the main code, while its schema.json
describes the data the plugin can handle.
To write a plugin, you don't need to worry too much about the main architecture of the extension. Still, you may want to take a look at Google's documentation of Chrome extension development. Essentially, our extension injects the plugin's JavaScript into a supported page like Yelp or Amazon.com. The plugin is a content script in Chrome extension terminology and runs in what's called an "isolated world"; it can't modify the scripts run by the site, but it can access the document, fetch pages from the same domain, and so on.
A plugin communicates with the rest of the extension through messages, which it receives by listening for relevant events using the Chrome API. It can listen for messages from the extension's popup (the thing that opens when you click the button in the browser). These messages will typically tell it to go and fetch some data. Once it's done that, the plugin then sends a message back to a background page that's always listening. That way, we don't need to have the popup open to show the results.
While it's doing its thing, a plugin can send little updates back to the popup, showing progress notices or error messages. A number of of convenience functions are provided for this purpose, which you can find in extension/src/plugin.js
.
Use the plugin.setup
function to ensure that your plugin code is only loaded when the document is ready. A plugin should call plugin.busy()
when it is doing work, and plugin.done()
when it is done working. These functions make sure that the button state and the spinner in the popup behave as they should.
Within a plugin, you'll generally want to be careful with direct calls to the Chrome extension API. The reason is that plugins can also get called in the Node.js context for testing purposes, in which case that API is not available. A sane way to check for the existence of the API is the expression typeof chrome !== 'undefined'
. We typically wrap extension-only code in functions and call them after this check.
Plugins handle different kinds of data. We send data around in JavaScript's lightweight JSON format, and we use very simple schemas to describe the data structure. The point of those schemas is primarily to ensure that we only stash the data we actually want, enforce type constraints, and can track schema changes.
These schemas are pretty important; they're also used by the freeyourstuff.cc web service to parse data submissions and store them in its MongoDB backend. The schema system is still pretty basic and designed for textual data right now. We'll need to make changes to it over time to support new data types and more complex data structures.
Check out extension/src/examples/yelp.js
as an example of data represented by a schema, and the schema itself. If you view your local copy of extension/src/result.html
in your browser, it will attempt to render this data using extension/src/result.js
. You can use this method to verify that your schema and data are handled as expected -- feel free to submit pull requests for additional examples.
Data is validated against its schema by extension/src/dataset.js
. A single schema describes all the kinds of data that can be stored for a given site (e.g., reviews, comments, blog posts). Together, we call this a SiteSet, while we call the sets that comprise it DataSets. Each DataSet has a header (for non-repeating information like the author) and rows of data. That's it.
Note that descriptive metadata (like column titles) isn't parsed by dataset.js
, but it is used in rendering results both in the extension and on the website. Simply follow the example and make sure you add longer descriptions and shorter labels for all your data. Right now we add the language code 'en'
for all labels and descriptions, so we can potentially support translations later.
The schema version is a simple integer and should be increased with each update to the schema once the first data has been published (you can keep it at 1 while in development).
To create a new plugin, first add the relevant directory under extension/src/plugins
and create an index.js
and schema.json
. Keep directory and filenames all lowercase (use a hyphen for separating words), since not all platforms store files in a case-sensitive way.
In addition, you will need to register a plugin in extensions/sites.json
:
name
: the proper name of the supported site as the site itself uses it.plugin
: the name of the subdirectory in which the plugin can be founddeps
: (optional) additional JavaScript library dependencies (array). Currently supported: "papaparse" for CSV parser PapaParse, "moment" for date/time library Moment.js, and "numeral" for number parser Numeral.js
canonicalURL
: when we link to the site, what URL should we link to?regex
: the regular expression the extension uses to show its icon, using a PageStateMatcher's urlMatches rule, following RE2 syntax.
supported
, unsupported
: the types of content we can and cannot retrieve for the given site. This is only shown in the popup.extraPermissions
: (optional, true or false) whether we require full cross-origin permissions for the site (primarily needed if we need to visibly navigate); these will be requested at runtime using Chrome's optional permissions systemOnce you've added a plugin to sites.json, you will want to reload the extension to refresh the matching rules.
If you've not worked with your browser's developer tools before, for Chrome/Chromium, you may want to read the dev tools tutorial before getting started on your plugin in earnest. In many cases sites don't offer a full, unlimited API to get your data, so you'll need to process HTML. Dev tools help you analyze the page's layout, and jQuery makes it easy to process the HTML.
A plugin can do everything that the user can do, though sometimes you'll have to jump through a few hoops to do it. freeyourstuff.cc is not a web scraping tool -- it's a tool for managing data you own. So a plugin should always first check that the user is logged in (a visual element you can use a jQuery selector on is a fine indicator).
Depending on the content type you want to retrieve, you typically want to look for a page that shows all information of that type (all reviews, comments, posts, etc.). Take into account that there's usually some pagination for users with lots of content.
What you have to do next depends on how the page is built. Use your dev tools to look at the browser's network requests to see if the page already arrives in your browser fully formed (static HTML) or if it's dynamically constructed using JavaScript requests.
The former case is more common and more straightforward: you loop through the relevant elements using jQuery, stash them in an array, fetch follow-up pages as needed, and then finally send the result (schema-parsed using dataset.js
) back to the extension.
The latter case can get fairly hairy. You may have to figure out how a website's internal API works. This also often involves some kind of generated token which you have to extract from the page source. Check out the TripAdvisor plugin's AJAX code for an example of this. On the flip side, an API may be less fragile than the page's layout.
The most complex plugins are the ones that have to anticipate complex AJAX requests and DOM changes. See the Quora plugin for how crazy this can get!
Many people don't like to write tests, but since plugins are likely to break every now and then, it's a requirement for any new plugin submission to at least be basically testable. Fortunately, that may be a matter of adding a single statement to your code:
window.freeyourstuff.jsonData = { someData: retrieveSomeData };
The testrunner in the cli
direcotry will attempt to run the functions defined in this variable for all plugins defined in sites.json
and write the results to tests/results/pluginname.description.json
, where description is the key in the tests
object (e.g., "someData" in the example above).
The testrunner depends on Puppeteer, a Node.js library to control the Chromium browser in its headless mode (i.e. the browser is invisible, but otherwise behaves as a normal browser). You'll need to run npm install
within the cli/
directory in order to get all the packages the testrunner relies on, which includes a copy of the Chromium browser.
Since we depend on user session data, you'll need to make a copy of your Chromium profile directory. This is the path under "Profile Path" in chrome://version/, minus the "Default" bit at the end. Copy this directory to ".chromium-testing-profile" in your home directory. You need to work with a copy, otherwise you run the risk of corrupting your profile! To run the tests, type node bin/run-tests
from the cli
directory, and add "--help" for some additional options. Some plugins let you specify a profile URL on the command line, but this feature is still very experimental.
freeyourstuff.cc is born out of a simple motivation -- we think users ought to be able to control the destiny of the content they contribute around the web. Whether this remains a singular experiment useful for a few websites or turns into a larger movement is entirely up to you! We hope you join us in turning this into a better tool, and look forward to your pull requests. You're welcome to subscribe to our mailing list for developers and users. You can also find us on irc.freenode.net on the #freeyourstuff
channel (you can use the web interface if you're new to IRC).