menu

Gatsby.js

Fast in every way that matters. Gatsby is a free and open source framework based on React that helps developers build blazing fast websites and apps.

Channels
# All channels
view-forward
# General
view-forward
# I Made This
view-forward
# Meta
view-forward
# Themes
view-forward
Team

Why is sourceNodes hook called before all onCreateNode callbacks are finished?

February 26, 2020 at 8:20am

Why is sourceNodes hook called before all onCreateNode callbacks are finished?

February 26, 2020 at 8:20am
I have a question regarding the sourceNodes API. In the Gatsby documentation at https://www.gatsbyjs.org/docs/node-apis/ it clearly is stated that “If you define this hook in gatsby-node.js it will be called exactly once after all of your source plugins have finished creating nodes.” However, when I implement the sourceNodes hook, I can see by using console.log that the sourceNodes hook is called before all onCreateNode callbacks have been called. In the current setup over 200 onCreateNode callbacks are executed after that sourceNodes has been executed. I would like that all onCreateNode callbacks have been called before the sourceNodes hook is executed, exactly as it states in the docs. I need this in order to fetch git information for markdown files, and to avoid traversing the git history multiple times, sourceNodes hook would be the ideal place for me to do it.

February 26, 2020 at 8:28am
Without seeing any code, my guess would be that you're executing async functions (promises) without waiting for them to return. If you don't wait for the promises to resolve, Gatsby will continue the build process on to the next hook
  • reply
  • like
Thanks for your answer, but I am not executing any async functions that I wait for here. I have exported these APIs from gatsby-node.js in the root folder:
const { onCreateNode, sourceNodes } = require('./gatsby/onCreateNode'); exports.onCreateNode = onCreateNode; exports.sourceNodes = sourceNodes;
In the file ./gatsby/onCreateNode the functions look like follows:
let counter = 0; exports.onCreateNode = async ({ node, actions, getNode }) => { if (node.internal.type !== 'MarkdownRemark' && node.internal.type !== 'Mdx') { return; } console.log("counter:", counter); };
exports.sourceNodes = async ({ actions, getNodesByType }) => { console.log("ENTER sourceNodes"); // <-- This line is printed BEFORE last counter log in onCreateNode };
The log output looks like:
counter: 107 counter: 108 ENTER sourceNodes counter: 109 counter: 110 counter: 111 counter: 112 counter: 113 counter: 114 counter: 115
and this continues until the last log entry: counter: 313
Edited
  • reply
  • like
Again, I'm not 100% sure, but I think it's to do with the lack of a return statement in your sourceNodes function. See the docs here for an explanation. Essentially an async function must explicitly return. Try changing your sourceNodes function like so:
exports.sourceNodes = async ({ actions, getNodesByType }) => {
console.log("ENTER sourceNodes"); // <-- This line is printed BEFORE last counter log in onCreateNode
return; // This line is added
};
Edited
  • reply
  • like
I suppose you meant adding the return to onCreateNode callback? I have tested with return both in sourceNodes and in onCreateNode, but it did not change the behavior at all. My problem is that I want all 313 callbacks of onCreateNode to execute before the one and only call of sourceNodes is executed, or alternatively, if that really is not possible, perhaps somehow wait in sourceNodes for the last callback (as a workaround). But how do I identify the last callback, since it depends on the number of nodes (markdownfiles) which could be any number since number of pages are expected to change and most likely to grow in the future?
  • reply
  • like
Essentially an async function must explicitly return
An explicit return is not required for a function that is marked async. As soon as you mark it async, it will always return a promise; not defining an explicit return means that it is returning a Promise<void>, just like an empty return statement. When sourceNodes returns a promise, it is expected to be a Promise<void>, so that return type is perfectly valid.
onCreateNode is called for each node that is created. If you look at the createNode action and the apiRunnerNode function, you'll see that the createNode action always runs asynchronously. Every single source plugin I've ever looked at calls createNode(), and then returns immediately. It doesn't do return createNode()... so it's never returning a promise waiting to be resolved... it's returning an already resolved Promise<void>. Meaning, the program will not wait for onCreateNode() to be resolved before moving on to the next node.
Also important to note is that Gatsby uses Bluebird promises, which use setImmediate() for resolving promises.
Standard ES Promises work a little bit outside of the event loop, in the same general area as nextTick(); they are given priority during most other stages of the event loop, and will process anything in its queue (synchronously) that is on the main thread (i.e. anything that is not being delegated to the worker pool, like filesystem/network/dns requests, or that has already returned from the worker pool) before allowing the next stage of the loop to continue... this means you can overload the promise/tick queue to the degree that the rest of your program will not be able to run until those queues are empty. That's not a very good way to run an asynchronous program, and is also why things like node web servers should generally use setImmediate() instead of standard promises... using standard promises means that if there are a lot of incoming requests, there is potential for some (or a lot) of requests to time out before the response is resolved, resulting in an unreliable/flaky web server.
setImmediate() works differently. When you queue something in setImmediate(), it adds it to the queue, but doesn't process that request until the next loop. This frees up the event loop to process other requests, instead of leaving them on the back burner.
All of this in combination means that the Gatsby lifecycle is able to resolve all of your plugin's sourceNodes() before all onCreateNode() events for those nodes have also been resolved. What this also means is that you can rely on onCreateNode() to be called for every createNode() that is called, but you cannot rely on your sourceNodes() to be called after every onCreateNode() is called.
One thing you can count on is that all File nodes have been created by the time your sourceNodes() is called. You could count all of the .mdx files, and when onCreateNode() counts the last Mdx transform, resolve a promise that triggers sourceNodes() to complete processing.
This works for me, but may need a bit more work:
let counter = 0;
let mdxCount = -1;
let resolver;
let sourceNodesPromise = new Promise((resolve) => {
resolver = resolve;
});
exports.onCreateNode = ({ node }) => {
if (node.internal.type !== 'Mdx') return;
counter++;
console.log(`${counter} ${node.internal.type}`);
if (mdxCount > -1 && counter === mdxCount) {
resolver();
}
};
exports.sourceNodes = ({ getNodesByType }) => {
const files = getNodesByType('File');
mdxCount = files.filter((fileNode) => fileNode.ext === '.mdx').length;
console.log(`ENTER sourceNodes(); expect count = ${mdxCount}`);
const finished = () => console.log(`FINISH sourceNodes()`);
if (mdxCount === 0 || mdxCount === counter) {
return finished();
}
return sourceNodesPromise.then(() => {
finished();
});
}
Edited
like-fill
1
  • reply
  • like

March 5, 2020 at 11:10am
Thanks for your detailed answer. I will try it as soon as I have time.
  • reply
  • like

March 10, 2020 at 4:03pm
, one thing I noticed in another project where I needed to do something similar: this part:
let sourceNodesPromise = new Promise((resolve) => {
resolver = resolve;
});
Is unreliable, because it may not always have fired by the time the resolver is accessed. I found this to be more reliable
let counter = 0;
let mdxCount = -1;
let resolver;
const tryResolve = () => {
if (mdxCount === 0 || counter === mdxCount) {
if (resolver) resolver();
}
}
exports.onCreateNode = ({ node }) => {
if (node.internal.type !== 'Mdx') return;
counter++;
console.log(`${counter} ${node.internal.type}`);
tryResolve()
};
exports.sourceNodes = ({ getNodesByType }) => {
const files = getNodesByType('File');
mdxCount = files.filter((fileNode) => fileNode.ext === '.mdx').length;
console.log(`ENTER sourceNodes(); expect count = ${mdxCount}`);
const finished = () => console.log(`FINISH sourceNodes()`);
return new Promise((resolve) => {
resolver = resolve;
tryResolve();
}).then(() => {
finished()
});
}
Basically, the resolver doesn't get set until sourceNodes is ready for it. The logic to test the counts is deduplicated, and it makes sure it is ready to go from all angles before resolving.
Edited
  • reply
  • like