Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Think gatsby core should fulfill webpack with some cache aspects #291

Closed
cusspvz opened this issue May 17, 2016 · 5 comments
Closed

Think gatsby core should fulfill webpack with some cache aspects #291

cusspvz opened this issue May 17, 2016 · 5 comments
Labels
stale? Issue that may be closed soon due to the original author not responding any more.

Comments

@cusspvz
Copy link

cusspvz commented May 17, 2016

@KyleAMathews we already talked on a PR about the pluggable API system, I'm waiting anxiously for it because I got some ideas that could really push up gatsby fowards.

Until that happens, I came across an issue relative with cache (I'm setting up a large cache for our assets).

Webpack has the feature of prepending an hash to bypass browser cache (suffixing a query after the script's public path its a bad idea it doesn't affect some proxies).

I came across a nice solution for having cache working flawless on gatsby, think this could be part of its core.

First I've tried to assemble a different build path on each build just for the build-javascript (older build) environment. Example: ${__dirname}/public/assets/${hash}/

Got some issues since build-css and build-html (old static) weren't using the newly hash folder, so I've set ASSETS_BASE as a bash environment generated on every npm run build on our project and set the same publicPath for them but only set the output.path for the build-javascript.

Unfortunately some of the requests were being using correctly the assets folder and others not because of the CDN usage vs our staging environment.
I've written a few of bash files and told nginx to use non-caching redirects in case it had an asset request from the root folder, just in case, and thinks worked out.

Here's our current config regarding those changes:

gatsby-node.js

  if ( env === 'build-css' || env === 'build-html' || env === 'build-javascript' ) {
    config.merge({
      output: {
        publicPath: `http://${BUILD_DOMAIN}/${ASSETS_BASE}/`
      }
    })
  }

  if ( env === 'build-javascript' ) {
    config.merge({
      output: {
        path: `${__dirname}/public/${ASSETS_BASE}/`,
      }
    })
  }

package.json - scripts

    "build": "export ASSETS_BASE=assets/$(date | md5sum | cut -c -10); for x in clean gatsby sync symlinks minify-images scramble-js; do npm run build-$x || exit 1; done;",
    "build-clean": "if [ -d public/ ]; then rm -fR public/; fi;",
    "build-gatsby": "gatsby build",
    "build-sync": "cd public; for file in $(ls -p | grep -v / | grep -v .html | grep -v .js); do cp $file ./$ASSETS_BASE/$file; done;",
    "build-symlinks": "cd public; for file in $(ls ./$ASSETS_BASE/*); do basefile=./$(basename $file); [ -f $basefile ] && rm $basefile; echo \"Symlinking $basefile from root to assets.\"; ln -s $file $basefile; done;",

Now that the story is told and you've somehow understood my point of view, here's my proposal:

  • Automatically generate a sitemap (know it could be achieved over postBuild but I think it should be implemented on a the core for a better SEO for everyone using this.
  • Addition of public-endpoint configuration on config.toml (Needed for link's base and sitemap generation)
  • Addition of cdn-endpoint configuration on config.toml (This would default to public-endpoint when undefined, needed for assets publicPath but for me it seems more practical and semantic to have both public-endpoint and cdn-endpoint instead of public-path only)
  • Output all assets into a different hashed folder on each non-dev build for caching purposes.
  • Symlink assets dir, assets files or both from the root

Semi-related questions:

  • Does gatsby allows overlapping config.toml configurations over Environment Variables?
  • If not, can I implement it?
@benstepp
Copy link
Contributor

I'm working on asset hashes for js/css right now. I'm not sold on a public API yet, but the implementation seems good and working right now. I'm have a lot of restructuring in the build stage to put css and js before html, and ideally in separate processes before this is ready for merge. I have a strong feeling the callback hell might become a problem in build.js.

Would the following implementation of the [chunkhash] and [contenthash] features of webpack solve your problems? Or is there something else that I'm not understanding that you are trying to accomplish?


A quick summary of how it works: Assets get hashed and stored in the main node process in an 'Asset Manager' after webpack returns the stats object from compilation. Webpack config requests the assets as json from the Asset Manager and injects them into the build-html stage of the build process. An example of this json by named chunk looks like:

{
  gatsby: [
    'gatsby-ebd1d09457bbf6196c377b93ebaf8274.css',
    'gatsby-8b4fed66cae0a2499e30.js'
  ]
}

Then the user would request the assets like:

import { assets } from 'gatsby-helpers'
render() {
  return (
    <html>
      <head>
        {/* defaults to an inline style tag */}
        {assets.css()}

        {/* link rel=stylesheet tag */}
        {assets.css({ inline: false })}

         {/* link rel=stylesheet tag of a custom named chunk */}
        {assets.css({ inline: false, chunk: 'vendor' })}
      </head>
      <body>

        {/* script tag with hash for production, no hash for development */}
        {assets.js()}

        {/* script tag of named custom chunk */}
        {assets.js({ chunk: 'vendor' })}
      </body>
    </html>
  )
}

Basically all of the if process.env.NODE_ENV and prefixLinks stuff that goes on in the starters' html.js files would now happen under the hood for most projects. I want to keep the public API very simple, but also open so that a lot of stuff can happen under the hood in the future (more options passed to the helpers, more assets being returned as an array for code split chunks, config.assetHost, etc.)

@cusspvz
Copy link
Author

cusspvz commented May 20, 2016

@benstepp thanks for commenting this!

Would the following implementation of the [chunkhash] and [contenthash] features of webpack solve your problems? Or is there something else that I'm not understanding that you are trying to accomplish?

I've solved already my cache problems, webpack has mechanisms to apply hashes to invalidate caches but they won't work since there are 3 different build stages instead of one.

My solution was to create an hash solution based on an Environment Variable, my proposal is to insert a mechanism on the core that generates an hash on each gatsby build and applies it to all the type of builds.

A quick summary of how it works: Assets get hashed and stored in the main node process in an 'Asset Manager'

Thanks for explaining me but I know how the webpack assets plugin works.
Relating to your assets API, I kind liked of the inline part of css. :-D

Webpack generates an hash per build which would generate three different paths on one gatsby build, and thats why I've used an external environment variable to hash the paths and code instead of using assets-plugin.

As the hash is injected on the code via Webpack's Define, I have access to the path on the html build even if it isn't built, so I don't see the need of usage of the assets-plugin with my solution.

Although I've liked your API, but for example I have my head as a string because I need to inject comments for IE, so I think it should be also capable to return the path only.

Gonna rewrite the resume of my proposal:

These are the changes I'm proposing to apply to gatsby:

  • Add public-endpoint and cdn-endpoint on config.toml
  • Inject an ASSETS_PATH replacement on every builds using Webpack Define Plugin.
  • Add an helper to facilitate assets insertion, the helper would base the paths on ASSETS_PATH so it would know where things will/are sit on the public assets folder.

cusspvz pushed a commit that referenced this issue May 20, 2016
As things can get pretty complex while explaining, gonna apply here
things proposed over this branch.
@cusspvz
Copy link
Author

cusspvz commented May 20, 2016

@benstepp I've created a branch to apply things I've proposed over here. Want to discuss the assets method API? cc @KyleAMathews

@benstepp
Copy link
Contributor

I've solved already my cache problems, webpack has mechanisms to apply hashes to invalidate caches but they won't work since there are 3 different build stages instead of one.

This was the problem I was aiming on solving. Basically we know the only place we need the hashed assets are in the build-html stage, so as long as we build the css and js before the html, these assets should be available somewhere. Rather than write to a json file using the webpack-assets-plugin I thought it best to just keep the assets in memory between webpack compilations.

I would prefer to use the webpack built in [chunkhash] and [contenthash] over a hash based on the build time. This way as long as the files don't change across rebuilds, the user won't have to re-download them. There is very little benefit to this now, but in the future with code splitting I can see a real benefit here.


As the hash is injected on the code via Webpack's Define, I have access to the path on the html build even if it isn't built....

This is the same solution I came to, but I opted to build the css and js before the html. Ideally we would run the css and js at the same time in separate processes, and that is one change I would like to tackle before merging asset hashes.


Although I've liked your API, but for example I have my head as a string because I need to inject comments for IE, so I think it should be also capable to return the path only.

My proposed solution would have all assets available via the process.env.GATSBY_ASSETS variable for more advanced usage.

Does Gatsby plan on supporting non-react html.js files. If so, should we be aiming more for an api like react-helmet with toString and toComponent methods on the returned assets.

{assets.css({ inline: false }).toComponent()}
const html = `
<html>
  <head>
    ${assets.css({ inline: false }).toString()}
  </head>
  <body>
    ${assets.js().toString()}
  </body>
</html>
`

@jbolda jbolda added the stale? Issue that may be closed soon due to the original author not responding any more. label Jun 11, 2017
@stale
Copy link

stale bot commented Oct 22, 2017

This issue has been automatically closed due to inactivity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale? Issue that may be closed soon due to the original author not responding any more.
Projects
None yet
Development

No branches or pull requests

3 participants