What’s the difference between artifacts and cache in GitHub Actions?
GitHub Actions has a couple of ways to store files: artifacts and cache. They have enough functional overlap that it can be difficult to decide which is best to use. Rather than look at the capabilities of these actions (there are enough articles out there on that), let’s look at situations where you would want to use each one.
- Use artifacts if you want to share files between jobs in the same workflow or view/download files after a workflow has been completed.
- Use cache if you want to share files between workflows (e.g. between the workflows run by multiple PRs or commits).
Artifacts essentially allow you to do two things: store files that will persist after a job is completed, or share files with another job in the same workflow. You would use artifacts if you want to:
- Access files after a job has been completed. For example: logs, test coverage, screen recordings of e2e tests, etc. GitHub Actions will allow you to view or download these files.
- Produce files in one job and use them in another job in the same workflow. This is the more interesting use case and the one that can seem to have some overlap with the cache action.
An example of using artifacts to share files between jobs is a project build step. Let’s say you’re using GitHub Actions to test your application and you have unit tests, API tests, and E2E tests. Each of these test suites needs its own environment and can run in parallel as long as those environments are isolated. So you decide to split your workflow into three jobs:
- Unit tests
- API tests
- E2E tests
As you set up this workflow, you realize that each job needs to build the application. So why not create another job for the build step and then share the build between the other jobs? You can do this with artifacts.
The “Project Build” job stores the build files as an artifact and the other jobs download the build files before running. That way the build only happens once.
You can do this with cache as well. However, cache files are shared between workflows (not just within a single workflow), you would need to know that the files used to build your project have not changed. This can be a challenge and might not be something you want anyway. Since artifacts are only accessible within a specific workflow, you don’t need to worry about other workflows grabbing them by accident.
GitHub Actions cache allows you to store files that can be used between workflows. It is recommended for files that won’t change often, or at least won’t change on every run of the workflow. However, if you can think of a reason to share files between workflows, even if they change on every run, then cache is the way to go. You can even use it to share files between jobs in the same workflow. Though, as we’ve seen, artifacts are likely going to work better for that.
The frequently cited use case for cache is storing project dependencies (e.g. npm, pip, Gradle, etc.). These may not change frequently between PRs/commits. So caching them and sharing them between workflows (i.e. between the runs from PRs and commits) makes sense.
But you can be more creative with cache if it solves your problem. For example, if you need to download a large database that doesn’t change frequently between builds, you can use cache to store it. Basically, as long as you need to access some files between workflows, use cache.