-
Notifications
You must be signed in to change notification settings - Fork 901
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Option to not save the current state of the object #1099
Comments
@seanlinsley and I discussed this recently in #1046. Can you address some of the concerns I gave there? Of course, Sean suggested omitting How would you handle the situation where someone turned on this config option for one week, then turned it off, then tried to How would you handle compatibility with the experimental association tracking feature? I think it would be OK to say that this option is incompatible, but we have to raise a helpful error if people try to combine them. Would you have people drop their |
@jaredbeck Reification seems to be equivalent to projection. The way to derive the projection of an event sourced record is to start from the beginning and process the changes of all events in the order in which they are created. This can be an expensive operation when a record has a lot of changes, so derived state is usually stored in a separate table.
If reify were to apply changes, rather than using materialized
No, this column can be used like a projection table and increase reification performance. |
Thinking over the |
It seems to me we are talking about two possible designs for a versioning library:
So, I feel like we're discussing a big change. In fact, it feels like the biggest and most fundamental change. You can see why I am nervous about it :) I think it would be cleaner to fork, or just write a different library. I'm concerned about maintaining the complexity of having both "patterns" in one library. In my mind, these two patterns are not compatible. So, if we were to support both patterns in the same library, I would want a really clear separation between them. That is why I suggested dropping the |
I agree with that approach. I don't use it myself very often personally because schema changes between the snapshots can cause problems. |
@seanlinsley what do you think? |
Over the years of running Rails applications, there's only been a handful of times when The ability to accurately For simplicity's sake I think it'd be fine to disable |
Thanks for the thoughtful analysis, Sean. It sounds like we agree on dropping the |
Volkan, sound good? What steps are you going to take next in your implementation of this new feature? |
@jaredbeck Yup, that's good by me. As far as the implementation, I have a patch for RecordTrail module that eliminates the use of object method. I could submit that as a PR, but it would also need to be amplified by a configuration option and other niceties, like an error message. Given my time constraints, and lack of familiarity with the codebase, I may not be able to do all those. But let me know how I can help! |
It appears that while Paper Trail populates both It's common for us to perform analytics on destroyed objects based on e.g. a foreign key. Dropping the With that in mind, it seems there are two paths forward:
{
"id": [1, null],
} Changing the behavior of destroy events might be considered a breaking change, so I think adding a setting to disable use of |
Once we agree on a course of action, I'd be happy to submit a PR. Alternatively, @volkanunsal I'd be more than willing to pair with you |
I'm opposed to this approach. I've stated my concerns here and in #1046. I believe such a setting would confuse users. I'm worried about people turning it on for a little while, then off, after which their database will be inconsistent. Then, they'll try to By contrast, dropping the
That sounds like a great idea. It should be controlled by a per-request setting (in
Volkan, I hope you get the chance to pair with Sean on this. Sean has contributed to PT many times, and has worked on some other big ruby projects. You'll have fun. |
That scenario seems unlikely to me, but I'm imagining a world where we've properly documented the tradeoffs of both approaches so the option for any given application should be clear. However, seeing how the
Looking through the code, it seems like Another option would be for That's persuasive for me b/c it presents a clear choice with simple steps to take. Want to be able to Which raises the question... why do we currently have the ability to disable
Which is odd... b/c That's a big issue when maintaining a long-lived OSS project: past decisions make today's API design much more difficult than it needs to be 💀 |
Expanding out the feature matrix, you could:
That seems reasonable. The remaining question becomes what to do with storing For simplicity's sake I think I'd rather do a major version bump and start storing data in |
I like your analysis, Sean.
Makes sense to me, but I think I prefer the simplicity of your other suggestion: to start storing data in
I guess it allows them to save some disk space. I'd like to keep the If the
Yeah, it's a challenge. Generally speaking, I'm not opposed to breaking changes as long as we provide a convenient, well-documented (and well-deprecated) path to upgrade.
I don't like the idea of introducing a That said, maybe there's a person out there who wants to keep the
I'm OK with that. I'm guessing we don't store I'm hoping this can be broken up into a few PRs to make review easier:
|
I'm not so worried about the novice user situation b/c even if that does happen, it seems like it'd happen early enough in their trial usage of Paper Trail that they wouldn't have lost anything important. I think #507 which introduced If we agree that
Whichever direction we take, these two things would definitely need PRs:
So we can at least work on those in the meantime. |
Here's the migration we ran. After Note that There's at least one spot where this migration is buggy: class DropObjectFromVersions < ActiveRecord::Migration[5.1]
disable_ddl_transaction!
disable_statement_timeout!
def up
# we perform these updates outside of a transaction (& in batches) to avoid locking the table
batch_size = 1_000_000
start = 0
approx_count = execute(<<-SQL.squish).to_a.dig 0, 'reltuples'
select reltuples::int from pg_class where relname = 'versions'
SQL
loop do
break if start > approx_count
stop = start + batch_size
# 50% of the table size is `object`, which we don't use
# https://github.com/paper-trail-gem/paper_trail/issues/1046
#
# before dropping the `object` column, we backfill `object_changes` on destroy
# events so we'll still be able to do analytics queries on destroyed objects
# https://github.com/paper-trail-gem/paper_trail/pull/1123
update <<-SQL.squish
update versions set object_changes = changes
from (
select id,
json_object_agg(object.key, array[object.value, null]) as changes
from versions
join lateral jsonb_each(object) as object on true
where event = 'destroy'
and object_changes is null
and id >= #{start} and id < #{stop}
group by id
) as v
where versions.id = v.id
SQL
# updated_at represents 21% of `object_changes` space usage, but isn't useful for us
# https://github.com/paper-trail-gem/paper_trail/issues/1129
update <<-SQL.squish
update versions
set object_changes = object_changes - 'updated_at'
where object_changes ? 'updated_at'
and id >= #{start} and id < #{stop}
SQL
start += batch_size
end
remove_column :versions, :object
execute 'create extension pg_repack'
# then after the deploy: `pg_repack -j 2 -t versions your_db`
end
def down
fail ActiveRecord::IrreversibleMigration
end
end This code should be added to an initializer to prevent class SkipUpdatedAt
def diff(changes)
changes.except! 'updated_at'
end
end
PaperTrail.config.object_changes_adapter = SkipUpdatedAt.new |
What, no
Nice!
That query is really interesting. I think I understand like 20% of it. I wonder if it's possible in MySQL/MariaDB. |
yeah the lateral join is a little mind-bending. it basically splits each key value pair in |
I think we should have a new top-level section in the readme titled "Large Databases". Can you paste all this stuff in there? We can't re-number readme sections because that breaks links, so this new section would be number 11. |
Thank you to everyone who made this possible! I can now make paper_trail use an incremental change design just by omitting the This makes paper_trail viable for big databases. 👍 |
Is your feature suggestion related to a problem? Please describe.
I'd like to use PaperTrail for an event sourcing project. I only want to keep the object_changes and I don't want to store the current state of the object in the events table.
Describe the solution you'd like to build
I'd like a configuration option that would tell PaperTrail to skip the
object
column, similar to the option to not recordobject_changes
Describe alternatives you've considered
Monkey patching the library, which I'm already doing.
The text was updated successfully, but these errors were encountered: