Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory issues in 1.5 #1148

Open
mikethecalamity opened this issue Jan 28, 2025 · 6 comments
Open

Memory issues in 1.5 #1148

mikethecalamity opened this issue Jan 28, 2025 · 6 comments

Comments

@mikethecalamity
Copy link
Contributor

When trying to upgrade from 1.4.0 to 1.5.4, we're seeing memory issues with com.networknt.schema.JsonNodePath increasing its memory usage each time an endpoint is run until the service OOMs. See the memory profiling screenshots below. The first of which is after an endpoint is hit once, the second is after 4 times (the 5th time OOMs). The memory usage of com.networknt.schema.JsonNodePath seems to be increasing indefinitely.

1st run:
Image

4th run:
Image

@justin-tay
Copy link
Contributor

What is the endpoint doing? Does it load schemas? Or does it just validate against a set of preloaded schemas?

By default any schemas that are loaded get cached. This is configurable but not caching the schema reduces performance quite significantly. Your dumps just show that more schemas are cached by the 4th run and hence there are more locations and paths cached which doesn't really say anything.

If your endpoint is taking a schema as input data then you shouldn't be caching the JsonSchemaFactory instance, but there just isn't enough information here to tell what is going on.

I don't really see any change between 1.4.0 to 1.5.4 that would make a difference. There is a change in JsonNodePath to cache the hash value but that shouldn't be the issue. It would be helpful if you can run a bisect and find if there really is a commit that caused a regression.

@mikethecalamity
Copy link
Contributor Author

mikethecalamity commented Jan 29, 2025

We use an implemented SchemaLoader to read the schemas as needed, so they are not pre-loaded. When validating, we reference the schema via URL, but it is actually read from the resources. So essentially the SchemaLoader converts https://myorganization.com/schema.json to resource://schema.json. See example below.

@Dependent
public class JsonValidator {

    private static final Logger LOGGER = LogManager.getLogger(JsonValidator.class);

    private static final JsonSchemaFactory VALIDATOR;

    static {
        final JsonSchemaFactory factory = JsonSchemaFactory.getInstance(SpecVersion.VersionFlag.V202012);
        final SchemaLoader schemaLoader = new ResourceSchemaLoader();
        VALIDATOR = JsonSchemaFactory.builder(factory).schemaLoaders(c -> c.add(schemaLoader)).build();
    }

    private final ObjectMapper mapper;

    @Inject
    public JsonValidator(final ObjectMapper mapper) {
        this.mapper = mapper;
    }

    public void validate(String schemaId, String json) throws JsonMappingException, JsonProcessingException {
        final URI schemaUri = new URI(schemaId);
        final JsonSchema schema = VALIDATOR.getSchema(schemaUri);
        final JsonNode jsonNode = mapper.readTree(json);
        final Set<ValidationMessage> errors = schema.validate(jsonNode);
        if (!errors.isEmpty()) {
            final StringBuilder sb = new StringBuilder().append("JSON invalid for schema \"").append(schemaId)
                    .append("\"").append(System.lineSeparator());
            for (ValidationMessage message : errors) {
                sb.append("\t").append(message.toString()).append(System.lineSeparator());
            }
            throw new RuntimeException(sb.toString());
        }
    }

    private static class ResourceSchemaLoader implements SchemaLoader {
        private static final List<String> SUPPORTED_SCHEMES = List.of("http", "https");

        @Override
        public InputStreamSource getSchema(AbsoluteIri absoluteIri) {
            if (SUPPORTED_SCHEMES.contains(absoluteIri.getScheme())) {
                try {
                    final URI uri = new URI(absoluteIri.toString());
                    final String filename = FilenameUtils.getName(uri.getPath());
                    Enumeration<URL> urls = getClass().getClassLoader().getResources(filename);
                    while (urls.hasMoreElements()) {
                        URL url = urls.nextElement();
                        if (hasMatchingSchemaId(url, absoluteIri.toString())) {
                            return () -> url.openStream();
                        }
                    }
                    throw new RuntimeException("Unable to find JSON schema for " + absoluteIri.toString());
                }
                catch (IOException | URISyntaxException e) {
                    LOGGER.error("Error trying to read JSON schema for " + absoluteIri.toString(), e);
                    throw new RuntimeException(e);
                }
            }

            throw new RuntimeException("Unsupported scheme " + absoluteIri.getScheme());
        }

        private boolean hasMatchingSchemaId(final URL url, final String requestedSchemaId) {
            try {
                String jsonSchema = IOUtils.toString(url, Charset.defaultCharset());
                try (JsonReader reader = Json.createReader(new StringReader(jsonSchema))) {
                    JsonObject schemaJson = reader.readObject();
                    String id = schemaJson.getString("$id");
                    return requestedSchemaId.equals(id);
                }
            }
            catch (Exception e) {
                LOGGER.error("Error attempting to read resource {}", url.toString(), e);
                return false;
            }
        }
    }
}

The JsonNodePath could be a red herring, I just know it's related to validation because when I turn off the validation, there are no memory issues. See the memory profiling graphs below.

Validation OFF w/ 1gb of memory (after hitting the endpoint 4 times)
Image

Validation ON w/ 1gb of memory (after hitting the endpoint 4 times)
Image

Validation ON w/ 5gb of memory (after hitting the endpoint 10 times)
Image

@mikethecalamity
Copy link
Contributor Author

I'm going to attempt to bisect the commits like you suggested and see if I can find where it all began.

@justin-tay
Copy link
Contributor

Is the schemaId the same for all the runs or are they different for each run? If it is the same for all the runs then its likely a bug. If its different then it could just be that loading and caching the different schemas just take too much memory. As you have a static JsonSchemaFactory the schemas are cached between runs and never released.

@mikethecalamity
Copy link
Contributor Author

It is the same on each run. So I'd expect after the first run that all the relevant schemas are cached. Then subsequent runs use the cached data and no more memory is used by the validator.

@justin-tay
Copy link
Contributor

Tried but was unable to replicate the out of memory issue. A reproducer unit test will be needed.

I did notice that the cache can't be directly used as the config is potentially different. You should use the same config instance. This shouldn't be the cause of the out of memory issue though.

public class JsonValidator {
    ...
    private static final SchemaValidatorsConfig CONFIG = SchemaValidatorsConfig.builder().build();
    ...
    public void validate(String schemaId, String json) throws JsonMappingException, JsonProcessingException {
        ...
        final JsonSchema schema = VALIDATOR.getSchema(SchemaLocation.of(schemaId), CONFIG);
        ...
    }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants