Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Character Limit With Issue with Urdu Slugs #3514

Closed
andrewfairlie opened this issue Dec 3, 2018 · 7 comments
Closed

Character Limit With Issue with Urdu Slugs #3514

andrewfairlie opened this issue Dec 3, 2018 · 7 comments

Comments

@andrewfairlie
Copy link

andrewfairlie commented Dec 3, 2018

Description

In English, we're limited to 255 characters in a slug.

With Urdu, it seems to be significantly less and the error message isn't useful

Could not generate a unique URI based on the URI format.

I suspect that this is because Urdu uses more bytes per character than ASCII characters.

Steps to reproduce

  1. Make a new entry with a very long Urdu title, for example "یہ ایک بہت طویل عنوان ہے یہ ایک بہت طویل عنوان ہے. یہ ایک بہت لمبی عنوان ہے. یہ ایک بہت لمبی عنوان ہے. یہ واقعی ایک طویل عنوان ہے. ایک بہت طویل عنوان یہ ایک بہت طویل عنوان ہے. یہ واقعی ایک طویل عنوان ہے"
  2. Try to save, you'll see the error

More detail...

A long title

یہ ایک بہت طویل عنوان ہے یہ ایک بہت طویل عنوان ہے. یہ ایک بہت لمبی عنوان ہے. یہ ایک بہت لمبی عنوان ہے. یہ واقعی ایک طویل عنوان ہے. ایک بہت طویل عنوان یہ ایک بہت طویل عنوان ہے. یہ واقعی ایک طویل عنوان ہے

Is 202 characters so should be sluggable, but because it's 354 bytes (https://mothereff.in/byte-counter) it seems to be disallowed.

Shorten that to 230 bytes (to allow a few bytes for dashes)...

یہ ایک بہت طویل عنوان ہے یہ ایک بہت طویل عنوان ہے. یہ ایک بہت لمبی عنوان ہے. یہ ایک بہت لمبی عنوان ہے. یہ واقعی ایک طویل عنوان ہے. ا

And it'll work

Additional info

  • Craft version: Craft Pro 3.0.31
  • PHP version: 7.2.12
  • Database driver & version: MySQL 5.7.24
@brandonkelly
Copy link
Member

brandonkelly commented Dec 3, 2018

There is a validation rule ensuring that slugs and URIs are <= 255 characters so they can fit into their varchar database columns, however the rule should be looking at the byte length rather than character count, so it gets triggered in cases like this where some characters are made up of more than one byte (your sample string is 354 bytes). I’ve fixed that for the next release.

The validation error message still mentions “characters” because most people who run into this won’t be using multi-byte characters and wouldn’t know how to limit to “255 bytes”, but even if it’s not technically accurate, at least it will point you in the right direction.

validation error on a Slug field stating that the slug must be at most 255 characters

@brandonkelly
Copy link
Member

Take that back. Looking into this further, I was wrong and a varchar(255) column does actually just care that it’s 255 characters and not 255 bytes.

So not totally sure if there’s a good course of action here for us to take. Reopening…

@brandonkelly brandonkelly reopened this Dec 4, 2018
@brandonkelly
Copy link
Member

brandonkelly commented Dec 4, 2018

One thing you can do is limit the length of the slug in your section’s URI Format. For example:

news/{slug[0:250]}

@brandonkelly
Copy link
Member

There was already code in place to auto-shrink slugs in order to get the URI down to <= 255 characters, but there were some logic bugs in the code. Fixed now for the next release.

@narration-sd
Copy link
Contributor

Wow, that is a hot one, Brandon. I don't find any doc for this; should there be??

Was watching this one due to experience with languages needing longer sizes...not jumping in though...

@brandonkelly
Copy link
Member

Seems a little too in the weeds to warrant documentation, especially since generally Craft will just do the right thing going forward.

@narration-sd
Copy link
Contributor

narration-sd commented Dec 5, 2018

Yeah, I understand -- but also understand from close up for years how much such things bite a multilingual environment. German itself is like 150% of English, still using Latin-1 characters.

The actual saving grace is probably the size of the field vs. generally reasonable slugs -- divided by 2 or 4...

Anyway, thanks for thinking, as ever, Brandon. And I'm having to think very carefully myself what complexity to present, in doc after reducing it as much as possible in the application, you know where. It's going to be the focus-group beta that determines in what form or whether this thing sensibly flies.

You and Brad weren't wrong to think of the alternative of consulting it in, though I can't see how that path with open source would get me out of the hot seat of intense support, exactly -- though, just this moment am thinking again...maybe I missed a point, while falling into the gravitation of long perfecting ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants