This is a migrated thread and some comments may be shown as answers.

Best way to Clean HTML / Clear Formatting ?

10 Answers 708 Views
Editor
This is a migrated thread and some comments may be shown as answers.
Scott
Top achievements
Rank 1
Scott asked on 15 Jul 2013, 05:14 AM
When pasting from Word into the editor there is a lot of extra formatting styles that don't get automatically cleaned e.g.
<p style="text-align:justify;"></p>
<ul style="margin-top:0cm;" type="disc">
<li style="text-align:justify;tab-stops:list 36.0pt;"><span lang="EN-GB" style="font-family:'Arial','sans-serif';">
Also depending on the jQuery code used in the page you can get Sizzle attributes left in the saved html e.g.
<ul sizcache07677761204452295="38 16 10" sizset="false"><li sizcache07677761204452295="38 16 10" sizset="false" value="0">
What is the best way to clean this HTML on paste / during editing (clear formatting tool) / on save ?

We want to save just the structural formatting (H1-6/UL/OL/LI/STRONG/EM/A/IMG) with no styling (having removed the font/size/colour tools)

Thanks

10 Answers, 1 is accepted

Sort by
0
Alex Gyoshev
Telerik team
answered on 15 Jul 2013, 07:45 AM
Hello Scott,

The Sizzle attributes should not be included at any time in the DOM. Please list the steps required to get them in the editor content.

Regarding the full format cleaning, you can post-process the editor content, removing the style attributes (whenever you get the editor value). The most robust way to do so is through HTML parsing (rather than regex matching), for example $(editor.value()).find("[style]").removeAttr("style").html();

Regards,
Alex Gyoshev
Telerik
Join us on our journey to create the world's most complete HTML 5 UI Framework - download Kendo UI now!
0
Scott
Top achievements
Rank 1
answered on 24 Jul 2013, 05:30 AM
Thanks Alex,

I have now got my Clean Format command working which does a depth first parse of the DOM cleaning each node as it comes back up the tree:
var cleanCommand = Command.extend({
    exec: function () {
        Editor.cleanDOM(this.editor.body);
    }
});
 
EditorUtils.registerTool("clean", new Editor.Tool({ command: cleanCommand, template: new Editor.ToolTemplate({ template: EditorUtils.buttonTemplate, title: 'Clean Formatting' }) }));
 
Editor.cleanDOM = function (node) {
    var next = null, depth = 0, toClean = false;
    if (node.childNodes.length > 0)
        do {
            if (!toClean && (next = node.firstChild)) {
                depth++;
            } else {
                toClean = false;
                if (!(next = node.nextSibling)) {
                    next = node.parentNode;
                    depth--;
                    toClean = true;
                };
                cleanNode(node);
            }
            node = next;
        } while (depth > 0);
}
and can call it when a user pastes content into the editor as follows:
paste: function (e) {
    var Editor = kendo.ui.editor;
    placeholder = Editor.Dom.create(document, 'div', { innerHTML: e.html });
    Editor.cleanDOM(placeholder);
    e.html = placeholder.innerHTML;
}
I have since seen there is a list of 'Cleaners' that you run internally when pasting content so I tried writing my own cleaner instead of having to handle the paste event for each editor but am not sure how to register the cleaner ? Also do the cleaners get run prior to the paste event being called ?
Editor.DOMCleaner = Cleaner.extend({
    applicable: function (html) {
        return true;
    },
 
    clean: function (html) {
        placeholder = Editor.Dom.create(document, 'div', { innerHTML: html });
        cleanDOM(placeholder);
        return placeholder.innerHTML;
    }
});
Any suggestions would be appreciated ...
Thanks
0
Alex Gyoshev
Telerik team
answered on 26 Jul 2013, 08:07 AM
Hello Scott,

See the following jsBin for an example on how to register the cleaner. The key takeaway is the line

editor.clipboard.cleaners.push(new DOMCleaner());

This functionality is not documented, as we have not many requests for custom cleaners; if such need arises, we will introduce a public API to do this. I don't expect much breaking changes for this code, but please keep in mind that it is not official.

All the best,
Alex Gyoshev
Telerik
Join us on our journey to create the world's most complete HTML 5 UI Framework - download Kendo UI now!
0
Adam
Top achievements
Rank 1
answered on 15 Sep 2013, 09:08 PM
Hello,

I am testing with the latest internal build (2013.2.912) and am wondering if there has been any further development (additions) on this with regard to the built in cleaner. I am attempting to accomplish the same thing in this post (clean style attributes from html tags).

Is there a better way to do this now or is the provided example still the best way to accomplish this?

Thank you for any info you can provide.
Adam
0
Atanas Korchev
Telerik team
answered on 16 Sep 2013, 07:32 AM
Hi Adam,

 There has been no further development with this regard. You can use the code shown in the jsbin demo.

Regards,
Atanas Korchev
Telerik
Join us on our journey to create the world's most complete HTML 5 UI Framework - download Kendo UI now!
0
Jan Hansen
Top achievements
Rank 1
answered on 05 Feb 2014, 11:00 PM
Hello Scott,

I'm looking for something quite what you are describing here. A paste-handler that strips anything but the structural formatting and if possible a custom button that does the same task. How did your final solution end up? Would it be possible for you to share the actual cleaning code (the cleanNode function)?

Best regards,
/Jan
0
Scott
Top achievements
Rank 1
answered on 14 Feb 2014, 05:33 AM
Hi Jan,

Here is what I am currently using - please share any updates/improvements or comments back here :-)

(function ($, undefined) {
 
    /* CLEAN FORMATTING */
 
    if (!Array.prototype.indexOf) {
        Array.prototype.indexOf = function (elt /*, from*/) {
            $.inArray(elt, this);
        };
    }
 
    var Editor = kendo.ui.editor,
        Command = Editor.Command,
        Cleaner = Editor.Cleaner,
        EditorUtils = Editor.EditorUtils;
 
    var cleanCommand = Command.extend({
        exec: function () {
            Editor.cleanDOM(this.editor.body);
        }
    });
 
    EditorUtils.registerTool("clean", new Editor.Tool({ command: cleanCommand, template: new Editor.ToolTemplate({ template: EditorUtils.buttonTemplate, title: 'Clean Formatting' }) }));
 
    Editor.cleanDOM = function (node, allow) {
        var next = null, depth = 0, toClean = false;
        if (node && node.childNodes.length > 0)
            do {
                if (!toClean && (next = node.firstChild)) {
                    depth++;
                } else {
                    toClean = false;
                    if (!(next = node.nextSibling)) {
                        next = node.parentNode;
                        depth--;
                        toClean = true;
                    };
                    cleanNode(node, allow || allowWhiteList);
                }
                node = next;
            } while (depth > 0);
    }
 
    // White List Of Allowed Tags + Attributes
 
    var allowWhiteList = {
        tags: 'h1,h2,h3,h4,h5,h6,blockquote,ol,ul,li,p,br,b,strong,em,i,u,span,del,a,img,table,thead,tbody,tfoot,tr,th,td'.split(','),
        attr: { 'img': ['src', 'alt', 'height', 'width', 'align'], 'a': ['href', 'target', 'name'], 'table': ['class', 'contentEditable'], 'th': ['contentEditable', 'colspan', 'rowspan'], 'td': ['contentEditable', 'colspan', 'rowspan'], 'span': ['class'] },
        style: { 'span': ['text-decoration'] },
        empty: ['img', 'th', 'td', 'br']
    };
 
    function cleanNode(node, allow) {
        var parent = node.parentNode;
        if (node.nodeType == 1) {
            // Remove If Not In White List
            var tag = node.nodeName.toLowerCase();
            if (allow.tags.indexOf(tag) === -1) {
                while (node.childNodes.length > 0) {
                    parent.insertBefore(node.childNodes[0], node);
                }
                parent.removeChild(node);
                return;
            };
 
            // Remove Empty Tags
            if (node.childNodes.length == 0 && allow.empty.indexOf(tag) == -1) {
                parent.removeChild(node);
                return;
            }
 
            // Clean Attributes
            var attrs = allow.attr[tag] || [];
            for (var x = node.attributes.length - 1; x >= 0; x--) {
                var attr = node.attributes[x].name.toLowerCase();
                if (attr == 'style') {
                    // Clean Style
                    var allowed = allow.style[tag] || [];
                    if (allowed.length == 0)
                        node.removeAttribute('style');
                    else {
                        var cssText = '';
                        for (var i = 0; i < allowed.length; i++) {
                            var defined = node.style[allowed[i]];
                            if (defined)
                                cssText += allowed[i] + ':' + defined + ';';
                        }
                        if (cssText)
                            node.style.cssText = cssText;
                        else
                            node.removeAttribute('style');
                    };
                } else if (attrs.indexOf(attr) === -1) {
                    node.removeAttribute(node.attributes[x].name);
                }
            };
 
            // Remove Span With No Attributes (Keeping Children)
            if (tag == 'span' && node.attributes.length == 0) {
                while (node.childNodes.length > 0) {
                    parent.insertBefore(node.childNodes[0], node);
                }
                parent.removeChild(node);
                return;
            };
 
            // Combine Adjacent Text Sub-Nodes
            node.normalize();
 
            // Remove Adjacent Breaks
            if (tag == 'p') {
                // Remove Trailing Breaks And Split On Double Breaks
                var lastChild = true;
                var lineBreaks = 0;
                for (var i = node.childNodes.length - 1; i >= 0; i--) {
                    var child = node.childNodes[i];
                    if (child.nodeName == 'BR') {
                        if (lastChild)
                            node.removeChild(child);
                        else {
                            lineBreaks++;
                        }
                    } else {
                        lastChild = false;
                        if (lineBreaks > 1) {
                            var newNode = parent.insertBefore(document.createElement('P'), node.nextSibling);
                            var skip = true;
                            while (child.nextSibling)
                                if (skip && child.nextSibling.nodeName == 'BR')
                                    node.removeChild(child.nextSibling)
                                else {
                                    newNode.insertBefore(child.nextSibling);
                                    skip = false;
                                }
                            lastChild = true;
                        }
                        lineBreaks = 0;
                    }
                }
 
                // Remove Breaks At Start Of Paragraph
                while (node.childNodes.length > 0 && node.childNodes[0].nodeName == 'BR')
                    node.removeChild(node.firstChild);
 
                // If Only Breaks Remove Paragraph
                var onlyBreaks = true;
                for (var i = 0; i < node.childNodes.length; i++)
                    if (node.childNodes[i].nodeName != 'BR')
                        onlyBreaks = false;
                if (onlyBreaks)
                    parent.removeChild(node);
            }
 
 
        } else if (node.nodeType != 3) {
            // Not An Element Or Text Node
            parent.removeChild(node);
        } else {
            node.nodeValue = node.nodeValue.replace(/\xA0/g, ' ').replace(/\n/g, " ").replace(/\s{2,}/g, ' ');
            // Remove Whitespace Only Text Nodes
            if (!/\S/.test(node.nodeValue))
                parent.removeChild(node);
        };
    };
 
})(window.kendo.jQuery);
0
Jan Hansen
Top achievements
Rank 1
answered on 27 Feb 2014, 11:40 AM
Thanks a lot! The only thing I've changes is the last part where you remove whitespace-only nodes. If you format text in word like

This is a text with a blue word.

- and "blue" is marked blue, then effectively the space between blue and word will be removed as well when pasting. 

Not a big issue, I just removed the last line in the clean method. But thanks a million - this is a great help.

Another issue is that I cant seem to add a text-only button the way you create the clean button. I end up with a button with a "random" portion of the icon sprite as background - not a regular text-based button. But then again, the users doesn't need that, so I've skipped it :o) None the less - I would like to know how its done, should you know it.

Best regards

Jan

0
Scott
Top achievements
Rank 1
answered on 28 Feb 2014, 05:49 AM
Using the buttonTemplate you then need to style the button using CSS.  Each tool gets added with the class k-{name} so you can style the clean button using

.k-clean {
    background-image: url('/img/editor/clean.png') !important;
    background-size: 24px 24px;
}
0
Jan Hansen
Top achievements
Rank 1
answered on 28 Feb 2014, 07:16 AM
Thanks :o) Actually, I had that part figured out - but since I'm terrible at drawing icons, I went for the more old style text-button, which I can't find a way to create. There is dropdowns and other things, but a text-only-button?

/Jan
Tags
Editor
Asked by
Scott
Top achievements
Rank 1
Answers by
Alex Gyoshev
Telerik team
Scott
Top achievements
Rank 1
Adam
Top achievements
Rank 1
Atanas Korchev
Telerik team
Jan Hansen
Top achievements
Rank 1
Share this question
or