Removal of Non ASCII characters using Python

PublishedJune 5, 2021

•2 min read

Removal of Non ASCII characters using Python

13+ Years of experienced as Full Stack Developer. Also worked as architect for building solutions and product to help for automation. Solution-oriented and hands-on technical utility player. Having experience of more than 4 years of experience in E commerce and finance in each domain. Experience in having driving business automation, marketing using technology. Strong follower of open source technology. Used PHP, Python, AWS and Angular as technology stack to build product

Hello Devs,

I am going to explain about how to remove non ascii characters from input text or content. Let first get to know what non-ascii characters are.

What are non ascii characters ?

You might have faced an issue while copy pasting text from document ( docx ) to HTML input element or any editor. Sometimes the format of symbols is not supported in particular. input area. Example, double quote is used in docx file and code editor or input element is different see below 👇🏻

“Example Text”. - in docx file 
"Example Text" - in editor or HTML input element

When you are trying to docx file text format into HTML then it is treated as non ascii characters or junk characters. Generally It can save into the database but sometime while doing some encoding or signature calculating you will face an issue because this will throw an error due to an unsupported string. One of the real scenarios I faced while calculating AWS signature before passing to API gateway and same matching with calculated signature by AWS is match and it throws an error because AWS signature calculation mechanism removes those characters and calculates signature but in your code you might not be doing then very straight it will not match.

How to solve this issue then ?

Below is Python script to remove those non ascii characters or junk characters.

Prerequisite :

Python any version ( recommended 3.x )
Regular expression operations library(re) - pip install re

import re
ini_string = "'technews One lone dude awaits iPad 2 at Apple\x89Ûªs SXSW store"
res1 = " ".join(re.split("[^A-Za-z0-9]+", ini_string)) 
print(res1)

if re.match("[^\t\r\n\x20-\x7E]+", ini_string):
    print("found")

result = ini_string.encode().decode('ascii', 'replace').replace(u'\ufffd', '`')
result2 = ini_string.encode().decode("utf-8").replace(u"\x89Ûª", "`").encode("utf-8")
print(result2)

References :

https://gist.github.com/aviboy2006/ca1e50f1cb1a32f7544f2f0af1fb928d

#hashnode #python #python-beginner #learn-coding #coding

4.6K views

Comments

Join the discussion

No comments yet. Be the first to comment.

More from this blog

From Parenting AI to Architecting Intent

A few months ago I wrote that AI needs a parent, not a replacement. That instinct was right. But instinct without structure is just anxiety with good intentions. What I've been reading since then gave

May 12, 20267 min read109

From Parenting AI to Architecting Intent

Stateful MCP Servers on ECS Fargate: What Happens When You Deploy

A few weeks back I was working on a PoC with Bedrock AgentCore Runtime. While doing that I came across multiple blogs and discussions around MCP server hosting on AWS. Most of them were pointing to ei

Apr 27, 202615 min read45

Stateful MCP Servers on ECS Fargate: What Happens When You Deploy

AI Doesn't Need a Replacement. It Needs a Parent.

I've spent 15 years building systems. Shipping products. Debugging things at 2 AM when production is on fire and nobody knows why. For the last year, I've been deep in AI tools — coding agents, cloud

Mar 7, 20266 min read23

AI Doesn't Need a Replacement. It Needs a Parent.

How I Replaced Prerender.io with My Own Serverless Renderer on AWS — For $0/Month

The Problem That Started It All A few months ago I published a post about using Prerender.io with Angular (https://www.internetkatta.com/how-i-fixed-seo-for-our-angular-spa-using-aws-amplify-prerender

Feb 27, 202614 min read132

How I Replaced Prerender.io with My Own Serverless Renderer on AWS — For $0/Month

My First AWS re:Invent Experience

Ten years. That's how long I'd been waiting to attend re:Invent. Ten years of watching from afar, reading live tweets, consuming session recordings days later, imagining what it would feel like to be there in person. This year, AWS launched a grant p...

Dec 24, 202512 min read153

My First AWS re:Invent Experience

I

InternetKatta | AWS | Programming | Learning | PHP | Angular

87 posts

Write & Share What We learn | Learning can't measure because it is learning