« MasterWriter- Songwriting software (Kim Gentes Worship Tech Blog) | Main | How To Blog & Podcast Your Life Away!! (Kim Gentes) »

Regex Pattern for Parsing CSV files with Embedded commas, double quotes and line breaks

While you have stumbled on KimGentes.com, you might be coming for a few different reasons. Some of you are interested in articles and resources on Christianity, music, worship and such. Others of you are interested in technology information related to church worship settings. Some other folks are programmers who are looking for helpful information on technical challenges. This particular post is a bleed over from some of my technical work in programming. Specifically, this is a post to present a solution to parsing CSV files.

Programmers understand that CSV files are simply text data files that have information stored in value fields in the file. Each of the fields is separated by commas to delimit when one value/field ends and the next begins. This is why they are called "Comma Separated Values" files (CSV for short). Anyone who is new to this concept or programming might think that writing a program to extract data from files wherein the commas separate the data fields, should be an easy task. And if that was the total sum of it, it would be quick and simple in virtually any language you could choose to do it in. But that is not the end of it. CSV files are written by a host of popular applications and read by thousands of programs as well, including almost every spreadsheet program in existence, including Microsoft Excel. When the first CSV file user started outputting values to fields and reading them in another destination, they quickly realized a limitation- if you wanted to include the literal character of a comma (,) inside of a field value itself, this could not be done, since it would be interpreted as a field separator and its value wouldn't be understood (as well as the field in which it appeared being literally chopped in half).

To overcome this problem, it's assumed that some Neanderthal software developers (back in the Jurassic era of programming) came up with an idea to allow programs to insert and read commas inside of comma separated fields. They would allow fields to be encased in double quotes as a signature that the value inside this field should be read literally (including commas) from the first double quote to the ending double quote. This worked fine and commas could now be embedded in CSV field values. But, as you can guess, these cause further problems for programs- now, the commas of the world had safe haven usage inside of comma separated values, however, double quotes now could not be included inside of a double quote encased field value. Programmers quickly realized that they couldn't keep adding special characters to allow for current special characters to be escaped (which is a way of saying interpreted as literal data without functional consequence in the interpretation of the data).

So, to avoid using other characters to escape current special meaning characters, CSV file progenitors harkened that users could escape double quotes inside of double quote encased CSV fields by placing two double quotes together in the text. This would the standard way of escaping a double quote character ("), by simply placing to double quote characters next to each other, as in "".

All this is fine for the people and programs writing the data- its simple straightforward programming to output such information. But reading CSV files that have embedded double quotes, commas and can include embedded line breaks is a complicated concept. Such is the life of a programmer :). To meet this challenge, we often use a pattern parsing language called Regex (which stands for Regular Expressions).

Regex maybe the most popular language in the programming world. It is used in literally every high level programming language we know of in the world, including Visual Basic, C#, Javascript,  Java, PHP, Perl, Ruby and dozens more. It is included in several utilities such as search functions inside of UltraEdit and Ace Text. And it is included in most revisions of Unix (and other) OSes in command lines functions such as grep, Windows utilities powerGrep and so forth. Technically speaking Regex isn't a programming language on its own. It's a pattern matching engine that is often embedded inside of other languages. It became widely popular due to its inclusion primary in the Unix/Linux command line function of grep and the early web standard language of Perl. Now, most programmers can't conceive of a language that doesn't include some flavor of regex.

That all said, I have chosen to write a regex pattern that can handle parsing the fields of a CSV with all the conditions I mentioned above. There are plenty of other examples of CSV parsers around, but none seem to do the trick I was looking for, which is grandly frustrating when Excel can import and export a CSV with all the listed nuances quickly and easily. So, not finding a good solution, I have written a short CSV parsing pattern. It is below.

CSV-parser (regex pattern below)

 

^(("(?:[^"]|"")*"|[^,]*)(,("(?:[^"]|"")*"|[^,]*))*)$

 

*YOU ARE FREE to use this algorithm in any application (commercial or personal or whatever). It comes with no warrantees.  If you DO end up using this REGEX pattern, please do so with the following considerations:

  • Please make this notation in your source code:  ©2008 Kim Anthony Gentes - FREE TO USE ANYWHERE. No Warrantees are implied or offered. This software is offered "as-is". Usable by anyone (freeware, non-commercial or personal). No support or service is offered or implied by your usage. Use of the software implies your own assumption of maintenance, liability and operability of the same. Only restriction for us: you must include this copyright notice (full text) with the code. 
  • Please post a response on this blog entry below (you do that by clicking on the "Comments" link at the bottom of this entry), saying you found this and are using it. I'd just like to know if its helping people and how people are using it.

 

When using the regex, some important things to know:

Options (turned on in your language/utility): ^ and $ match at line breaks

Description: below is a textual description of the regex pattern that may be helpful to programmers who want to understand what is happening in the regex.

 

Assert position at the beginning of a line (at beginning of the string or after a line break character) «^»
Match the regular expression below and capture its match into backreference number 1 «(("(?:[^"]|"")*"|[^,]*)(,("(?:[^"]|"")*"|[^,]*))*)»
   Match the regular expression below and capture its match into backreference number 2 «("(?:[^"]|"")*"|[^,]*)»
      Match either the regular expression below (attempting the next alternative only if this one fails) «"(?:[^"]|"")*"»
         Match the character “"” literally «"»
         Match the regular expression below «(?:[^"]|"")*»
            Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
            Match either the regular expression below (attempting the next alternative only if this one fails) «[^"]»
               Match any character that is NOT a “"” «[^"]»
            Or match regular expression number 2 below (the entire group fails if this one fails to match) «""»
               Match the characters “""” literally «""»
         Match the character “"” literally «"»
      Or match regular expression number 2 below (the entire group fails if this one fails to match) «[^,]*»
         Match any character that is NOT a “,” «[^,]*»
            Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
   Match the regular expression below and capture its match into backreference number 3 «(,("(?:[^"]|"")*"|[^,]*))*»
      Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
      Note: You repeated the capturing group itself.  The group will capture only the last iteration.  Put a capturing group around the repeated group to capture all iterations. «*»
      Match the character “,” literally «,»
      Match the regular expression below and capture its match into backreference number 4 «("(?:[^"]|"")*"|[^,]*)»
         Match either the regular expression below (attempting the next alternative only if this one fails) «"(?:[^"]|"")*"»
            Match the character “"” literally «"»
            Match the regular expression below «(?:[^"]|"")*»
               Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
               Match either the regular expression below (attempting the next alternative only if this one fails) «[^"]»
                  Match any character that is NOT a “"” «[^"]»
               Or match regular expression number 2 below (the entire group fails if this one fails to match) «""»
                  Match the characters “""” literally «""»
            Match the character “"” literally «"»
         Or match regular expression number 2 below (the entire group fails if this one fails to match) «[^,]*»
            Match any character that is NOT a “,” «[^,]*»
               Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
Assert position at the end of a line (at the end of the string or before a line break character) «$»
Thank you for all the additional information/examples and samples from various languages! Keep posting your ideas that can help others!
thanks
Kim

PrintView Printer Friendly Version

EmailEmail Article to Friend

Reader Comments (39)

All hail regular expressions! I might have to do actual work without them.

October 15, 2008 | Unregistered CommenterMatt Self

ha! I hear ya Matt... ya.. they are good.. its nice that you can get something like PowerGrep for the PC and use it locally.. btw, if you ever need a good Regex tool (to help test patterns), check out Regex Buddy.. it's great at debugging them.

October 15, 2008 | Registered CommenterKim Gentes

Couldn't get this to work in .net :(

December 1, 2008 | Unregistered Commenterjason

Jason,

be sure to check whether your regex .net flavor has the Options turned on in your language/utility for ^ and $ to match to line breaks.

also, the core part of the pattern that matches a single field is this
("(?:[^"]|"")*"|[^,]*)

if I had a 3 field pattern all the time, I would do the following
^("(?:[^"]|"")*"|[^,]*),("(?:[^"]|"")*"|[^,]*),("(?:[^"]|"")*"|[^,]*)$

some patterns are better set as predetermined scope (such as 3 reps like above than using the unknown * iterator if you aren't sure how to unwind the stack once you find the pattern).

Kim

December 2, 2008 | Registered CommenterKim Gentes

you mentioned the .net implementation... if I used VB.Net, i would write my 3 field iterator as follows:

Try
Dim RegexObj As New Regex("^(""(?:[^""]|"""")*""|[^,]*),(""(?:[^""]|"""")*""|[^,]*),(""(?:[^""]|"""")*""|[^,]*)$", RegexOptions.Multiline)
Dim MatchResults As Match = RegexObj.Match(SubjectString)
While MatchResults.Success
' matched text: MatchResults.Value
' match start: MatchResults.Index
' match length: MatchResults.Length
MatchResults = MatchResults.NextMatch()
End While
Catch ex As ArgumentException
'Syntax error in the regular expression
End Try

December 2, 2008 | Registered CommenterKim Gentes

I use biterscripting a lot for parsing of CSV files. They have a sample script at http://www.biterscripting.com/Download/SS_CSV.txt . You can parse by string, line, word (separated by commas), character. Further, you can change the delimiter values, even between lines of the same file, even depending on previous lines, by changing variables $wsep and $lsep.

Patrick

January 9, 2009 | Unregistered CommenterPatrick Mc

Kim--

Using your Regex for some in-house code. Made the following changes for it to work on C#:

string pattern = @"(""(?:[^""]|"""")*""|[^,]*)(?:,(""(?:[^""]|"""")*""|[^,]*))*";

Note removed the start and end line anchors since I'm processing line-by-line. Easy enough to add them back in.

Thanks for posting your Regex. The english-breakdown helped!

James

July 27, 2009 | Unregistered CommenterJames

I found your parser, and was about to use it (Thanks!) until I found this site:
http://www.programmersheaven.com/user/Jonathan/blog/73-Splitting-CSV-with-regex/
whose author, along with Guillaume Roderick, came up with the wonderful:

Ruby: string.split(/,(?=(?:[^\"]*\"[^\"]*\")*(?![^\"]*\"))/)

Perl: $fields = preg_split("/,(?=(?:[^\"]*\"[^\"]*\")*(?![^\"]*\"))/", $string)

Splitting a string (or line) means putting every piece into an array (or list), which is almost always what you want to do. These lookahead regexps are explained on Jonathan's site. They are fully general.

July 30, 2009 | Unregistered CommenterC. Jack Bamford

Thanks for the original idea.
I could not make the repeating group work in C#, so have created this modified one and use Matches to find the values. use with the ExplicitCapture flag.
Groups[1] contains the result for each match.

(((?<x>(?=[,\r\n]+))|"(?<x>([^"]|"")+)"|(?<x>[^,\r\n]+)),?)

Explanation:

Match type 1: a lookahead checks if we have an empty element - if it finds a comma or a newline. this results in an empty match.
or
Match type 2: if 1 fails, check for a quoted string
or
Match type 3: then for a non-quoted string. this is taken from the original regex.

Finally, consume a comma if there is one.

Behavior:

- newline characters inside "" will become part of the value.
- empty lines outside "" result in one empty value
- escaped quotes inside quoted strings ("") need to be replaced afterwards!

note: if the last line is empty AND the previous line had a comma at the end, no extra result is given. I have no fix for this yet.

C# code to create arrays from csv lines: (uses linq)


string[] SplitCsvLine(string line)
{
return (from System.Text.RegularExpressions.Match m in System.Text.RegularExpressions.Regex.Matches(line,
@"(((?<x>(?=[,\r\n]+))|""(?<x>([^""]|"""")+)""|(?<x>[^,\r\n]+)),?)",
System.Text.RegularExpressions.RegexOptions.ExplicitCapture)
select m.Groups[1].Value).ToArray();
}

Good luck.

November 24, 2009 | Unregistered Commenterh. brouwer

I had become intrigued by the use of Regular Expressions as a way to parse CSV files. I began a project to test these formulas as soon as I learned of their potential as a solution. I was hoping to find some sample code or even a distributable DLL that could be employed for this purpose. So far the most definitive work I’ve seen was the regular expression formulas shared on this blog.

The formula provided by the original author of this blog for some reason does not work for me. When I use this formula on a single string record, it returns the entire record back to me in the first member of the collection created by the MS Regx process. I’m using the Microsoft vbscript.dll version 5.5 in a VB6 program. I’m referring to the following formula:

^(("(?:[^"]|"")*"|[^,]*)(,("(?:[^"]|"")*"|[^,]*))*)$

I then tried the shorter version offered by Kim and had much better results.

("(?:[^"]|"")*"|[^,]*)

The problem with this second formula is that it creates an extra empty string field between each data column. I could live with this except I wasn’t sure if this would be consistent for any type of CSV format or if there was something about my test data that was causing this. I tried running more exotic string combinations and unfortunately the formula didn’t work. Here is the exotic string to which I was referring:

"000523"",1",",HOLTSVILLE"",""40.81518"",""-73.0455"",""5"",""25"",""103"",""Y"",""V13916"",""U"",""""",,A

(note: The above two lines should be a single line)

The correct and actual field breaks (as can be verified with Excel) create four fields and occur as follows:

000523",1
,HOLTSVILLE","40.81518","-73.0455","5","25","103","Y","V13916","U",""
(empty field)
A

I realize this is a tough string to parse but of course, that was the intention. In any event, so far I’ve yet to find a RegX formula that would pass the test. And yet, Excel has no problem with it at all, which still amazes me.

I finally decided to write my own DLL which I intend to use whenever again faced with trying to read and process a correctly formatted CVS file. I realize XML is the chosen way to go for data exchange these days and in many ways find XML superior to flat text files, but there are still legacy processes around and it is nice to know I can handle the situation should it become necessary. Also, there are some files so large that I find XML to be impractical. I’m referring to flat files that or 10 or 15 gig in size. I’ve never tried to process an XML file this large and wonder whether or not it is possible.

Testing my new DLL, I learned that this process is highly susceptible to attain self-serving results. That is why I’ve tried to develop some very unfriendly data to challenge the logic.

As a way to maintain sanity while testing, I chose a simple approach. My “proof of process” was that if Excel can parse the record correctly then I should be able to parse it in exactly the same way with the same results. So I used Excel to verify every string I’ve subjected to my parsing routine. If I attained the same results, I took it to mean that my process was correct.

I’ve used test media from a number of sources, including some I've created using Excel. These were chosen with the sole purpose of trying to confound the logic with highly improbably data. Nevertheless I've no doubt there are probably perfectly valid files out there that my program might not be able to handle. I certainly have not exercised all the conditions that have been included in the logic. If anyone would care to contribute some complicated CSV format files for me to test, I would appreciate hearing from you. You may email them to me but please limit the size of the file(s) to under 25K. Again, I stress, these data must be able to pass the “Excel” test for me to consider them as a viable challenge.

As time permits, I will return the results and if the DLL proves viable, it can be shared should there still be anyone out there still using VB6. Eventually, I will make this a VB.Net program after I am sure it is working.

December 27, 2009 | Unregistered CommenterVisual Basic Programmer

Thanks for your notes all.. Especially all those who have chosen to post your updates and share your code here so others could benefit. VBasic, if you feel free to, we'd love to see what you came up with too... Thanks for posting your thoughts and notes.. Feel free to add your own flavors and updates all, as you have them for your specific apps.

Kim

December 28, 2009 | Registered CommenterKim Gentes

Thanks so much for posting this, and all the helpful comments evrybody's contributed. It's just what I needed and saved me a load of time!

January 7, 2010 | Unregistered CommenterMike

Thanks for the post! I am trying to convert a nasty CSV file into SQL INSERT statements and this helped a great deal.

April 15, 2010 | Unregistered CommenterRob Scott

Thanks for the post! It helped me tremendously in parsing the CSV format produced by the schtasks</> command in Windows.

April 21, 2010 | Unregistered Commenteranon

Thank you so much for making this available. A DEFINITE life saver.

June 8, 2010 | Unregistered CommenterStuck In RegEx

Thanks a lot this is great!

June 16, 2010 | Unregistered CommenterJoe

This version doesn't capture spaces outside of quotes, and treats an empty quoted field (..., "foo", "", "bar"...) as an empty string rather than as a single quote.

// Not escaped: \s?((?<x>(?=[,]+))|"(?<x>([^"]|"")+)"|"(?<x>)"|(?<x>[^,]+)),?

var matches = Regex.Matches(line,
@"\s?((?<x>(?=[,]+))|""(?<x>([^""]|"""")+)""|""(?<x>)""|(?<x>[^,]+)),?",
RegexOptions.ExplicitCapture);

var values = (from Match m in matches
select m.Groups["x"].Value.Trim().Replace("\"\"", "\"")).ToArray();

Thanks for getting me started!

June 17, 2010 | Unregistered Commenterjgr4

Great Pattern; using for classic ASP-based website. Thanks.

July 12, 2010 | Unregistered CommenterTravis Clark

CSV's never go away.
Hopefully the insights gained from this page may result in a match.

August 11, 2010 | Unregistered CommenterSal

LOL! CSV humor.. very funny Sal..!

August 12, 2010 | Registered CommenterKim Gentes

I use the Vim text editor gvim to interactively develop regexps. Vim is helpful because you can tell it to highlight and jump to intermediate results as you enter and edit the regexp on the search command line. I then translate the finished product to the regexp flavor for the language I am programming in.

I translated your ideas into a Vim regex that matches individual CSV fields
"\%(\_[^"]\+\|""\)*"\|[^",]*
The \_ prefix to the [^"] character class allows newlines to be included
\%( ... \) is the Vim idiom for non-capture grouping
Note, the quoted-field handling part must come first!

The only major difference I have is to append \+ after the first character class, and to exclude commas as well as double quotes in the final character class. Not sure if that breaks some special case, but it does test out well using this command...
%s/"\%(\_[^"]\+\|""\)*"\|[^",]*/<&>/g
...on this challenging example from Wikipedia's article on Comma-separated values:
Year,Make,Model,Description,Price
1997,Ford,E350,"ac, abs, moon",3000.00
1999,Chevy,"Venture ""Extended Edition""","",4900.00
1999,Chevy,"Venture ""Extended Edition, Very Large""","",5000.00
1996,Jeep,Grand Cherokee,"MUST SELL!
air, moon roof, loaded",4799.00

September 28, 2010 | Unregistered CommenterJim Christensen

well done, Jim. Thanks so much for adding your contribution to the discussion. I honestly hadn't heard of VIM, but it sounds intriguing. A number of regex building tools exist now that really take the syntax to more of a work output, and you can use wizards to develop the pattern themes, but it sounds like VIM is an interactive tool for building them real time.. cool concept. Regex Buddy has some of those components and since I fancy myself as a bit of a purest, I like trying to work out the regex in raw form rather than wizards. But it is nice, even when doing this, to see instant testing results, especially when the depth of groupings of captured and uncaptured segments get deeper and deeper.

Keep on regexing, people! Thanks for all the contributions on this post. I hear regularly from people that it continues to help folks to find what they are looking for in this area of CSV files specifically.

September 28, 2010 | Registered CommenterKim Gentes

Brilliant ! Thanks for posting

October 20, 2010 | Unregistered CommenterLevanter

I'm so happy I found this, this was going to be a nightmare for me.

March 2, 2011 | Unregistered CommenterReini Grauer

Anybody has regex for php for text comma separated file for eg: firstname,lastname,address,city,province,postalcode

March 21, 2011 | Unregistered CommenterRam

What would be the pattern for the following?
Either empty string, or.
csv with a twist that there would be comma before first entry and after last entry as well.
Example 1: Empty string
Example 2: ,abc,
Example 3: ,abc,def,ghi,

June 8, 2011 | Unregistered Commentercsver

Do you seriously think you can copyright a regular expression? Dream on, buddy!

August 23, 2011 | Unregistered CommenterJ Smith

@J Smith

Actually, any software you write is not only copyrightable, but patentable. Copyrights don't even have to be proven to be unique, although patents do. In fact copryights are constitutionally guaranteed on creation of the work, not requiring registration (though that helps with later legal proving, if needed).

I wrote this software, this regex and not only am I within my rights to copyright it, but fools to who don't claim copyright of their software might as well just work for free for whatever companies are interested in their services. The entire reason a software developer is paid is to develop intellectual solutions to real world problems. Most of the time today, those solutions are already derived in some program or module, and a lot of current software is "gluing" components together. But when you derive a unique solution to a unique problem, protecting that solution via a copyright has been the foundation of Intellectual property rights for the last 250 years.

This, of course, does not negate people from stealing your ideas, copying them, selling them to others or even making money off of them.

There are some ideas which I prefer to give away to the public, as is evident here in this blog post. This particular blog post has become the #1 search result in Google for CSV regex (and various other forms of the search query). Why? Precisely because I valued my work and decided to present it nicely, professionally and share it with the world. That doesn't mean I gave up copyright for it. In fact, it means my work spreads as goodwill throughout the community of folks who might otherwise have toiled to get basically the same solution I came up with.

So "dream on". Oh I do. But not from a point of ignorance of not knowing the law, not knowing the profession or not knowing the online world wide community of software developers (open source, for example) that all hold well to the precepts I have described here of making great software, sharing it, and being a responsible contributor to business and society. So while I may dream, I also accomplish.

Perhaps if you find this contribution to software so detestable, you might post your source code for others to copy, utilize, share, use and mock? Or instead of doing that you might just be silent and accept that others might know more about a subject than you do.

Blessings
Kim Gentes

August 23, 2011 | Registered CommenterKim Gentes

The response to 'J Smith' looks like it’s straight from a textbook.

You have absolutely no chance of being able to enforce this in the real world. There isn’t a court in any country that matters that would uphold your claim.

You may as well have included an instruction to dance naked in the street upon use of your code.

August 26, 2011 | Unregistered CommenterDan Wotherington

Dan,

two points..

1. I never plan on trying to enforce it. That is the whole point. I gave it away. Hello? But I gave away what was mine, and did have value.

2. Why would I take anything from a textbook on law? I have spent about 15 years closely working in legal related matters, copyrights, patents.. I have been to court, negotiated issues and understand how little/much it can sometimes take to enforce any claim. But again, what is to claim? I am not claiming anything or ever filing anything against anyone for using the code I posted here.. that is the point..

The reason I replied to J SMITH at all was that he claimed a regex was uncopyrightable --- that point is absolutely false. I have worked on patented systems that some key regex formulas literally are main parts of the value. protecting them certainly can be done if you wanted to (although posting on the internet is a pretty stupid way to "protect" anything)... But copyrighting a regex? Absolutely legal, and upholdable. Proving someone copied yours ? well, ya, that would be hard. But if you had a patent you wouldn't even have to prove they copied it, you'd just have to prove they use one you already own a patent for.

You're right Dan, enforcement is tough, but copyrighting and enforcing code rights are not impossible, it just has to be done well. I have a copyright on a regex that I know I could sell for a good sum of money, because no one has a solution for that problem online anywhere, except me. The best protection is secrecy and management, not proofing in the law. But put together, your intellectual property has value, and that is the point I was making....

Again J SMITH mocked copyrighting a piece of code.. not protecting or enforcing a copyright. If I was trying to protect it, I would have never posted it to the internet.

Kim

August 26, 2011 | Registered CommenterKim Gentes

The work from H Brouwer gave me success.

November 28, 2011 | Unregistered CommenterChris Marisic

I too find it a little ridiculous that to use this regex I MUST include your blurb in MY software - a regular expression is NOT software or a program that can stand on its own - it's a pattern to match against so it's more like an equation if you ask me (and I know you didn't) - you don't see anyone claiming ownership of 2X + 3Y do you? no, because it's kinda lame to try to... so here's a regex for you (that I claim copyright to so noone else in the world can use unless they dance around naked) ^Lame$

December 2, 2011 | Unregistered CommenterJamie

Sorry you find it restrictive, Jamie. In the balance of giving away something for free, I felt it was a reasonable trade-off. All I was trying to do here was provide a solution for those willing to accept it as it is. My rationale for doing it this way has already been stated. If you find it too onerous, you are, of course, welcome to not use it, or to develop your own solution. I understand your points, and I appreciate that you presented them kindly. Thanks

As to your second point, if my regex here was simple as 2x+3y, you wouldn't be here searching for it, nor would anyone else. It's not simple. I am not declaring it is brilliant or anything. In fact, I think some of the follow up comments have better regexes for the problem for certain uses. I have plenty of other regexes (than the one I presented here) that are much more complex, mind you, but the fact is, this regex is useful and complex enough for people to care about finding it and not figuring it out on their own. I "lay claim" to this one for just one reason- I wrote it. Nothing else. It may be the dumbest thing ever, but I did it, so I claim that I did it. That is all.

Thanks for the discussion..

Kim

December 2, 2011 | Registered CommenterKim Gentes

In re-reading Jamie's post, I have decided to change the wording on the post to now only request that you acknowledge copyright of the code segment (regex). The main goal of the post was to help others. A secondary goal was to protect myself from attacks by people who would sue me if they used my code and it didn't work for their application. I established copyright of the regex as a means to give away the code for free and it be free for everyone (including being free for me from legal attacks). That said, the change does not remove my claim to ownership of the code, but does reflect the freedom of others to use it without the need to acknowledge the source. I would hope people would acknowledge the source out of courtesy, but I can see that people feel it was a restrictive application to actually allowing the code to be used with more prevalence. That said, no one should now have a conflict with using the code in their own contexts.

I want to especially point out Jamie's gentle reminder that my approach seemed "kinda lame" to him. Far be it from me to be kinda lame. Thanks to Jamie as well for using a bit of humor to make his point. "^Lame$". Very nice.

There you go folks. Go crazy.

kg

December 2, 2011 | Registered CommenterKim Gentes

I'd like to use this with sed. As a test, I try to put AAAA and ZZZZ as obvious field markers. But

sed 's/("(?:[^"]|"")*"|[^,]*)/AAAA & XXXX/g' inputfile

seems to just output the original file. No AAAA and ZZZZ markers.
it's as if the regex doesn't match anything.

Hints?

December 15, 2011 | Unregistered CommenterLindsay Morris

Thanks for the internet, and thanks for sharing.

July 1, 2012 | Unregistered CommenterHugo van den Berg

Hi, I would like to break Read CSV file into an array by splitting it on \n. But the problem is that when i split the CSV read string, it also breaks the Multi Line Text into multiple lines which is totally wrong. Can you please give me a VBA version how to escape \n while reading CSV.

Regards
Murtaza

October 19, 2012 | Unregistered Commentermurtaza

I'm using a modified version of this in my source code.

October 12, 2013 | Unregistered CommenterSam O

Hi, thanks for the hits of the regex!

March 4, 2014 | Unregistered Commenterwild on powershell

PostPost a New Comment

Enter your information below to add a new comment.

My response is on my own website »
Author Email (optional):
Author URL (optional):
Post:
 
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>