# Regular Expression Assignment Practice Lab

## How to construct and debug regular expressions

The next cell defines a large html string that we will use to test some of regular expressions.  When writing a program that is going to depend on accurately extracting instances of certain patterns from text or HTML, you need to create the regular expressions first, testing them on realistic example strings.  You need your expressions to do two things:

1. Match the strings you trying to extract, and possibly some context around them, to guarantee you
   are extracting the right information;
2. If your expression matches context as well as the information you are trying to extract,
   (and often it will have to) you need to identify the target part of  the expression.  This is done by placing the target part of 
   the pattern in parentheses (illustrated below).
   
The homework assignment asks you to extract the baby name year in the html file.  The line containing the relevant information looks like this
     
     <h3 align="center">Popularity in 1990</h3>
     
One regular expression that will match the year is the following:

       '\d\d\d\d'

The code below tries out this idea.  Evaluate it and report on the  success of the idea in the markdown cell below the code cell.  

In [1]:
import re

html_string = """
<head><title>Popular Baby Names</title>
<meta name="dc.language" scheme="ISO639-2" content="eng">
<meta name="dc.creator" content="OACT">
<meta name="lead_content_manager" content="JeffK">
<meta name="coder" content="JeffK">
<meta name="dc.date.reviewed" scheme="ISO8601" content="2005-12-30">
<link rel="stylesheet" href="../OACT/templatefiles/master.css" type="text/css" media="screen">
<link rel="stylesheet" href="../OACT/templatefiles/custom.css" type="text/css" media="screen">
<link rel="stylesheet" href="../OACT/templatefiles/print.css" type="text/css" media="print">
</head>
<body bgcolor="#ffffff" text="#000000" topmargin="1" leftmargin="0">
<table width="100%" border="0" cellspacing="0" cellpadding="4">
  <tbody>
  <tr><td class="sstop" valign="bottom" align="left" width="25%">
      Social Security Online
    </td><td valign="bottom" class="titletext">
      <!-- sitetitle -->Popular Baby Names
    </td>
  </tr>
  <tr bgcolor="#333366"><td colspan="2" height="2"></td></tr>
  <tr><td class="graystars" width="25%" valign="top">
       <a href="../OACT/babynames/">Popular Baby Names</a></td><td valign="top"> 
      <a href="http://www.ssa.gov/"><img src="/templateimages/tinylogo.gif"
      width="52" height="47" align="left"
      alt="SSA logo: link to Social Security home page" border="0"></a><a name="content"></a>
      <h1>Popular Names by Birth Year</h1>September 12, 2007</td>
  </tr>
  <tr bgcolor="#333366"><td colspan="2" height="1"></td></tr>
</tbody></table>
<table width="100%" border="0" cellspacing="0" cellpadding="4" summary="formatting">
  <tr valign="top"><td width="25%" class="greycell">
      <a href="../OACT/babynames/background.html">Background information</a>
      <p><br />
      &nbsp; Select another <label for="yob">year of birth</label>?<br />      
      <form method="post" action="/cgi-bin/popularnames.cgi">
      &nbsp; <input type="text" name="year" id="yob" size="4" value="1990">
      <input type="hidden" name="top" value="1000">
      <input type="hidden" name="number" value="">
      &nbsp; <input type="submit" value="   Go  "></form>
    </td><td>
<h3 align="center">Popularity in 1990</h3>
<p align="center">
"""
re1 = r'\d\d\d\d\d\d\d\d\d\d\d'
re1_revised = r'[12]\d\d\d'
match = re.search(re1,html_string)
match_two = re.search(re1_revised,html_string)
# match object tells you positions in string where match begins and ends (match.start() and match.end()).  
# Let's look at  this span
if match is None:
   print(match)
#print html_string[match.start():match.end()]
#print html_string[match_two.start():match_two.end()]

None


Discuss how well this regular expresion worked at extracting the year. If it failed, explain why.
You may edit this cell.

This exercise should have convinced you needed to amend the regular expression to provide some contexts; 4 digits in a row, even if the first is required to be 1 or 2, won't do it.  In the next cell, define and test a new regular expression that does
the job. You may want to try some of the exercises in the following sections first, to get some practice with regular expressions.

In [2]:
#Match <h3 align="center">Popularity in 1990</h3> and variants to retrieve year
re1_revised = r'<h3\s+align\s*=\s*"center"\s*>Popularity\s+in\s+(\d\d\d\d)</h3>'
match_revised = re.search(re1_revised,html_string)
print(html_string[match_revised.start():match_revised.end()])
print(match_revised.groups())

<h3 align="center">Popularity in 1990</h3>
('1990',)


## Regular expression practice

Edit this cell and after each regular expression, describe the class of strings it matches.  Check your answer examining the output of the code cell that follows.

1.  [a-zA-Z]+     # Set of alphabetic strings of length 1 or greater
2.  [A-Z][a-z]*   # Set of alphabetic strings beginning with a cap and continuing with 0 or more lower case lets
3.  \d+(\.\d+)?   # Set of decimal numbers including integers like 108 or 0, excluding decimal points w/ no
                  # following numbers, e.g., 10.
4.  ([bcdfghjklmnpqrstvwxyz][aeiou][bcdfghjklmnpqrstvwxyz])*  # Set of lower case alph. strings consisting of
                  # a consonant, a vowel, and consonant repeated 0 or more times, so babab is out babbab is in.
5.  \w+|[^\w\s]+  # any sequence of word characters or any sequence of non-word, non-space characters

In [5]:
re.match(r'^\w+$', 'b_c')

<_sre.SRE_Match object; span=(0, 3), match='b_c'>

In [24]:
########################################
###     Some regular expressions     ###
########################################

re2 = r'[a-zA-Z]+'
re3 = r'[A-Z][a-z]+'
re4 = r'\d+(\.\d+)?'
re5 = r'([bcdfghjklmnpqrstvwxyz][aeiou][bcdfghjklmnpqrstvwxyz])*'
re6 = r'\w+|[^\w\s]+'
res = [re2,re3,re4,re5,re6]

########################################
###     Some example strings         ###
########################################

example1 = 'abracadabra'
example2 = '1billygoat'
example3 = 'billygoat1'
example4 = '43.1789'
example5 = '43.'
example6 = '43'
example7 = 'road_runner'
example8 = ' road_runner'
example9 = 'bathos'
example10 = "The little dog laughed to see such a sight."
example11 = 'socrates'
example12 = 'Socrates'
example13 = '*&%#!?'
example14 = 'IBM'
example15 = 'iBm'

examples = [example1,example2,example3,example4,example5,example6,
            example7,example8,example9,example10,example11,example12,example13,
            example14, example15]

########################################
###     Trying some matches          ###
########################################

for i,re_pat in enumerate(res):
    banner = 're%d %s' % (i+2,re_pat)
    print() 
    print(banner)
    print('=' * len(banner))
    print()
    for (i,ex) in enumerate(examples):
        match = re.match(re_pat,ex)
        if match:
            print('  %2d. %-45s  %s' % (i+1,ex,ex[match.start():match.end()]))
        else:
            print('  %2d. %-45s  %s' %(i+1,ex,None))


re2 [a-zA-Z]+

   1. abracadabra                                    abracadabra
   2. 1billygoat                                     None
   3. billygoat1                                     billygoat
   4. 43.1789                                        None
   5. 43.                                            None
   6. 43                                             None
   7. road_runner                                    road
   8.  road_runner                                   None
   9. bathos                                         bathos
  10. The little dog laughed to see such a sight.    The
  11. socrates                                       socrates
  12. Socrates                                       Socrates
  13. *&%#!?                                         None
  14. IBM                                            IBM
  15. iBm                                            iBm

re3 [A-Z][a-z]+

   1. abracadabra                                    None
   2. 1billygoat   

Make sure you can answer the following questions about the results of testing these regular expressions on the examples:

1. Why does `re2` fail on `example8`?
   *The first char of `example8` is a space and `re2` requires the first character to be alphanumeric.*
1. Why does `re3` only succeed on `example10` and `example12`?  Be sure to explain why it fails
   on `example14`.  *The first character of any match must be upper case, and itmust be followed by 1 or more lower case letters; only examples 10, 12 and 14 start with upper case letters, but example 14 
   fails because it does not have any lower case letters following.*
1. When 're4' matches 'example5', why isn't the decimal point part of the match? *`r4` says that when the deciomal is matched it must be followed by one or more digits.*
1. All of the regular expressions except `re5` report a `None` with at least one
   one of the examples.  Why doesn't `re5` report any `None`s?  *`re5` matches the empty string because it uses `*`, so instead of reporting a failure (`None`) on many of the examples, it reports a match with an empty string.*
1. Why does `re6` match all the characters in `example13`?  *Because `re6` can match either a sequence of all alphanumeric characters (`\w`) or a sequence of punctuation marks, and all the characters in `example13` are punctutaion marks.*
1. Why doesnt `re6` match `example8`? *`re6` matches either a sequence of alphanumeric characters (`\w`) or something that is a sequence of non-white-space non-alphanumeric and `example8` starts with white space, so it is neither.*

In [26]:
re.match(r'[^\w\s]+','43.')

## An example that requires NLTK to be installed

In [34]:
# From http://www.nltk.org/book/ch03.html
#  Find the most common vowel sequences in English.  Note: be patient.  Evaluating this may take a while.
from nltk.corpus import brown
from collections import Counter
bw = sorted(set(brown.words()))
# Find every instance of two or more consecutive vowels, and count tokens of each.
ctr = Counter(vs  for word in bw for vs in re.findall('[aeiou]{2,}',word)
              )
ctr.most_common(25)

[('io', 2787),
 ('ea', 2249),
 ('ou', 1855),
 ('ie', 1799),
 ('ia', 1400),
 ('ee', 1289),
 ('oo', 1174),
 ('ai', 1145),
 ('ue', 541),
 ('au', 540),
 ('ua', 502),
 ('ei', 485),
 ('ui', 483),
 ('oa', 466),
 ('oi', 412),
 ('eo', 250),
 ('iou', 225),
 ('eu', 187),
 ('oe', 181),
 ('iu', 128),
 ('ae', 85),
 ('eau', 54),
 ('uo', 53),
 ('eou', 52),
 ('uou', 37)]

## Poker examples

Suppose you are writing a poker program where a player’s hand is represented as a 5-character string with each character representing a card, “a” for ace, “k” for king, “q” for queen, “j” for jack, “t” for 10, and “2” through “9” representing the card with that value.  (We will ignore a card's suit for now, and simplify the problem by not trying to recognize a flush).

To see if a given string is a valid hand, one could run the code in the next cell.

In [2]:
import re
def displaymatch(regex,text):
    match = regex.match(text)
    if match is None:
        matchstring = None
    else:
        matchstring = '%s[%s]%s' % (text[:match.start()],text[match.start():match.end()],text[match.end():])
    print('%-10s %s' % (text,matchstring))

valid = re.compile(r"^[a2-9tjqk]{5}$")

## Some examples
displaymatch(valid, "akt5q")  # Valid.
displaymatch(valid, "akt5e")  # Invalid.
displaymatch(valid, "akt")    # Invalid.
displaymatch(valid, "727ak")  # Valid.
displaymatch(valid, "727aka")  # Invalid.

akt5q      [akt5q]
akt5e      None
akt        None
727ak      [727ak]
727aka     None


The hand "727ak" contains a pair, and we would like to recognize such hands as special, so that we can push all our chips into the pot.  We can do this using regular expression groups and register references.  The match for each parenthesized part of a regular expression is called a **group**.  We can refer back to the particular match  associated with a group with \integer.  Where integer is any integer from 1 through 9.  \1 refers to the first group, \2 to the second, and so on.  So to match poker hands with pairs, we do the following.

In [3]:
pair = re.compile(r".*(.).*\1.*")
displaymatch(pair,"727ak")
displaymatch(pair,"723ak")
displaymatch(pair,"772ak")
displaymatch(pair,"72ak7")

727ak      [727ak]
723ak      None
772ak      [772ak]
72ak7      [72ak7]


Of course, the regex `pair` does not require the text string to be a Poker hand.  We could revise it to do that and if you think about it a little, it would actually make the regex  **a lot** more complicated.  What we could do instead is first apply `valid` to guarantee we've got a valid poker hand and then apply `pair` to find out if it contains a pair. This makes both regexes simple and easy to understand and still enforce all the constraints we want.  Often a good strategy in applying regexes to enforce some complicated constraints is to divide the constraints up into separate categories and apply them **in succession.**.  

A problem with `pair` is that it doesnt tell us  what we've got a pair of.  Actually, the match object contains this information.  It has an attribute called `groups` which contains all portions of the string that matched a group.  We can use a revised version of `displaymatch` to print this, when requested:

In [28]:
import re
def displaymatch(regex,text, print_groups=False):
    match = regex.match(text)
    if match is None:
        matchstring = None
    else:
        matchstring = '%s[%s]%s' % (text[:match.start()],text[match.start():match.end()],text[match.end():])
    if print_groups and match:
        print('%-10s %s %s' % (text,matchstring,match.groups()))
    else:
        print('%-10s %s' % (text,matchstring))

# Re for recognizing pair hands
pair = re.compile(r".*(.).*\1")
displaymatch(pair,"723ak",print_groups=True)
displaymatch(pair,"723a7",print_groups=True)
print()
## Write your regex for recognizing two pair below. Test
## two_pair = ??
two_pair = re.compile(r"(?:.*(.).*(.).*\1.*\2.*)|(?:.*(.).*(.).*\4.*\3.*)|(?:.*(.).*\5.*(.).*\6.*)")
displaymatch(two_pair,"722a7",print_groups=True)
displaymatch(two_pair,"722ak",print_groups=True)  # Shd fail on this one
displaymatch(two_pair,"7a722",print_groups=True)
displaymatch(two_pair,"727a2",print_groups=True)
displaymatch(two_pair,"723a7",print_groups=True)
displaymatch(two_pair,"aaak2",print_groups=True)  # Shd fail on this one
displaymatch(two_pair,"aaaa2",print_groups=True)  # Will also succeed on this one, but that's ok

723ak      None
723a7      [723a7] ('7',)

722a7      [722a7] (None, None, '7', '2', None, None)
722ak      None
7a722      [7a722] (None, None, None, None, '7', '2')
727a2      [727a2] ('7', '2', None, None, None, None)
723a7      None
aaak2      None
aaaa2      [aaaa2] ('a', 'a', None, None, None, None)


## Questions

1.  Write regexes that match three-of-a-kind hands,  and four-of-a-kind hands.  Follow the model of `pairs` and dont bother to
    guarantee that it's a valid Poker hand.
2.  It's quite complex to write a regular expression that checks to see if you've got a straight, but you can try the 
    following strategy.  First, verify you've got a valid poker hand; then verify you havent got a pair, three-of-kind, or
    four-of-a-kind.  So you have a valid poker hand with no repetitions and you dont need the regex that checks for straights
    to rule those out.
    
    Now write a regex that will check to see if a valid poker hand 
    with no repetitions is a straight  beginning with '2'.  It should succeed on `23456` and `25643` and `32654` and it should fail
    `24357`.  To deal with all possible straights in this way, how many cases are there to take care of?  Write a single regular
    expression that will identify any straight, given that it is a valid poker hand with no repetitions.  Test it on the 
    straights above and on straights like `akqjt` and on the non-straight `24357`.
3.  Write a regex that matches a two pair hand. This is tricky and the most natural answer will also match four-of-a-kind. 
    Assume we've eliminated that possibility by failing to match the four-of-kind pattern from 1.  You should 
    test `722a7`, `7a722` and `727a2`.  You will need a pattern that is a big disjunction using `|`, and you will need to
    enclose the disjuncts of this big disjunction in parentheses, but for that purpose you will need parentheses that don't
    count as defining a retrievable group.  The notation for that is `(?:` instead of `(` [the same right paren is used 
    in both cases]. See [Python regex docs.](http://docs.python.org/2/library/re.html)

In [24]:
# Straight
valid = re.compile(r"^[a2-9tjqk]{5}$")
# Re for recognizing pair hands
pair = re.compile(r".*(.).*\1")
three_of_a_kind = re.compile(r".*(.).*\1.*\1.*")
#displaymatch(three_of_a_kind,"27a22",print_groups=True)
four_of_a_kind =  re.compile(r".*(.).*\1.*\1.*\1.*")
#displaymatch(four_of_a_kind,"2222a",print_groups=True)
#straight simple
straight_simple = re.compile(r"[2-6]{5}|[3-7]{5}|[4-8]{5}|[5-9]{5}|[6-9t]{5}|[7-9tj]{5}|[89tjq]{5}|[9tjqk]{5}|[tjqka]{5}")
displaymatch(straight_simple,"25436",print_groups=True)
displaymatch(straight_simple,"57436",print_groups=True)
displaymatch(straight_simple,"27436",print_groups=True)
displaymatch(straight_simple,"274536",print_groups=True) # not a valid hand, doesnt fit
displaymatch(straight_simple,"245367",print_groups=True) # not a valid hand, does fit
displaymatch(straight_simple,"74536",print_groups=True)

25436      [25436] ()
57436      [57436] ()
27436      None
274536     None
245367     [24536]7 ()
74536      [74536] ()


## How to do extraction

The following example is from `The weather underground page for San Diego <http://www.wunderground.com/weather-forecast/US/CA/San_Diego.html>`_.  The temperature is regularly given in a page division (HTML tag `div`) with ID (HTML attribute `divID`) `NowTemp`.  If we can find that division and the temperature inside it, we have what we want.  The pattern needs to be compiled with flags that allow it to match across multiple lines, because the context that identifies the temperature does not occur on the same line as the temperature.  Compiling regular expressions also makes them more efficient when reused.  A key point is that we place the actual temperature we want inside parentheses, the `(\d{1,3}\.\d)` part of the pattern.  Portions of a pattern that occur in parentheses and are matched are placed ins the `groups` attribute of  the match object.  The groups attribute is a tuple of all the matched strings in parentheses in the pattern.

In [7]:
html_string = """
<div class="br10" id="stationSelect">
		<a class="br10" id="stationselector_button" href="javascript:void(0);" onclick="_gaq.push(['_trackEvent', 'Station Select', 'Opened']);"><span>Station Select</span></a>
		</div>
		</div>
		<div id="conds_dashboard">
		<div id="hour00">
		<div id="nowCond">
		<div class="titleSubtle">Now</div>
		<div id="curIcon"><a href="" class="iconSwitchBig"><img src="http://icons-ak.wxug.com/i/c/k/nt_partlycloudy.gif" width="44" height="44" alt="Scattered Clouds" class="condIcon" /></a></div>
		<div id="curCond">Scattered Clouds</div>
		</div>
		<div id="nowTemp">
		<div class="titleSubtle">Temperature</div>
		<div id="tempActual"><span id="rapidtemp" class="pwsrt" pwsid="KCASANDI123" pwsunit="english" pwsvariable="tempf" english="&deg;F" metric="&deg;C" value="55.8">
  <span class="nobr"><span class="b">55.8</span>&nbsp;&deg;F</span>
</span></div>
		<div id="tempFeel">Feels Like
  <span class="nobr"><span class="b">55.1</span>&nbsp;&deg;F</span>
</div>
		</div>
"""
pattern = r'<div\s+id\s*=\s*\"nowTemp\"\s*>.*?(\d{1,3}\.\d).*?</div>'
pattern_re = re.compile(pattern,re.MULTILINE | re.DOTALL)
m = re.search(pattern_re,html_string)
m.groups()

('55.8',)

The pattern in the example above was built up piece by piece.  First we built a regular expression matching the `<div id="nowTemp">` part of the pattern.  That piece looked like this:
    
     subpattern = r'<div\s+id\s*=\s*\"nowTemp\"\s*>
 
 The `\s*` aren't needed for this particular string, but there is considerable variation in how actual HTML is generated, and since
 white space in the `\s*` positions wouldn't be meaningful, it is allowed.  Next we tested the core part of the pattern on its own:
 
     corepattern = r'(\d{1,3}\.\d)'
  
  Finally we tested the last part:
  
     lastpattern = r`</div>'

## Tokenization  (NLTK assumed)

Tokenization is the process of breaking up a text into words.  We have in some cases used `split()` for this purpose, uniformly splitting a text up into words on the spaces, but this doesn't always yield the right results, as the next examples show.

In [7]:
# From http://www.nltk.org/book/ch03.html
import re

text = """
"That," said  Fred, "is what
you ... get in the U.S.A. for $5.29."
"""
try1 = text.split()

# Notice the use of special NONCAPTURING parens (?:...)
# All parens in the regexp must be non capturing.
pattern = r""" 
   (?:[A-Z]\.)+        # abbreviations, e.g. U.S.A.
  |\w+(?:-\w+)*        # words with optional internal hyphens
  |\$?\d+(?:\.\d+)?%?  # numbers, money and percents, e.g. 3.14, $12.40, 82%
  |\.\.\.            # ellipsis
  |[][.,;"'?():-_`]  # keep punctuation, delimiters as separate word tokens
"""
re_flags = re.UNICODE | re.MULTILINE | re.DOTALL | re.X
pattern_re = re.compile(pattern,re_flags)
try2 = pattern_re.findall(text)
# Or equivalently, let nltk do some of the work.
import nltk
try3 = nltk.regexp_tokenize(text,pattern,flags=re_flags)

In [1]:
dd = dict()
dd['foo'] = 'fred'

In [2]:
dd

{'foo': 'fred'}

In [6]:
try1

['"That,"',
 'said',
 'Fred,',
 '"is',
 'what',
 'you',
 '...',
 'get',
 'in',
 'the',
 'U.S.A.',
 'for',
 '$5.29."']

The `split` tokenized sentence has some very strange words, for example the 7-character strings `"Fred,"` and `"That,"`,  and the 3-character string `"is`. What's being missed here is that certain characters (like comma and qutation-mark) unambiguously mark a word boundary.  Regular expressions are very good at enforcing this sort of generalization, as we can see by comparing the results of tokenizing the same sentence with a regexp that does not allow words to continuew past boundary markers.

In [3]:
print(try2 == try3)
try2

True


[u'"',
 u'That',
 u',',
 u'"',
 u'said',
 u'Fred',
 u',',
 u'"',
 u'is',
 u'what',
 u'you',
 u'...',
 u'get',
 u'in',
 u'the',
 u'U.S.A.',
 u'for',
 u'$5.29',
 u'.',
 u'"']

Python regular expressions use parentheses for two different things, defining retrievable groups, which as we saw, is useful for extraction, and defining the scope of some regular expression operator (like `*` or `+`). Sometimes these two roles get in each other's way.  This is what happens in `pattern` above: Python `findall` handles groups specially and incorrectly treats the parenthesized elements as groups; so we use the regular expression convention of changing `(` to '(?:'.  The "(?:' functions unambiguously to scope an operator and does not define a retrievable group.  Rather than make this change by hand, we call the convenient NLTK function `convert_regexp_to_nongrouping`.  We then compile the regular expression using various regaular expression compiling flags.  `re.MULTILINE` and `re.DOTALL` allow our regular tokenizing `pattern` to match across lines, while `re.UNICODE` allows our definition of word, which depends on the interpretation of `\w` to apply to UNICODE characters.  Finally, `re.X` is the most directly relevant to this example.  This allows regular expressions that intersperse comments, which makes them much more readable.  See [Python.org re docs](http://docs.python.org/2/library/re.html) for more details.


In [20]:
# Twitter
text = """RT @RealDonaldTrump #metoo Lightweight Senator Kirsten Gillibrand, 
a total flunky for Chuck Schumer and someone who would come to my office 
“begging” for campaign contributions not so long ago (and would do anything for them), 
is now in the ring fighting against Trump. Very disloyal to Bill & Crooked-USED!"""
tokenized = nltk.regexp_tokenize(text,pattern,flags=re_flags)
#tokenized = nltk.regexp_tokenize(text,r"[][\.,;\"'?():-_`]")

In [18]:
pattern2 = r""" 
   (?:[A-Z]\.)+        # abbreviations, e.g. U.S.A.
  |\w+(?:-\w+)*        # words with optional internal hyphens
  |\$?\d+(?:\.\d+)?%?  # numbers, money and percents, e.g. 3.14, $12.40, 82%
  |\.\.\.            # ellipsis
  |\#[][.,;"'?():-_`]  # keep punctuation, delimiters as separate word tokens
"""
tokenized2 = nltk.regexp_tokenize(text,pattern2,flags=re_flags)

In [21]:
tokenized

['RT',
 '@',
 'RealDonaldTrump',
 'metoo',
 'Lightweight',
 'Senator',
 'Kirsten',
 'Gillibrand',
 ',',
 'a',
 'total',
 'flunky',
 'for',
 'Chuck',
 'Schumer',
 'and',
 'someone',
 'who',
 'would',
 'come',
 'to',
 'my',
 'office',
 'begging',
 'for',
 'campaign',
 'contributions',
 'not',
 'so',
 'long',
 'ago',
 '(',
 'and',
 'would',
 'do',
 'anything',
 'for',
 'them',
 ')',
 ',',
 'is',
 'now',
 'in',
 'the',
 'ring',
 'fighting',
 'against',
 'Trump',
 '.',
 'Very',
 'disloyal',
 'to',
 'Bill',
 'Crooked-USED']