SCALA DUMP ZONE

Helper page for Scala snippets

  • XML is primarily used as a way of storing data, and looks a bit like HTML but with custom tags:

    <ParentTag>
        <ChildTag>
            <BabyTag name="Billy">Inner Text</BabyTag>
        </ChildTag>
    </ParentTag>

    XML is native to Scala, and writing it is very easy. There are two main types that you can assign XML to in Scala: Elem and NodeSeq. There is also Node, which is one single "tag" inside a NodeSeq.

    scala> val xml = <ParentTag>
    |     <ChildTag>
    |         <BabyTag name="Billy">Inner Text</BabyTag>
    |     </ChildTag>
    | </ParentTag>
    xml: scala.xml.Elem =
    <ParentTag>
        <ChildTag>
            <BabyTag name="Billy">Inner Text</BabyTag>
        </ChildTag>
    </ParentTag>
    
    scala> val xml2: scala.xml.NodeSeq = <ParentTag>
         |     <ChildTag>
         |         <BabyTag name="Billy">Inner Text</BabyTag>
         |     </ChildTag>
         | </ParentTag>
    xml2: scala.xml.NodeSeq =
    <ParentTag>
        <ChildTag>
            <BabyTag name="Billy">Inner Text</BabyTag>
        </ChildTag>
    </ParentTag>

    You can search through XML in Scala for various different attributes:

    Inner Text

    scala> (xml \\ "BabyTag").text
    res0: String = Inner Text

    If there is no match (ie the Node doesn't exist or is empty), an empty String will be returned rather than an error being thrown. You can then validate that it exists with something like if(!value.isEmpty).

    Tag Attributes (with @)

    scala> (xml \\ "BabyTag" \ "@name").text
    res1: String = Billy

    Note: (xml \\ "BabyTag" \ "@name").text == xml \\ "BabyTag" \@ "name"

    Getting the inner tags back

    scala> (xml \\ "ChildTag")
    res2: scala.xml.NodeSeq =
    NodeSeq(<ChildTag>
            <BabyTag name="Billy">Inner Text</BabyTag>
        </ChildTag>)

    You can then .map on each individual node if you want to do things like creating a List of values:

    scala> val xmlWithNamespace = <ParentTag>
    |     <ChildTag>My Text</ChildTag>
    |     <ChildTag>My Text2</ChildTag>
    | </ParentTag>
    xmlWithNamespace: scala.xml.Elem =
    <ParentTag>
        <ChildTag>My Text</ChildTag>
        <ChildTag>My Text2</ChildTag>
    </ParentTag>
    
    scala> (xmlWithNamespace \\ "ChildTag").map(_.text)
    res3: scala.collection.immutable.Seq[String] = List(My Text, My Text2)

    Getting a Node's namespace

    scala> val xmlWithNamespace = <ParentTag>
    |     <ChildTag xmlns="https://www.example.com">My Text</ChildTag>
    | </ParentTag>
    xmlWithNamespace: scala.xml.Elem =
    <ParentTag>
        <ChildTag xmlns="https://www.example.com">My Text</ChildTag>
    </ParentTag>
    
    scala> (xmlWithNamespace \\ "ChildTag").map(_.namespace)
    res4: scala.collection.immutable.Seq[String] = List(https://www.example.com)

    Searching through all Nodes, including nested Nodes

    Sometimes, you will want to get back all nodes (nested or not). To do this, search with the XML search wildcard _, ie (xml \\ "_").

    For example, to count the number of every Node without knowing the structure of the XML beforehand:

    scala> val xml = <family>
         |     <mother name="julie" />
         |     <father name="harold" />
         |     <child name="billy" status="good child" />
         |     <child name="charlie" status="good child" />
         |     <child name="mandy" status="bad child" />
         |     <child name="nigel" status="bad child" />
         |     <extendedfamily>
         |         <uncle name="jeff" />
         |         <auntie name="vicky" />
         |         <cousin name="little boy 1" />
         |         <cousin name="little boy 2" />
         |     </extendedfamily>
         | </family>
    xml: scala.xml.Elem =
    <family>
        <mother name="julie"/>
        <father name="harold"/>
        <child name="billy" status="good child"/>
        <child name="charlie" status="good child"/>
        <child name="mandy" status="bad child"/>
        <child name="nigel" status="bad child"/>
        <extendedfamily>
            <uncle name="jeff"/>
            <auntie name="vicky"/>
            <cousin name="little boy 1"/>
            <cousin name="little boy 2"/>
        </extendedfamily>
    </family>
    
    scala> val familyMap = (xml \\ "_").groupBy(_.label).map { case (k, v) => (k, v.size) }
    familyMap: scala.collection.immutable.Map[String,Int] = Map(mother -> 1, auntie -> 1, uncle -> 1, child -> 4, extendedfamily -> 1, father -> 1, cousin -> 2, family -> 1)
    
    scala> familyMap foreach {
         |     case (k, v) => println(s"$k count: $v")
         | }
    mother count: 1
    auntie count: 1
    uncle count: 1
    child count: 4
    extendedfamily count: 1
    father count: 1
    cousin count: 2
    family count: 1
    View my answer on StackOverflow for more context and info on searching through XML
  • In order to parse XML in a Scala Play application, you can either take it as type xml or type anyContent. Example POST methods which take in an XML request body and pass them back as a String are below:

    def index: Action[NodeSeq] = Action(parse.xml) {
      implicit request =>
        val xml = request.body
        Ok(xml)
    }
    def handlePost(): Action[AnyContent] = Action(parse.anyContent) {
      implicit request =>
        Try(request.body.asXml.get) match {
          case Success(xml) => Ok(xml)
          case Failure(ex) => UnsupportedMediaType
        }
    }

    It is recommended to handle the request as anyContent so that you can handle the UnsupportedMediaType exception manually (e.g. log the error then redirect or return the appropriate response code).

  • Modifying nodes can become quite ugly and complicated, so the best way to do this cleanly is with a helper function. Below is a function which has multiple uses, along with the XML we will manipulate. Obviously, this can be split into multiple functions too.

    def rewriteXml(xml: NodeSeq, search: Elem, newNode: NodeSeq = NodeSeq.Empty): NodeSeq = xml.foldLeft(NodeSeq.Empty){
      (acc: NodeSeq, curr: Node) => curr match {
        case elem: Elem if elem.label == search.label => acc ++ newNode
        case elem: Elem => acc ++ Elem(
          elem.prefix,
          elem.label,
          elem.attributes,
          elem.scope,
          elem.minimizeEmpty,
          rewriteXml(NodeSeq.fromSeq(elem.child), search, newNode):_*)
        case node => acc ++ node
      }
    }
    val xml = <node1>
        <node2 xmlns="http://www.example.com">
            <node3>James dislikes XML</node3>
            <node4>XML is useful</node4>
        </node2>
    </node1>
    1. Changing a Nodes's text

      This searches for a Node and replaces it with any NodeSeq we give it. Input:

      rewriteXml(xml, <node3/>, <node3>James loves XML</node3>)

      Output:

      res0: scala.xml.NodeSeq = <node1>
          <node2 xmlns="http://www.example.com">
              <node3>James loves XML</node3>
              <node4>XML is useful</node4>
          </node2>
      </node1>
    2. Removing a Node

      This strips a given Node out of the XML by replacing the given Node with nothing. Input:

      rewriteXml(xml, <node3/>)

      Output:

      res0: scala.xml.NodeSeq = <node1>
          <node2 xmlns="http://www.example.com">
      
              <node4>XML is useful</node4>
          </node2>
      </node1>
    3. Duplicating a Node

      To duplicate a Node, we will have to modify the original function (this modification will not break the other examples).

      def rewriteXml(xml: NodeSeq, search: Elem, newNode: NodeSeq = NodeSeq.Empty, duplicateNode: Boolean = false): NodeSeq = xml.foldLeft(NodeSeq.Empty){
        (acc: NodeSeq, curr: Node) => curr match {
          case elem: Elem if elem.label == search.label && duplicateNode => acc ++ elem ++ elem
          case elem: Elem if elem.label == search.label => acc ++ newNode
          case elem: Elem => acc ++ Elem(
            elem.prefix,
            elem.label,
            elem.attributes,
            elem.scope,
            elem.minimizeEmpty,
            rewriteXml(NodeSeq.fromSeq(elem.child), search, newNode, duplicateNode):_*)
          case node => acc ++ node
        }
      }

      Input:

      rewriteXml(xml, <node3/>, duplicateNode = true)

      Output:

      res0: scala.xml.NodeSeq = <node1>
          <node2 xmlns="http://www.example.com">
              <node3>James dislikes XML</node3>
              <node4>XML is useful</node4><node4>XML is useful</node4>
          </node2>
      </node1>
    4. Adding a Node

      To add a Node, we will have to modify the original function again. A new Node will be added below the search Node.

      def rewriteXml(xml: NodeSeq, search: Elem, newNode: NodeSeq = NodeSeq.Empty, duplicateNode: Boolean = false, addNewNode: Boolean = false): NodeSeq = xml.foldLeft(NodeSeq.Empty){
        (acc: NodeSeq, curr: Node) => curr match {
          case elem: Elem if elem.label == search.label && duplicateNode => acc ++ elem ++ elem
          case elem: Elem if elem.label == search.label && addNewNode => acc ++ elem ++ newNode
          case elem: Elem if elem.label == search.label => acc ++ newNode
          case elem: Elem => acc ++ Elem(
            elem.prefix,
            elem.label,
            elem.attributes,
            elem.scope,
            elem.minimizeEmpty,
            rewriteXml(NodeSeq.fromSeq(elem.child), search, newNode, duplicateNode, addNewNode):_*)
          case node => acc ++ node
        }
      }

      Input:

      rewriteXml(xml, <node4/>, <node5>Here's an additional Node</node5>, addNewNode = true)

      Output:

      res0: scala.xml.NodeSeq = <node1>
          <node2 xmlns="http://www.example.com">
              <node3>James dislikes XML</node3>
              <node4>XML is useful</node4><node5>Here's an additional Node</node5>
          </node2>
      </node1>
    5. Changing a Namespace

      To change or add a Namespace, we will once again have to modify the original function.

      def rewriteXml(xml: NodeSeq, search: Elem, newNode: NodeSeq = NodeSeq.Empty, duplicateNode: Boolean = false, addNewNode: Boolean = false, namespace: String = ""): NodeSeq = xml.foldLeft(NodeSeq.Empty){
        (acc: NodeSeq, curr: Node) => curr match {
          case elem: Elem if elem.label == search.label && duplicateNode => acc ++ elem ++ elem
          case elem: Elem if elem.label == search.label && addNewNode => acc ++ elem ++ newNode
          case elem: Elem if elem.label == search.label && namespace.nonEmpty => acc ++ Elem(
            elem.prefix,
            elem.label,
            elem.attributes,
            NamespaceBinding(null, namespace, elem.scope),
            elem.minimizeEmpty,
            rewriteXml(NodeSeq.fromSeq(elem.child), search, newNode, duplicateNode, addNewNode):_*
          )
          case elem: Elem if elem.label == search.label => acc ++ newNode
          case elem: Elem => acc ++ Elem(
            elem.prefix,
            elem.label,
            elem.attributes,
            elem.scope,
            elem.minimizeEmpty,
            rewriteXml(NodeSeq.fromSeq(elem.child), search, newNode, duplicateNode, addNewNode, namespace):_*)
          case node => acc ++ node
        }
      }

      Input:

      rewriteXml(xml, <node4/>, namespace = "https://james-work-account.github.io/")

      Output:

      res0: scala.xml.NodeSeq = <node1>
          <node2 xmlns="http://www.example.com">
              <node3>James dislikes XML</node3>
              <node4 xmlns="https://james-work-account.github.io/">XML is useful</node4>
          </node2>
      </node1>
  • It is probably easier to show how to do this rather than trying to explain it step by step.

    import java.io.{ByteArrayOutputStream, StringReader}
    
    import javax.xml.transform.stream.{StreamResult, StreamSource}
    import javax.xml.transform.{Result, Source, TransformerFactory}
    
    import scala.util.{Failure, Success, Try}
    import scala.xml._
    
    val xml: Elem = {
    
              <Message>
                      <Header>
                          <MessageDetails>
                              <Name>James</Name>
                              <Timestamp>2006-01-05T15:31:59.000</Timestamp>
                          </MessageDetails>
                          <SenderDetails/>
                      </Header>
                  <Body>
                  </Body>
              </Message>
    }
    
    val ns = "http://www.example.com"
    
    val xslt =
    s"""
    |<xsl:stylesheet version="1.0"
    | xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    |    <xsl:output method="xml" indent="yes" omit-xml-declaration="yes" />
    |
    |    <xsl:template match="Message">
    |        <xsl:element name="{local-name()}" namespace="$ns">
    |            <xsl:apply-templates select="node() | @*"  />
    |        </xsl:element>
    |    </xsl:template>
    |    <xsl:template match="node() | @*">
    |        <xsl:copy>
    |            <xsl:apply-templates select="node() | @*" />
    |        </xsl:copy>
    |    </xsl:template>
    |
    |</xsl:stylesheet>
    """.stripMargin
    
    
    
    def transformXml(xml: String): NodeSeq = Try {
    
      val xmlSource: Source = new StreamSource(new StringReader(xml))
    
      val outputStream = new ByteArrayOutputStream()
      val result: Result = new StreamResult(outputStream)
    
      val transformerFactory = TransformerFactory.newInstance
      val transformer = transformerFactory.newTransformer(new StreamSource(new StringReader(xslt)))
    
      transformer.transform(xmlSource, result)
      outputStream.toString.replaceAll(":?ns0:?", "")
    } match {
      case Success(xmlAsString) =>
        XML.loadString(xmlAsString)
      case Failure(ex: Throwable) =>
        throw ex
    }
    
    transformXml(xml.toString())

    This example uses an XSLT which is loaded from a String, but loading from an external file isn't too different. This specific XSLT adds a namespace to the Message Node.

    There are ways to transform XML using Scala libraries, but the simplest way I've found is to use the Java TransformerFactory. Most of the work is done for you behind the scenes; all you need to do it set it up correctly.

    What you will need:

    • XML to be transformed, as a String
    • XSLT to be used, loaded using a StringReader in this case but can be loaded from an external file using a Java FileReader

    In my example I have a replaceAll addition, as the actual output of this specific XSLT changes the Message node to <ns0:Message xmlns:ns0="http://www.example.com"> rather than <Message xmlns="http://www.example.com">; it is not necessary.

    After the transformation is complete, you can turn the transformed XML into a NodeSeq using XML.loadString(your XML). I have put this transformation in a Try block so that if it fails at any point, the exception will be caught. In this example there is just one generic Failure(ex: Throwable) catch, but you can specify which different exceptions you wish to catch if you want to handle them all differently (e.g. with different bespoke logging messages).

  • Any applications which accept external XML with DocType Declarations in them are vulnerable to XXE attacks by default. Thankfully, the Playframework handles these attacks automatically by not allowing any DTD in XML POSTed to your application. Below is an example of some XML containing a DTD.

    <?xml version="1.0" encoding="utf-8"?>
    <!DOCTYPE foo [
    <!ELEMENT foo (bar)>
    	<!ELEMENT bar (#PCDATA)>
    ]>
    <foo>
    	<bar>string</bar>
    </foo>

    This example is harmless, but it could be much more serious.

    In order to test that your Play application is secure against XXE attacks, you can write the following Routes test:

    import org.scalatestplus.play.PlaySpec
    import org.scalatestplus.play.guice.GuiceOneAppPerSuite
    import play.api.mvc.Call
    import play.api.test.FakeRequest
    import play.api.test.Helpers.{POST => POST_REQUEST, _}
    
    class RoutesSpec extends PlaySpec with GuiceOneAppPerSuite {
    
      /**
        * Test to verify POSTing XML with DTD will fail due to default Application behaviour.
        * By default, Play blocks all XML with any DTD in it due to potential XXE vulnerability.
        * This means that the call will fall over at `Action.async(parse.xml)` (or equivalent).
        * `scala.xml.XML.loadString` has XXE vulnerability, so POSTing the XML as `scala.xml.Unparsed` gets around this.
        *
        */
    
      "The Play Application" must {
        "not handle XXE XML" in {
    
        lazy val xmlWithDTD = scala.xml.Unparsed(
            """<?xml version="1.0" encoding="utf-8"?>
              |<!DOCTYPE foo [
              |<!ELEMENT foo (bar)>
              |	<!ELEMENT bar (#PCDATA)>
              |]>
              |<foo>
              |	<bar>string</bar>
              |</foo>
            """.stripMargin)
    
          val Some(result) = route(app, FakeRequest(Call(POST_REQUEST, "/your-app-route"))
            .withXmlBody(xmlWithDTD))
          )
          status(result) mustEqual 400
          contentAsString(result) mustBe ""
        }
      }
    }

    Importantly, the usual loadString method is vulnerable to attacks so loading the XML as a scala.xml.Unparsed gets around this.

    If you are accepting XML without the security of the Playframework, there are other ways to accept XML safely. My personal preference is to use a secure SAX parser when calling the loadString method.

    def secureSAXParser = {
      val saxParserFactory = SAXParserFactory.newInstance()
      saxParserFactory.setFeature("http://xml.org/sax/features/external-general-entities", false)
      saxParserFactory.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true)
      saxParserFactory.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false)
      saxParserFactory.newSAXParser()
    }
    
    XML.withSAXParser(secureSAXParser).loadString(your XML)

    This protects against things like XXE attacks.