Skip to content Skip to sidebar Skip to footer

Excel Vba Web Scraping Returning Wrong Text In Msxml2.xmlhttp Method

I am trying to extract the movie description from this Url, 'https://ssl.ofdb.de/plot/138627,271359,I-Am-Legend' When i use CreateObject('InternetExplorer.Application') method it g

Solution 1:

You want to attain UTF-8 from byte string returned rather than unicode. You can use helper functions as shown below which I have taken from here. This is the 64 bit version. I will leave the 32 bit at the bottom. You can also use a more targeted css selector to obtain your node; this will be quicker and avoid additional string cleaning function calls.

OptionExplicit''' Maps a character string to a UTF-16 (wide character) stringPrivateDeclare PtrSafe Function MultiByteToWideChar Lib"kernel32" ( _
    ByVal CodePage AsLong, _
    ByVal dwFlags AsLong, _
    ByVal lpMultiByteStr As LongPtr, _
    ByVal cchMultiByte AsLong, _
    ByVal lpWideCharStr As LongPtr, _
    ByVal cchWideChar AsLong _
    ) AsLong' CodePage constant for UTF-8PrivateConst CP_UTF8 = 65001''' Return length of byte array or zero if uninitializedPrivateFunction BytesLength(abBytes() AsByte) AsLong' Trap error if array is uninitializedOnErrorResumeNext
    BytesLength = UBound(abBytes) - LBound(abBytes) + 1EndFunction''' Return VBA "Unicode" string from byte array encoded in UTF-8PublicFunction Utf8BytesToString(abUtf8Array() AsByte) AsStringDim nBytes AsLongDim nChars AsLongDim strOut AsString
    Utf8BytesToString = ""' Catch uninitialized input array
    nBytes = BytesLength(abUtf8Array)
    If nBytes <= 0ThenExitFunction' Get number of characters in output string
    nChars = MultiByteToWideChar(CP_UTF8, 0&, VarPtr(abUtf8Array(0)), nBytes, 0&, 0&)
    ' Dimension output buffer to receive string
    strOut = String(nChars, 0)
    nChars = MultiByteToWideChar(CP_UTF8, 0&, VarPtr(abUtf8Array(0)), nBytes, StrPtr(strOut), nChars)
    Utf8BytesToString = Left$(strOut, nChars)
EndFunctionPublicSub test()

    Dim xhr As MSXML2.XMLHTTP60: Set xhr = New MSXML2.XMLHTTP60
    Dim html As MSHTML.HTMLDocument: Set html = New MSHTML.HTMLDocument

    With xhr
        .Open "GET", "https://ssl.ofdb.de/plot/138627,271359,I-Am-Legend", False
        .send
         html.body.innerHTML = Utf8BytesToString(.responseBody)
    EndWith

    [A1] = html.querySelector("p.Blocksatz").innerText
 
EndSub

32-bit:

PrivateDeclareFunction MultiByteToWideChar Lib"kernel32" ( _
    ByVal CodePage AsLong, _
    ByVal dwFlags AsLong, _
    ByVal lpMultiByteStr AsLong, _
    ByVal cchMultiByte AsLong, _
    ByVal lpWideCharStr AsLong, _
    ByVal cchWideChar AsLong _
    ) AsLong

Post a Comment for "Excel Vba Web Scraping Returning Wrong Text In Msxml2.xmlhttp Method"